{"id":92115,"date":"2021-08-13T01:43:26","date_gmt":"2021-08-13T01:43:26","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=92115"},"modified":"2021-08-24T14:18:00","modified_gmt":"2021-08-24T14:18:00","slug":"loading-existing-data-into-sql-graph-objects","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/loading-existing-data-into-sql-graph-objects\/","title":{"rendered":"Loading Existing Data Into SQL Graph Objects"},"content":{"rendered":"<p>The technique I will use in this blog can be used to load a graph with existing data you have. In my blog <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/using-a-modeling-tool-to-draw-and-import-a-graph-into-sql-server\/\" target=\"_blank\" rel=\"noopener\">Using a Modeling Tool To Draw and Import a Graph into SQL Server,<\/a>\u00a0I demonstrated how to use a free (and nicely featured) tool to draw a graph and then get that data into a set of tables using a Powershell script. Now the task is to get this (or any data) into edges and nodes.<\/p>\n<p>To refresh you memory, using this query:<\/p>\n<pre class=\"lang:none theme:none\">SELECT *\r\nFROM   NodeStaging.Node\r\nWHERE  Node.Filename = 'NodeType-DefaultEdgeType-Sample';\r\n\r\nSELECT *\r\nFROM   NodeStaging.Edge\r\nWHERE  Edge.Filename = 'NodeType-DefaultEdgeType-Sample'\r\nORDER BY Edge.FromNodeId,Edge.ToNodeId;\r\nGO<\/pre>\n<p>You can see the data I am going to load:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"590\" height=\"533\" class=\"wp-image-92116\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/08\/word-image.png\" \/><\/p>\n<p>To load this data into a set of SQL Graph tables, I will create the next two tables:<\/p>\n<pre class=\"lang:none theme:none \">DROP TABLE IF EXISTS NodeType, DefaultEdgeType;\r\n\r\nCREATE TABLE NodeType (\r\n\tNodeId   int  NOT NULL UNIQUE,\r\n\tNodeName nvarchar(30) NOT NULL \r\n) AS NODE;\r\n\r\nCREATE TABLE DefaultEdgeType  AS EDGE<\/pre>\n<p>The first method is to just load the data using straightforward joins. The nodes are simple, just take the nodes and insert them. Adding nodes to a set of graph tables is pretty simple. The tables are just tables, and all the graph internals are done for you.<\/p>\n<pre class=\"lang:none theme:none\">INSERT INTO dbo.NodeType(NodeId, NodeName)\r\nSELECT NodeId, Node.Name\r\nFROM   NodeStaging.Node\r\nWHERE  FileName = 'NodeType-DefaultEdgeType-Sample'\r\n  AND  Node.NodeType = 'NodeType';<\/pre>\n<p>But the edges are where things get &#8220;complicated&#8221;, because you need to get the node_id values for the from and to nodes.<\/p>\n<pre class=\"lang:none theme:none\">WITH BaseRows AS (\r\n--note, for this example it is technically not needed\r\n--to do this join, but you usually will need other data.\r\nSELECT FromNode.NodeId AS FromNodeId, \r\n       ToNode.NodeId AS ToNodeId\r\nFROM   NodeStaging.Edge\r\n\t\tJOIN NodeStaging.Node AS FromNode\r\n\t\t\tON FromNode.NodeId = Edge.FromNodeId\r\n\t\tJOIN NodeStaging.Node AS ToNode\r\n\t\t\tON ToNode.NodeId = Edge.ToNodeId\r\n--extra where stuff to deal with the fact that I am loading from my copy database\r\nWHERE  Edge.Filename = 'NodeType-DefaultEdgeType-Sample'\r\n  AND  Edge.EdgeType = 'DefaultEdgeType'\r\n  AND  FromNode.FileName = 'NodeType-DefaultEdgeType-Sample'\r\n  AND  FromNode.NodeType = 'NodeType'\r\n  AND  ToNode.FileName = 'NodeType-DefaultEdgeType-Sample'\r\n  AND  ToNode.NodeType = 'NodeType')\r\nINSERT INTO dbo.DefaultEdgeType ($From_id, $to_id) \r\nSELECT FromNode.$node_id AS FromNode, \r\n       ToNode.$node_id AS ToNode\r\nFROM   BaseRows\r\n\t\tJOIN dbo.NodeType AS FromNode\r\n\t\t\tON FromNode.NodeId = BaseRows.FromNodeId\r\n\t\tJOIN dbo.NodeType AS ToNode\r\n\t\t\tON ToNode.NodeId = BaseRows.ToNodeId;<\/pre>\n<p>Now though, you a set of nodes you can easily query:<\/p>\n<pre class=\"lang:none theme:none\">SELECT FromNode.NodeName, ToNode.NodeName\r\nFROM   dbo.NodeType AS FromNode,\r\n\t   dbo.DefaultEdgeType,\r\n\t   dbo.NodeType AS ToNode\r\nWHERE  MATCH(FromNode-(DefaultEdgeType)-&gt;ToNode)<\/pre>\n<p>This returns:<\/p>\n<pre class=\"lang:none theme:none\">NodeName                       NodeName\r\n------------------------------ ------------------------------\r\n3                              4\r\n1                              4\r\n2                              1\r\n2                              3\r\nNode Name                      1\r\n4                              5\r\n6                              5\r\n7                              6\r\n7                              Possesssive's Node Name\r\nPossesssive's Node Name        8<\/pre>\n<p>While this method is perfectly acceptable, due to the joins it may not be quite as fast if you are loading a lot of data. Luckily there is an easier way that is not super obvious, but actually makes loading a graph from existing data very easy.<\/p>\n<p>The idea lies in the data structure of the values you see in the special columns. The node_id values look like this:<\/p>\n<pre class=\"lang:none theme:none\">{\"type\":\"node\",\"schema\":\"dbo\",\"table\":\"NodeType\",\"id\":0}<\/pre>\n<p>As I <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/translating-index-key-errors-from-graph-tables-in-sql-server\/\">demonstrated here<\/a>, that value actually doesn&#8217;t exist in the database, but is really just shorthand for a few integer values (which is really annoying when error handling, hence the blog). But this shorthand is something you can exploit to load data because you can form the data yourself.<\/p>\n<p>So I will clear the tables:<\/p>\n<pre class=\"lang:none theme:none\">DELETE FROM NodeType ;\r\nDELETE FROM DefaultEdgeType;<\/pre>\n<p>To create the string, we need to add several values together in our insert. You can do this with CONCAT and put together the value, but to simplify, I will create a user-defined function that will return the node mapping value that looks just the string you see in the values:<\/p>\n<pre class=\"lang:none theme:none\">IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'Tools')\r\n\tEXEC ('CREATE SCHEMA Tools')\r\nGO\r\nCREATE OR ALTER FUNCTION Tools.GraphTable$GenerateId(\r\n\t@Type sysname, --Node or Edge\r\n\t@SchemaName sysname , --Schema of the Node or Edge Table\r\n\t@tableName sysname, --Name of the Node or Edge Table\r\n\t@graphItemId int -- The id of the graph object\r\n\t)\r\nRETURNS nvarchar(2000)\r\nAS\r\nBEGIN\r\n\tRETURN CONCAT('{\"type\":\"', @Type ,\r\n\t               '\",\"schema\":\"', @schemaName,\r\n\t\t\t\t   '\",\"table\":\"',@tableName, \r\n\t\t\t\t   '\",\"id\":', @graphItemId,'}')\r\nEND;<\/pre>\n<p>For example, say you want a node for the table we are building, like for id =1:<\/p>\n<pre class=\"lang:none theme:none\">SELECT Tools.[GraphTable$GenerateId]\r\n                       ('node','dbo','NodeType',1);<\/pre>\n<p>This returns:<\/p>\n<pre class=\"lang:none theme:none\">{\"type\":\"node\",\"schema\":\"dbo\",\"table\":\"NodeType\",\"id\":1}<\/pre>\n<p>And luckily, not only can you use this when querying for a value, but you can also create a new row\u2019s graph surrogate value by providing your own. By taking control of the id values, things get a lot easier. Of course, while it will greatly simplify the edge loading, you have one easy step to take when loading a node, providing the id value:<\/p>\n<p>So the insert becomes simply (I am including the id value because we needed it in the previous example and left it in the table):<\/p>\n<pre class=\"lang:none theme:none\">INSERT INTO dbo.NodeType($Node_id, NodeId, NodeName)\r\nSELECT dbo.[GraphTable$GenerateId]\r\n                    ('node','dbo','NodeType',NodeId),\r\n\t   NodeId, Node.Name\r\nFROM   NodeStaging.Node\r\nWHERE  FileName = 'NodeType-DefaultEdgeType-Sample'\r\n  AND  Node.NodeType = 'NodeType';<\/pre>\n<p>Loading the edge, since we have the integer values for the from and to values, and they match the id values we just created. we use the function and output the node values:<\/p>\n<pre class=\"lang:none theme:none\">INSERT INTO dbo.DefaultEdgeType ($from_id, $to_id) \r\nSELECT dbo.[GraphTable$GenerateId]\r\n            ('node','dbo','NodeType',FromNodeId) AS FromNode,\r\n       dbo.[GraphTable$GenerateId]\r\n            ('node','dbo','NodeType',ToNodeId) AS ToNode\r\nFROM   NodeStaging.Edge\r\nWHERE  FileName = 'NodeType-DefaultEdgeType-Sample'\r\n  AND  Edge.EdgeType = 'DefaultEdgeType';<\/pre>\n<p>Note: There was a bug with inserting your own nodes in SQL Server that was fixed in <a href=\"https:\/\/support.microsoft.com\/en-us\/topic\/kb5004524-cumulative-update-12-for-sql-server-2019-45b2d82a-c7d0-4eb8-aa17-d4bad4059987#bkmk_14129474\">SQL Server 2019 CU 12<\/a>.<\/p>\n<p>Run the following code:<\/p>\n<pre class=\"lang:none theme:none\">SELECT FromNode.NodeName, ToNode.NodeName\r\nFROM   dbo.NodeType AS FromNode,\r\n\t   dbo.DefaultEdgeType,\r\n\t   dbo.NodeType AS ToNode\r\nWHERE  MATCH(FromNode-(DefaultEdgeType)-&gt;ToNode);<\/pre>\n<p>And you get the same output as before.<\/p>\n<p>What is cool is that you can do this with any of your data that has integer ids. For example, you might load a set of graph tables to analyze your data. For a concrete example, consider the many-to-many relationship between customers and products. There is an example of this in the WideWorldImporters database. It isn&#8217;t a straight connection, of course, because the connection goes through a sales order. But once you do the join through the sales order and line items, the process is exactly the same because you have a from and to value ready made:<\/p>\n<pre class=\"lang:none theme:none\">USE WideWorldImporters\r\nGO<\/pre>\n<p>Then add the Tools function:<\/p>\n<pre class=\"lang:none theme:none\">IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'Tools')\r\n\tEXEC ('CREATE SCHEMA Tools')\r\nGO\r\nCREATE OR ALTER FUNCTION Tools.GraphTable$GenerateId(\r\n\t@Type sysname, --Node or Edge\r\n\t@SchemaName sysname , --Schema of the Node or Edge Table\r\n\t@tableName sysname, --Name of the Node or Edge Table\r\n\t@graphItemId int -- The id of the graph object\r\n\t)\r\nRETURNS nvarchar(2000)\r\nAS\r\nBEGIN\r\n\tRETURN CONCAT('{\"type\":\"', @Type ,\r\n\t               '\",\"schema\":\"', @schemaName,\r\n\t\t\t\t   '\",\"table\":\"',@tableName, \r\n\t\t\t\t   '\",\"id\":', @graphItemId,'}')\r\nEND;<\/pre>\n<p>Next create a couple of node and edge tables to hold the data:<\/p>\n<pre class=\"lang:none theme:none\">IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'AnalysisGraph')\r\n\tEXEC ('CREATE SCHEMA AnalysisGraph')\r\nGO\r\n\r\nCREATE TABLE AnalysisGraph.Customer(\r\n\tCustomerId int NOT NULL,\r\n\tCustomerName nvarchar(100) NOT NULL\r\n) AS NODE;\r\n\r\nCREATE TABLE AnalysisGraph.Product(\r\n\tProductId int NOT NULL,\r\n\tProductName nvarchar(100) NOT NULL\r\n) AS NODE;\r\n\r\nCREATE TABLE AnalysisGraph.Ordered \r\n(\r\n\tOrderedQuantityCount int NOT NULL \r\n     --included an attribute to see what product was \r\n     --ordered most by the connected customer\r\n)\r\nAS EDGE;<\/pre>\n<p>Now load the data:<\/p>\n<p>Clean the tables before starting, as you will want to refresh all the data, or build some ETL to handle changes for a real database..<\/p>\n<pre class=\"lang:none theme:none\">TRUNCATE TABLE AnalysisGraph.Customer;\r\nTRUNCATE TABLE AnalysisGraph.Product;\r\nTRUNCATE TABLE AnalysisGraph.Ordered;<\/pre>\n<p>Just as before, using keys from the table, load the customer and product tables.<\/p>\n<pre class=\"lang:none theme:none \">INSERT INTO AnalysisGraph.Customer($Node_id, CustomerId, CustomerName)\r\nSELECT Tools.[GraphTable$GenerateId]\r\n          ('node','AnalysisGraph','Customer',CustomerId),\r\n\t   CustomerId, CustomerName\r\nFROM   Sales.Customers;\r\n\r\nINSERT INTO AnalysisGraph.Product($Node_id, ProductId, ProductName)\r\nSELECT Tools.[GraphTable$GenerateId]\r\n          ('node','AnalysisGraph','Product',StockItemId),\r\n\t   StockItemId, StockItems.StockItemName\r\nFROM   Warehouse.StockItems;<\/pre>\n<p>In the Edge insert, the CTE aggregates all the orders of the database, so we can pick the related customer that has the most ordered items for a related product.<\/p>\n<pre class=\"lang:none theme:none\">WITH AggregatedRows AS (\r\nSELECT Customers.CustomerID, OrderLines.StockItemID,\r\n       SUM(OrderLines.Quantity) AS OrderedQuantityCount\r\nFROM  Warehouse.StockItems\r\n\t\tJOIN Sales.OrderLines\r\n\t\t\tON OrderLines.StockItemID = \r\n                         StockItems.StockItemID\r\n\t\tJOIN Sales.Orders\r\n\t\t\tON Orders.OrderID = OrderLines.OrderID\r\n\t\tJOIN Sales.Customers\r\n\t\t\tON Customers.CustomerID = Orders.CustomerID\r\nGROUP BY Customers.CustomerID, OrderLines.StockItemID)\r\nINSERT INTO AnalysisGraph.Ordered \r\n              ($from_id, $to_id,OrderedQuantityCount) \r\nSELECT Tools.[GraphTable$GenerateId]\r\n             ('node','AnalysisGraph','Customer',\r\n                 AggregatedRows.CustomerId) AS FromNode,\r\n       Tools.[GraphTable$GenerateId]\r\n             ('node','AnalysisGraph','Product',\r\n                 AggregatedRows.StockItemId) AS ToNode,\r\n\t\tAggregatedRows.OrderedQuantityCount\r\nFROM AggregatedRows;<\/pre>\n<p>Loading these tables on my SQL Server 2019 on my local computer takes around 3 or 4 seconds. It is not a very beefy computer, so this is pretty awesome. There are: 663 customers, 227 products, and 115140 edges created, so this is no slouch. Now you can execute the following query:<\/p>\n<pre class=\"lang:none theme:none\">SELECT TOP 10 Product.ProductName, \r\n\t   customer2.CustomerName,Ordered2.OrderedQuantityCount\r\nFROM   AnalysisGraph.Customer, \r\n\t   AnalysisGraph.Ordered,\r\n\t   AnalysisGraph.Product,\r\n\t   AnalysisGraph.Ordered AS Ordered2,\r\n\t   AnalysisGraph.Customer AS Customer2\r\nWHERE MATCH(Customer-(Ordered)-&gt;Product&lt;-(Ordered2)-Customer2)\r\nAND Customer.CustomerName = 'Wingtip Toys (Mauldin, SC)'\r\nORDER BY Ordered2.OrderedQuantityCount desc;<\/pre>\n<p>From this query you get the following result, showing top 10 products that customers had in common with &#8216;Wingtip Toys (Mauldin, SC)&#8217; that made the same product orders:<\/p>\n<pre class=\"lang:none theme:none\">ProductName                                        CustomerName                     OrderedQuantityCount\r\n-------------------------------------------------- -------------------------------- --------------------\r\nBlack and orange fragile despatch tape 48mmx100m   Kumar Naicker                    1800\r\nBlack and orange fragile despatch tape 48mmx75m    Manca Hrastovsek                 1584\r\nBlack and orange fragile despatch tape 48mmx75m    Wingtip Toys (Straughn, IN)      1548\r\nBlack and orange fragile despatch tape 48mmx75m    Tailspin Toys (Hodgdon, ME)      1476\r\nBlack and orange fragile despatch tape 48mmx75m    Kertu Sokk                       1440\r\nBlack and orange fragile despatch tape 48mmx100m   Tailspin Toys (Kalvesta, KS)     1440\r\nShipping carton (Brown) 500x310x310mm              Wingtip Toys (Bourneville, OH)   1400\r\nBlack and orange fragile despatch tape 48mmx100m   Wingtip Toys (Jamison, IA)       1332\r\nBlack and orange fragile despatch tape 48mmx75m    Anindya Ghatak                   1260\r\nBlack and orange fragile despatch tape 48mmx100m   Tailspin Toys (Larose, LA)       1260<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The SQL Server graph database feature allows you to model relationships that would be too complex for traditional relational tables. In this post, Louis Davidson demonstrates how to load existing data into graph objects in SQL Server. &hellip;<\/p>\n","protected":false},"author":56085,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[19684],"class_list":["post-92115","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92115","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/56085"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=92115"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92115\/revisions"}],"predecessor-version":[{"id":92292,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92115\/revisions\/92292"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=92115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=92115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=92115"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=92115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}