Loading Existing Data Into SQL Graph Objects

The SQL Server graph database feature allows you to model relationships that would be too complex for traditional relational tables. In this post, Louis Davidson demonstrates how to load existing data into graph objects in SQL Server.

The technique I will use in this blog can be used to load a graph with existing data you have. In my blog Using a Modeling Tool To Draw and Import a Graph into SQL Server, I demonstrated how to use a free (and nicely featured) tool to draw a graph and then get that data into a set of tables using a Powershell script. Now the task is to get this (or any data) into edges and nodes.

To refresh you memory, using this query:

You can see the data I am going to load:

To load this data into a set of SQL Graph tables, I will create the next two tables:

The first method is to just load the data using straightforward joins. The nodes are simple, just take the nodes and insert them. Adding nodes to a set of graph tables is pretty simple. The tables are just tables, and all the graph internals are done for you.

But the edges are where things get “complicated”, because you need to get the node_id values for the from and to nodes.

Now though, you a set of nodes you can easily query:

This returns:

While this method is perfectly acceptable, due to the joins it may not be quite as fast if you are loading a lot of data. Luckily there is an easier way that is not super obvious, but actually makes loading a graph from existing data very easy.

The idea lies in the data structure of the values you see in the special columns. The node_id values look like this:

As I demonstrated here, that value actually doesn’t exist in the database, but is really just shorthand for a few integer values (which is really annoying when error handling, hence the blog). But this shorthand is something you can exploit to load data because you can form the data yourself.

So I will clear the tables:

To create the string, we need to add several values together in our insert. You can do this with CONCAT and put together the value, but to simplify, I will create a user-defined function that will return the node mapping value that looks just the string you see in the values:

For example, say you want a node for the table we are building, like for id =1:

This returns:

And luckily, not only can you use this when querying for a value, but you can also create a new row’s graph surrogate value by providing your own. By taking control of the id values, things get a lot easier. Of course, while it will greatly simplify the edge loading, you have one easy step to take when loading a node, providing the id value:

So the insert becomes simply (I am including the id value because we needed it in the previous example and left it in the table):

Loading the edge, since we have the integer values for the from and to values, and they match the id values we just created. we use the function and output the node values:

Note: There was a bug with inserting your own nodes in SQL Server that was fixed in SQL Server 2019 CU 12.

Run the following code:

And you get the same output as before.

What is cool is that you can do this with any of your data that has integer ids. For example, you might load a set of graph tables to analyze your data. For a concrete example, consider the many-to-many relationship between customers and products. There is an example of this in the WideWorldImporters database. It isn’t a straight connection, of course, because the connection goes through a sales order. But once you do the join through the sales order and line items, the process is exactly the same because you have a from and to value ready made:

Then add the Tools function:

Next create a couple of node and edge tables to hold the data:

Now load the data:

Clean the tables before starting, as you will want to refresh all the data, or build some ETL to handle changes for a real database..

Just as before, using keys from the table, load the customer and product tables.

In the Edge insert, the CTE aggregates all the orders of the database, so we can pick the related customer that has the most ordered items for a related product.

Loading these tables on my SQL Server 2019 on my local computer takes around 3 or 4 seconds. It is not a very beefy computer, so this is pretty awesome. There are: 663 customers, 227 products, and 115140 edges created, so this is no slouch. Now you can execute the following query:

From this query you get the following result, showing top 10 products that customers had in common with ‘Wingtip Toys (Mauldin, SC)’ that made the same product orders: