Using a Modeling Tool to Draw and Import a Graph into SQL Server

As I am starting to prep for writing about graphs in SQL Server, I have reached the most challenging part of the process… demo data. That’s right, the code, the words, etc., all so much easier than making an example that demonstrates everything you want it to and has an easy way to present it (and is malleable for when you realize you forgot about a case that you meant to cover but have spent 10 hours honing your point already). For some of the data in the book, I have created data generators that will spit out huge amounts of data to test performance, but most of the sample data needs to be interesting to the reader to go beyond the technical details and into the problem being solved and give them a story to hand the concept on.

To visualize graphs for my Database Design book and blogs, I have been using a tool name yEd (https://www.yworks.com/products/yed) that does a pretty great job of letting you visualize a graph. For example, say I have drawn the following graph to cover the fairly simple test case:

The problem was, if I wanted to recreate this graph in data, I had to type in a bunch of SQL statements (something I generally enjoy to a certain point, but one of my sample files cover the geography of Disney World, and it would take a very long time to manually type that into a database as it took quite a while just to do one section of the park). 

So I went hunting for a tool to do this for me, but ended right back with yEd. The default file type when you save in yEd is GraphML, which is basically some pretty complex XML that was well beyond my capabilities using XML in SQL or Powershell. Realistically I don’t care that much about anything other than just the nodes and edges, and what I found was that you can save graphs in the tool a format named Trivial Graph Format (TGF). TGF is a very simple format that you can read more about here (https://en.wikipedia.org/wiki/Trivial_Graph_Format) but from that article, but format is simply:

Be careful to not save the graph as TGF only, unless you are fine having to reformat constantly. This is only the nodes and edges with name data, so when you open the TGF file in yEd, all the nodes will be on top of each other.

The node and edge names are optional, and while I would not desire to have empty node names (and my importer will fail if that occurs), my plan for the a model will be that, all nodes will be the same type, but edge types could vary. For example, say you have this basic structure:

Each of these edges represents a containership, so I don’t want to enumerate that on the diagram. However, if I add another municipality to the graph, I might want to say it is adjacent to another node, so I model the following, giving the edge a name.:

To do this extra metadata, I will use a filename pattern of “NodeType-DefaultEdgeType-Description or Uniqueifier” (the Uniqueifier basically is there to make because we might have duplicate node types and edges, but all the file names obviously have to be unique.

Going back to my original model, it is stored in the TGF file as:

Simple enough, but it does have a few complexities in the parsing. All of the fields in this file are space delimited, but the names have spaces in them… Not a terribly large deal, but it did make it more interesting.

To solve the problem, I decided to make some reasonably primative Powershell that could take a directory of TGF files and import them into SQL Server. I stopped short of automatically importing the data, but instead made the code generate a set of SQL Files that create a staging table and insert the data. I didn’t make it a module or function, but just a simple ps1 script to run interactively (hardcoding directories) as this is the kind of process that if you need to do something similar, you probably will make a change or two. (And adding automatic inserting of the data would not be terribly difficult, but typically wouldn’t be necessary).

I will include the Powershell code at the end of the post (and provide a link to github to fetch it), but the output is most interesting. The first block creates the tables if they don’t exist in your database (a USE statement is added and is parameterizable in the ps1 file), and a DELETE statement removes data from a file of the same name (for when you are running this over and over getting it just right):

Note: the code was formatted using SQL Prompt, it is a lot more spread out from the generator.

Next there are two sets of inserts. Nodes:

And then edges:

The final bit of code shows you the data for your imported nodes:

Run this code with the sample file, the output is:

Now I can use this data to easily load a set of nodes in a tree, using SQL Graph or an adjacency list. 

The following code (and two sample files, one that is the first graph in the blog, the second a subgraph of that graph) is available here: https://github.com/drsqlgithub/TGFImport/:

Hop this is a help as you start using graph tables also.  If you are keen to export a graph,  Arvind Shyamsundar has the basis of that code in a project that was done to visualize Pass Summit sessions in 2017, to output the graph as a GRAPHML file. (https://deep.data.blog/2017/11/03/how-we-did-it-pass-2017-summit-session-similarity-using-sql-graph-and-python/) .