Graph structures are something that I have lightly covered in my Database Design book in the last few book editions (the last coinciding with SQL Server 2016), implementing them with relational tables. In the next edition of my book that I am starting to write now, my goal will be to cover one of the most important new topics for database design in SQL Server: using the graph db extensions added in SQL Server 2017.
Before SQL Server 2017, the techniques I covered were centered around using relational structures implementing tree constructs. A tree structure is, generally speaking, one that lets a node have one or zero parents, but multiple child nodes. A common example is a corporate employee structure. Everyone has one person they report to generally for administrative purposes but may be the manager of multiple people.
A common (and fun) example of a graph structure is based on the movie Six Degrees of Kevin Bacon. Using the website: Oracle of Bacon, you can see a link between different celebrities linked to other celebrities. For example, Kevin Bacon to James Arnold Taylor (who voices Obi Wan Kenobi on the Clone Wars, amongst many other things).
James Arnold Taylor has been in many projects, and each of those projects also had many other people involved. Jim Cummings was in “Comic Book: The Movie” with him, and Jim Cummings was in “Balto” with Kevin Bacon. Running this query a second time, I found a link between Mr Taylor and Edie McClurg for the movie “Foodfight!”and Edie McClurg was in “Planes, Trains, and Automobiles” with Kevin Bacon. If you keep executing the query, you will see many other examples.
In this design, the nodes are the actors and movies, and the edges are links between them. Building the relational tables to support these objects is a straightforward task, and querying for simple cases, like actors in movies, also not terribly complex. For example, I should easily be able to query for all people who have acted in a movie with James Arnold Taylor. And with that list, I can query to see if any of those actors have been in a move with Kevin Bacon. Or take that list of people and get everyone who as acted with them in film and continue the pattern. It should be clear that this is going to get messy pretty quick for even a moderate sized set of data. (Note that the algorithm I have clumsily described is referred to as a recursive algorithm, where you use the result of a query and run the same query on the results over and over until you find a termination point (in this case Kevin Bacon.)
Now consider that not only do we want to find movies that an actor has been in with Kevin Bacon, but what if we need to be able to expand this to directing, producing, invested in, etc. As the number of tables increases, the complexity grows considerably.
If your database is 100% centered on such analysis, there are databases that store and query graph structures natively that will probably work better for your need that using SQL Server. I have no belief that the people who decided to put graph structures into SQL Server’s engine were thinking they were out to unseat Neo4j, and certainly not their own Azure Cosmos DB offering.
So what I am setting out to decide, understand, and start sharing, is how we might best use graph-based objects intertwined with the already established relational objects we have used for many years. Could they be a great blanket replacement for the many-to-many resolution tables, such as CustomerInterest in the following diagram?
Might they simply be best served to take data such as in the classic Customer to Product relationship, through the invoice/sales order, such as:
Transform this data and other similar data into a graph structure for analysis?
This is the goal I will be working on for the next several months. Partly for the book, but also for a new session I am working towards, with the following name/abstract (this is technically the first draft).
Mixing Graph with Relational Data in SQL Server
For many years, working with many-to-many relationships in a relational database has been one of the most complex tasks to handle as a database designer. A simple many-to-many relationship is easy to create with a table bridging two objects, but querying them is quite messy in all but the simplest cases. Yet, traversing many-to-many relationships are where you start to discover interesting details about your customer’s behavior. For example, in a customer relationship management system, customers might be connected to other customers, interested in and have purchased different types of products, and indicated multiple interests. Shared behavior and interests may indicate future behavior. Enter the graph db components in SQL Server, greatly enhanced in SQL Server 2019, providing advanced capabilities for handling many-to-many relationships, albeit in a manner that sits uncomfortably close to violating classic relational database design patterns. In this session, we are going to discuss what graph tables can do that relational tables cannot, and strategies for how to mix graph tables with relational tables for new and existing databases.
Hopefully I can, through some experimentation, and even more learning from the people who wrote the graph db stuff, and others who have actually used it, find the proper way to use the objects for the best possible database designs