SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

Along with Graph Databases, Microsoft has introduced the new MATCH function for querying them. Robert Sheldon explains how to use the new function.

The series so far:

  1. SQL Server Graph Databases - Part 1: Introduction
  2. SQL Server Graph Databases - Part 2: Querying Data in a Graph Database
  3. SQL Server Graph Databases - Part 3: Modifying Data in a Graph Database
  4. SQL Server Graph Databases - Part 4: Working with hierarchical data in a graph database
  5. SQL Server Graph Databases - Part 5: Importing Relational Data into a Graph Database

Microsoft incorporated the graph database in SQL Server 2017, providing a logical structure for storing and querying data sets that contain complex many-to-many or hierarchical relationships. The first article in this series introduced you to the basics of graph databases and described how to define node and edge tables and populate them with data. In this article, we turn to the querying side of the equation, with a focus on retrieving related data in multiple node tables.

For the examples in this article, I used the FishGraph database from the first article in this series, except that I added more sample data. The database is based on a fictitious fish-lovers forum and includes three node tables (FishSpecies, FishLover, and FishPost) and three edge tables (Likes, Posts, and LinksTo). The following figure shows the data model used to build the database.

The rectangles represent the nodes, and the arrows connecting the nodes represent the edges, with the arrows pointing in the direction of the relationships. You can download the T-SQL script used to create and populate the database at the bottom of this article.

Introducing the MATCH Function

For the most part, querying tables in a graph database works much the same way as querying regular relational tables, although there are some limitations, such as not being able to perform cross-database queries on graph tables. (For more information about graph database limitations, refer to the Microsoft document SQL Graph Architecture.)

Despite the limitations, you should find that most queries work as expected. For example, the following SELECT statement joins the FishLover, Likes, and FishSpecies tables in order to retrieve a list of users who like certain fish species:

The SELECT statement uses the $node_id, $from_id, and $to_id column aliases to join the tables and return the data shown in the following figure

As you can see, querying graph tables is a fairly straightforward process, especially when you take advantage of the graph column aliases. Of course, if you’re trying to retrieve data based on more complex relationships, the query itself also becomes more complex and can even get a bit unwieldy. For this reason, Microsoft has added the MATCH function for retrieving related data from graph tables.

The MATCH function lets you define a search pattern based on the relationships between nodes. You can use the function only in the WHERE clause of a SELECT statement that queries node and edge tables. The following syntax shows the elements that go into defining the search pattern:

A search pattern can define one or more relationships. For each relationship, you must identify the originating and terminating nodes, as well as the edge that ties the two nodes together. You must also specify the direction of the relationship, using dashes and arrows, with the edge situated between the two nodes. For example, if you want to define a single relationship that originates with node1 and terminates with node2, you would use the following syntax for your WHERE clause:

Notice that the edge is enclosed in parentheses, with a dash preceding the edge, and a dash and right arrow following the edge. This defines a relationship that moves from left-to-right. You can reverse this order by specifying node1 on the right side of the search pattern and node2 on the left side, with the arrow pointing in the opposite direction:

In either case, the search pattern indicates that the WHERE clause should return only those rows in which a relationship exists between node1 (the originating node) and node2 (the terminating node), as defined in the edge element.

With these basics in mind, you can rewrite the SELECT statement above to simplify the query:

In the FROM clause, you simply list the participating tables – without the ON clause– providing table aliases where appropriate. You can then reference the aliases in the search pattern of the MATCH function. In this case, the search pattern defines the relationship fish lover likes a fish species. The statement will return the same results as those returned by the SELECT statement above.

You can also reverse the order of the relationship so that it is defined from right-to-left:

Again, the SELECT statement returns the same results as the previous two SELECT statements.

You can include other elements in the WHERE clause, in addition to the MATCH function. For example, the following WHERE clause adds a search condition specifying that only rows with a Username value of hooked should be returned:

The statement now returns the results shown in the following figure.

As you can see, using the MATCH function to define a search pattern based on a single relationship is a fairly straightforward process. In many cases, however, you’ll want to return data based on multiple relationships, which is where the function can be particularly handy.

Creating Compound MATCH Expressions

If you refer back to the syntax for the MATCH function, you’ll notice that you can use the AND operator when defining your search pattern, allowing you to string together multiple relationships within a single expression. Note, however, that the MATCH function does not support the OR operator or the NOT operator, so the logic you can define is somewhat limited. Even so, the AND operator can still be very useful. For example, the following search pattern uses the operator to string together two relationships:

For the SELECT statement to return a row, a fish lover must like a fish post and the fish post must link to a fish species. In this case, the statement returns only two rows, as shown in the following figure.

By being able to link together multiple relationships, you can dig more deeply into how the nodes in a graph database are interconnected. For example, the preceding SELECT statement might help to determine whether users are more inclined to like a post that specifically links to a fish species.

In some cases, you can link together relationships without using the AND operator, as long as your search pattern defines the same logic. For instance, you can rewrite the preceding SELECT statement by eliminating the AND operator and one of the references to the Post table alias:

The SELECT statement returns the same results as the preceding one, even though the search pattern has been simplified. You can also define a search pattern that contains two relationships terminating with the same node, as in the following example:

The first relationship (fish lover likes a fish species) is defined from left-to right, and the second relationship (fish post links to a fish species) is defined from right-to-left. As a result, the SELECT statement returns only those rows in which a fish species is both liked and linked to, giving us the results shown in the following figure.

You can also define search patterns that include more than two relationships, using the AND operator where appropriate. For example, the search pattern in the following SELECT statement includes three relationships but only one instance of the AND operator:

Because the Likes table is referenced twice within the search pattern, it must be included twice in the FROM clause, with a different alias assigned to each instance. The search pattern then uses these aliases when defining the three relationships (fish lover likes a fish post, fish post links to a fish species, and fish lover likes a fish species). The SELECT statement will return only those rows in which a fish species is both liked and linked to from a post that is liked, as shown in the following figure.

In this case, the user underwatercasey likes both the fish post and fish species, and the fish post links to the fish species. You can also rewrite the search pattern to eliminate the AND operator altogether:

The SELECT statement returns the same results as the preceding statement but simplifies the search pattern. In some cases, however, you might find that the AND operator makes it easier to read and troubleshoot your code as you heap on more and more relationships.

Defining a Self-referencing Query

Because of the way in which a graph database is structured, it is just as easy to perform a self-referencing query as any other type of query. For example, the following SELECT statement returns a list of fish posts that link to other posts:

To define the relationship in the search pattern, you must include two instances of the FishPost table in the FROM clause, assigning a different alias to each instance, similar to how you included multiple instances of the Likes edge table in the preceding two examples. The SELECT statement returns the results shown in the following figure.

You can use the same logic to determine which fish lovers like other fish lovers:

This time the FROM clause includes two instances of the FishLover table, with a unique alias assigned to each one, giving us the results shown in the following figure.

You can also use this approach when defining more than two relationships in your search pattern. For example, the following SELECT statement returns a list of users that are liked by user hooked, along with the users that they like:

The FROM clause now includes three instances of the FishLover table and two instances of the Likes table. The search pattern uses these instances to define the relationships fish lover1 likes fish lover2 and fish lover2 likes fish lover3, giving us the results shown in the following figure.

Essentially what we have here is a friend-of-a-friend type scenario:

  • The user hooked likes the users powerangler and deepdive.
  • The user powerangler likes the user deepdive.
  • The user deepdive likes the user underwatercasey.

Self-referencing queries make it easy to discover how data is related wherever it resides. Because the relationships are stored within the edge tables, it does not matter if the relationships cross multiple nodes, are contained within a single node, or are a combination of both.

Combining the MATCH Function with Other T-SQL Features

So far in this article, the SELECT statements we’ve covered have included only the SELECT, FROM, and WHERE clauses. However, you can use the MATCH function with other query types, such as queries that group and aggregate data. For example, the following SELECT statement includes the GROUP BY and ORDER BY clauses:

The GROUP BY clause groups the data by the Username values in the FishLover table in order to provide a count of total likes per user, as shown in the following results.

Although this is a fairly basic example, it demonstrates how you can build various types of queries based on data from a graph database, just like queries against relational tables. You can even use a graph database in conjunction with Microsoft Machine Learning Services (MLS). For example, the following T-SQL script retrieves data from a graph database and then uses it in a Python script to generate a bar chart:

I won’t go into too much detail about running the Python script because I covered it in my series on Python and MLS. (The series starts with the article SQL Server Machine Learning Services – Part 1: Python Basics.) Basically, the T-SQL script does the following:

  • Declares the @pscript variable and assigns the Python script to the variable. The script creates a data frame based on the data returned from the graph database and then generates a bar chart based on the data.
  • Declares the @sqlscript variable and assigns the SELECT statement to the variable. This is the same SELECT statement shown in the previous example.
  • Passes in the two variables as parameter values to the sp_execute_external_script stored procedure, which retrieves the graph data and runs the Python code.

When you run the sp_execute_external_script stored procedure, the Python engine generates the bar chart shown in the following figure.

As these examples demonstrate, graph databases have been well integrated into the SQL Server ecosystem, providing you with a great deal of flexibility for working with graph data. MLS in particular opens up a wide range of possibilities for analyzing and visualizing the data in a variety of ways.

Conclusion

As with storing and populating graph tables, querying those tables is a relatively painless process, once you understand the fundamentals. Where applicable, you can use the MATCH function to streamline your queries. When that’s not practical, you can revert to regular queries to get the data you need. The only trick is to make sure you’re working against an instance of SQL Server 2017 and not an earlier version. Fortunately, you can download the Developer’s edition for free, which comes with all the features available to the Enterprise edition, including support for graph databases. You can then use what you learned so far in this series to start exploring graph databases on your own.