{"id":85165,"date":"2019-09-11T14:35:45","date_gmt":"2019-09-11T14:35:45","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=85165"},"modified":"2021-06-03T16:48:18","modified_gmt":"2021-06-03T16:48:18","slug":"sql-server-2019-graph-database-and-shortest_path","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/sql-server-2019-graph-database-and-shortest_path\/","title":{"rendered":"SQL Server 2019 Graph Database and SHORTEST_PATH"},"content":{"rendered":"<p>My crystal ball seems to be working again: The new addition to SQL Server 2019, <code>shortest_path<\/code>, was the subject of many of the technical sessions I delivered as one of the missing features of SQL Server Graph Database.<\/p>\n<p>I will use an example that is similar to the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/t-sql-programming\/sql-graph-objects-sql-server-2017-good-bad\/\">first article<\/a> I wrote on this topic. You can create the database and tables needed for this article using this <a href=\"https:\/\/1drv.ms\/u\/s!ApgSbfuN8DmZnMZyL0ATb0ZnmOkvqg?e=VwBGjy\">script<\/a>. Maybe you would also like to read the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/graph-edge-constraints\/\">previous blog post<\/a> about some recent improvements in Graph Databases.<\/p>\n<p>Since the data is essentially a graph, a solution to visually show the results is helpful. An application like Gephi to can be used to view the graph. (You can download this application <a href=\"https:\/\/gephi.org\/users\/download\/\">here<\/a>.) To use Gephi with SQL Server, make sure that the TCP\/IP protocol and mixed authentication are enabled. The app only supports SQL authentication, so you will need a SQL Server login to connect.<\/p>\n<h2>Using Gephi<\/h2>\n<p>Follow these steps to view the graph in Gephi.<\/p>\n<ul>\n<li>After opening Gephi, select <em>New Project<\/em> on the <em>Welcome<\/em> screen<\/li>\n<li>On the <em>File<\/em> menu, select <em>Import Database-&gt;Edge List. <\/em>This will open the <em>Database settings <\/em>screen.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"824\" height=\"686\" class=\"wp-image-85166\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-cell-phone-description-automati.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<ul>\n<li>In the <em>Configuration Name<\/em> field, create a name for the configuration, such as <em>SQLServer2019.<\/em><\/li>\n<li>In the <em>Driver<\/em> field, select <em>SQL Server.<\/em><\/li>\n<li>In the <em>Host<\/em> field, insert the machine\/instance name of your SQL Server<\/li>\n<li>In the <em>Port<\/em> field, enter the port used, typically 1433. If you\u2019re not sure, look for the port number in the SQL Server Configuration Manager program.<\/li>\n<li>In the <em>Database<\/em> field, insert the name of the database which contains the graph tables which is probably <em>GraphDemo<\/em> if you have been following along<\/li>\n<li>In the <em>Username<\/em> field, enter the username of a SQL Server login with <em>SELECT<\/em> permission to the tables in the GraphDemo database.<\/li>\n<li>In the <em>Password<\/em> field, insert the password for the SQL login.<\/li>\n<\/ul>\n<p>In addition to the configuration required to connect to SQL Server, the <em>Database settings<\/em> screen also requires two queries: One to retrieve the list of nodes from the server and another to retrieve the list of edges from the server. I\u2019ll explain those queries next.<\/p>\n<p>The screen itself contains information at the top of the dialog about the columns the queries need to return, and it&#8217;s important to notice these columns are case sensitive.<\/p>\n<p>The nodes table in SQL Server has the pseudo-column <code>$node_id<\/code>, which is a JSON column. A good option is to extract the id from the JSON and use the <code>MemberName<\/code> as the label for the node. The query will look like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT\u00a0Json_value($node_id,\u00a0'$.id')\u00a0AS\u00a0id,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0membername\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS\u00a0label\u00a0\r\nFROM\u00a0\u00a0\u00a0forummembers\u00a0<\/pre>\n<p>The edge query has also to extract the id from\u00a0 <code>$to_node<\/code> and <code>$from_node<\/code><strong>,<\/strong> but besides that, you need to filter the nodes returned, because the nodes table from the previous article has a relation between two members and between members and messages. For this example, return only the relation between two members. Here is the query:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT\u00a0Json_value($from_id,\u00a0'$.id')\u00a0AS\u00a0source,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Json_value($to_id,\u00a0'$.id')\u00a0\u00a0\u00a0AS\u00a0target\u00a0\r\nFROM\u00a0\u00a0\u00a0likes\u00a0\r\nWHERE\u00a0Json_value($from_id,\u00a0'$.table')\u00a0=\u00a0'ForumMembers'\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AND\u00a0Json_value($to_id,\u00a0'$.table')\u00a0=\u00a0'ForumMembers'\u00a0<\/pre>\n<p>It&#8217;s important to note that the source and target id&#8217;s in the edge query need to match with the ids in the node query. There are some additional columns that could be used, but for this example, you\u2019ll only need these.<\/p>\n<p>The final configuration will look similar to the following image:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"499\" class=\"wp-image-85167\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/word-image-9.png\" \/><\/p>\n<p>After you click <em>OK<\/em>, the next window, <em>Import Report<\/em>, involves many details but they will not be covered in this article. What&#8217;s important here is the number of nodes and edges found, confirming that the queries are correct. However, there is only a single step to be done on this screen:<\/p>\n<p>Select the option <em>Append to existing workspace<\/em>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"807\" height=\"685\" class=\"wp-image-85168\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-cell-phone-description-automati-1.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>After this first screen, you will see the <em>Workspace1<\/em> and <em>Preview<\/em> tabs and <em>Preview Settings<\/em> window. It\u2019s time to build the graph.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"626\" height=\"964\" class=\"wp-image-85169\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-social-media-post-description-a.png\" alt=\"A screenshot of a social media post\n\nDescription automatically generated\" \/><\/p>\n<p>Click on the <em>Refresh<\/em> button inside the <em>Preview Settings<\/em> pane. This will result in an image similar to the one below. Note that yours may look different, but in the next step, you can verify that the nodes are connected correctly.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"622\" height=\"609\" class=\"wp-image-85170\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-picture-containing-sky-kite-water-outdoor-de.png\" alt=\"A picture containing sky, kite, water, outdoor\n\nDescription automatically generated\" \/><\/p>\n<p>Improve this graph just a bit by making these changes in the <em>Preview Settings<\/em> tab.<\/p>\n<ul>\n<li>Change the <em>Opacity<\/em> property to 0 under <em>Nodes<\/em>.<\/li>\n<li>Mark the <em>Show Labels<\/em> property under <em>Node Labels.<\/em><\/li>\n<li>Change the <em>Outline opacity<\/em> under <em>Node Labels<\/em> to 0<\/li>\n<\/ul>\n<p>After clicking refresh, the graph will change to something like the image below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"631\" height=\"606\" class=\"wp-image-85171\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-close-up-of-a-map-description-automatically-gen.png\" alt=\"A close up of a map\n\nDescription automatically generated\" \/><\/p>\n<p>As you may notice, even a simple graph with a small amount of data can be quite complex to identify information such as the shortest path between two nodes in the graph. That&#8217;s why you need a tool to calculate this, and SQL Server 2019 can make this kind of calculation.<\/p>\n<h2>Calculating the Path<\/h2>\n<p>When you think about a function to calculate the shortest path between two points, you may think that it will be a simple function. To make this calculation work, however, you need way more than a simple function. You must establish paths among the graph data. The data has many paths, and each one has many nodes with a beginning and an end. Each path is a group of nodes, in some ways like the group by function.<\/p>\n<p>One similarity, for example, is that you can&#8217;t read the column directly from the path. You need to apply aggregation functions to the set of nodes that are part of the path to read the information.<\/p>\n<p>The syntax to build this query resembles the <code>GROUP BY<\/code> clause, requiring you to use aggregate functions to make calculations on every (shortest) path of nodes, getting grouped results.<\/p>\n<p>To make this simpler, start to build the query over the model piece by piece. First, the <code>From<\/code> clause<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">FROM\u00a0\r\n\u00a0\u00a0\u00a0\u00a0ForumMembers\u00a0P1,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0ForumMembers\u00a0FOR\u00a0PATH\u00a0as\u00a0P2,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0Likes\u00a0FOR\u00a0PATH\u00a0as\u00a0IPO<\/pre>\n<p>The <code>FOR<\/code> <code>PATH<\/code> expression in the edge and node tables indicates these tables will be used to calculate paths, a grouping, in other words. Over the columns of these particular tables you can only apply aggregation functions; you can&#8217;t retrieve the columns directly.<\/p>\n<p>You may have noticed the node table appears twice in the query, one using the <code>FOR<\/code> <code>PATH<\/code> and another without using the <code>FOR<\/code> <code>PATH<\/code>. Since you can&#8217;t retrieve columns directly from the <code>FOR<\/code> <code>PATH<\/code> tables, you can include the nodes table twice to retrieve individual values of the node columns, usually for the start node of the path.<\/p>\n<p>Now analyse the list of columns and expressions in the <code>SELECT<\/code> list:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT\u00a0\r\n\u00a0\u00a0\u00a0\u00a0P1.MemberID,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0P1.MemberName,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0STRING_AGG(P2.MemberName,\r\n        '-&gt;')\u00a0WITHIN\u00a0GROUP\u00a0(GRAPH\u00a0PATH)\u00a0AS\u00a0[MemberName],\u00a0\r\n\u00a0\u00a0\u00a0\u00a0LAST_VALUE(P2.MemberName)\u00a0WITHIN\u00a0GROUP\u00a0(GRAPH\u00a0PATH)\u00a0\r\n        AS\u00a0FinalMemberName,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0COUNT(P2.MemberId)\u00a0WITHIN\u00a0GROUP\u00a0(GRAPH\u00a0PATH)\u00a0AS\u00a0Levels<\/pre>\n<p>The first two columns come from the P1 table, which is not marked as <code>FOR<\/code> <code>PATH<\/code>. The <em>MemberID<\/em> and <em>MemberName<\/em> columns are from the first node of the path. The <code>WHERE<\/code> clause will also expose this.<\/p>\n<p>The other three expressions use aggregate functions. They use the following functions:<\/p>\n<p><code>Count<\/code><strong>:<\/strong> is a well know aggregate function<\/p>\n<p><code>STRING_AGG<\/code><strong>:<\/strong> was introduced in a recent version of SQL Server and can be used to concatenate string values<\/p>\n<p><code>LAST_VALUE<\/code><strong>:<\/strong> is a windowing function, but it can be used with any kind of aggregation, including a graph path.<\/p>\n<p>Besides each aggregation function, you have the <code>WITHIN<\/code> <code>GROUP<\/code> <code>(GRAPH PATH)<\/code> statement, a special statement created for the grouping generated by the <code>shortest_path<\/code> function.<\/p>\n<p>Finally, here\u2019s the <code>WHERE<\/code> clause:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">WHERE\u00a0MATCH(SHORTEST_PATH(P1(-(IPO)-&gt;P2)+))<\/pre>\n<p>It&#8217;s like a regular <code>MATCH<\/code> clause that you already know about, but using the new function, <code>SHORTEST_PATH<\/code>. The first table, which is not part of the grouping, is related to the edge and node tables, which is part of the grouping. The edge and second node tables appear between parenthesis. The <code>SHORTEST_PATH<\/code> function understands this as an instruction to use recursion, creating the groups.<\/p>\n<p>The &#8216;<code>+<\/code>&#8216; symbol indicates that you would like information about the entire path between each member of P1 and P2, without limiting the number of hops<\/p>\n<p>Here\u2019s the final query:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">SELECT\u00a0\r\n\u00a0\u00a0\u00a0\u00a0P1.MemberID,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0P1.MemberName,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0STRING_AGG(P2.MemberName,\r\n       '-&gt;')\u00a0WITHIN\u00a0GROUP\u00a0(GRAPH\u00a0PATH)\u00a0AS\u00a0[MemberName],\u00a0\r\n\u00a0\u00a0\u00a0\u00a0LAST_VALUE(P2.MemberName)\u00a0WITHIN\u00a0GROUP\u00a0(GRAPH\u00a0PATH)\u00a0\r\n       AS\u00a0FinalMemberName,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0COUNT(P2.MemberId)\u00a0WITHIN\u00a0GROUP\u00a0(GRAPH\u00a0PATH)\u00a0AS\u00a0Levels\u00a0\r\n\u00a0\u00a0FROM\u00a0\r\n\u00a0\u00a0\u00a0\u00a0ForumMembers\u00a0P1,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0ForumMembers\u00a0FOR\u00a0PATH\u00a0as\u00a0P2,\u00a0\r\n\u00a0\u00a0\u00a0\u00a0Likes\u00a0FOR\u00a0PATH\u00a0as\u00a0IPO\u00a0\r\n\u00a0\u00a0WHERE\u00a0MATCH(SHORTEST_PATH(P1(-(IPO)-&gt;P2)+));<\/pre>\n<p>The image below gives an idea of the result. You have the information about the first member of the path, you have the entire path (not including the first member) created by the function <code>STRING_AGG<\/code>, you have the last name of the path created by the <code>LAST_VALUE<\/code> function, and you also have the number of levels in the path, created by the function <code>COUNT<\/code><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"911\" height=\"301\" class=\"wp-image-85172\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/word-image-10.png\" \/><\/p>\n<p>You may be wondering about how Carl connects to Carl. This image shows the path:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"578\" class=\"wp-image-85173\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/word-image-11.png\" \/><\/p>\n<h2>Performance<\/h2>\n<p>The execution plan is big. There is no doubt that you will need to take care when using it in large environments. The full execution plan doesn&#8217;t fit here, but you may notice by the piece below that the plan makes extensive use of <code>tempdb<\/code>.<\/p>\n<p>You may notice the following details in the execution plan:<\/p>\n<ul>\n<li>It starts with the edge table<\/li>\n<li>It creates three temporary tables from the edge table. You will see many table operations in the plan, but checking the names, you will notice there are only three<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1449\" height=\"467\" class=\"wp-image-85174\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-computer-description-automatica.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<ul>\n<li>It joins one of the temporary tables with the edge table, the nodes table and a second temporary table, aggregating the result and inserting in the temporary tables<\/li>\n<li>It has a <code>Sequence<\/code> operator, which then is joined with the node table to get the results<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1446\" height=\"455\" class=\"wp-image-85175\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-computer-description-automatica-1.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>Filtering<\/h2>\n<p>By adding one more predicate, you can view just the paths starting from one forum member:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT \r\n   P1.MemberID, \r\n   P1.MemberName,\r\n   STRING_AGG(P2.MemberName,'-&gt;') WITHIN GROUP \r\n       (GRAPH PATH) AS [MemberName],\r\n   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) \r\n        AS FinalMemberName,\r\n   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels\r\nFROM ForumMembers P1,\r\n\t ForumMembers FOR PATH as P2,\r\n\t Likes FOR PATH as IPO\r\nWHERE MATCH(SHORTEST_PATH(P1(-(IPO)-&gt;P2)+))\r\n\t  AND p1.MemberID=7;<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"919\" height=\"241\" class=\"wp-image-85176\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-social-media-post-description-a-1.png\" alt=\"A screenshot of a social media post\n\nDescription automatically generated\" \/><\/p>\n<p>The execution plan changes. You may notice the following:<\/p>\n<ul>\n<li>The sequence is still the middle of the execution plan<\/li>\n<li>The plan starts by the node, not the edge<\/li>\n<li>After the sequence, the Query Optimizer is able to use an index for one of the nodes<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1448\" height=\"373\" class=\"wp-image-85177\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-video-game-description-automati.png\" alt=\"A screenshot of a video game\n\nDescription automatically generated\" \/><\/p>\n<ul>\n<li>There is a fourth temporary table, called <em>Source<\/em><\/li>\n<li>The number of paths before the sequence increases<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1449\" height=\"569\" class=\"wp-image-85178\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-cell-phone-description-automati-2.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>If you would like to see only the path between <em>Jonh<\/em> and <em>Steve<\/em>, you may be surprised that you have to separate the query into a CTE or subquery and filter the results. One reason is that a windowing function cannot be directly filtered. The <code>MATCH<\/code> statement doesn\u2019t offer a solution either, you can\u2019t filter by a P2 field, for example, because it\u2019s marked as <code>FOR<\/code> <code>PATH<\/code>.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">WITH qry AS (\r\n  SELECT \r\n   P1.MemberID, \r\n   P1.MemberName,\r\n   STRING_AGG(P2.MemberName,'-&gt;') WITHIN GROUP (GRAPH PATH) \r\n     AS [Path],\r\n   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) \r\n     AS FinalMemberName,\r\n   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels\r\n  FROM\r\n   ForumMembers P1,\r\n   ForumMembers FOR PATH as P2,\r\n   Likes FOR PATH as IPO\r\n  WHERE MATCH(SHORTEST_PATH(P1(-(IPO)-&gt;P2)+))\r\n   and p1.MemberID=7)\r\nSELECT * FROM qry \r\nWHERE FinalMemberName='Steve';<\/pre>\n<p>It&#8217;s not difficult to predict: The query plan is bad. You are calculating the path from <em>Jonh<\/em> to all other members and only after this does the filter kick in to get the path between <em>Jonh<\/em> and <em>Steve<\/em>. In the image below you may notice the filter for <em>Steve<\/em> only after the sequence, after calculating all the paths from <em>Jonh<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1447\" height=\"555\" class=\"wp-image-85179\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/09\/a-screenshot-of-a-cell-phone-description-automati-3.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>You may have already noticed the + symbol in the <code>Match<\/code> clause. It means you allow an unlimited number of hops. However, you can use a slightly different syntax to find all the forum members that are, for example, up to two hops away from others:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT \r\n   P1.MemberID, \r\n   P1.MemberName,\r\n   STRING_AGG(P2.MemberName,'-&gt;') WITHIN GROUP (GRAPH PATH) \r\n      AS [MemberName],\r\n   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) \r\n      AS FinalMemberName,\r\n   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels\r\nFROM\r\n   ForumMembers P1,\r\n   ForumMembers FOR PATH as P2,\r\n   Likes FOR PATH as IPO\r\nWHERE MATCH(SHORTEST_PATH(P1(-(IPO)-&gt;P2){1,2}));<\/pre>\n<p>However, the start needs always to be 1. If you want to see the people that are an exact number of hops from other, once again you will need to filter the result of the query:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">WITH qry as\r\n(\r\n  SELECT \r\n   P1.MemberID, \r\n   P1.MemberName,\r\n   STRING_AGG(P2.MemberName,'-&gt;') WITHIN GROUP (GRAPH PATH) AS [path],\r\n   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) AS FinalMemberName,\r\n   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels\r\n  FROM\r\n   ForumMembers P1,\r\n   ForumMembers FOR PATH as P2,\r\n   Likes FOR PATH as IPO\r\n  WHERE MATCH(SHORTEST_PATH(P1(-(IPO)-&gt;P2){1,2}))\r\n\t)\r\nSELECT * FROM qry WHERE levels=2;<\/pre>\n<p>The <code>shortest_path<\/code> function is a great new feature for the <strong>SQL Server<\/strong> graph database, but being unable to filter the end node or the exact number of hops without performing the entire calculation and only then filter the result is still a problem for query performance.<\/p>\n<h2>Indexing<\/h2>\n<p>You can create indexes on the pseudo-columns of the edge and nodes. Considering the execution plans you saw, you can create the following clustered index for the edge:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE CLUSTERED INDEX indLikes ON likes($from_id,$to_id);<\/pre>\n<p>For the nodes, on the other hand, the index didn&#8217;t help so much. If you create a clustered index, it will result in a scan operation. If you create a non-clustered index, you would need to include all the columns involved in the operation. It may sound a bit strange, but one of the columns is the <code>graph_id<\/code>. There is no pseudo-column for the <code>graph_id<\/code> in the nodes table. There is one function to retrieve the <code>graph_id, GRAPH_ID_FROM_NODE_ID, <\/code> but you can&#8217;t use this function to create a computed column. This instruction bellow, for example, will fail:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">ALTER TABLE forumMembers ADD graph_id \r\nAS (GRAPH_ID_FROM_NODE_ID($node_id))\r\nPERSISTED;<\/pre>\n<p>On the other hand, for the queries where you are filtering for the start member, a non-clustered index on the member id helps:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE NONCLUSTERED INDEX indMembers ON ForumMembers(memberid);<\/pre>\n<p>There is not much news about the more complex situations filtering by levels or end node: It&#8217;s an additional filter after all the calculations of <code>shortest_path.<\/code><\/p>\n<h2>Conclusion<\/h2>\n<p>The function <code>shortest_path<\/code> is an excellent addition to the graph database features; however, the two limitations it has may create heavy queries in the environment:<\/p>\n<ul>\n<li>You can&#8217;t specify an end node.<\/li>\n<li>You can&#8217;t specify a start number of hops which would allow selecting an exact number of hops with same start and end.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>With SQL Server 2019 on the way, it\u2019s time to begin learning about all the new features. In this article, Dennes Torres demonstrates how to use the new SHORTEST_PATH function that is part of the Graph Database feature. &hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143531],"tags":[95509],"coauthors":[6810],"class_list":["post-85165","post","type-post","status-publish","format-standard","hentry","category-featured","category-t-sql-programming-sql-server","tag-standardize"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/85165","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=85165"}],"version-history":[{"count":10,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/85165\/revisions"}],"predecessor-version":[{"id":85360,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/85165\/revisions\/85360"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=85165"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=85165"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=85165"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=85165"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}