{"id":104550,"date":"2024-11-13T02:03:51","date_gmt":"2024-11-13T02:03:51","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=104550"},"modified":"2024-11-13T02:03:52","modified_gmt":"2024-11-13T02:03:52","slug":"kql-the-top-of-a-group-in-an-easy-way","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/kql-the-top-of-a-group-in-an-easy-way\/","title":{"rendered":"KQL: The top of a group in an easy way"},"content":{"rendered":"<p>When learning something new, we always compare it with what we know. In this case, we end up comparing KQL with SQL.<\/p>\n<p>In SQL, when we need to get the top rows based on a grouping, the process is not easy. For example, let&#8217;s say that in a table containing taxi rides, we want to retrieve the record of the ride with the highest fare on each day.<\/p>\n<p>There are multiple ways to do this, none is too easy. One of the methods is to create a <strong>row_number<\/strong> based on the day. This can be achieved using what&#8217;s called in SQL as window function. In this case <strong>row_number<\/strong> with a stablished window in the result based on the date.<\/p>\n<p>After creating the <strong>row_number<\/strong>, we select every row where the <strong>row_number<\/strong> is 1: The topmost row for each date.<\/p>\n<p><strong>Here comes the good news:<\/strong> In Kusto we can achieve this in a way easier way<\/p>\n<h2>The ARG_MAX KQL function<\/h2>\n<p>The documentation of this function is quite complex. In summary, this function can bring the records with the topmost value of one field.<\/p>\n<p>This can be used together a Summary statement or not and it will create different combinations of result.<\/p>\n<p>As an example, I will use the RIDES table from the New York taxi ride. This can be used in real time samples inside Fabric.<\/p>\n<h2>No Grouping<\/h2>\n<p>Using the <strong>ARG_MAX<\/strong> with no grouping means bringing the topmost record from the table based in one of the fields. Let&#8217;s consider the field <strong>FARE_AMOUNT<\/strong>.<\/p>\n<p>The query will be like this:<\/p>\n<pre class=\"lang:tsql decode:true \">rides\n| summarize arg_max(fare_amount,*)<\/pre>\n<p>The &#8216;*&#8217; means we want all the fields from the table returned. The <strong>FARE_AMOUNT<\/strong> specifies we want only the record with the topmost fare amount. The result will be a single record.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1072\" height=\"118\" class=\"wp-image-104551\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104550-1.png\" \/><\/p>\n<h2>Single KQL Grouping<\/h2>\n<p>Let&#8217;s consider we want the record with the topmost fare for each day. In this scenario, we can use the &#8220;BY&#8221; clause together with the <strong>SUMMARIZE<\/strong> to achieve this result.<\/p>\n<pre class=\"lang:tsql decode:true \">rides\n| extend rideDate=startofday(tpep_pickup_datetime)\n| summarize arg_max(fare_amount,*) by rideDate\n| order by rideDate<\/pre>\n<p>On the image below, for each day, we get the row with the highest fare amount.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"221\" class=\"wp-image-104552\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-3.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>Multiple Grouping<\/h2>\n<p>Let&#8217;s consider a more complex scenario. For each day and each pickup location, we want to discover what&#8217;s the drop off location with the highest number of rides.<\/p>\n<p>In other words, on each day, from one neighborhood, what&#8217;s the destination with the highest number of rides?<\/p>\n<p>In this scenario we have 3 groupings: Date, Pick Up Location and drop off location. We want to get the topmost count according to the 3 and get the Drop off location with this value for each combination of the first 2.<\/p>\n<p>The query will be like this:<\/p>\n<pre class=\"lang:tsql decode:true \">rides\n| extend rideDate=startofday(tpep_pickup_datetime)\n| summarize total=count() by rideDate, PULocationID, DOLocationID\n| summarize arg_max(total, DOLocationID) by rideDate, PULocationID\n| order by PULocationID, rideDate<\/pre>\n<p>After calculating the <strong>COUNT<\/strong>, we use the <strong>arg_max<\/strong> together with the summarize to get the DOLocationID with maximum total according to the Date and Pickup location.<\/p>\n<p>The image below illustrates how we get the Drop Off location with the highest number of rides for each Day and Pickup Location combination<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"546\" height=\"415\" class=\"wp-image-104553\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-data-description-automatically.png\" alt=\"A screenshot of a data\n\nDescription automatically generated\" \/><\/p>\n<h2>Summary<\/h2>\n<p>The documentation doesn&#8217;t help too much, but this function makes complex analysis in KQL way easier than it would be in SQL.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When learning something new, we always compare it with what we know. In this case, we end up comparing KQL with SQL. In SQL, when we need to get the top rows based on a grouping, the process is not easy. For example, let&#8217;s say that in a table containing taxi rides, we want to&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":104555,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159160,159164],"tags":[],"coauthors":[6810],"class_list":["post-104550","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blogs","category-business-intelligence","category-microsoft-fabric"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104550","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=104550"}],"version-history":[{"count":2,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104550\/revisions"}],"predecessor-version":[{"id":104556,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104550\/revisions\/104556"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/104555"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=104550"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=104550"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=104550"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=104550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}