{"id":91869,"date":"2021-08-09T17:00:30","date_gmt":"2021-08-09T17:00:30","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=91869"},"modified":"2021-08-09T20:21:51","modified_gmt":"2021-08-09T20:21:51","slug":"creating-kusto-sub-queries","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/creating-kusto-sub-queries\/","title":{"rendered":"Creating Kusto sub-queries"},"content":{"rendered":"<p>In the same way as other query environments, <strong>Kusto<\/strong> queries in <strong>Log Anaytics<\/strong> can become complex. We need similar features in <strong>Kusto<\/strong> as we have in <strong>SQL Queries<\/strong> and one of these features is sub-queries.<\/p>\n<h2>The Problem<\/h2>\n<p style=\"padding-left: 30px\">On the example below I&#8217;m building a query over my blog&#8217;s <strong>Log Analytics<\/strong> Data to identify the amount of access to my blog.<\/p>\n<p style=\"padding-left: 30px\"><strong>Log Analytics<\/strong> register the IP&#8217;s of the users making access to the blog. Web Site Access 101: When analyzing the access to a website, we have requests and sessions. Every time a user access a page, this is a request. While navigating the site, the user can access many pages, this is a navigation session.<\/p>\n<p style=\"padding-left: 30px\">The problem is that we usually have some outliers that increase a lot the number of requests. These usually are indexing tools requesting our pages to index and these indexing tools hide the real number of user requests our site receives, mixing it with indexing requests.<\/p>\n<p style=\"padding-left: 30px\">In order to make calculations with the real number of requests we need to discover the outliers IP addresses and remove them from the calculation. It&#8217;s basically two queries in one: Find the outliers and make the calculation removing the outliers. In other words, a sub-query.<\/p>\n<h2>Building the sub-query<\/h2>\n<p style=\"padding-left: 30px\">In <strong>Kusto,<\/strong> sub-queries have some similarities with <strong>CTEs<\/strong>: We use the statement <strong>LET<\/strong> to define a name for a sub-query. After that, we can user this query by name on our main query. As you may be imagining, we can create as many sub-queries as we would like in a single <strong>Kusto<\/strong> query.<\/p>\n<p style=\"padding-left: 30px\">The rule to find outliers is a choice in each case. In my example, I will consider an outlier any IP address with more than 100 requests in a single day. Let&#8217;s recover the list of these IP addresses:<\/p>\n<p style=\"padding-left: 30px\">let outliers=<br \/>\n\u00a0\u00a0\u00a0 AppServiceHTTPLogs<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">summarize<\/span> Total=<span style=\"color: #0000ff\">count<\/span>() by CIp,bin(TimeGenerated,1d)<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">where<\/span> Total &gt; 100<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">project<\/span> CIp;<\/p>\n<p style=\"padding-left: 30px\">Most of the details of this sub-query are just some <strong>Kusto<\/strong> syntax rules:<\/p>\n<p style=\"padding-left: 60px\">1) The query is called outliers<br \/>\n2) We are totaling the calls by Ip in a 1 day interval. The bin statement establishes the time-frame<br \/>\n3) Any Ip with a total of more than 100 requests will be listed<br \/>\n4) The query needs to finish with a semi-colon because it&#8217;s a sub-query<\/p>\n<h2>Using the sub-query<\/h2>\n<p style=\"padding-left: 30px\">\nLet&#8217;s use our sub-query in a second query. This second query is totaling the requests by day of the week:<\/p>\n<p style=\"padding-left: 30px\">AppServiceHTTPLogs<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">where<\/span> CIp !in (outliers)<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">summarize<\/span> Total=<span style=\"color: #0000ff\">count<\/span>() <span style=\"color: #0000ff\">by<\/span> dayofweek(TimeGenerated),Week=bin(TimeGenerated,7d)<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">project<\/span> Day=format_timespan(Column1,&#8217;d&#8217;),Week,Total<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">order by<\/span> Day asc<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">project<\/span> Day,Week,Total<\/p>\n<p style=\"padding-left: 60px\">1) The first step is to exclude the outliers IPs from the calculation<br \/>\n2) The summary is using <em>dayofweek<\/em> <strong>Kusto<\/strong> function and the bin as usual, but providing a field name for the bin result<br \/>\n3) The <em>dayofweek<\/em> function returns a time span, we still need to format it using <em>format_timespan<\/em> function. Since we didn&#8217;t provide a name for the <em>dayofweek<\/em> result on the summarize, it&#8217;s called <em>Column1<\/em><br \/>\n4) Finally we order and get the final fields<\/p>\n<p>&nbsp;<\/p>\n<h2>The Final Query<\/h2>\n<p style=\"padding-left: 30px\">\nThese two queries need to be used together, one is a sub-query for the other (or <strong>CTE<\/strong>, if you prefer). The final result is this:<\/p>\n<p style=\"padding-left: 30px\">let outliers=<br \/>\n\u00a0\u00a0\u00a0 AppServiceHTTPLogs<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">summarize<\/span> Total=<span style=\"color: #0000ff\">count<\/span>() by CIp,bin(TimeGenerated,1d)<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">where<\/span> Total &gt; 100<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">project<\/span> CIp;<br \/>\nAppServiceHTTPLogs<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">where<\/span> CIp !in (outliers)<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">summarize<\/span> Total=<span style=\"color: #0000ff\">count<\/span>() <span style=\"color: #0000ff\">by<\/span> dayofweek(TimeGenerated),Week=bin(TimeGenerated,7d)<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">project<\/span> Day=format_timespan(Column1,&#8217;d&#8217;),Week,Total<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">order by<\/span> Day asc<br \/>\n\u00a0\u00a0\u00a0 |<span style=\"color: #0000ff\">project<\/span> Day,Week,Total<\/p>\n<h2>References<\/h2>\n<p style=\"padding-left: 30px\"><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-explorer\/kusto\/query\/?WT.mc_id=DP-MVP-4014132\">Overview of Kusto Queries<\/a><\/p>\n<p style=\"padding-left: 30px\"><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-explorer\/kusto\/query\/letstatement?WT.mc_id=DP-MVP-4014132\">Kusto LET Statement<\/a><\/p>\n<h2>Conclusion<\/h2>\n<p style=\"padding-left: 30px\">The <strong>KQL<\/strong> language is very powerful and flexible to support our needs<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the same way as other query environments, Kusto queries in Log Anaytics can become complex. We need similar features in Kusto as we have in SQL Queries and one of these features is sub-queries. The Problem On the example below I&#8217;m building a query over my blog&#8217;s Log Analytics Data to identify the amount&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":92096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[126250,126249,143562],"coauthors":[6810],"class_list":["post-91869","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blogs","tag-kql","tag-kusto","tag-log-analytics"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/91869","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=91869"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/91869\/revisions"}],"predecessor-version":[{"id":91892,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/91869\/revisions\/91892"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/92096"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=91869"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=91869"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=91869"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=91869"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}