{"id":106032,"date":"2025-03-26T19:45:00","date_gmt":"2025-03-26T19:45:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=106032"},"modified":"2025-03-26T20:12:10","modified_gmt":"2025-03-26T20:12:10","slug":"fabric-query-a-sql-endpoint-from-a-notebook","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/fabric-query-a-sql-endpoint-from-a-notebook\/","title":{"rendered":"Fabric: Query a SQL Endpoint from a Notebook"},"content":{"rendered":"<p>Let&#8217;s analyse why we would like to query an SQL Endpoint. Once we understand why, we can dig into how to make a query to a SQL Endpoint.<\/p>\n<p>We use notebooks to connect directly to lakehouse. Except by the T-SQL notebook, the notebooks have a default lakehouse and work directly with it from spark. However, accessing other data object may be more complex<\/p>\n<h2>Why execute a Query on a SQL Endpoint from a Notebook<\/h2>\n<p>The reasons to query a SQL Endpoint from a notebook are not very usual, but there are many.<\/p>\n<ul>\n<li>Most data objects have a SQL Endpoint: Data Warehouse, Fabric Database, mirrored database. If you need to access data from these objects, you will need to query the SQL Endpoint.<\/li>\n<li>I wrote before about a <a href=\"https:\/\/www.linkedin.com\/pulse\/sql-endpoint-secrets-you-need-know-dtower-software-3soxf\/\" target=\"_self\" rel=\"noopener\">delay in SQL Endpoints to reflect lakehouse data<\/a>. You may would like to query the SQL Endpoint to analyse this.<\/li>\n<li>There are some queries and features we can use through SQL Endpoints but we can&#8217;t execute directly in a lakehouse.<\/li>\n<\/ul>\n<p>These are only a few examples I identified. Let&#8217;s talk in the comments about what other scenarios may create this need.<\/p>\n<h2>Querying the SQL Endpoint: The method to use<\/h2>\n<p>A SQL Endpoint uses TDS, the same connection protocol as SQL Server and other Microsoft SQL flavours. Most software development languages are capable to make a connection to a Microsoft SQL flavour.<\/p>\n<p>The notebooks use powerful languages, such as pySpark and Scala. In this way, there are multiple methods to make the connection to a SQL Endpoint. For example, we can use ODBC, JDBC or many other connection options.<\/p>\n<p>However, I was looking for a more &#8220;Fabric native&#8221; method. There is one. It&#8217;s a bit strange, but at the moment, is the most &#8220;Fabric native&#8221; method.<\/p>\n<p>This method requires us to use libraries only available to Scala language. In this way, the notebook will need to be in Scala.<\/p>\n<h2>Querying the SQL Endpoint in Scala<\/h2>\n<p>This is the basic code to query a SQL Endpoint using Scala:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\" class=\"crayon:false\"><span style=\"color: #008800; font-weight: bold;\">import<\/span> <span style=\"color: #0e84b5; font-weight: bold;\">com.microsoft.spark.fabric.tds.implicits.read.FabricSparkTDSImplicits._<\/span>\n<span style=\"color: #008800; font-weight: bold;\">import<\/span> <span style=\"color: #0e84b5; font-weight: bold;\">com.microsoft.spark.fabric.Constants<\/span>\n\n<span style=\"color: #008800; font-weight: bold;\">val<\/span> df <span style=\"color: #008800; font-weight: bold;\">=<\/span> spark<span style=\"color: #333333;\">.<\/span>read<span style=\"color: #333333;\">.<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"color: #bb0066; font-weight: bold;\">Constants<\/span><span style=\"color: #333333;\">.<\/span><span style=\"color: #bb0066; font-weight: bold;\">WorkspaceId<\/span><span style=\"color: #333333;\">,<\/span> datausageWorkspaceId<span style=\"color: #333333;\">).<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"color: #bb0066; font-weight: bold;\">Constants<\/span><span style=\"color: #333333;\">.<\/span><span style=\"color: #bb0066; font-weight: bold;\">DatabaseName<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"color: #bb0066; font-weight: bold;\">SQLEndpoint_name<\/span><span style=\"color: #333333;\">).<\/span>synapsesql<span style=\"color: #333333;\">(<\/span>qry<span style=\"color: #333333;\">.<\/span>query<span style=\"color: #333333;\">)<\/span>\n<\/pre>\n<\/div>\n<p>These are the details about this code:<\/p>\n<ul>\n<li>The two imports only work in Scala<\/li>\n<li>The two constants contain strings with the option name we need to use for the read operation<\/li>\n<li>We need to have the Workspace Id and SQL Endpoint name<\/li>\n<li>The Workspace is considered the server and the SQL Endpoint is considered the database<\/li>\n<li>At the end we execute the query<\/li>\n<\/ul>\n<h2>Interaction between pySpark and Scala<\/h2>\n<p>My entire environment uses pySpark, while this solution requires Scala.<\/p>\n<p>In some situations, we may need to create an interaction between the pySpark environment and Scala environment.<\/p>\n<p>What we can do:<\/p>\n<ul>\n<li>The parameters cell can be in Scala and the notebook can be called by pySpark notebooks. No problem about this.<\/li>\n<li>The Scala cells can call pySpark notebooks.<\/li>\n<li>By &#8220;call&#8221; this means either &#8216;%Run&#8217; or &#8216;notebook.run&#8217;<\/li>\n<\/ul>\n<p>What we can&#8217;t do:<\/p>\n<ul>\n<li>Scala cells can&#8217;t access pySpark variables, neither the opposite<\/li>\n<\/ul>\n<h2>Exchanging values between languages<\/h2>\n<p>You may need to exchange values between the two languages from time to time. It depends on each scenario.<\/p>\n<p>The solution for this is to use spark configurations. In one language you can set a configuration with the the variable value and read the configuration in the other language to retrieve the value.<\/p>\n<p>Example &#8211; Set the configuration in pySpark:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\" class=\"crayon:false\">spark<span style=\"color: #333333;\">.<\/span>conf<span style=\"color: #333333;\">.<\/span>set(<span style=\"background-color: #fff0f0;\">\"lakemirrorSQLEndpointId\"<\/span>,lakemirrorSQLEndpointId)\n<\/pre>\n<\/div>\n<p>Retrieve the configuration in Scala:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\" class=\"crayon:false\"><span style=\"color: #008800; font-weight: bold;\">val<\/span> lakemirrorSQLEndpointId<span style=\"color: #008800; font-weight: bold;\">=<\/span>spark<span style=\"color: #333333;\">.<\/span>conf<span style=\"color: #333333;\">.<\/span>get<span style=\"color: #333333;\">(<\/span><span style=\"background-color: #fff0f0;\">\"lakemirrorSQLEndpointId\"<\/span><span style=\"color: #333333;\">)<\/span>\n<\/pre>\n<\/div>\n<h2>Exceptions to the basic syntax<\/h2>\n<p>The &#8220;synapsesql&#8221; method doesn&#8217;t accept the same syntaxes supported by a SQL Endpoint.<\/p>\n<p>These syntax may be perfect and acceptable by the SQL Endpoint, but &#8220;synapsesql&#8221; will reject them. It doesn&#8217;t help the error message being completely meaningless, taking us in the wrong directions.<\/p>\n<h2>Another syntax option: Breaking the query in two<\/h2>\n<p>The queries can be broken down in two parts. This is only one of the ways to solve the problem. Let&#8217;s analyse how this syntax works<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\" class=\"crayon:false\"><span style=\"color: #008800; font-weight: bold;\">val<\/span> df <span style=\"color: #008800; font-weight: bold;\">=<\/span> spark<span style=\"color: #333333;\">.<\/span>read<span style=\"color: #333333;\">.<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"color: #bb0066; font-weight: bold;\">Constants<\/span><span style=\"color: #333333;\">.<\/span><span style=\"color: #bb0066; font-weight: bold;\">WorkspaceId<\/span><span style=\"color: #333333;\">,<\/span> wksId<span style=\"color: #333333;\">).<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"color: #bb0066; font-weight: bold;\">Constants<\/span><span style=\"color: #333333;\">.<\/span><span style=\"color: #bb0066; font-weight: bold;\">DatabaseName<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"color: #bb0066; font-weight: bold;\">EndPointName<\/span><span style=\"color: #333333;\">).<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"background-color: #fff0f0;\">\"prepareQuery\"<\/span><span style=\"color: #333333;\">,<\/span> firstpart<span style=\"color: #333333;\">).<\/span>synapsesql<span style=\"color: #333333;\">(<\/span>secondpart<span style=\"color: #333333;\">)<\/span>\n<\/pre>\n<\/div>\n<p>This first option breaks the query into two parts. The first part is sent together the &#8220;option&#8221; method, the second part is sent using the &#8220;synapsesql&#8221; method.<\/p>\n<p>The relation between one query and another is made using CTE&#8217;s. Other options would be way more expensive, such as temporary views. The solution using CTE&#8217;s is like the code below:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\" class=\"crayon:false\"><span style=\"color: #333333;\">%%<\/span>spark\n\n<span style=\"color: #008800; font-weight: bold;\">import<\/span> <span style=\"color: #0e84b5; font-weight: bold;\">com.microsoft.spark.fabric.tds.implicits.read.FabricSparkTDSImplicits._<\/span>\n<span style=\"color: #008800; font-weight: bold;\">import<\/span> <span style=\"color: #0e84b5; font-weight: bold;\">com.microsoft.spark.fabric.Constants<\/span>\n\n<span style=\"color: #008800; font-weight: bold;\">val<\/span> df <span style=\"color: #008800; font-weight: bold;\">=<\/span> spark<span style=\"color: #333333;\">.<\/span>read<span style=\"color: #333333;\">.<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"color: #bb0066; font-weight: bold;\">Constants<\/span><span style=\"color: #333333;\">.<\/span><span style=\"color: #bb0066; font-weight: bold;\">WorkspaceId<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"bb457c73-bf10-4f53-8933-8c409192747a\"<\/span><span style=\"color: #333333;\">)<\/span>\n<span style=\"color: #333333;\">.<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"color: #bb0066; font-weight: bold;\">Constants<\/span><span style=\"color: #333333;\">.<\/span><span style=\"color: #bb0066; font-weight: bold;\">DatabaseName<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"interlake\"<\/span><span style=\"color: #333333;\">)<\/span>\n<span style=\"color: #333333;\">.<\/span>option<span style=\"color: #333333;\">(<\/span><span style=\"background-color: #fff0f0;\">\"prepareQuery\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"with qry as (select * from fact_sale_1y_full where CustomerKey=204)\"<\/span><span style=\"color: #333333;\">)<\/span>\n<span style=\"color: #333333;\">.<\/span>synapsesql<span style=\"color: #333333;\">(<\/span><span style=\"background-color: #fff0f0;\">\"select count(*) as total from qry\"<\/span><span style=\"color: #333333;\">)<\/span>\n\ndisplay<span style=\"color: #333333;\">(<\/span>df<span style=\"color: #333333;\">)<\/span>\n<\/pre>\n<\/div>\n<h2>PrepareQuery and SynapseSQL difference<\/h2>\n<p><strong>SynapseSQL<\/strong> method is processed by spark. Spark tries to optimize the query and make additional checks. In this way, some native syntaxes are not supported in synapsesql.<\/p>\n<p>The <strong>prepareQuery<\/strong>, on the other hand, sends the query &#8220;as is&#8221; to the target. In this way, the query is not affected by any spark processing.<\/p>\n<p>There are many syntaxes which can cause this problem and require to be broken down in CTE&#8217;s and sent in two pieces:<\/p>\n<ul>\n<li>Common Table Expressions (CTEs)<\/li>\n<li>Deeply nested queries<\/li>\n<li>Window functions<\/li>\n<li>Custom SQL Functions<\/li>\n<li>Dynamic SQL<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>After analysing some reasons that would require executing a query on a SQL Endpoint from a notebook, we analysed an interesting method to do this.<\/p>\n<p>This method still has limitations in relation to the language, but it&#8217;s interesting to notice how it uses Fabric libraries directly.<\/p>\n<p>This is far from be the only method to execute these queries, depending how the driver used to connect to the SQL endpoint.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let&#8217;s analyse why we would like to query an SQL Endpoint. Once we understand why, we can dig into how to make a query to a SQL Endpoint. We use notebooks to connect directly to lakehouse. Except by the T-SQL notebook, the notebooks have a default lakehouse and work directly with it from spark. However,&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":106041,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159164],"tags":[158997,159146,159288,159289],"coauthors":[6810],"class_list":["post-106032","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blogs","category-microsoft-fabric","tag-microsoft-fabric","tag-pyspark","tag-scala","tag-sqlendpoint"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/106032","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=106032"}],"version-history":[{"count":2,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/106032\/revisions"}],"predecessor-version":[{"id":106043,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/106032\/revisions\/106043"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/106041"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=106032"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=106032"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=106032"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=106032"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}