{"id":97031,"date":"2023-06-19T20:55:05","date_gmt":"2023-06-19T20:55:05","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=97031"},"modified":"2023-06-02T20:56:36","modified_gmt":"2023-06-02T20:56:36","slug":"snowflake-a-cloud-warehouse-solution-for-analytics","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/snowflake\/snowflake-a-cloud-warehouse-solution-for-analytics\/","title":{"rendered":"Snowflake: A Cloud Warehouse Solution for Analytics"},"content":{"rendered":"<p>This article explains what Snowflake data warehouse is and how it is different from other traditional data warehouses. This article will cover how to create virtual warehouses and different types of tables, how to handle semi-structured data, how to profile queries, and the benefits of clustering keys. Practical demonstrations of the concepts explained will be provided throughout the article.<\/p>\n<p>Snowflake is a cloud-based solution only, with no on-premises version that you can purchase and install. For more information on cloud versus on on-premises, check <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/on-premises-vs-cloud\/\">my blog here<\/a>.<\/p>\n<h2>What Is Snowflake?<\/h2>\n<p>Snowflake is a cutting-edge cloud-based data warehousing platform that stands out for its ability to efficiently store and analyse vast volumes of data. What sets Snowflake apart is its innovative architecture, which effectively separates storage and compute operations. Unlike traditional database solutions like Teradata and Oracle, Snowflake eliminates the need for installing, configuring, and managing hardware or software. It operates entirely on public cloud infrastructure. However, what truly distinguishes Snowflake and makes it unparalleled is its exceptional architecture and robust data sharing capabilities.<\/p>\n<p>At its core, Snowflake Data Cloud is a fully managed, cloud-native solution that allows users to store and manage structured and semi-structured data from various sources in a central location. With Snowflake, organizations can scale up or down as needed to meet changing demand, without worrying about infrastructure or maintenance.<\/p>\n<p>Moreover, Snowflake is available on multiple cloud platforms, including AWS, Azure, and Google Cloud Platform, providing users with the flexibility to choose the cloud provider that best suits their needs. In traditional databases compute and storage resources are limited. In those databases, increases in storage and compute resources must be applied together, as they are tightly coupled.<\/p>\n<p>Snowflake solves the problem of tight coupling: customers can add compute and storage to their data warehouses independently from each other, and storage is virtually unlimited. You pay for what you use, and you don\u2019t need to manage any hardware or software, as it is all fully managed by snowflake.<\/p>\n<p>Snowflake is an incredibly versatile platform that excels in numerous data-related domains, including data warehousing, data engineering, data lakes, data science, and data application development. It seamlessly integrates with popular visualization tools like Tableau and Power BI, allowing users to create visually stunning and interactive dashboards. These dashboards provide a comprehensive view of data, enabling users to easily identify trends, patterns, and anomalies. With Snowflake&#8217;s integration capabilities, organizations can unlock the full potential of their data and make data-driven decisions with enhanced clarity and understanding.<\/p>\n<h2>How Can You Access Snowflake?<\/h2>\n<p>To access Snowflake, open your browser to <a href=\"https:\/\/www.snowflake.com\">https:\/\/www.snowflake.com<\/a>. To access a free 30-day trial, no credit card required, just provide your personal details, and choose a cloud provider, and your account will be ready within minutes. The free trial is a great option for initial learning. I won\u2019t go into more detail about the free trial setup here, but I can share more in the comments if anyone requires. However, it is very easy to set up.<\/p>\n<h2>What Is Unique About the Snowflake Architecture?<\/h2>\n<p>In this section I will share some of the ways that Snowflake&#8217;s architecture is unique (or certainly not common to most database management systems.<\/p>\n<p>The Snowflake architecture represents a new architecture that was designed and built from the ground up. It is considered a hybrid of traditional shared-disk and shared-nothing database architectures, and it does not copy any traditional database architecture.<\/p>\n<p>Figure 1 shows a high-level view of the Snowflake architecture.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1107\" height=\"704\" class=\"wp-image-97032\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/diagram-description-automatically-generated.png\" alt=\"Diagram\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 1: Architecture of snowflake<\/strong><\/p>\n<p>As shown in Figure 1, the unique Snowflake architecture consists of three key layers:<\/p>\n<h3>Cloud Services<\/h3>\n<p>This is the \u201cbrain\u201d of Snowflake where activities such as authentication, infrastructure management, metadata management, query parsing and optimization, and access control are all managed. These services tie together all the different components of Snowflake to process user requests, from login to query dispatch.<\/p>\n<p><strong>Query Processing<\/strong><\/p>\n<p>This layer is also known as Compute layer. Considered the \u201cbrawn\u201d of Snowflake, the Query Processing layer performs data manipulations, transformations, and queries (data fetching) requested by users. All this processing is done by Snowflake using virtual warehouses. <br \/>\nEach virtual warehouse is a massively parallel processing (MPP) compute resource cluster. Each virtual warehouse consists of multiple nodes with CPU and memory automatically provisioned in the Cloud by Snowflake.<\/p>\n<p>Here are some benefits of virtual warehouses:<\/p>\n<ul>\n<li>Comes in different sizes from extra small to 6XL and you can change the size of an existing virtual warehouse based upon your need. When you resize your virtual warehouse, running queries remain unaffected; only new queries submitted will use the new warehouse size.<\/li>\n<li>You only pay for the resources you utilize. If there are no workloads running, there are no costs associated with virtual warehouse compute. However, it&#8217;s important to note that you are charged for data storage regardless of usage. Can be started or stopped at any time. You can also auto-suspend, or auto-resume a Virtual Warehouse based upon a specific period of inactivity.<\/li>\n<li>Can be set to auto-scale. This means if more queries are launched on a Virtual Warehouse, increasing the load on it, it can automatically launch new clusters to accommodate the increased load and automatically scale down when the load decreases.<\/li>\n<\/ul>\n<h3>Database Storage Layer<\/h3>\n<p>Snowflake is a columnar database. When data is loaded into Snowflake, it is internally reorganized into an optimized, compressed, columnar format. Snowflake stores this optimized data in Cloud storage. Snowflake&#8217;s database storage layer is tasked with the responsibility of storing and managing data. It utilizes a scalable and elastic storage infrastructure, effectively decoupling compute, and storage operations. The data is organized into numerous micro-partitions, optimizing data access and query performance. Notably, Snowflake&#8217;s storage layer exhibits a high level of durability, automatically replicating data across multiple availability zones to ensure reliability. Moreover, it is designed to seamlessly handle extensive data volumes and supports a wide array of data formats, enabling organizations to efficiently store and analyse diverse data types.<\/p>\n<p>The internal representation of data objects is not accessible, nor is it visible, to customers. Snowflake manages all aspects of storage file organization, metadata compression, and data storage statistics. This ensures that users do not need to worry about data distribution across different micro partitions.<\/p>\n<p>As the storage layer is independent, customers need to pay only for the average monthly storage used. Since Snowflake is provisioned on the Cloud, storage is elastic and is charged as per terabyte (TB) of usage per month. Compute nodes connect with the Database Storage layer to fetch data for query processing. The main benefit of columnar databases store is it stores data in columns rather than rows, they are well-suited for analytical workloads that involve querying large amounts of data. Columnar databases can retrieve data more quickly than row-oriented databases because they only need to read the columns that are required for a particular query. This can result in significant performance improvements, particularly for complex queries that involve multiple joins and aggregations.<\/p>\n<h2>Creating a Virtual Warehouse<\/h2>\n<p>Now let\u2019s discuss how to create virtual warehouses. By default, when we create a Snowflake account it comes with a default warehouse named <code>COMPUTE_WH<\/code>. You can create new warehouses at any time using the <code>ACCOUNTADMIN<\/code> and <code>SYSADMIN<\/code> security roles in Snowflake.<\/p>\n<p>To create a new warehouse, log into the Snowflake web UI, go to the Warehouses tab and click the \u201c+ Warehouse\u201d button. Give your new warehouse a meaningful name. I have named my new warehouse <code>DEMO_WAREHOUSE<\/code> and kept the size as <code>X-Small<\/code>. You can also select <strong>Advanced Settings<\/strong> like <strong>Auto Resume<\/strong> and <strong>Auto Suspend<\/strong> from this <strong>New Warehouse<\/strong> dialog. Figure 2 shows the New Warehouse dialog.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1893\" height=\"838\" class=\"wp-image-97033\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/graphical-user-interface-application-description.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 2: how to create a virtual warehouse from snowflake UI.<\/strong><\/p>\n<p>A warehouse is created within a few seconds, giving us a glimpse of the power of Cloud computing and Snowflake. We did not purchase any hardware, performed no installation, and yet we now have a data warehouse up and running within minutes. The main advantage is we can change the compute power on the fly, have multiple warehouses\u2014like one for each department\u2014and only pay for the resources we use. Figure 3 shows the Warehouses summary screen after we create our warehouse.<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"1872\" height=\"848\" class=\"wp-image-97034\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/graphical-user-interface-application-table-exce.png\" alt=\"Graphical user interface, application, table, Excel\n\nDescription automatically generated\" \/><\/strong><\/p>\n<p><strong>Figure 3: Virtual warehouse window after creation.<\/strong><\/p>\n<p>You can also create warehouse using the Snowflake <code>SQL CREATE WAREHOUSE<\/code> statement, shown in Figure 4.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE WAREHOUSE DEMO_WAREHOUSE\r\nWITH WAREHOUSE_SIZE = XSMALL\r\n COMMENT = 'This is for demo purpose only.'\r\n  AUTO_SUSPEND = 300\r\n  WAREHOUSE_TYPE = STANDARD;<\/pre>\n<p>Execute this code and you should see:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"709\" height=\"94\" class=\"wp-image-97035\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/word-image-97031-4.png\" \/><\/p>\n<p><strong>Figure 4: SQL\u2019s and parameters for creating virtual warehouse.<\/strong><\/p>\n<p>You can find the full syntax and additional options related to virtual warehouse creation at the following link: <strong><a href=\"https:\/\/docs.snowflake.com\/en\/sql-reference\/sql\/create-warehouse.html\">https:\/\/docs.snowflake.com\/en\/sql-reference\/sql\/create-warehouse.html<\/a><\/strong><\/p>\n<p>The Web UI provides access to additional features like transferring ownership, roles, and many other warehouse management functions. In the Warehouse Activity tab, you can see individual query execution metrics like executed query text, query executor, and duration, etc. Figure 5 shows the Warehouse Activity screen.<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"1888\" height=\"888\" class=\"wp-image-97036\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/graphical-user-interface-application-table-desc.png\" alt=\"Graphical user interface, application, table\n\nDescription automatically generated\" \/><\/strong><\/p>\n<p><strong>Figure 5: Web UI demo for available information.<\/strong><\/p>\n<h3>Time travel and fail safe in Snowflake:<\/h3>\n<p>Time travel in Snowflake enables users to retrieve historical versions of data that were previously stored in the system. This feature allows users to query the data as it appeared at a specific moment in time, regardless of any subsequent changes or deletions that may have occurred.<\/p>\n<p>For more details on Snowflake time travel, see the following documentation:<\/p>\n<p><a href=\"https:\/\/docs.snowflake.com\/en\/user-guide\/data-time-travel\">https:\/\/docs.snowflake.com\/en\/user-guide\/data-time-travel<\/a><\/p>\n<p>Within the context of Snowflake, the term &#8220;fail-safe&#8221; pertains to the system&#8217;s capability to prevent data loss, corruption, or compromise in the event of a system failure or outage. The fail-safe mechanisms implemented in Snowflake are intended to uphold data integrity and to reduce the possibility of any downtimes or data loss.<\/p>\n<p>For more details on Snowflake fail-safe, see the following documentation:<\/p>\n<p><a href=\"https:\/\/docs.snowflake.com\/en\/user-guide\/data-failsafe\">https:\/\/docs.snowflake.com\/en\/user-guide\/data-failsafe<\/a><\/p>\n<h2>Snowflake Tables<\/h2>\n<p>In this section, I will cover some of characteristics of Snowflake tables to help you understand the basic architecture of how tables are built and utilized.<\/p>\n<h3>Table Types<\/h3>\n<p>Snowflake provides 3 different types of tables: Temporary, Transient, and Permanent.<\/p>\n<h4>Temporary Tables<\/h4>\n<p>Temporary tables are like permanent tables with a caveat that they only exist for the current session. A temporary table is only visible to the user who created it and cannot be accessed by anyone else. When the session ends, the data in the temporary table gets purged.<\/p>\n<p>The common use case for temporary tables is storing data, which is non-permanent and can be recreated, like aggregates, etc. Temporary tables serve as a staging area for data prior to undergoing transformations or being loaded into permanent tables. They are handy for storing intermediate results during complex data processing tasks and can improve query performance by materializing intermediate results.<\/p>\n<p>Figure 6 shows the creation of a temporary table.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE TEMPORARY TABLE DEMO_TEMP\r\n(\r\nNAME VARCHAR(100),\r\nEMPLOYER VARCHAR(100),\r\nID NUMBER\r\n); <\/pre>\n<p><strong>Figure 6: Code for creating a temporary table. <\/strong><\/p>\n<p>You can find the full syntax and additional options related to create table creation at the following link: <a href=\"https:\/\/docs.snowflake.com\/en\/sql-reference\/sql\/create-table\">https:\/\/docs.snowflake.com\/en\/sql-reference\/sql\/create-table<\/a>.<\/p>\n<h4>Transient Tables<\/h4>\n<p>Transient tables operate like permanent tables except for the fact that they do not have a fail-safe period associated with them. The maximum time travel retention period for transient tables is one day, while permanent tables in the Enterprise Edition or higher can retain data for up to 90 days. Data in the transient table is persisted until the table is explicitly dropped, so unlike a temporary table the transient table will remain even after the session it was created in ends.<\/p>\n<p>There is no fail-safe cost associated with it, but it does contribute to overall storage expenses for accounts.<\/p>\n<p>Figure 7 shows how to create a transient table.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE TRANSIENT TABLE DEMO_TEMP\r\n(\r\nNAME VARCHAR,\r\nEMPLOYER VARCHAR,\r\nID NUMBER\r\n);          <\/pre>\n<p><strong>Figure 7: Code to create a transient table. <\/strong><\/p>\n<h4>Permanent Tables<\/h4>\n<p>Permanent tables have a fail-safe period and provide an additional level of data recovery and protection. A permanent table is created when data must be stored for a longer period. Permanent tables persist until explicitly dropped. Figure 8 shows the creation of a permanent table.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE  TABLE DEMO_PERM\r\n(\r\nNAME VARCHAR(100),\r\nEMPLOYER VARCHAR(100),\r\nID NUMBER\r\n);<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Figure 8: Web UI creating a permanent table. <\/strong><\/p>\n<p>Inserting data into <code>DEMO_PERM<\/code> table:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">INSERT INTO DEMO_PERM \r\nVALUES ('Pankaj Gupta','Test Employer','1');\r\nINSERT INTO DEMO_PERM \r\nVALUES ('Ravi Jain','DataDog','2');<\/pre>\n<p>This will insert the following data as shown in Figure 9:<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"1313\" height=\"157\" class=\"wp-image-97037\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/word-image-97031-6.png\" \/><\/strong><\/p>\n<p><strong>Figure 9: Web UI showing inserted data. <\/strong><\/p>\n<p>Table 1 summarizes the differences between Snowflake table types and their impact on time travel and fail-safe functionality.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p><strong>Table Type<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Persistence<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Time\u00a0Travel\u00a0Retention\u00a0Period\u00a0(Days)<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Fail-safe\u00a0Period\u00a0(Days)<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Temporary<\/p>\n<\/td>\n<td>\n<p>Remainder of session<\/p>\n<\/td>\n<td>\n<p>0 or 1 (default is 1)<\/p>\n<\/td>\n<td>\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Transient<\/p>\n<\/td>\n<td>\n<p>Until explicitly dropped<\/p>\n<\/td>\n<td>\n<p>0 or 1 (default is 1)<\/p>\n<\/td>\n<td>\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><a href=\"https:\/\/docs.snowflake.com\/en\/user-guide\/intro-editions.html\">Permanent (Standard Edition)<\/a><\/p>\n<\/td>\n<td>\n<p>Until explicitly dropped<\/p>\n<\/td>\n<td>\n<p>0 or 1 (default is 1)<\/p>\n<\/td>\n<td>\n<p>7<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><a href=\"https:\/\/docs.snowflake.com\/en\/user-guide\/intro-editions.html\">Permanent (Enterprise Edition and higher)<\/a><\/p>\n<\/td>\n<td>\n<p>Until explicitly dropped<\/p>\n<\/td>\n<td>\n<p>0 to 90 (default is configurable)<\/p>\n<\/td>\n<td>\n<p>7<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Table 1. Differences Between Snowflake Table Types.<\/p>\n<h3>Clustering Keys and Query Performance<\/h3>\n<p>In this section we will consider clustering keys in Snowflake and their effect on query performance for large tables.<\/p>\n<p>The clustering key in Snowflake is a designated column or set of columns that dictates the physical storage order of data within a table. It plays a crucial role in determining how data is structured and grouped together on disk, leading to notable effects on query performance and the efficiency of data retrieval. By carefully selecting the clustering key, you can optimize the organization of data on disk, resulting in improved query performance and faster access to relevant information.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"674\" height=\"293\" class=\"wp-image-97038\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-picture-containing-text-screenshot-number-fon.png\" alt=\"A picture containing text, screenshot, number, font\n\nDescription automatically generated\" \/><\/p>\n<p>Fig. 14 shows a cluster table where data with same month in kept in same micro partitions.<\/p>\n<p>Within Snowflake&#8217;s architecture, micro-partitions refer to individual units of data storage that comprise a section of a table&#8217;s data. These micro-partitions are designed to contain a relatively small amount of data, usually ranging between 50 MB to 500 MB. Moreover, data stored within each micro-partition is compressed, columnar-organized, and saved in a specialized format, which is fine-tuned to optimize query processing performance.<\/p>\n<p>Clustering in snowflake relates to arrangement of rows co-located with similar rows in the same micro-partitions. The effect of clustering can be seen for very large tables (larger than 1+ TB) when ordering of data when initially loaded data was not ordered or an excessive number of Data Manipulation Language (DML) operations have been performed against the table (like <code>INSERT<\/code>, <code>UPDATE<\/code>, etc.), causing the table\u2019s natural clustering to degrade over a period.<\/p>\n<p>When a clustering key is defined on a table, the automatic clustering service will co-locate rows with similar values in their cluster key columns to the same micro-partitions.<\/p>\n<p><strong>The Benefits of Defining Clustering Keys:<\/strong><\/p>\n<ol>\n<li>Better scan efficiency, making your queries execute more efficiently.<\/li>\n<li>Less table maintenance is required if auto-clustering is enabled, and data is not being inserted in order.<\/li>\n<li>Better column compression.<\/li>\n<\/ol>\n<p>A clustering key can be defined on a table at the time of creation or by using the <code>ALTER TABLE<\/code> statement if the table already exists.<\/p>\n<p>To demonstrate, we will create a table without any clustering key and insert data in it, using the following code. The results of this query are shown in Figure 15.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE TABLE DEMO_NO_CLUSTERING (\r\nEMP_ID NUMBER(10,0),\r\nDEPT_ID NUMBER(4,0),\r\nEMP_NAME VARCHAR(4),\r\nHOME_ADDR VARCHAR(1000)\r\n);\r\nINSERT OVERWRITE INTO DEMO_NO_CLUSTERING\r\nSELECT\r\nseq4() AS EMP_ID,\r\nuniform(1, 1000, random()) AS DEPT_ID,\r\nrandstr(4, random()) AS EMP_NAME,\r\nrandstr(1000, random()) AS HOME_ADDR\r\nFROM table(generator(rowcount =&gt; 550000));<\/pre>\n<p>This will return:<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"308\" height=\"139\" class=\"wp-image-97039\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-graph-description-automatically.png\" alt=\"A screenshot of a graph\n\nDescription automatically generated with low confidence\" \/><\/strong><\/p>\n<p><strong>Figure 15: Snowflake web UI showing SQL run of query.<\/strong><\/p>\n<p>We can then select a record from our sample table using a SELECT query with a WHERE clause filter, as shown in the following code:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT * \r\nFROM DEMO_NO_CLUSTERING \r\nWHERE DEPT_ID = 1;<\/pre>\n<p>The output is shown in Figure 16.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1757\" height=\"349\" class=\"wp-image-97040\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-computer-description-automatica.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with low confidence\" \/><\/p>\n<p><strong>Figure 16. SELECT from sample non-clustered table.<\/strong><\/p>\n<h2>Semi-Structured Data in Snowflake<\/h2>\n<p>Here we will look at the power of Snowflake\u2019s semi-structured data handling capability. We will create a table with the <code>VARIANT<\/code> data type and use it to load semi-structured <code>JSON<\/code> data. Figure 10 shows the creation of a table with a <code>VARIANT<\/code> data type column and loading <code>JSON<\/code> data into it.<\/p>\n<p>For example:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE   TABLE DEMO_TEMP_VARIANT\r\n(\r\n  LOCATION VARIANT\r\n);<\/pre>\n<p>Insert below test data for loading into table.<\/p>\n<pre class=\"lang:none theme:none\">INSERT INTO DEMO_TEMP_VARIANT \r\nSELECT parse_json(column1)\r\n  from values\r\n  ('{\r\n    \"id\": \"1234\",\r\n    \"name\": {\r\n      \"first\": \"Pankaj\",\r\n      \"last\": \"Gupta\"\r\n    },\r\n    \"company\": \"demo_company\",\r\n    \"email\": \"pankaj.gupta@demo_company.info\",\r\n    \"phone\": \"+1 (123) 456-7890\",\r\n    \"address\": \r\n      \"268 Havens Place, Dunbar, Rhode Island, 7725\"\r\n  }')\r\n  , ('{\r\n    \"id\": \"5678\",\r\n    \"name\": {\r\n      \"first\": \"Arpan\",\r\n      \"last\": \"Rai\"\r\n    },\r\n    \"company\": \"DIGIGEN\",\r\n    \"email\": \"Arpan.Rai@DIGIGEN.net\",\r\n    \"phone\": \"+1 (111) 222-3021\",\r\n    \"address\": \r\n     \"441 Dover Street, Ada, New Mexico, 5922\"\r\n  }');<\/pre>\n<p><strong>Figure 10: Running SQL and INSERT query in Snowflake web UI.<\/strong><\/p>\n<p>Once our <code>JSON<\/code> data has been inserted into the Snowflake table, we can view it with a <code>SELECT<\/code> query, as shown in Figure 11.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT  * \r\nFROM DEMO_TEMP_VARIANT;<\/pre>\n<p>Executing this code will return:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1798\" height=\"399\" class=\"wp-image-97041\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-computer-description-automatica-1.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with medium confidence\" \/><\/p>\n<p><strong>Figure 11: Showing data in Snowflake web UI.<\/strong><\/p>\n<p>Now we\u2019ll consider the technique for parsing the <code>JSON<\/code> data in this table. One thing to keep in mind is that <code>VARIANT<\/code> values are not strings; <code>VARIANT<\/code> values contain strings. The operators \u201c:\u201d and subsequent \u201c.\u201d and \u201c[]\u201d always return <code>VARIANT<\/code> values containing strings.<\/p>\n<p>Below are some pre-defined functions which can be used with variant data.<\/p>\n<ul>\n<li><code>PARSE_JSON<\/code> &#8211; pre-defined function in Snowflake that facilitates the extraction of data from strings that are formatted in <code>JSON<\/code>. This function takes a solitary parameter, which is a string that contains the data formatted as <code>JSON<\/code>, and subsequently returns a variant data type that can be used in SQL queries.<\/li>\n<li><code>CHECK_JSON<\/code> built-in function in Snowflake that validates whether a string contains valid <code>JSON<\/code> data. This function can be used to check if a given string is formatted correctly as <code>JSON<\/code> and returns a Boolean value of <code>TRUE<\/code> or <code>FALSE<\/code><\/li>\n<\/ul>\n<p><strong>Sample query <\/strong><\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT CHECK_JSON( \r\n'\r\n{\r\n    \"id\": \"1234\",\r\n    \"name\": {\r\n      \"first\": \"Pankaj\",\r\n      \"last\": \"Gupta\"\r\n    }\r\n  }')\r\nas is_valid;<\/pre>\n<p>Using Snowflake\u2019s semi-structured data extraction operators, our <code>JSON<\/code> data can simply be fetched in a flat table format using a simple query like the one shown in Figure 12.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT\r\n    (LOCATION:id) as id,\r\n    (LOCATION:name:first) as first_name,\r\n    (LOCATION:name:last) as last_name,\r\n    (LOCATION:company) as company,\r\n    (LOCATION:email) as email,\r\n    (LOCATION:phone) as phone,\r\n    (LOCATION:address) as address\r\nFROM\r\n    DEMO_TEMP_VARIANT;<\/pre>\n<p><strong>Figure 12:SQL for parsing JSON in Snowflake web UI.<\/strong><\/p>\n<p>Figure 13 shows the results of our JSON extraction query.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1284\" height=\"187\" class=\"wp-image-97042\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-computer-description-automatica-2.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with low confidence\" \/><\/p>\n<p><strong>Figure 13: Flattened semi structured JSON data.<\/strong><\/p>\n<h2>Query History and Profiling<\/h2>\n<p>To look as the query plan of queries that have been executed. You<\/p>\n<p>can go to query profiling page through activity tab \ud83e\udc6a <strong>Query History<\/strong>:<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"996\" height=\"636\" class=\"wp-image-97043\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/graphical-user-interface-text-application-descr.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/strong><\/p>\n<p><strong>Figure 17: query history tab <\/strong><\/p>\n<p>Select the query for which you want to see profile.<\/p>\n<p>Now when we look at the query profile, we notice this table had 23 micro-partitions, and all 23 were scanned while searching for records where <code>DEPT_ID = 1<\/code>. This result is shown in Figure 18.<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"1806\" height=\"787\" class=\"wp-image-97044\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/graphical-user-interface-description-automaticall.png\" alt=\"Graphical user interface\n\nDescription automatically generated\" \/><\/strong><\/p>\n<p><strong>Figure 18: Snowflake web UI showing query profile result for table with no clustering key.<\/strong><\/p>\n<p>Now let\u2019s create an equivalent table with a clustering key and assess how same select query works. The query in Figure 19 creates our new table with a clustering key.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE OR REPLACE TABLE DEMO_CLUSTERING \r\n                   cluster by (DEPT_ID) (\r\nEMP_ID NUMBER(10,0),\r\nDEPT_ID NUMBER(4,0),\r\nEMP_NAME VARCHAR(4),\r\nHOME_ADDR VARCHAR(1000)\r\n);\r\n    \r\nINSERT OVERWRITE INTO DEMO_CLUSTERING\r\nSELECT\r\nseq4() AS EMP_ID,\r\nuniform(1, 1000, random()) AS DEPT_ID,\r\nrandstr(4, random()) AS EMP_NAME,\r\nrandstr(1000, random()) AS HOME_ADDR\r\nFROM table(generator(rowcount =&gt; 550000)) \r\nORDER BY DEPT_ID;<\/pre>\n<p><strong>Figure 19: DDL and SQL for execution in snowflake web UI<\/strong><\/p>\n<p>Now run the equivalent <code>SELECT<\/code> query from our previous example against the new table, as shown.<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT *\r\nFROM DEMO_CLUSTERING\r\nWHERE DEPT_ID = 1;<\/pre>\n<p>When we look at the query profile, we can clearly see that with a clustering key enabled on the <code>DEPT_ID<\/code> column, this query scans only 1 partition of the 23 available. If there are thousands of partitions, then clustering key can increase CPU resource efficiency and result in more efficient execution. The key thing here is that data should be inserted for the clustering key in the sorted order in our table. This is why we apply the <code>ORDER BY<\/code> clause to the <code>SELECT<\/code> query that feeds our <code>INSERT<\/code> statement. If you do not load data in sorted order for <code>DEPT_ID<\/code>, it will not organize micro-partitions properly and will still scan all the micro-partitions. Figure 20 shows our query profile after applying a clustering key and inserting our data into the table in clustering key order.<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" width=\"1812\" height=\"782\" class=\"wp-image-97045\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/graphical-user-interface-description-automaticall-1.png\" alt=\"Graphical user interface\n\nDescription automatically generated\" \/><\/strong><\/p>\n<p><strong>Figure 20: Snowflake web UI showing query profile result for table with clustering key.<\/strong><\/p>\n<p>Within most traditional database management systems (DBMSs), utilization of the <code>ORDER BY<\/code> clause within a <code>SELECT<\/code> query that feeds a DML statement, such as <code>UPDATE<\/code> or <code>DELETE<\/code>, does not inevitably guarantee ordering of the resulting data. Consequently, rows retrieved by such a query might not consistently maintain the same order every time the query is executed, even if the <code>ORDER BY<\/code> clause is used.<\/p>\n<p>Nevertheless, this characteristic distinguishes Snowflake from most other DBMSs. When using Snowflake, the <code>ORDER BY<\/code> clause in a <code>SELECT<\/code> query that feeds a DML statement does guarantee ordering of the resulting data. Hence, rows returned by the query will persistently maintain the same order, regardless of how many times the query is executed.<\/p>\n<p>This guarantee of ordering can be attributed to Snowflake&#8217;s unique architecture. Its micro-partitioning technique and query optimization methods permit highly efficient and parallelized query execution, resulting in ordered outputs even when using the <code>ORDER BY<\/code> clause in a DML statement.<\/p>\n<h2>Summary<\/h2>\n<p>In this article we considered the basic differences between an on-premises infrastructure and a Cloud-based infrastructure. We then discussed the Snowflake architecture and created a new virtual warehouse. We discussed the different types of tables Snowflake supports and created each type of table. We also covered how to load semi-structured data in Snowflake in a <code>VARIANT<\/code> data type column, and how to extract <code>JSON<\/code> data in a tabular format. Finally, we discussed clustering keys and performed a practical demonstration that showed the difference between tables created with and without a clustering key.<\/p>\n<h2>References<\/h2>\n<p><strong>Snowflake Documentation: <\/strong><a href=\"https:\/\/docs.snowflake.com\/en\/\">https:\/\/docs.snowflake.com\/en\/<\/a><\/p>\n<p>Here are a few popular Cloud providers you can visit to get additional information. There are many more, and this article does not endorse any specific one. The links are for informational purposes only.<\/p>\n<p><strong>Amazon Web Services:<\/strong> <a href=\"https:\/\/aws.amazon.com\/\">https:\/\/aws.amazon.com\/<\/a><\/p>\n<p><strong>Microsoft Azure:<\/strong> <a href=\"https:\/\/azure.microsoft.com\/en-us\/\">Cloud Computing Services | Microsoft Azure<\/a><\/p>\n<p><strong>Google Cloud Platform:<\/strong> <a href=\"https:\/\/cloud.google.com\/\">Cloud Computing Services | Google Cloud<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article explains what Snowflake data warehouse is and how it is different from other traditional data warehouses. This article will cover how to create virtual warehouses and different types of tables, how to handle semi-structured data, how to profile queries, and the benefits of clustering keys. Practical demonstrations of the concepts explained will be&#8230;&hellip;<\/p>\n","protected":false},"author":341000,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,159001],"tags":[],"coauthors":[159000],"class_list":["post-97031","post","type-post","status-publish","format-standard","hentry","category-featured","category-snowflake"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341000"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=97031"}],"version-history":[{"count":3,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97031\/revisions"}],"predecessor-version":[{"id":97047,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97031\/revisions\/97047"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=97031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=97031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=97031"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=97031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}