Synapse Serverless and Dedicated Pool: The differences no one told you about

The basic differences between Synapse Serverless and the Dedicated pool are obvious: While the serverless doesn’t store data, only access data from storage accounts and scale the MPP environment automatically, the dedicate SQL Pool keeps a static number of servers according to the service level we choose and a constant number of distributions – always 60.

About these basic differences, I wrote 3 articles about Synapse Serverless:

  1. How to query blob storage with SQL using Azure Synapse
  2. How to query private blob storage with SQL and Azure Synapse
  3. Performance of querying blob storage with SQL 

I also wrote an article to help decide if your volume of data creates the need for a dedicated SQL Pool and I have youtube videos explaining the internal details about a dedicated SQL Pool.

Besides all these obvious differences, there are some differences in drivers and SQL Syntax which are easily missed. Let’s talk about them.

OPENROWSET is not supported in the Synapse Dedicated SQL Pool

The OPENROWSET syntax we use in the serverless pool in not supported in the dedicated SQL Pool. On the dedicated SQL Pool we are required to create an external data source, external file format and external table. Very frequently we will need to create a database scoped credential as well.

Example:

CREATE DATABASE SCOPED CREDENTIAL [WorkspaceIdentity]
WITH IDENTITY = ‘Managed Identity’;
GO
 
CREATE EXTERNAL DATA SOURCE [sourceNativeABFS] WITH
(
     LOCATION = ‘abfss://opendatalake@lakedemo.dfs.core.windows.net/’,
     CREDENTIAL = [WorkspaceIdentity]
);
go
 
CREATE EXTERNAL TABLE nativeTable
(
 [DateID] int,
[MedallionID] int,
[HackneyLicenseID] int,
[PickupTimeID] int,
[DropoffTimeID] int,
[PickupGeographyID] int,
[DropoffGeographyID] int,
[PickupLatitude] float,
[PickupLongitude] float,
[PickupLatLong] varchar(50),
[PickupLatLong2] varchar(50),
[DropoffLatitude] float,
[DropoffLongitude] float,
[DropoffLatLong] varchar(50),
[DropoffLatLong2] varchar(50),
[PassagenCount] int,
[TripDurationSeconds] int,
[TripDistanceMiles] float,
[PaymentType] varchar(50),
[FareAmount] float,
[SurchargeAmount] float,
[TaxAmount] float,
[TipAmount] float,
[TollsAmount] float,
[TotalAmount] float)
WITH (
    LOCATION = ‘trips/’,
    DATA_SOURCE = SourceNativeABFS,
    FILE_FORMAT = [parquetcompressed]
)

The access drivers are different

The serverless SQL Pool uses a native driver to access azure storage accounts, always using HTTPS.

The Dedicate Pool, on the other hand, uses Polybase. Polybase is a technology evolving since SQL Server 2016. The current polybase is very different than the old one from those days, but still, this is what the dedicated pool uses.

Due to this difference, the syntax of the statements to access external data is different. But to add more salt to this salad, the dedicated pool now supports a native driver to access storage accounts as well. However, the native driver in the dedicated pool has different capabilities than the native driver used in the serverless pool.

Let’s make a table with some of the differences:

Feature

Serverless SQL Pool

Dedicated SQL Pool

Supported Drivers

Native

Native and Hadoop

Supported Protocols of Native Driver

HTTPS

abfss, wasbs and https

Supported Source

Storage Accounts

Multiple sources, using Hadoop

Storage Authentication

Keys, SAS, Managed Identity

Managed Identity is only supported in abfss. HTTPS doesn’t support managed identity, in the opposite than in the serverless pool

Native Driver

Read-Write with CETAS support

Read-only, no CETAS support

CETAS

It only works with “.dfs” endpoint

It only works with Hadoop driver, but the protocol and endpoint, abfss or wasbs doesn’t matter.

Difference between Native and Hadoop Data Sources

The native driver is read-only. It’s not possible to use CETAS statement to create a table using the native driver, for example. If you try to do so, the resulting error message doesn’t explain much about the problem:

Msg 110802, Level 16, State 1, Line 2
110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Error occurred while accessing HDFS external file[/files/QID21876782_20230303_91720_0.parq.snappy][0]: Java exception raised on call to HdfsBridge_CreateRecordWriter. Java exception message:
HdfsBridge::createRecordWriter – Unexpected error encountered when creating a record writer: ClassCastException: class com.microsoft.polybase.storage.input.ParquetFileToPaxBlockInputFormat cannot be cast to class com.microsoft.polybase.storage.output.SharedWriterOutputFormat (com.microsoft.polybase.storage.input.ParquetFileToPaxBlockInputFormat and com.microsoft.polybase.storage.output.SharedWriterOutputFormat are in unnamed module of loader ‘app’)

There is one small difference on the syntax between a native source and a Hadoop source: The TYPE attribute.

Native Data Source example:

CREATE EXTERNAL DATA SOURCE [sourceNativeABFS] WITH
(
   LOCATION = ‘abfss://opendatalake@lakedemo.dfs.core.windows.net/’,
   CREDENTIAL = [WorkspaceIdentity]
)
;
go 

Hadoop Data Source example:

CREATE EXTERNAL DATA SOURCE [sourceHadoopABFS] WITH
(
    TYPE= HADOOP,
    LOCATION = ‘abfss://opendatalake@lakedemo.dfs.core.windows.net/’,
    CREDENTIAL = [WorkspaceIdentity]
)
;
go

Reference:

Partitioning

Since Synapse Serverless implemented the partitioning techniques using the OPENROWSET function, which I explained on my article about performance on storage access from the Serverless, there is a race to see which one will implement partitioning first: SQL Server, Azure SQL, Managed Instance or Dedicated SQL Pool.

The dedicated SQL Pool has not implemented it yet, partitioning is a feature limited to the serverless SQL Pool.

Protocol Differences

  • Only abfss supports managed identity authentication.
  • HTTPS supports managed identity in the serverless, but not in the dedicated pool
  • wasbs requires “.blob.” storage endpoint
  • abfss requires “.dfs.” storage endpoint
  • wasbs is optimized for Hierarchical Namespace disabled, but works with it enabled as well
  • In a test with parquet format and 20.000 records, wasbs and abfss had similar performance (1 second) while HTTPS executed in 6 seconds

References:

Native vs Hadoop

The native mode is intended to have better performance than the Hadoop mode for read-only operations, but the Hadoop mode can provide additional sources and read-write behaviour. Besides that, the data type mapping may be different. Specially with the parquet files, sometimes the data types which work in native mode may need to be changed for the hadoop mode.

Summary

The many flavours of SQL have so many slight variations that I hope this blog can put some light on a few of the differences and help to avoid some mistakes which generate error messages without detailed explanations.