Synapse Dedicated Differences

The basic differences between Synapse Serverless and the Dedicated pool are obvious: While the serverless doesn’t store data, only access data from storage accounts and scale the MPP environment automatically, the dedicate SQL Pool keeps a static number of servers according to the service level we choose and a constant number of distributions – always 60.

About these basic differences, I wrote 3 articles about Synapse Serverless:

I also wrote an article to help decide if your volume of data creates the need for a dedicated SQL Pool and I have youtube videos explaining the internal details about a dedicated SQL Pool.

Besides all these obvious differences, there are some differences in drivers and SQL Syntax which are easily missed. Let’s talk about them.

OPENROWSET is not supported in the Synapse Dedicated SQL Pool

The OPENROWSET syntax we use in the serverless pool in not supported in the dedicated SQL Pool. On the dedicated SQL Pool we are required to create an external data source, external file format and external table. Very frequently we will need to create a database scoped credential as well.

Example:

CREATE DATABASE SCOPED CREDENTIAL [WorkspaceIdentity]
WITH IDENTITY = ‘Managed Identity’;

CREATE EXTERNAL DATA SOURCE [sourceNativeABFS] WITH
(
LOCATION = ‘abfss://opendatalake@lakedemo.dfs.core.windows.net/’,
CREDENTIAL = [WorkspaceIdentity]
);

CREATE EXTERNAL TABLE nativeTable
(
[DateID] int,
[MedallionID] int,
[HackneyLicenseID] int,
[PickupTimeID] int,
[DropoffTimeID] int,
[PickupGeographyID] int,
[DropoffGeographyID] int,
[PickupLatitude] float,
[PickupLongitude] float,
[PickupLatLong] varchar(50),
[PickupLatLong2] varchar(50),
[DropoffLatitude] float,
[DropoffLongitude] float,
[DropoffLatLong] varchar(50),
[DropoffLatLong2] varchar(50),
[PassagenCount] int,
[TripDurationSeconds] int,
[TripDistanceMiles] float,
[PaymentType] varchar(50),
[FareAmount] float,
[SurchargeAmount] float,
[TaxAmount] float,
[TipAmount] float,
[TollsAmount] float,
[TotalAmount] float)
WITH (
LOCATION = ‘trips/’,
DATA_SOURCE = SourceNativeABFS,
FILE_FORMAT = [parquetcompressed]
)

The access drivers are different

The serverless SQL Pool uses a native driver to access azure storage accounts, always using HTTPS.

The Dedicate Pool, on the other hand, uses Polybase. Polybase is a technology evolving since SQL Server 2016. The current polybase is very different than the old one from those days, but still, this is what the dedicated pool uses.

Due to this difference, the syntax of the statements to access external data is different. But to add more salt to this salad, the dedicated pool now supports a native driver to access storage accounts as well. However, the native driver in the dedicated pool has different capabilities than the native driver used in the serverless pool.

Let’s make a table with some of the differences:

Feature	Serverless SQL Pool	Dedicated SQL Pool
Supported Drivers	Native	Native and Hadoop
Supported Protocols of Native Driver	HTTPS	abfss, wasbs and https
Supported Source	Storage Accounts	Multiple sources, using Hadoop
Storage Authentication	Keys, SAS, Managed Identity	Managed Identity is only supported in abfss. HTTPS doesn’t support managed identity, in the opposite than in the serverless pool
Native Driver	Read-Write with CETAS support	Read-only, no CETAS support
CETAS	It only works with “.dfs” endpoint	It only works with Hadoop driver, but the protocol and endpoint, abfss or wasbs doesn’t matter.

Difference between Native and Hadoop Data Sources

The native driver is read-only. It’s not possible to use CETAS statement to create a table using the native driver, for example. If you try to do so, the resulting error message doesn’t explain much about the problem:

Msg 110802, Level 16, State 1, Line 2
110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Error occurred while accessing HDFS external file[/files/QID21876782_20230303_91720_0.parq.snappy][0]: Java exception raised on call to HdfsBridge_CreateRecordWriter. Java exception message:
HdfsBridge::createRecordWriter – Unexpected error encountered when creating a record writer: ClassCastException: class com.microsoft.polybase.storage.input.ParquetFileToPaxBlockInputFormat cannot be cast to class com.microsoft.polybase.storage.output.SharedWriterOutputFormat (com.microsoft.polybase.storage.input.ParquetFileToPaxBlockInputFormat and com.microsoft.polybase.storage.output.SharedWriterOutputFormat are in unnamed module of loader ‘app’)

There is one small difference on the syntax between a native source and a Hadoop source: The TYPE attribute.

Native Data Source example:

CREATE EXTERNAL DATA SOURCE [sourceNativeABFS] WITH
(
LOCATION = ‘abfss://opendatalake@lakedemo.dfs.core.windows.net/’,
CREDENTIAL = [WorkspaceIdentity]
);

Hadoop Data Source example:

CREATE EXTERNAL DATA SOURCE [sourceHadoopABFS] WITH
(
TYPE= HADOOP,
LOCATION = ‘abfss://opendatalake@lakedemo.dfs.core.windows.net/’,
CREDENTIAL = [WorkspaceIdentity]
);

Reference:

CETAS: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=azure-sqldw-latest&preserve-view=true

Partitioning

Since Synapse Serverless implemented the partitioning techniques using the OPENROWSET function, which I explained on my article about performance on storage access from the Serverless, there is a race to see which one will implement partitioning first: SQL Server, Azure SQL, Managed Instance or Dedicated SQL Pool.

The dedicated SQL Pool has not implemented it yet, partitioning is a feature limited to the serverless SQL Pool.

Protocol Differences

Only abfss supports managed identity authentication.
HTTPS supports managed identity in the serverless, but not in the dedicated pool
wasbs requires “.blob.” storage endpoint
abfss requires “.dfs.” storage endpoint
wasbs is optimized for Hierarchical Namespace disabled, but works with it enabled as well
In a test with parquet format and 20.000 records, wasbs and abfss had similar performance (1 second) while HTTPS executed in 6 seconds

References:

Native vs Hadoop

The native mode is intended to have better performance than the Hadoop mode for read-only operations, but the Hadoop mode can provide additional sources and read-write behaviour. Besides that, the data type mapping may be different. Specially with the parquet files, sometimes the data types which work in native mode may need to be changed for the hadoop mode.

Summary

The many flavours of SQL have so many slight variations that I hope this blog can put some light on a few of the differences and help to avoid some mistakes which generate error messages without detailed explanations.

Register for Simple Talk

Synapse Serverless and Dedicated Pool: The differences no one told you about

OPENROWSET is not supported in the Synapse Dedicated SQL Pool

The access drivers are different

Difference between Native and Hadoop Data Sources

Partitioning

Protocol Differences

Native vs Hadoop

Summary

About the author

Dennes Torres

Dennes's contributions

Articles

Books

Top topics

Dennes's latest contributions:

AI Studio and The New AI Features

Microsoft Fabric: Moving Dataflows Gen 2 to different workspaces

Microsoft Fabric: New Features you should not miss

OPENROWSET is not supported in the Synapse Dedicated SQL Pool

The access drivers are different

Difference between Native and Hadoop Data Sources

Partitioning

Protocol Differences

Native vs Hadoop

Summary

Recommended for you

About the author

Dennes Torres

Dennes's contributions

Articles

Books

Top topics

Dennes's latest contributions:

AI Studio and The New AI Features

Microsoft Fabric: Moving Dataflows Gen 2 to different workspaces

Microsoft Fabric: New Features you should not miss