Synapse Serverless: CETAS requires DFS to work

CETAS is the acronym for Create External Table As Select. This allow us to export the result of a select to an external table in different formats, such as PARQUET or CSV.

We can also convert the format from one external table to another. The SELECT used on the CETAS can be an external table query. In this way we would be converting files in a data lake from one format to another. For example, we could convert CSV files to PARQUET.

The following query is an example of CETAS:

CREATE EXTERNAL TABLE csvsampletables
WITH (
      LOCATION = ‘filescsv/’,
      DATA_SOURCE = SynapseSQLwriteable,
      FILE_FORMAT = ParquetFF
)
AS
select *, tradedetail.filepath(1) as datetime,
        tradedetail.filepath(2) as ticks from
OPENROWSET(
        BULK ‘https://euwe01devqigsa01.blob.core.windows.net/staging/TradeDetail/’,
        FORMAT=‘CSV’,
        HEADER_ROW=true,
        parser_version=‘2.0’
) AS [tradedetail]

This query has two blob addresses: The SELECT query uses one address, while the create table uses a different address hidden in the Data Source called SynapseSQLWritable.

While making some CETAS tests, I discovered an interesting new behaviour. The following error message was displayed and it was very strange:

Msg 16539, Level 16, State 1, Line 1
Operation failed since the external data source ‘https://euwe01devqigsa01.blob.core.windows.net/dennes/filescsv/’ has underlying storage account that is not in the list of Outbound Firewall Rules on the server. Please add this storage account to the list of Outbound Firewall Rules on your server and retry the operation.

The Solution For CETAS

The error message doesn’t make much sense, because the Synapse Serverless Pool doesn’t have exposed Outbound Firewall Rules. In fact, the root cause and solution for this message is completely different from the message itself: CETAS is not supported using the BLOB protocol, only with the data lake protocol (DFS).

Mind the query using the BLOB protocol: This IS NOT the problem. The query can use the blob protocol with absolutely no problem. The problem is hidden in the data source called SynapseSQLWritetable. We will need to drop this data source and create again using the DFS protocol:

drop external data source [SynapseSQLwriteable]

CREATE EXTERNAL DATA SOURCE [SynapseSQLwriteable] WITH (
         LOCATION = ‘https://euwe01devqigsa01.dfs.core.windows.net/dennes/’,
         CREDENTIAL = [WorkspaceIdentity]
);GO

Once the data source is replaced, executing the query again it will work.

More About Synapse Serverless

  1. How to query blob storage with SQL using Azure Synapse
  2. How to query private blob storage with SQL and Azure Synapse
  3. Performance of querying blob storage with SQL