SQL Server 2025 : Zstandard Backup Compression

This article is part of a series of posts on the forthcoming SQL Server 2025 release. Once 2025 has been released, posts will be kept if the feature hasn't changed for the final release. For other posts in this series, you can find them in this collection.

Whether you are a bank or a hot dog stand, creating backups is a boring but essential part of managing databases. Compressing backups – like other types of data compression – can save time and storage space, at the usually unavoidable cost of CPU. There has been little change in compression throughout SQL Server’s long history, but this year, in SQL Server 2025, there is an exciting change coming.

But first…

A very brief history of backup compression

In the SQL Server 2005 timeframe, there were several vendors who offered backup management products that included compression as a feature, since SQL Server did not offer this capability. The three I remember using were Redgate SQL Backup, Idera SQL Safe Backup, and Quest LiteSpeed. These products still exist but, in SQL Server 2008, Microsoft started taking their lunch by adding native backup compression to Enterprise Edition. By SQL Server 2012, it had made its way to all editions. These products offer other functionality too but, if all you were after was some compression, you no longer needed a 3^rd party tool.

Microsoft has been using the MS_XPRESS compression algorithm, which they openly documented starting in 2011 here:

[MS-XCA]: Xpress Compression Algorithm

In my experience, the native backup compression has always worked well. So much so that, at least a decade ago, I adopted the firm stance that all backups should use compression.

Nothing much in this landscape really changed until SQL Server 2022, when they added an algorithm called QAT (“QuickAssist Technology”), which is most effective when used with specific Intel hardware. I haven’t played with this algorithm, but Glenn Allan Berry has talked about it here:

My quick take: It is complex and disruptive to configure and feels like overkill if you can’t get the full benefit. Which also explains why I haven’t tried it.

And now…

Enter Zstandard

SQL Server 2025 adds the option to use Zstandard (“ZSTD”) compression, which promises to produce better results than MS_XPRESS while being easier to configure than QAT. ZSTD as a technology has been around for nine years, described in detail here by Facebook’s Yann Collet and Chip Turner.

Now you will be able to say:

1 2	BACKUP DATABASE … WITH COMPRESSION (ALGORITHM = ZSTD);

And you can specify the level (LOW, MEDIUM, or HIGH):

1 2	BACKUP DATABASE … WITH COMPRESSION (ALGORITHM = ZSTD, LEVEL = HIGH);

The scale from LOW to HIGH being higher compression = smaller size in exchange for more CPU.

You can even set the default to use ZSTD, but currently there isn’t a way to set the default level to LOW, MEDIUM, or HIGH – so by default it will always use LOW:

1 2	EXEC sys.sp_configure 'backup compression algorithm', 3; RECONFIGURE;

But not so fast! If you try this in CTP 2.0, the maximum is still hard-coded to 2, so (as the release notes warn):

1 2	Msg 15129, Level 16, State 1, Procedure sys.sp_configure, Line 172 '3' is not a valid value for configuration option 'backup compression algorithm'.

Surely, they’ll fix that before RTM (and maybe add a way to specify the default compression level, too).

If you use Ola Hallengren’s backup solution, he has already updated them with support for this new algorithm. All you’ll have to change is the interface(s) you use to call his stored procedure.

But do I want to switch?

Of course, you have to be punishing your servers with large enough backups to make the payoff worthwhile and noticeable – never mind being sure it won’t add substantial CPU overhead. If you don’t have large databases, or you’re not trying to reduce the space or time their backups require, then maybe your current solution is already good enough.

I’ve seen a few posts on this already, and I have to say, they haven’t moved me – because they don’t really show me anything. Reducing the size and duration of a 640kb WideWorldImporters or 13kb AdventureWorks backup that takes 2 seconds on modern hardware – even without compression – is hardly worth lifting a finger and, as a demo, not particularly exciting.

Luckily, I have access to a large, real-world database that most of you don’t: Stack Overflow.

So, I decided to take a sanitized copy of this ~8 TB database and put it through the ringer. As a starting point, with MS_XPRESS compression, in recent history and in its existing environment, the Stack Overflow backup has been around 2 TB, across 32 files, and takes roughly 2.5 hours across the network. I don’t have any reference for what it takes without compression because we simply don’t ever do that.

I restored the sanitized copy in a VM in GCP with the following characteristics:

VM class	m3-ultramem-128
Cores	64 (Intel Ice Lake)
Memory (GB)	3,904
Backup Disk	Hyperdisk Balanced (20 TB, 20K IOPs, 2,400 MB/s throughput)
Data/Log Disk	Hyperdisk Balanced (20 TB, 20K IOPs, 2,400 MB/s throughput)
Operating system	Windows Server 2022 21H2 (20348.3692)
SQL Server	SQL Server 2025 CTP 2.0 (17.0.700.9)

To approximate what we do in production, I wanted to back up to 32 files, using the default MS_XPRESS algorithm, then follow up with each level of the ZSTD algorithm. I wanted to analyze both generating new backups and restoring from them which means, unless I wanted to provision hundreds of terabytes of disk, I’d have to be dropping and deleting everything I create before starting the next test.

I wrote a stored procedure to help run tests and a table to store the results. This way I didn’t have to watch paint dry or interact with the system after each test. Here is the table:

CREATE TABLE dbo.BackupTests

(

TestID int identity(1,1),

DatabaseName sysname,

CompressionType sysname,

CompressionLevel varchar(10),

BackupStartTime datetime,

BackupEndTime datetime,

RestoreStartTime datetime,

RestoreEndTime datetime,

SizeMB decimal(20,4),

CompressedSizeMB decimal(20,4),

CONSTRAINT PK_BRT PRIMARY KEY(TestID)

);

And here is the procedure:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

CREATE OR ALTER PROCEDURE dbo.RunBackupTest

@DatabaseName sysname = N'StackOverflow',

@CompressionType sysname = N'MS_XPRESS',

@CompressionLevel varchar(10) = null,

@OutputPath nvarchar(4000) = N'M:\Backups\',

@OutputURL nvarchar(4000) = null,

@RestorePath nvarchar(4000) = N'K:\Data\',

@NumberOfFiles int = 32,

@debug bit = 0

BEGIN

SET NOCOUNT ON;

DECLARE @OutputFormat nvarchar(4000),

@BackupCommand nvarchar(max),

@FileList nvarchar(max),

@Description nvarchar(255),

@RestoreCommand nvarchar(max),

@DropCommand nvarchar(max),

@DeleteCommand varchar(4000),

@TestID int,

@ts char(15) = CONVERT(char(8), getdate(), 112)

+ '.' + REPLACE(CONVERT(char(8), getdate(), 108), ':', '');

SET @Description = @DatabaseName

+ N'.' + @CompressionType

+ COALESCE(N'.' + @CompressionLevel, N'') + N'.'

+ @ts;

SET @OutputFormat = COALESCE(@OutputPath, @OutputURL)

+ @Description;

SET @BackupCommand = N'BACKUP DATABASE '

+ QUOTENAME(@DatabaseName) + N' TO '

SELECT @FileList = char(13) + char(10) + space(2)

+ STRING_AGG(CONVERT(nvarchar(max), CONCAT(

CASE WHEN @OutputPath IS NOT NULL

THEN N'DISK ' ELSE N'URL ' END,

N' = ''', @OutputFormat, N'.',

RIGHT(CONCAT(N'0', value),2), N'.bak''')), N',

') FROM GENERATE_SERIES(1, @NumberOfFiles);

SET @BackupCommand = @BackupCommand + @FileList + N'

WITH INIT, FORMAT, COMPRESSION '

+ CASE @CompressionType WHEN N'MS_XPRESS' THEN ''

ELSE N'(ALGORITHM = ZSTD, LEVEL = '

+ COALESCE(@CompressionLevel, N'LOW') + N')' END

+ N', DESCRIPTION = N''' + @Description + N''';';

IF @debug = 1 PRINT @BackupCommand;

IF @debug = 0

BEGIN

EXEC master.sys.sp_executesql @BackupCommand;

INSERT dbo.BackupTests

(

DatabaseName,

CompressionType,

CompressionLevel,

BackupStartTime,

BackupEndTime,

SizeMB,

CompressedSizeMB

)

SELECT @Description,

@CompressionType,

@CompressionLevel,

backup_start_date,

backup_finish_date,

backup_size / 1048576.0,

compressed_backup_size / 1048576.0

FROM msdb.dbo.backupset

WHERE description = @Description;

SELECT @TestID = SCOPE_IDENTITY();

END

SET @RestoreCommand = N'RESTORE DATABASE '

+ QUOTENAME(@Description) + N' FROM '

+ @FileList + N'

WITH REPLACE, RECOVERY, '

+ char(13) + char(10)

(

SELECT STRING_AGG(CONVERT(nvarchar(max), CONCAT(N'MOVE '''

+ name + N''' TO ''' + @RestorePath

+ @Description, N'.', file_id, N'.ndf''')), N','

+ char(13) + char(10))

FROM master.sys.master_files

WHERE database_id = DB_ID(@DatabaseName)

) + N';';

IF @debug = 1 PRINT @RestoreCommand;

IF @debug = 0

BEGIN

UPDATE dbo.BackupTests

SET RestoreStartTime = getutcdate()

WHERE TestID = @TestID;

EXEC master.sys.sp_executesql @RestoreCommand;

UPDATE dbo.BackupTests

SET RestoreEndTime = getutcdate()

WHERE TestID = @TestID;

END

SET @DropCommand = N'ALTER DATABASE '

+ QUOTENAME(@Description) + N' SET SINGLE_USER

WITH ROLLBACK IMMEDIATE;

DROP DATABASE ' + QUOTENAME(@Description) + N';';

SET @DeleteCommand = 'del "' + @OutputPath + '*' + @ts + '*.bak"';

IF @debug = 1 PRINT CONCAT(@DropCommand, char(13), char(10), @DeleteCommand);

IF @debug = 0

BEGIN

EXEC master.sys.sp_executesql @DropCommand;

EXEC master.sys.xp_cmdshell @DeleteCommand, NO_OUTPUT;

END

There’s a lot to unpack there. I needed to:

Create unique names to use for database names, files on disk, and for the backup description to easily identify items from backup/restore history for each test;
Create a backup command that would output TO DISK or TO URL the exact number of files I specify, and apply the right compression type and level;
Build a restore command that also accurately indicated the files from the backup, as well as valid WITH MOVE options for the database’s data and log files; and,
Drop the database and delete the backup files.

With the debug argument set to true (and a limited number of files for brevity), the following command…

EXEC dbo.RunBackupTest

@DatabaseName = N'StackOverflow',

@NumberofFiles = 4,

@CompressionType = 'ZSTD',

@CompressionLevel = 'HIGH',

@debug = 1;

…outputs everything it would have run (with a little formatting finesse here from yours truly):

BACKUP DATABASE [StackOverflow] TO

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.01.bak',

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.02.bak',

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.03.bak',

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.04.bak'

WITH INIT, FORMAT, COMPRESSION (ALGORITHM = ZSTD, LEVEL = HIGH),

DESCRIPTION = N'StackOverflow.ZSTD.HIGH.20250529.203745';

RESTORE DATABASE [StackOverflow.ZSTD.HIGH.20250529.203745] FROM

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.01.bak',

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.02.bak',

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.03.bak',

DISK = 'M:\Backups\StackOverflow.ZSTD.HIGH.20250529.203745.04.bak'

WITH REPLACE, RECOVERY,

MOVE 'DataFile' TO 'K:\Data\StackOverflow.ZSTD.HIGH.20250529.203745.1.ndf',

MOVE 'LogFile' TO 'K:\Data\StackOverflow.ZSTD.HIGH.20250529.203745.2.ndf';

ALTER DATABASE [StackOverflow.ZSTD.HIGH.20250529.203745]

SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

DROP DATABASE [StackOverflow.ZSTD.HIGH.20250529.203745];

del "M:\Backups\*20250529.203745*.bak"

Then I just needed to write a little harness to run each of the tests:

DECLARE @tests TABLE

(

CompressionType sysname,

CompressionLevel varchar(10)

);

INSERT @tests VALUES

(N'MS_EXPRESS', NULL),

(N'ZSTD', 'LOW'),

(N'ZSTD', 'MEDIUM'),

(N'ZSTD', 'HIGH');

DECLARE @ct sysname, @cl varchar(10), @c cursor;

SET @c = cursor LOCAL FAST_FORWARD READ_ONLY FOR

SELECT CompressionType, CompressionLevel FROM @tests;

OPEN @c;

FETCH NEXT FROM @c INTO @ct, @cl;

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC dbo.RunBackupTest @CompressionType = @ct,

@CompressionLevel = @cl;

FETCH NEXT FROM @c INTO @ct, @cl;

END

GO 5

The table made it much easier to analyze the results than trying to parse and collect the output from the messages pane in SSMS. It only doesn’t account for average CPU over the course of any operation, but that I can pull from other monitoring.

Initial Tests

Here are the results averaged across 5 tests for each compression option:

Test	Backup time (s)	Backup CPU	Backup size	Restore time (s)	Restore CPU
MS_XPRESS	3,274	13%	1.92 TB	3,267	6%
ZSTD – LOW	3,272	13%	1.92 TB	3,271	6%
ZSTD – MED	3,273	14%	1.68 TB	3,259	7%
ZSTD – HIGH	3,454	16%	1.59 TB	3,262	8%

The backups were all capped out by the storage, which has a maximum throughput of 2,400 MB/s. In fact, all the output messages said things that reflected almost exactly that figure, e.g.:

1 2	BACKUP DATABASE successfully processed 1002629998 pages in 3273.557 seconds (2392.824 MB/sec). RESTORE DATABASE successfully processed 1002629998 pages in 3264.211 seconds (2399.675 MB/sec).

This means, in GCP, at least with our current chosen disk (Hyperdisk Balanced), no amount of compression is going to improve our backup duration, unless we introduce more drives capable of the same throughput (on both the read and the write sides). It’s not likely we’ll be upgrading our storage across the board anytime soon as our current solution performs perfectly fine. Plus, we back up across the network or to S3, so faster local drives will still be subject to network and other constraints anyway.

What was important for me from this initial test was saving more than a quarter of a terabyte in the final output. This came at the cost of a modest and acceptable increase in CPU (~2-3%) and an equally acceptable uptick in duration (~5%).

What if I enable trace flag 3042?

These tests were run without trace flag 3042, which means the .bak files were initially placed on disk at some inflated guesstimate of their full, uncompressed size (in this case, 2.48 TB total), then populated with data, and shrunk down at the end of the operation to < 2 TB.

3042 prevents that from happening. Under the trace flag, the .bak files are initialized at 0 bytes, grow to accommodate data as it is written, and then there is no shrink operation required at the end. This can be useful in scenarios where you’re desperately short on space.

I tried that:

1	DBCC TRACEON(3042, -1);

It made very little difference. Results were identical, down to the byte and second, except for a minor saving in backup time for ZSTD HIGH, and slightly faster restores across the board:

Test	Backup time (s)	Backup CPU	Backup size	Restore time (s)	Restore CPU
MS_XPRESS	3,274	13%	1.92 TB	3,266	6%
ZSTD – LOW	3,273	13%	1.92 TB	3,266	6%
ZSTD – MED	3,274	14%	1.68 TB	3,323	7%
ZSTD – HIGH	3,430	16%	1.59 TB	3,323	8%

I turned the trace flag back off.

What about a better disk class?

As its name implies, the Hyperdisk Balanced disk is, paraphrased, decent performance at a decent price. You can configure the number of IOPs you want (up to 50,000) and the throughput you want (capping out at 2,400 MB/s), and this just dictates how many network-attached disks will be behind that drive. What you actually get in performance also depends on the machine class you attach it to – an M3 with 64 cores will be capable of better I/O to these same disks than a C4 with 8 cores.

For our mix of machines, we landed on 20,000 IOPs and the max throughput for all of our database servers, after reviewing performance tests and taking budget into consideration.

Hyperdisk Extreme is a little bit different, in that you provision IOPs only (up to 350,000), and not throughput. At the top level, the documentation states you can get 5,000 MB/s. The performance here is expected to be far superior to the balanced, though we didn’t spend a lot of time testing this initially because it wasn’t feasible to outfit this disk type across the board. But I’m happy to test it in isolation!

I created a 20TB drive of this variety, with the max IOPs offered, then ran the tests again with this drive as the output. Just to see what would be possible given unlimited budget.

As it turns out, again, no big difference, except the duration of ZSTD HIGH came back in line with the other methods:

Test	Backup time (s)	Backup CPU	Backup size	Restore time (s)	Restore CPU
MS_XPRESS	3,279	13%	1.92 TB	3,280	6%
ZSTD – LOW	3,274	13%	1.92 TB	3,266	6%
ZSTD – MED	3,275	14%	1.68 TB	3,319	8%
ZSTD – HIGH	3,279	15%	1.59 TB	3,318	8%

While these disks might be capable of better performance for lots of small I/Os, streaming a backup is still limited by the bandwidth constraint of 2,400 MB/s, presumably because of the weakest link: the read side. In order to get better runtimes, I would likely have to have more capable storage on both sides.

While that is unlikely to happen in real life, let’s try it, for science, and just to be thorough. I created two more Hyperdisk Extreme disks, restored the copy of Stack Overflow to one, and ran the tests again with the restore destination set to the other. Results:

Test	Backup time (s)	Backup CPU	Backup size	Restore time (s)	Restore CPU
MS_XPRESS	1,581	24%	1.92 TB	1,606	14%
ZSTD – LOW	1,579	25%	1.92 TB	1,569	13%
ZSTD – MED	1,580	29%	1.68 TB	1,626	15%
ZSTD – HIGH	1,581	34%	1.59 TB	1,637	15%

With the read bottleneck reduced, all tests fared a lot better in terms of speed (a little better than half the duration), but doubled CPU or more – and, as expected, still resulted in the same compression ratio. Here’s an example from the output:

1 2	BACKUP DATABASE successfully processed 1002684270 pages in 1578.765 seconds (4961.771 MB/sec). RESTORE DATABASE successfully processed 1002684270 pages in 1566.904 seconds (4999.050 MB/sec).

Clearly SQL Server can work faster under more capable I/O – and would likely do even better if the surrounding hardware would let it.

This is still a win!

You would expect me to be disappointed sharing we were already hitting the limits on our storage, and that the boost from ZSTD won’t ever improve our backup durations on our current hardware. However, from the start, my primary excitement around this new feature was around the disk footprint, not how long a backup takes – we don’t have a dire need to make our daily backups faster, especially at the cost of more extravagant storage options that will be sitting idle most of the time. But it will be quite handy to have fewer bytes to shuffle around and archive.

If it can make this difference on the size of our Stack Overflow backups, once we’re on SQL Server 2025, I’m going to turn it on by default and stop thinking about it. If it has zero – or even negative – impact on the rest of the backups, it will be at worst a wash, because that database is so much bigger than all the others combined. And I’ll be happy going back to our 2.5 hour backups over the network and onto a less capable drive, knowing that they’ll be a significant percentage smaller simply by using a more efficient compression algorithm.

If I can get smaller files in the same amount of time, with only marginal extra CPU cost, and on the hardware we’ve already budgeted, that’s a win for me.

Register for Simple Talk

SQL Server 2025 : Zstandard Backup Compression

A very brief history of backup compression

Enter Zstandard

But do I want to switch?

Initial Tests

What if I enable trace flag 3042?

What about a better disk class?

This is still a win!

Article tags

About the author

Aaron Bertrand

Aaron's contributions

Articles

Books

Top topics

Aaron's latest contributions:

Common SQL Server Problems: Network Related or Instance Specific Error Occurred

Five changes to SQL Server I’d love to see

Days of the Week, Bitwise Edition

A very brief history of backup compression

Enter Zstandard

But do I want to switch?

Initial Tests

What if I enable trace flag 3042?

What about a better disk class?

This is still a win!

Article tags

Recommended

About the author

Aaron Bertrand

Aaron's contributions

Articles

Books

Top topics

Aaron's latest contributions:

Common SQL Server Problems: Network Related or Instance Specific Error Occurred

Five changes to SQL Server I’d love to see

Days of the Week, Bitwise Edition