Columnstore Index Improvements in SQL Server 2025 

A stock image showing someone working on a laptop
Comments 0

Share to social media

Columnstore indexes are a powerful tool for storing analytic data directly in SQL Server. This feature has improved in every version of SQL Server since their inception over ten years ago, and SQL Server 2025 is no exception! 

The newest enhancements are laser-focused on business continuity and performance. Ordered clustered columnstore indexes, ordered non-clustered columnstore indexes, and database/file shrink operations are all given significant boosts that are worth the time to introduce and learn. 

In this article we will dive into each of these changes, how they impact columnstore workloads in SQL Server, and demonstrate their operation. 

Ordered Non-Clustered Columnstore Indexes 

Ordered clustered columnstore indexes were introduced in SQL Server 2022 and provided an optional ORDER clause to the index creation statement, allowing data to be pre-sorted on a key column or set of key columns. Sorted data allows for better segment elimination and can improve performance and compression on the table. 

SQL Server 2025 adds the same syntax to non-clustered columnstore indexes. This use-case is particularly interesting as prior to this feature, a non-clustered columnstore index would inherit its order from the clustered rowstore index. If analytics needs aligned with the clustered rowstore order, then performance would be quite good. If not, then there was little that could be done without creating new tables or re-architecting the existing one. 

To test this feature, a large rowstore table will be created: 

This T-SQL creates a table with 14.72 million rows in it. Not a huge amount, but enough to test columnstore index queries, performance, and metadata with meaningful results. 

The realistic use-case for a non-clustered columnstore index on this data would be a need for real-time operational analytics that filter primarily on OrderDate. Whether a quarterly sales report or a live sales dashboard, there are many ways this data can be sliced and diced. Consider this analytic query used to power a daily sales dashboard that includes current data, in addition to historical data: 

The results show a day-by-day analysis of a few sales-related metrics: 

Picture 1, Picture

A quick check of STATISTICS IO and the execution shows that this query is reading a LOT of data: 

Picture 2, Picture

Picture 3, Picture

Running this frequently against a large transactional table ultimately causes conflict with users that are trying to place orders, update them, or check on their status. In addition, if this server is not running Enterprise edition, then batch mode will not be used when processing this data. To alleviate this pressure, a non-clustered columnstore index is created on the table: 

Once created, performance improves greatly and the execution plan now reflects a columnstore index scan and significantly fewer reads: 

Picture 4, Picture

The improvement is noticeably better, but when I look closely, I notice that all 16 rowgroups in the columnstore index were read. This is denoted in STATISTICS IO via the segment reads, which show that zero were skipped. This indicates that no segment elimination occurred when executing the query. 

The ideal way to improve segment elimination and reduce the number of rowgroups that need to be read is by controlling data order. If this were a clustered columnstore index, the solution would be to rebuild the table with a rowstore index and swap it with a clustered columnstore index. Alternatively, an ordered clustered columnstore index could be used in SQL Server 2022 and higher. Since this is a clustered rowstore table, data order is directly inherited from that index into the non-clustered columnstore index. 

To resolve this, an ordered non-clustered columnstore index will be created:

This index will order the columnstore index rows by OrderDate prior to creating rowgroups. Once complete, the analytic query from earlier can be run again and its STATISTICS IO inspected: 

Picture 7, Picture

Five of 11 rowgroups were skipped and logical reads were reduced by about 1/3. The execution plan did not change as a part of this. The change was that the filter predicate on OrderDate was used to successfully skip 5 rowgroups worth of data, which equates to about 5 million rows or about 1/3 of the table. 

For real-time operational analytics where performance is critical, using an ordered non-clustered columnstore index can allow for segment elimination where normally it would not be possible. This provides additional performance gains over an unordered non-clustered columnstore index and comparatively massive gains when compared to querying a rowstore table directly. 

Online Index Rebuilds for Ordered Columnstore Indexes 

One of the significant drawbacks to ordered columnstore indexes when they were first released was that index rebuilds were an offline operation. While this did operate on a partition-by-partition basis, it meant that rebuilds would make data unavailable for the duration of the index maintenance operation. 

For clustered columnstore indexes, this was typically a bit more tolerable as maintenance windows are easier to secure for analytic data than transactional data. For the newly-introduced ordered non-clustered columnstore indexes, though, taking data offline while rebuilding is far less acceptable. 

SQL Server 2025 supports online index rebuilds for both clustered and non-clustered ordered columnstore indexes. This is a game-changer for this feature as it ensures business continuity when ordered columnstore indexes are created or rebuilt. Note that online rebuilds are an enterprise-only feature. If a server is not running SQL Server 2025+ enterprise edition, then this improvement will not be relevant. This change does not introduce online index rebuilds to standard, web, or other editions of SQL Server. Developer edition, being all-inclusive, does feature online rebuilds for any indexes that support them, which now includes ordered columnstore indexes. 

Consider the ordered non-clustered columnstore index that was created in the last demonstration: 

A rebuild of this index will lock it for the entirety of its runtime. Users trying to read or write data will be left waiting, which is not desirable for a typical transactional application. Here is the T-SQL to rebuild that index: 

In a separate query, we will check dm_tran_locks  for locks against the table while the index creation statement is running. While the index creation is running, the following are the results: 

Picture 9, Picture

Ah, well, that is awkward. The columnstore index is locked, and so the system view dm_tran_locks cannot return any data. Instead, the analytic query from earlier will be used to test it: 

As expected, results are not returned as the query as I was waiting for the rebuild to complete: 

Picture 10, Picture

As soon as the index rebuild is complete, the results are returned as expected: 

Picture 11, Picture

A small adjustment to the index rebuild statement allows it to be executed online, without needing to block users trying to access the table: 

When the analytic query from earlier is executed, results are immediately returned. Similarly, the query against dm_tran_locks not only executes immediately without waiting, but also returns results: 

Picture 12, Picture

The locks taken during the index rebuild are all schema and intent-shared locks, which ensure transactional integrity for the index being rebuilt, but do not block users from accessing data in the table. Finally, a check of the execution plan for the analytic query confirms that the non-clustered columnstore index was used to return its results: 

Picture 13, Picture

This confirms that the analytic query was using the index as it was being rebuilt online and was not using the rowstore index instead. 

For SQL Servers running enterprise edition, the ability to rebuild ordered columnstore indexes as an online operation greatly improves the usability of this feature. Index maintenance can run on columnstore indexes without fear of blocking users trying to access real-time operational analytics. 

One final (and critical) note on columnstore indexes: rebuild operations are not often needed. The reorganize operation performs a wide variety of useful tasks on columnstore indexes, including: 

  • Remove soft-deleted rows 
  • Merge undersized rowgroups 
  • Process the contents of the delta store into compressed rowgroups 

Rebuilds are rarely needed and typically are most helpful when data becomes unordered and segment elimination suffers enough that end-user performance is degraded. Because of this, when considering index maintenance processes for columnstore indexes, start with regular reorganize operations and only schedule rebuilds when the data order needs to be re-created. 

Database and File Shrink Operations 

Columnstore indexes are optimized for data-warehouse-style data. While strings can benefit from columnstore compression algorithms, the savings are not as pronounced. One of the biggest drawbacks to storing data in LOB string data fields was that data stored on those pages could not be moved during database shrink operations. The data types to be addressed here are: 

  • VARCHAR(MAX) 
  • NVARCHAR(MAX) 
  • VARBINARY(MAX) 

This meant that shrink operations often did not free up space in the amounts anticipated by an administrator. The direct result of this was wasted space. The most common workaround for this situation was to normalize string columns to a page compressed clustered rowstore table and provide a lookup ID in the columnstore index. While this is an effective workaround, the overall situation was still poorly documented. Many people were confused by shrink operations not behaving the way they expected them to

Starting in SQL Server 2025, when a clustered columnstore index contains any columns of the above data types, shrink operations can move data pages used by those columns. The two operations included in this change are DBCC SHRINKDATABASE and DBCC SHRINKFILE. 

This is a straightforward and beneficial change that impacts all editions of SQL Server. If you have a columnstore index that contains large string columns and the database is subject to database shrink operations, then improved savings will be provided by those operations. The impact of this will become more dramatic as time passes and the space consumed by string data increases. 

This improvement addresses a somewhat-edge case and will not impact all columnstore indexes, but is still a welcome change to those that need to reclaim space on a database with large columnstore indexes containing MAX length string columns. 

Conclusion 

Columnstore indexes continue to be improved with each version of SQL Server. This feature has grown and matured greatly over the years and is the ideal way to store analytic data in SQL Server. 

The newest updates with SQL Server 2025 are focused on improving real-time operational analytics. For workloads that depend heavily on non-clustered columnstore indexes or on clustered columnstore indexes that reside in application databases, these performance and availability updates are welcome changes. 

Ordered columnstore indexes help solve the challenge of segment elimination in a more automated fashion. Rather than needing to maintain data order via deliberate/scripted processes, allowing SQL Server to pre-sort data allows for index maintenance to re-order data, thus automating what had previously been a more complex task. 

Future versions of SQL Server will continue to improve columnstore indexes. Whether for traditional data warehouse/data lake scenarios or for real-time analytics in application environments, the use-cases for this feature continue to expand and become more compelling over time. 

If you store data in SQL Server and have yet to use columnstore indexes, this is an ideal time to get started! Try them out for analytic data, log/archive data, or where reporting data is needed quickly from transactional databases. What are your thoughts on these new columnstore upgrades? Let me know! 

Article tags

Load comments

About the author

Edward Pollack

See Profile

Ed Pollack has 20+ years of experience in database and systems administration, which has developed his passion for performance optimization, database design, and making things go faster. He has spoken at many SQLSaturdays, 24 Hours of PASS, and PASS Summit. This led him to organize SQLSaturday Albany, which has become an annual event for New York’s Capital Region. In his free time, Ed enjoys video games, traveling, cooking exceptionally spicy foods, and hanging out with his amazing wife and sons.