String manipulation is an inevitable task for developers and data professionals alike. Despite all the best efforts to normalize databases, eventually we are faced with some sort of text-based data stored within a relational database and need to extract detailed information from it. Those of us who have tackled these challenges fully understand how code … Read more
It is always an afterthought. New objects are created that start off small and current. New feature development takes over and the recently architected data structures become old news. Over time, data grows and suddenly a previously small table contains millions or billions of rows. Is all that data necessary? How long should it be … Read more
Packing intervals is a classic SQL task that involves packing groups of intersecting intervals to their respective continuous intervals. In mathematics, an interval is the subset of all values of a given type, e.g., integer numbers, between some low value and some high value. In databases, intervals can manifest as date and time intervals representing … Read more
Nearly a decade ago, I wrote a post called “Bad habits : Counting rows the hard way.” In that post, I talked about how we can use SQL Server’s metadata to instantly retrieve the row count for a table. Typically, people do the following, which has to read the entire table or index: To largely … Read more
Tell me if you’ve heard this one before: I changed data from my application, but when I checked the database, I couldn’t see the change! I’ve seen this. Loads. It can be quite perplexing for folks because they expect to see an error message if the insert, update, or delete failed. I put this post … Read more
Recently someone posted a question where they couldn’t quite figure out how to construct a predicate based on a bit parameter. They tried to write a procedure like this, which wouldn’t parse, of course: I explained that you can’t have control-of-flow inside a SQL statement like that, at least not in T-SQL. And that the … Read more
There are plenty of applications and tools available that allow for the movement of data in and out of SQL Server. Some tools are built by Microsoft, such as SSIS or Azure Data Factory. Others are created by third parties, such as Databricks or Snowflake. Still other available options make use of SQL Server features … Read more
I’ve quietly resolved performance issues by re-writing slow queries to avoid DISTINCT. Often, the DISTINCT is there only to serve as a “join-fixer,” and I can explain what that means using an example. Let’s say we have the following grossly simplified schema, representing customers, products, and product categories: And then we have tables for orders … Read more
While this article is specifically geared to SQL Server, the concepts apply to any relational database platform. The Stack Exchange network logs a lot of web traffic – even compressed, we average well over a terabyte per month. And that is just a summarized cross-section of our overall raw log data, which we load into … Read more
When people start learning a new field, for example T-SQL, it’s tempting to spend very little time trying to understand the fundamentals of the field so that you can quickly get to the advanced parts. You might think that you already understand what certain ideas, concepts and terms mean, so you don’t necessarily see the … Read more
If you haven’t already heard, SQL Server 2022 introduced a new built-in system function called GREATEST. Simply put, it is to a set of columns, variables, expressions etc. what the MAX function is to a set of values (i.e., rows) of a single column or expression. The opposite of GREATEST function is LEAST function which … Read more
In 2012, Aaron Bertrand said most everything I knew (and a bit more) about the issues with using the sp_ prefix. Procedures prefixed with sp_ have special powers when placed in the the master database in that it can be executed anywhere on the server after that. Nothing much has change in those suggestions. It … Read more
At Stack Overflow, our environment has multiple implementations of a largely – but not 100% – identical schema. (By “schema,” I mostly mean the set of tables within a database.) I sometimes need to retrieve or update data across a large number of sites. Let’s say, pulling or removing information from the Users table, and … Read more
Did you know you can include a SELECT statement within another SELECT statement? When a SELECT statement is embedded within another statement it is known as a subquery. There are two types of subqueries: basic subquery and correlated subquery. In this article I will be discussing both types of subqueries and will be providing examples … Read more
A student of mine sent me a T-SQL challenge involving gap-filling of missing dates and balances. I found the challenge interesting and common enough and figured that my readers would probably find it interesting to work on as well. I’ll present the challenge and three possible solutions, as well as the results of a performance … Read more
This is the beginning of a series on SQL Server Dynamic Data Masking. Dynamic Data Masking is a concept familiar with all developers and users of sensitive data. It is implemented in SQL Server with simplicity and elegance, requiring minimal changes to front end applications, including reporting, and almost no changes to queries. The series … Read more
When I first saw a bullet item stating SQL Server 2022 would support instant file initialization for log file growth, I was excited. When I later learned it only applies to automatic growths, and only those of 64 MB or less, I was a little less excited. With those limitations, I was skeptical this enhancement … Read more
One of the more challenging technical details of columnstore indexes that regularly gets attention is the need for data to be ordered to allow for segment elimination. In a non-clustered columnstore index, data order is automatically applied based on the order of the underlying rowstore data. In a clustered columnstore index, though, data order is … Read more
I have come across a lot of use cases for manufacturing rows on the fly, aside from the common goal of populating a large data set such as a numbers or calendar table. A few favorites include building sample data, pivoting an unknown number of columns, data extrapolation, and filling gaps in date or time … Read more