The Basics of Good T-SQL Coding Style

TSQL Code must work properly and efficiently. That's not enough though. Unless you are working alone, have perfect memory and plan to never change job, then you need to comment and document your code, it must be inherently readable, well laid out, use informative and obvious names, and it must be robust and resilient; written defensively. It must not rely on deprecated features of SQL Server, or assume particular database settings. Robert Sheldon starts a series of articles that explains the basics.

The series so far:

  1. The Basics of Good T-SQL Coding Style
  2. The Basics of Good T-SQL Coding Style – Part 2: Defining Database Objects
  3. The Basics of Good T-SQL Coding Style – Part 3: Querying and Manipulating Data
  4. The Basics of Good T-SQL Coding Style – Part 4: Performance

Everyone has an opinion when it comes to writing T-SQL code, whether talking about formatting, variable assignments, system functions, ANSI standards, or any number of issues. And those opinions can vary far and wide, as evidenced by the multitude of forum postings and heated article comments. Despite these differences, most would agree that turning out readable and high-performing code is the best solution for everyone. The trick, of course, is getting consensus on how to achieve that goal.

Often a good place to start within an organization is to create a set of coding standards that T-SQL developers can help to define and then reference when building their solutions. In this way, the standards provide a focal point for ironing out differences and working toward a common purpose.

To help with this process, this article and the ones to follow will look at various considerations to take into account when trying to formulate T-SQL standards in your organization. The series will cover such topics as readability, accuracy, performance, standardization, and other factors. The series in not intended as a prescriptive how-to list that dictates what approaches you should take, but rather as a set of guidelines for evaluating the types of issues you should address when coming up with your own approach to T-SQL coding.

Because this is the first article in the series, I focus on general considerations that can apply to your code regardless of the types of T-SQL statements you’re defining, covering such issues as formatting, system functions, deprecated language elements, and user-defined variables. In the articles to follow, I’ll go deeper into the specifics of defining database objects and querying SQL Server data. With that in mind, let’s get started.

Readability and other oddities

A great number of articles have been written about T-SQL formatting, and for good reason. At some point, most code will need to be revisited, and a lot of time can be wasted just trying to understand what is going on, let alone identify potential problems. The sloppier the formatting, the more difficult it is to decipher and troubleshoot the code.

Take, for example, the following USE and SELECT statements, which are written in all lowercase, with no line breaks or spaces except where required:

By default, SQL Server is configured with a case-insensitive collation, so there is no reason you can’t take this approach, or take an approach similar to the next example, which uses all uppercase characters:

Again, this is perfectly permissible. It’s just unreadable. Imagine having to review a script that contains hundreds of lines of such code. Even code that is not nearly this bad can present challenges if it is inconsistent and put together haphazardly. But look at what happens when we break the SELECT statement into multiple lines, indent certain elements, make keywords all uppercase, and add a comment to explain the statement:

Certainly it would be a lot easier to review code like this than what we saw in the proceeding examples. The careful use of line breaks, spaces, capitalization, and indentation makes it infinitely easier to review T-SQL code, especially when everyone is following the same standards.

But before we go further—and I get slammed with a horde of comments about the “proper” way to format code—let me make it clear that I am not advocating you follow the formatting I have used here, or any particular formatting styles for that matter. Issues such as capitalization and indentation have been discussed and argued in countless places, with opinions ranging across the spectrum of database developers.

I am merely making the point that all T-SQL should be consistently and carefully formatted so it is readable and understandable by everyone who reviews it. To this end, you and your team should take into account a wide range of factors, such as capitalization, indentation, and the judicious use of line breaks.

When deciding how to address such issues, you’ll have to weigh each one carefully because there are often good arguments for and against a particularly convention or approach (along with an endless stream of sentiments). Just be sure you take into account all those niggling little details, such as whether to add a space on either side of a comparison operator, when and how to indent, or where to place commas in the select list. But be prepared for the ensuing battles.

With that said, let’s move onto comments, which provide a simple and effective mechanism for describing what’s going on in the code, making the code easier to understand, review, and troubleshoot. Carefully placed comments can save everyone a heap of time and might even help to point out inconsistencies or reveal other potential problems. Even if you’re reviewing your own code, you’ll appreciate the careful documentation, especially if you wrote that code several years earlier.

Related to this, you should also consider whether to include extended proprieties when creating database objects and, if so, when and how to use them. Extended properties can help to provide further insights into the database and make it easier to document the database and its components.

Another way to make your code easier to understand is to use parentheses when appropriate. In the preceding example, I added parentheses to the WHERE clause to demonstrate this concept. Although they’re really not necessary in this case, they serve as a reminder, just like the comment. Parentheses can help clarify a statement’s logic to make code reviews faster and easier and to ensure that you’re implementing the correct logic in your conditional clauses.

Also be aware that it is generally considered a best practice to terminate your statements with the semicolon. Although SQL Server does not require the semicolon in most cases, Microsoft has been threatening that in an eventual release, the semicolon will become mandatory. It has been part of the ANSI standards for quite a while now.

You should also try to avoid the use of GOTO commands, which can make code difficult to review, especially if there is a lot of code and many instances of the command.

Again, the goal is to come up with a way to consistently format and present your T-SQL code so it is easy to read and understand and troubleshoot. How you get there is up to you and your team, but the sooner you get those standards in place, the better for everyone.

For a good perspective on issues related to formatting—and T-SQL coding in general—you might want to check out some of Joe Celko’s articles, starting with Formatting SQL Code – Part the Second. He often explains the logic and history behand his reasoning, which can help to inform your decisions, even if you decide to go a different route.

Getting your code right

Even more important than how you format your code is getting it right, and that includes making sure you’re using the most current T-SQL elements and you’re using them correctly. Like it’s ANSI counterpart, T-SQL is an evolving language, so you must be diligent to ensure you’re not using outdated or inappropriate syntax.

An often cited example of stale code is a join based on pre-ANSI SQL-92 standards, in which the join condition is defined in the WHERE clause, as shown in the following example:

Although SQL Server still supports this approach, the general consensus among the database gurus is that we should all be sticking with the newer model, which is to include the join condition in the FROM clause:

A good reason for taking the second approach is readability. It is much easier to pick out the join condition if it is not buried in the WHERE clause with several other conditions.

The second approach has another advantage. If you forget to include the join condition in either statement, the first example returns a cross join, which can include an enormous amount of data, whereas the second statement returns an error, letting you know that something is amiss.

Another example of outdated code can sometimes be seen when a T-SQL statement includes a TOP expression. In the past, you would specify the numeric expression without parentheses, as in the following example:

Although this still works in SQL Server, the correct syntax is now to include the parentheses:

Although you can use either approach, there’s no telling when and if Microsoft might pull the plug on the first one.

A database team must be proactive when it comes to outdated code. Many T-SQL and ANSI SQL elements have been deprecated over the years or are slated for deprecation at some point in the future. Chances are good you have deprecated elements lurking somewhere in your production code, given how the standards are always changing. For example, in SQL Server 2016, the SET ROWCOUNT option has been deprecated for INSERT, UPDATE, and DELETE statements. So too have the data types text, ntext, and image. Even the CREATE DEFAULT and DROP DEFAULT statements are (finally) approaching their sunset years.

When you’re defining your coding standards, be sure to specify how to handle deprecated elements. It’s easy enough to say that developers should avoid them, but it is seldom that simple, so you might want to also explain when and how they should be removed when discovered. Fortunately, Microsoft provides details about what’s being deprecated with each SQL Server release. You’ll find a list of the latest items in Deprecated Database Engine Features in SQL Server 2016.

Now let’s look at another example of questionable code. The following select statement does something no developer should do—use a comparison operator (not equal) against a NULL value:

Although the table includes rows with a Title value of NULL, this statement returns no rows nor does it return an error. If you’re not careful, you could end up with incorrect results without realizing there’s a problem. For this reason, you should avoid this type of construct and instead use the IS NULL or IS NOT NULL operator to return the correct rows, as in the next SELECT statement:

Of course, this is only one example of the many places developers can run into issues when working with NULL values. If you want to learn about other pitfalls, check out my article How to Get NULLs Horribly Wrong in SQL Server. You might want to include the more common pitfalls in your coding standards to help developers along.

You will also need to decide about such issues as column alias assignments. One approach often taken is to specify the alias first, followed by an equal sign, as shown in the following example:

Another approach is to add the alias at the end, separated by the optional AS keyword:

Proponents of the first approach claim that it is more readable, which is often true, especially when you have a lot of columns with aliases. However, proponents of the second approach also claim better readability, arguing that the first approach can be confused with variable assignments. They also point out that their way is consistent with the ANSI standards and with how table aliases get assigned.

Like numerous T-SQL issues, you will find ample discussion and many, many opinions about this one. You will have to decide what is best for your organization. Just be consistent.

Another issue worth pointing out is the use of undocumented stored procedures. For example, the following SELECT statement uses the sp_mstablespace stored procedure to return the number of rows and amount of space used by the Person table:

The stored procedure currently works great and, like other undocumented procedures, can be quite handy at times. But undocumented procedures do not belong in a production environment. You have no way of knowing when Microsoft will change them or yank them altogether. You could end up with broken code and not even realize it.

There are, of course, all sorts of issues to take into account when standardizing and verifying your code. A big part of the equation is how important it is that your code comply with the ANSI standards. Some would argue that it should always comply with ANSI (at least as much as possible) because it makes the code more portable and familiar among developers who work with different database systems. On the other hand, you might lose important functionality by giving up the proprietary elements. You must decide what is best for your team and organization.

Functional misuse

SQL Server provides a number of system functions for carrying out various operations. As with any statement element, you must ensure that they are being used correctly. A misused function might not return an error, but it can return incorrect results without you realizing what has happened. Or it might return an error, but not in the way you expected or tested for.

For example, the ISNUMERIC function is notorious for returning unpredictable results, which we can demonstrate with the following example:

First, we create a table variable and populate it with different types of values, all passed in as strings. We then use the ISNUMERIC function to test whether a value is numeric. If it is (returns 1), we try to convert it to the INT data type. In this case, however, when the database engine hits the value $456, it baulks and returns the following error message:

The problem is that the ISNUMERIC function will sometimes call a value numeric that cannot be converted to a numeric data type, as with $456. It will even interpret values such as 7e9 and $., as numeric. A better solution is to use the TRY_CONVERT function to test whether a conversion will work before actually trying to convert the value:

Data conversion in general can be a slippery slope in SQL Server, so you must be on your guard. If you want to know about other ways in which conversions can cause your team big-time headaches, check out my article How to Get SQL Server Data-Conversion Horribly Wrong.

When working with system functions, you’ll run into those that can be used to perform similar operations or return similar types of information. For example, you can use either the CAST or CONVERT function to convert a value’s data type. The CAST function is part of the ANSI standards but has limited capabilities. The CONVERT function is not part of the standards but provides a number of options for formatting date/time values, although in many cases, it’s better to leave the formatting to the application tier. Regardless, you will need to make a decision about their use and how important it is to conform to the ANSI standards.

Another example of interchangeable functions is COUNT and EXISTS, when used to confirm the existence of certain data. For example, the following IF statement checks whether the Person table contains rows that have an EM value in the PersonType column:

Although this statement works just fine, you will see better performance when using the EXISTS function, especially for large data sets:

Function misuse is not always a function-to-function issue. For example, you might find that using the SCOPE_IDENTITY() function will provide more accurate information than the @@IDENTITY system variable, depending on the situation. Although both return the last identity value generated for a table in the current session, the SCOPE_IDENTITY() function applies only to a specific scope, but the @@IDENTITY variable does not, which can make a difference in the reliability of the returned value. For more information about this issue, see the SQL Server topic SCOPE_IDENTITY (Transact-SQL).

The important takeaway here is that that the use of system functions should not be arbitrary. Careful consideration should be given to which ones are implemented and how they’re used, just like any other T-SQL element.

Variables and parameters

There’s an easy rule to follow when using variables: Make sure you do it right. Let’s look at a simple example that demonstrates some of the issues that can arise:

In this short set of T-SQL statements, we’ve managed to commit a number of coding blunders:

  • We declare @b and assign a value, but never use it in our SELECT statement.
  • We declare @c with the VARCHAR(2) data type, rather than using the CHAR(2) date type.
  • We declare @d as VARCHAR, without specifying a length, and we do not assign a value. We then go on to use the variable in our SELECT statement.
  • We use @f in our SELECT statement, although we never declare it or assign a value.

Obviously, you should avoid any of these scenarios, but when you’re working with complex scripts that contain hundreds of rows of data, it’s easy to miss a few undeclared or unused variables. Fortunately, SQL Server will let you know about variables you try to use without declaring them, but not about the other scenarios. At the very least, your coding standards should include a few reminders so developers check that they’re using variables properly.

While we’re on the topic of variables, another issue worth raising is the use of SELECT to assign variable values. The advantage to this approach is that we can assign values to multiple variables in a single statement, something we cannot do with SET. On the other hand, this approach is not part of the ANSI standards, and it can be confusing when presented in proximity with other SELECT statements. Again, you’ll have to make the decision based on what works best for your organization. Just be consistent.

Wonderful world of collations

When developing or reviewing T-SQL code, you must also take into account how collations are being implemented at the server, database, and column levels. If you use only the collation assigned at the server level throughout your databases, there should be no problem. However, if your databases and columns use collations different from the server default, then collations can become an issue.

Suppose your SQL Server instance is configured with the SQL_Latin1_General_CP1_CI_AS collation and you then create the following database and table:

You’re now dealing with different collations at the server, database, and column levels, which can impact how you query data, create temporary tables, define foreign keys, and take other steps. For this reason, when developing your coding standards, you should address the use of collations and the impact of implementing them at the various levels. For more information about working with collations, consider checking out my article Questions About SQL Server Collations You Were Too Shy to Ask.

Just the beginning

There is, of course, much more to good coding practices than what we’ve covered here. In this article, which is the first in the series, we’ve focused primarily on general issues related to code formatting and accuracy, topics that can apply to a wide range of T-SQL statements. In the rest of the series, we’ll cover more specific topics, such as defining database objects, querying and manipulating data, and maximizing performance.

Throughout the series, we’ll focus on the types of issues you should be aware of to ensure high-quality, high-performing T-SQL code. The goal here is not to tell you how to write your T-SQL statements or what format you should be using, but to help you assess what steps you need to take to ensure that your team is working toward a common strategy of good coding practices.