Transact-SQL Formatting Standards (Coding Styles)

How should SQL code be formatted? What sort of indentation should you use? Should keywords be in upper case? How should lists be lined up? SQL is one of those languages that will execute anyway however you treat whitespace and capitalization. However, the way SQL is laid out will effect its readability and the time taken to review and understand it. Standardisation of code layout is an important issue, but what standard should you adopt? Rob avoids a direct answer, but tells you the sort of answers you'll need to decide upon when creating a strategy for formatting SQL code.

Few topics generate as much debate among database engineers as how Transact-SQL code should be formatted. For every issue raised, you’ll likely receive twice as many opinions as there are people. Yet standardizing the T-SQL code across an organization is essential to facilitating effective and efficient code reviews, troubleshooting T-SQL script, supporting joint development efforts, and handing off projects from one group to another. DB engineers might disagree about what the code should look like, but few will question the wisdom of implementing such standards.

Yet as anyone knows who’s participated in this process, it’s no small task. And the more DB folks involved, the more Herculean the project. But you’ve got to start somewhere, and that’s what this article is all about-to provide you with an outline of the types of decisions you’ll need to make when trying to standardize your code’s format. Whether you’re working with beautification software such as the Format SQL feature included in Red Gate’s SQL Prompt or formatting all your code manually, you need to take into account a variety of factors when determining how to style your T-SQL script.

A word of warning, however. This is not a discussion about optimizing your code, maximizing performance, or determining what policies to implement to address such issues as indexing, using GUIDs as primary keys, whether to store BLOB content, how to set security policies, and so on. Of course, these are all important considerations, but they’re not what we’re focusing on here.

In fact, because of the volatile nature of discussions about formatting, I will try to hold back on my opinions altogether, and instead focus on the issues themselves, and not their resolution. The final decisions about how you format your code are up to you. What I plan to provide here are the questions you should ask yourself as you’re mapping out your formatting strategy. I also provide examples that demonstrate different ways you can format your code.

Case (Capitalization)

Case refers to the way in which T-SQL should or should not be capitalized in your code. For example, some developers prefer to make all reserved keys uppercase, others prefer lowercase, and some mix and match. It’s all a matter of preference. When determining what strategies to implement with regard to case, you should take into account the following considerations:

  • How should you categorize Transact-SQL in terms of keywords, data types, user-defined objects, etc.? For example, should all reserved words be treated as one category, or do you prefer to treat data types differently from other keywords? What about user-defined objects, such as tables and views? Some sources break this category down into scalars (columns, parameters, variables) and schema objects (tables, views, stored procedures). Others make no distinction and treat all T-SQL elements as a single category.
  • What case rules will you apply to each category? For example, should keywords be all uppercase, and data types all lower case? Should you use camel case (compound words with an initial capital) for user-defined objects such as table names?

Let’s look at some examples that demonstrate different capitalization strategies. In the first statement, all T-SQL reserved words are uppercase, but user-defined object names, such as tables and columns are camel case, with the first letter capitalized:

In the next example, all the reserved words are lowercase, and the user-defined object names are camel case:

Here’s one other example. This time, the keywords are uppercase, the data types lowercase, and the object names camel case:

The key to a capitalization strategy should be readability. There’s no real right or wrong, as long as the standards are applied consistently throughout the organization (and you’re consistent with the collation of the identifiers used in the target databases).

Object References

When you reference an object such as a table or column, you must decide about such issues as whether to qualify the object name, whether wildcards are permitted, and so on. The following questions provide an outline of the types of details you should take into account when deciding how to reference objects:

  • How extensively should you qualify object names? For example, should you always include the schema name with a table even if the schema is dbo? Should you include the database name as well?
  • Do different types of code call for different naming strategies? For instance, should you reference objects in a stored procedure differently from how you would reference objects in a data definition language (DDL) statement?
  • Is the asterisk (*) wildcard permitted in SELECT lists rather than specifying the column names? Are there circumstances when a wildcard would be permitted? Although most guidelines recommend against using the wildcard in a SELECT list, you cannot assume that everyone follows these guidelines. For this reason, you need to be very specific about these sorts of style decisions.
  • Should column names always be specified in INSERT and UPDATE statements? For some of these statements, it’s not necessary to include the column names, but best practices generally recommend that you do include them. As with the use of wildcards in SELECT lists, you should be very specific.
  • Is the use of column numbers in place of names permitted in your T-SQL statements? For example, you can use column numbers in the ORDER BY clause when referencing columns in the SELECT list.

Let’s take a look at a few examples that demonstrate some of these issues. In the first example, a wildcard is used in the SELECT list, in place of the column names:

Now compare this example to the following example, which specifies the column names in the SELECT list:

 

Clearly, there’s no doubt which columns should be retrieved. Even if new columns are added to the table, the code will still work. Notice also in this example that the table name is qualified with the database and schema names. Again, you must decide how to handle these sorts of issues. Now take a look at an example that includes an ORDER BY clause. In this case, the clause uses an integer to refer to the DocTitle column, the second column in the SELECT list:

 

Most best practices guidelines recommend against using the integer in this way, but like the use of the wildcard in the SELECT list, it’s still done. In the next example, the ORDER BY clause specifies the column name:

As you can see in this example, the SELECT list includes the column names, rather than a wildcard, the table name is qualified with the database and schema names, and the ORDER BY clause contains the column name. Your standards must be very specific about all these types of issues and not assume anything.

Now let’s take a look at another issue. In the next example, an INSERT statement adds data to the ProductDocs table without specifying the column names:

Depending on the table definition, you can often insert data without specifying the columns. However, if the table definition should change, this statement would fail. For that reason, best practices guidelines often specify that the column names should be included, as the in following example:

Notice in the last two examples that the table name has been qualified with only the schema name. Again, this is a decision that you’ll have to make when you define your policies about calling objects.

Aliases

Aliases let you assign temporary names to objects (or assign names to computed columns) so they’re easier to work with when writing and reviewing code. When determining what strategies to employ for aliases, you should take into account several considerations:

  • Will you implement a different strategy for different alias types? For example, should table aliases be treated the same as column aliases? Should all column aliases be treated the same, or should computed columns be treated differently from renamed columns?
  • What types of alias names are permitted? Can single letters be used for table aliases? Can abbreviations be used for any type of aliases? Should column aliases follow the same naming conventions as those used for object names?
  • Should the AS keyword always be used? Never used?

In the following example, the column alias (FullName) is a compound word, but the table aliases are single letters:

In complex joins, single-letter aliases can sometimes make the code more difficult to follow because it’s not always intuitive which columns are associated with which tables. The following example uses more meaningful abbreviations for the table aliases:

Notice the use of the AS keyword in the last two examples. In the first example AS is used. In the second example, it is not. Again, your formatting standards should specify whether the AS keyword is included.

Commas

One of the quickest ways to wreak havoc among T-SQL developers is to start a discussion about how commas should be treated within T-SQL code, particularly in a SELECT list. Even so, a standard should be set, and to do so, you should take into account several factors:

  • Should commas be treated differently in different circumstances? For example, should columns in a SELECT list be treated the same as columns in an ORDER BY clause? What about the commas used in a list of parameters?
  • For object names that are separated into multiple lines, as is sometimes the case for a SELECT list, should commas end the line or begin the line?
  • Should spaces or tabs ever be used before or after the commas?

The questions are pretty straightforward; an agreed-upon solution is not. Take a look online and you’ll see endless comments dedicated to this issue. Before you do that, however, let’s look at a few examples of how commas can be treated. The following SELECT statement includes two series that require commas-the SELECT list and the ORDER BY clause:

Notice that in the SELECT list the commas are placed at the end of the column names. However, in the ORDER BY clause, the column names are on one line, so the comma comes after the first column name, followed by a space. Another approach you can take is to precede the column names in the SELECT list with a comma, as in the following example:

In this case, the comma is placed directly in front of the column names in the SELECT list. However, you can also choose to add a fixed number of spaces after each column:

Now let’s take a look at an example in which the ORDER BY clause is treated like the SELECT list:

As you can see, there are several approaches you can take with commas. And comma use is by no means limited to SELECT lists and ORDER BY clauses. In the following DDL statements, commas are used to separate the elements of the OBJECT_ID function and the column definitions:

In this case, the function parameters are treated like the column definitions. Undoubtedly, you’ll come up with your own strategies for how to handle such issues as functions and columns. However, whatever strategy you implement, it must take into account the various ways that commas are used. And part of those considerations will depend on how you space and align the various elements in your statements.

Spacing and Aligning

Everyone has an opinion about how his or her code should be spaced, indented, and broken across lines. There are as many ways to lay out your code as there are people developing that code. In fact, trying to determine the spacing and alignment of the various T-SQL elements might prove to be one the more challenging tasks in your standards process, but the more consistent the code across your organization, the better. For that reason, you should take into account a number of factors when planning your code’s layout:

  • What will the policy be for new lines (linefeeds)? For example, should there be a new line for each clause? Should there be a new line for each clause keyword and a separate line for the arguments to that keyword? For instance, should the FROM clause be on one line and the table name be on the next line?
  • What is the maximum number of characters permitted in each line? Should line widths provide for printing the code without adding unplanned line breaks?
  • How will you handle clauses that exceed the maximum line width? Should subsequent lines be indented?
  • What is the general policy for indentation? For example, in a SELECT statement, should each clause after the SELECT clause be indented? Should arguments to the clause be indented more?
  • Will you use spaces or tabs for indentation? If you use spaces, how many spaces per indent?
  • How will you handle joins in your queries? Will the JOIN and ON keywords start new lines? Should each clause in a join be on separate lines?
  • How should XML-related queries be handled? For example, how should you format a query that includes an XML method in the SELECT list? What if the method references a namespace?

Now let’s look at a few examples that demonstrate different spacing strategies. In the first example, each clause begins on a separate line:

Compare this example with the next one, in each clause and each set of clause arguments begin on separate lines:

In both cases, the code can be easily read because these are very simple statements. However, the more complex your statements, the more important it is to have formatting standards that can deal with these complexities. For instance, if your clauses don’t fit into a single line, you must determine how you will handle line wraps, such as the SELECT list in the following example:

Notice that the SELECT list now spans two lines and that the second line is indented. However, instead of taking this approach you might separate the elements in the SELECT list and ORDER BY clause onto separate lines:

Notice also that the elements that make up each clause are indented three spaces to clearly delineate each clause and their elements. In fact, there are numerous indenting strategies you can implement. For example, in the next SELECT statement, each clause after the SELECT clause is indented:

Another approach you can take is to indent the elements so they all start at the same margin, as in the following example:

            As you can see, all the columns and table names line up. Also, in this case, tabs are used instead of spaces to indent the elements. The use of tabs instead of spaces can initially be easier, but when you start moving code around and cutting and pasting, spaces seem more effective. But then again, this is a judgment call, one that you have to make yourself. In the meantime, take a look at the following example, which includes a join in the FROM clause:

            In this example, the JOIN keyword starts a new line, following by the joined table. The ON clause follows on another new line and is indented. However, you might take a different strategy when working with joins, such as that in the following statement:

This time the JOIN and ON keywords are added before the line break. Another strategy you might employ is to indent the entire join condition, as in the following example:

When setting up your formatting standards, you should also take into account XML-related queries. For instance, the following select statement uses the XML query() method to retrieve the job candidate’s name:

Notice how the method’s argument is set off from the rest of the SELECT list. Also notice that the namespace wraps to a second line. Another approach you can take is to isolate the method’s argument even further:

XML can be tricky when trying to implement formatting standards because the XQuery elements can become quite long, particularly when the namespace is being referenced, as in the examples above.

Code Blocks

For this article, I use the term code block loosely. In this case, it refers to any sort of block of code, whether it’s a TRY/CATCH construction, column definitions enclosed in parentheses, a complex Boolean expression, a subquery, or any other grouping of code. And the way you lay out code blocks is tied to the decisions you make about spacing and alignment. When determining how to format blocks of code, you should take into account the following considerations:

  • How should you handle code blocks such as BEGIN/END, IF/ELSE, or TRY/CATCH? Should you specify policies for each type of code block? How will you handle the elements in each block? Should inside elements be indented? Should the keywords themselves be in separate lines?
  • How will you handle code blocks when one is imbedded in another? For example, if you use BEGIN/END blocks within your IF/ELSE construction, how should each element be treated? Should you indent the BEGIN/END blocks? Should keywords such as ELSE and BEGIN be placed on separate lines?
  • How should subqueries be handled in your T-SQL code? Should they be treated the same regardless of where they occur? For example, should a subquery in a SELECT list be treated differently from a subquery in a WHERE clause? What about indentation? Should subqueries be on lines separate from other elements?
  • How should expressions be formatted in your T-SQL statements? Should they be separated from other elements in the statement? Should conditional operators be on separate lines?
  • How will you treat parentheses and braces in your statements? Should line breaks and indentation be used to set them off from other statement elements? Should their format be based on how they’re used? After all, parentheses are used for expressions, data definition language (DDL) statement, function parameters, and subqueries. Should they be treated differently in each case?

Let’s look at some code they help to illustrate these issues. In the following CREATE TABLE statement, the set of parentheses that encloses the column definition are set apart from the rest of the statement:

As you can see, the opening parenthesis is on a separate line, the closing parenthesis is on its own line, and everything in between is indented. However, parentheses and the elements they enclose can be treated in a variety of way. For example, the following CREATE TABLE statement, the new line comes after opening parenthesis and after the closing parentheses:

Another issue is how to handle code blocks such as IF/ELSE. In the following example, each keyword that starts or ends a block is on its own line:

In this case, only the contents of the BEGIN/END blocks are indented. However, because the BEGIN/END blocks are embedded in the IF/ELSE blocks, you might decide to indent to the BEGIN/END keywords as well, as shown in the following example:

Another consideration is how to handle expressions, such as the Boolean expression after the IF keyword. In the previous example, the expression is on the same line as the IF keyword. However, you can separate the expression elements as well, as shown in the following example:

Notice also that first BEGIN comes at the end of the Boolean expression and the second BEGIN is on the same line as ELSE. Again, you must decide how you want to handle each of these elements when defining your standards. Now let’s look at an example SELECT statement that includes a subquery:

As you can see, the subquery is an element in the SELECT list and is long enough to require two lines. You can also choose to separate out the subquery even more:

Again, all these details are the types of considerations you should take into account so that you can have consistent code across your organization.

Comments

Comments are fairly self-explanatory. If you plan to use them, then you should have policies that take into account their usage, in which case you should consider the following:

  • Should comments always be included in code files and T-SQL statement?
  • If opening comments in a code file are used, should they be standardized? For example, should they always be in the same format and include the same information, such as the developer’s name, revision history, and purpose of the code?
  • Should you use only one type of comment on your code? For example, should you use the slash/asterisk (/*…*/) convention, rather than the double-dash (–) convention?

The following example shows both types of comments supported by T-SQL and how comments can be used before or after the code, or within the code:

Naming Conventions

Determining naming conventions is not really a formatting issue, per se, but you’ll seldom get into discussions about formatting without eventually talking about how to name things, so I thought I would raise this issue so it’s on your radar as you set up your standards. That said, here are a few considerations to take into account:

  • Should you use different naming standards for different object types? For example, should you differentiate between tables, columns, stored procedures, functions, views, constraints, indexes, and so on?
  • If you implement naming standards based on the object type, what will the standards be? For example, should all stored procedure names be based on an action? Should primary key constraints be based on column names?
  • Should you standardize names for different types or groupings of objects within a database? For instance, should all tables related to marketing be prefixed with “Mkt”? Should all column names that specify an identification integer value end in “ID”? Should many-to-many bridge tables reflect the names of the tables they are bridging?
  • Should table and view names be singular or plural, such as BookTitle versus BookTitles? What about nouns that have a plural form, such as people versus person?
  • Should object names be prefixed or suffixed? Should the use of  prefixes or suffixes be based on the object type? For example, would you use “usp_” to prefix the names of user-defined stored procedures, or “pk_” to prefix the names of primary key constraints? Should some object types be prefixed but not others?
  • Can underscores (_) or other special characters be used in object names? What about keywords? Spaces? Should there be a policy that no name be used that cannot be called without enclosing that name in brackets?
  • Should abbreviations be used for object names? How about single letters?

Moving Forward

In addition to the number of issues I’ve raised above, you should also determine whether to develop a set of standards for each type of statement. For example, do DDL and DML statements require different rules. Your standards might also want to cover how statements are placed within a code file. For instance, should all variable assignments (DECLARE and SET) in a stored procedure be at the beginning of the procedure definition?

There will, of course, be other issues that you’ll want to address when putting together your formatting standards. But the questions above should, at least, provide you with a starting point. They should also demonstrate the breadth of this undertaking as you try to negotiate the many details that will need to be decided upon in order to implement a set of policies that define how your organization’s T-SQL code should be formatted.

Keep in mind, however, that even with the best effort, you’ll find that defining standards is an ongoing process. You should try to lock down and implement your styles as soon as possible, but know  you’ll inevitably run into situations when those standards don’t cover what you need or will have to be modified to meet changing requirements. For that reason, along with the standards, you should also put in place a process that allows you to implement changes as quickly and as smoothly as possible. And you should try to keep your standards documents as current as possible. In the meantime, this article should give you a sense of the types of issues you’ll need to tackle and the number of decisions you’ll have to make to implement a comprehensive and effective formatting strategy.