Laying out SQL Code

It is important to ensure that SQL code is laid out the best way for the team that has to use and maintain it. Before you work out how to enforce a standard, one has to work out what that standard should be for the application. So do you dive into detail or create an overall logic to the way it is done?

‘This article is just rudimentary. What are u professional guys fussing about?’
Anonymous comment on Bob Sheldon’s article ‘Transact SQL formatting standards’

 SQL, unlike a language such as Python, doesn’t require any formatting, or laying out, of its code in order  to compile and run. On the other hand, the layout of SQL is important because SQL was always intended to be close to a real declarative human sentence, with phrases for the various parts of the command. It was written in the days when it was considered that a computer language should be easy to understand.  Laying SQL out carefully, and choosing sensible object names helps greatly.

You may feel compelled to format SQL code as if it were Pascal or Java, for the reason that you may have come from a Pascal background and find any other way of doing it disconcerting. Every language marks its practitioners for keeps. ForTran programmers tend to write thin columns of abbreviated code,  Java programmers often like their SQL code to be in lower case, and BASIC programmers never seem to get used to multi-line strings.  In fact, it is often possible to detect what language a database developer first cut his teeth on from looking at the way they format SQL. Because we all have had different influences on the way we write code, there will never be any agreement about the way that code is laid out.

There is no single correct way of laying out SQL or naming your database objects. However, when a developer is spending forty hours a week staring at SQL code, he or she gets to like it laid out to a particular style. Code in this style will be considerably easier for them to read and understand. Other people’s code looks all wrong.  This only causes difficulties when team members find no way of agreeing on a format, and much time is wasted lining things up or changing the case of object names before starting to work on existing code.

Why revisit this topic? It has been discussed in a number of places and was the subject of a book (Joe Celko’s SQL Programming Style).   The reason I’m interested is that a growing number of applications have been devised that will do this automatically, and I even wrote one myself in SQL that did this in a rudimentary way. The subject is even more interesting to publishers such as Simple Talk, because they have to render SQL code for publication  in a way that is generally acceptable. It isn’t easy to come up with the detail for doing this.

There are several things that need to be decided before you start a development project. One job that is worth tackling is to produce a style guide before any code is cut. Any style guide should, I think, cover Object Naming Conventions and Code Layout. I would keep the topic of structured code-headers, and code-portability, separate. Although ISO/IEC 11179 will help a great deal in defining a common language for talking about metadata, it is, inevitably,  less prescriptive when discussing the practicalities of a style guide for a project. I have not found any adopted standard at all for layout, so I hope I can help with some suggestions for this in this article.

Object Naming Conventions.

Object naming is really a different subject altogether from layout, and it is impossible to refactor the naming of SQL objects automatically as one can the layout.  It has to be done right from the start. Because object-naming is so bound-up with our culture, it causes many arguments in development teams.  There are standards for doing this (ISO/IEC 11179-5. Naming and identification principles for data elements), but everyone likes to bring their own familiar rituals to the process.  Here are a few points that cause arguments.

Tibbling.

The habit most resistant to eradication is ‘Tibbling’, the use of reverse Hungarian notation, a habit caught from starting out with Microsoft Access.  ‘Tibbling’ is named after the habit of prefixing the name of a table with the ‘tbl’ prefix, thereby making it difficult to pronounce. A tibbler will take a table that should be called ‘Node’, and call it ‘tblNode‘.  Stored procedures will be called something like spCreateCustomer and table-valued functions will be called tvfSubscription. This makes talking about your data difficult.  ‘Tibbling’ is now rather entrenched at Microsoft, in a mutated version that gives a PK_, FK_ IX_SP_or DF_ prefix to object names (but not mercifully to tables yet), so I doubt  that it will ever be eradicated amongst SQL Server programmers. It has never been part of any national or international standard for naming data objects.

Pluralising

A Pluraliser will always name a table after a quantity of entities rather than an entity.  The Customer table will be called Customers, and Invoice will be Invoices.  Ideally, the use of a collective name for the entities within a table is best, but failing that, the singular noun is considered better than the plural.

Abbreviating (or abrvtng)

An Abbreviator usually picks up the habit from interpreted procedural programs of trying to abbreviate all names, under the mistaken belief that the code will run faster, take less space or be, in some mystical sense, more efficient. Heaving out the vowels (the ‘vowel movement’) is a start, so that Subscription becomes Sbscrptn, but the urge towards the mad extreme will end up with Sn. I’ve heard of this being call ‘Custing’, after the habit of using the term Cust instead of Customer. To them, I dedicate the following routines

This habit came from the old ForTran days when you could only use 6 characters at the most. SQL 92 allows 18 characters, but SQL Server has no limit.

[Escaping]

Because spaces are not allowed in object names, unless the name is escaped, SQL names need some way of separating words.  One could write customerAccounts, CustomerAccounts, customer_Accounts or Customer_Accounts.  Yes, you need to make up your mind.

Because desktop databases such as Access are more liberal about the character set you can use for object names, the idea came of ‘escaping’, ‘quoting’, or delimiting  such names so that they could be copied without modification into a full relational database.  Those of us who take trouble to write legal SQL object names find the rash of square brackets that are generated by SSMS acutely irritating. Here is some code that really runs perfectly happily in SQL Server, purely because of the use of escaping with square brackets.

It is true that ‘delimited’ names used to be handy  for non-Latin languages such as Chinese, but nowadays you can use Unicode characters for names so this compiles perfectly happily

Herein lies another horrifying possibility. SQL Server will allow you to use ‘shapes’.

Restricting

A habit that has crept into SQL from ex-Cobol programmers, I believe, is the use of a very restricted vocabulary of terms.  This is rather like the development of cool street-argot with a highly restricted set of 400 words, rather than the 40,000 that are within the grasp of the normal adult.  With sql, this typically uses words like ‘GET’, ‘PUT’  or ‘SAVE’ in a variety of  SQL is perfectly happy to oblige, even though the results are difficult to understand. 

Taking this to extremes, this code is perfectly acceptable to SQL Server.

The existing standards for naming objects are more concerned with the way of discussing how you name database objects, and the sort of ways you might document your decisions. However, one can come to a reasonable basis for agreement with the following:

  • Procedures should be a phrase with singular nouns and a verb, in the present tense,  in it to describe what they do
  • Be consistent with the way you denote word-boundaries (the use of the underline character, the use of a capital letter, or hyphen.)
  • Tables, sets, views  and other collections should use  a collective name, a name for a group of entities, such as ‘flock’, ‘ledger’, ‘team’, ‘staff’
  • Scalar names  should be in the singular
  • Any object name should use only commonly-understood abbreviations. Such as ZIP for ‘Zone Improvement Plan’
  • Use standard and consistent  postfixes  (eg _ID, _name, _date, _quantity)
  • Use commonly understood words for relationship tables. (e.g. meeting, booking, marriage, purchase)
  • names should consist of one or more of the following components
    • Object Class:  The name can include just one ‘Object class’  which is the terminology used within the community of users of the application. Words like ‘Cost’ or  purchase
    • Property Term: Property terms represent the category of the data. e.g: Total Amount, Date, Sequence, size, height
    • Qualifiers These can be used if necessary to describe the data element and make it unique within a specified context, in no particular order, but they must precede the term being qualified. Qualifier terms are optional. Example: Budget Period
    • The Representation term This describes the representation of the valid value set of the data element. There should be only one, as the final part of the name, and it should add precision to the preceding terms. 

It isn’t always easy to come up with a word to attach to a table.

‘Not all ideas are simply expressed in a natural language, either. For example, “women between the ages of 15 and 45 who have had at least one live birth in the last 12 months” is a valid object class not easily named in English’.

ISO/IEC 11179-1:2004(E): Page 19

Coding layout

Indenting and line-breaking.

Line Breaks

SQL code doesn’t have to be broken into short lines like a Haiku poem.  Since SQL is designed to  be as intelligible as an English sentence, it can be written as an English sentence. It can, of course, be written as a poem, but not as a thin smear down the left-hand side of the query window. The urge to insert large numbers of line-breaks comes from procedural coders where a vertical style is traditional, dating back to the days of FORTRAN and Basic. An advantage of the vertical style is that, when an error just reports a line-number, it takes less time to work out what the problem was. However, it means an over familiarity with the scroll-bar, if the routine runs to any length.

Indenting and Line-Breaking is done purely to emphasize the structure of SQL, and aid readability.  line-breaks have to be inserted at certain points (I rather like to have a line-break at around the 80th character), and they shouldn’t be mid-phrase. However, to specify that here must always be a line-break between each phrase (before the FROM, ON, and WHERE clause, for example) can introduce an unnecessary amount of white-space into code.  Such indenting should never become mere ritual activity to make things look neat, like obsessively painting the rocks in front of your house with white paint.

Generally, the use of tabs for doing indenting has resulted in indenting that is way too wide.  Of course, written text can have wide indents, but it isn’t done to around eight levels, skidding the text hard-against the right-hand side of the page. Usually, two or three spaces is fine.

Indenting

Code without indenting is very difficult to follow. Indentation follows a very similar practice to a structured document where the left margin is indented according to the nesting of the section heading.  It should be a fixed number of spaces for each level of nesting.

It is at the point where we need to decide what comprises a change in the nesting level that things get difficult.  We can be sure that, in a SELECT statement, all clauses are subordinate to the SELECT.  Most of us choose to indent the FROM or the WHERE clause at the same level, though, but one usually sees the lists of columns indented.  On the other hand, it is quite usual to see AND, ORDER BY, OR, and so on indented to the next level.  What rules lie behind the current best-practice?  Many of us like to have different rules for DDL code, such as CREATE TABLE statements,  from DML such as INSERT, UPDATE or SELECT  statements.

Formatting lists

Lists occur all over the place in code. As in printed text, you can handle them in a number of different ways.  If, for example, you are just listing entities then you’d do it like this. I like many French cheeses, including  Abbaye de Belloc, Baguette Laonnaise, Brie de Melun,  Calenzana,  Crayeux de Roncq, Esbareich, Frinault, Mixte, Pave du Berry, Port-Salut, Quercy Petit, Regal de la Dombes, Sainte Maure, Sourire Lozerien, Truffe and Vignotte. Now no typesetter would agree to arrange this in a vertical list because the page would contain too much white space.

I like many French cheeses, including

  • Abbaye de Belloc,
  • Baguette Laonnaise,
  • Brie de Melun, 
  • Calenzana,  
  • Crayeux de Roncq,
  • Esbareich,
  • (…etc …)

…and they’d be most unlikely to want to put commas at the beginning of list elements. However, if the list elements consisted of longer strings, then it would be perfectly acceptable.  In the same way, the rules for formatting SQL have to take into account the type of SQL statement being formatted, and the average length of each list element.

Punctuation

Commas used as list-separators are often put at the beginning of lines. I realise that it makes the ‘commenting out’ of list-members easier during development, but it makes it difficult for those of us that are used to reading English text in books. Commas are at the end of phrases, with no space before them, but if they are followed by a word or phrase on the same line, then there is a space after the comma.  Semi-colons are a rather more unfamiliar, but increasing, punctuation mark in SQL.  Their treatment should be similar.  In development work, one can, and should, do all sorts of wild formatting of SQL, but once it is tested, and  ‘put to bed’, it should be tidied up to make it easier for others to understand.

Capitalisation

Before we start. I’d like to define what I mean by the various terms

  • This_Is_Capitalised

  • This_is_lowercase -or minuscule

  • this_Is_Camelcase

  • THIS_IS_UPPERCASE-or majuscule

Schema objects are, I believe, better capitalised. I would strongly advise against using a binary or Case-sensitive collation for the database itself, since this will cause all sorts of unintended errors. A quirk of all European languages is that words mean the same thing whether capital or lower-case letters are used. Uppercase, or majuscule, lettering was used exclusively by the Roman Empire, and lowercase or minuscule was developed later on  purely as a cursive script. The idea that the case of letters changed the meaning of words is a very recent novelty.  Of course, one would not expect programmers to be so perverse as to do this sort of thing, but I’ve seen code that approaches this scale of awfulness in C#

So get off the fence

I wouldn’t want to impose my views on anyone else. However, if you are looking for recommendations, here what I usually suggest. I’d stick to the conventions of

  • Keeping your database Case-insensitive even if your data has to be case-sensitive, unless you are developing in a language for which this is inappropriate.
  • Capitalising all the Scalars and Schema object names (e.g. Invoice, Basket, Customer, CustomerBase, Ledger)
  •  Uppercasing all reserved words (such as SELECT, WITH, PIVOT, FROM, WHERE)
  • putting a line-break between list items only when each list item averages more than thirty or so characters.
  • putting block delimiters (such as BEGIN END)  on a new line by themselves, correctly indented
  • putting line-breaks within SQL statements before the clause (FROM, ON, WHERE, HAVING, GROUP BY) only where it aids clarity in long statements, but not in every case.
  • use the semi-colon to aid the reading of code, even where SQL syntax states that it is only optional.
  • using an increased indent for subordinate clauses  if the ON, INTO, and HAVING statement is at the start of the line

For sheer practicality, I’d opt for a layout that can be achieved automatically by your favorite code-layout tool (I use SQL Refactor or SQL Prompt, but there are several others) There is nothing more irritating than to find that someone has trashed a beautifully laid-out procedure by mangling it with a badly-set-up layout tool.

I tend to write my SQL fast and sloppily, to get some initial results quickly, and then refining and rewriting the code until it is fast and efficient. At that point, it is usually a mess and it it very satisfying to run it through a layout tool to smarten it up. In fact, some time ago, before layout tools existed for SQL, I created a stored procedure that tidied up SQL code. It gradually ended up as the SQL Prettifier, repurposed to render SQL in HTML, and with the formatting part taken out once SQL Refactor appeared.  A tool like this can save a lot of inevitable arguments amongst developers as to the ‘correct’ way to format SQL code.

Further reading