15 November 2018

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Phil Factor

15 November 2018

SQL Prompt Code Analysis: A Hint is Used (PE004-7)

Phil Factor suggests a philosophy of "the SQL query optimizer knows best" when it comes to choosing the right execution plan. Use hints as a last resort, and evaluate them carefully whenever SQL Prompt warns you of their presence in your SQL code.

Guest post

“Because the SQL Server query optimizer typically selects the best execution plan, we recommend that hints be used only as a last resort by experienced developers and database administrators” — Microsoft SQL Server Documentation

Really, there should be no discussion about this, because the above warning, in the documentation, summarizes it so well. However, I feel sympathy with the frustration of database developers who see the ever-increasing range of hints that are available or see the wide use of hints in commercial database applications, some of which sound like they can help solve their query performance problems. The dangers of hints need to be explained.

SQL Prompt has several performance-related code analysis rules that will warn you of the use of index hints (PE004), join hints (PE005), tables hints (PE006) and query hints (PE007).

Misuse of Hints

I was once a development team leader, managing the operations of a telecoms database system. When I found hints in the SQL Code, I put out a draconian ban on the use of all hints, except OPTION(RECOMPILE). It caused a little ill-feeling in the group, because they could see hints being used in the source of SQL Server, and they felt that it was an acceptable way to fine-tune their SQL. It seemed like the ‘Turbo Button‘ that once adorned the front of PCs that used to make everything run faster.

I relented to the point that I allowed any hint, as long as the developers were able to explain why they needed it, what it did, what the gain was, the consequences for scalability and the effects on ACID compliance. I asked them also to prepare a maintenance plan that identified and re-tested all hints in response to changes in the data distribution, or in response to an upgrade to a new SQL Server version, or application of a new service pack, to verify that the hint still offered optimal performance. The use of hints soon disappeared.

I would hate to imply that all hints are bad, only the reckless use of them. There will always be a rare requirement for rare requirement for a query hint, possibly, once in a blue moon, a join hint, and even a legendary table hint, despite the scary warnings, Microsoft keep introducing them. Most frequently, hints are used if the Query Optimizer can’t produce a good plan, or a plan that offers stable performance, which usually happens when it gets its estimate of the number of rows very wrong. This is more likely to happen as queries get more complicated, or they get too clever.

A better approach, when faced with a query that doesn’t get the best plan, is to back off and write it a simpler way. A slow-running query is nature’s way of telling you that you’re doing something the wrong way. Most often, you are using the wrong approach to get the information, or the database design is flawed in some way. This flaw is most commonly an indexing problem. I’ve almost never succeeded in successfully ‘hinting’ away a problem, even in what Microsoft tactfully calls ‘the last resort’. I realize that, in the real world of pragmatism, it is sometimes impossible to do more than a gaffer-tape remedy, but it is always worth remembering that an act of pragmatism can turn into a time-bomb when you upgrade the SQL Server version, or when either the data in the database changes or the indexes are altered and the optimizer can’t change the query plan in response.

Is it OK to use the NOLOCK hint?

Let’s take NOLOCK table hint as the classic example. It is misnamed, because the hinted query still obtains a Sch-S (schema stability) lock on the target table and, if used with a data modification statement, also exclusive locks. Really, it just ensures that the hinted query acquires no shared locks on that table and is essentially a directive that dirty reads are allowed, for the duration of the transaction. This means that any data modification transactions running concurrently will not be blocked from modifying data that a NOLOCK query is reading. Also, when the hint is applied to data modification transactions, another query can still read that ‘in-flight’ data. This means that it might read data that is never committed, perhaps because the hinted transaction subsequently rolls back.

If you really need to allow dirty reads on a table, for the duration of a transaction, then it’s better to use the equivalent, and more informative hint, READUNCOMMITTED. This is as if you had set the query to work at the READ_UNCOMMITTED isolation level. NOLOCK is an anachronism that is already in maintenance mode for data modifications (UPDATE and DELETE)

If you do use it then, naturally, in a hard-working OLTP system, you’ll get a frequent variety of errors depending on timings, but they can include errors for your transaction (e.g. error 601), incorrect aggregations such as totals, doppelganger data (data seen twice), time-travel data (column data from different ‘commits’ in the same row) or phantom reads. You can even get index corruption if you use NOLOCK on anything other than a SELECT.

One might think that query with a NOLOCK table hint is able to do its reckless work regardless of existing locks on table, but it is stopped dead in its tracks when another query holds a Sch-M (schema modification) lock that that is issued by a DDL operation. Also, a DDL operation, such as index maintenance, is prevented by a query using a NOLOCK hint, because it has issued a Sch-S lock.

If the reason for using the NOLOCK hint is to avoid the overhead of locking and blocking, then a far better approach is to use one of the snapshot-based isolation levels. One option is to allow transactions to use the default READ COMMITTED isolation level, but with the READ_COMMITTED_SNAPSHOT database option set to ‘ON‘. This means that queries see a transactionally-consistent snapshot of the data as it existed at the start of the statement. This avoids dirty reads, but without the need for transactions to acquire shared read locks.

To prevent all read phenomena (dirty reads, non-repeatable reads, phantom reads), you can set the ALLOW_SNAPSHOT_ISOLATION database option to ON, and then at the session-level, use the SET TRANSACTION ISOLATION LEVEL command to specify the SNAPSHOT isolation level. This ensures that statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.

Edge cases for use of NOLOCK

One can’t say that the use of NOLOCK is always wrong, especially as they are so evident in the source of the metadata functions on SQL Server. They are used there because a higher isolation level runs the risk of causing unwanted contention on non-user data structures. It is very much an edge-case.

It is difficult to argue that it is wrong to use it where a query must read a high proportion of rows in a large read-only table. However, even where NOLOCK is not a bad choice, it is still better to use isolation levels such as SNAPSHOT, or the snapshot-based version of READ COMMITTED, instead.

Another edge case is when you must execute diagnostic queries in a production database. The last thing you want to do is to accidentally block other users. In this case, there is value in a NOLOCK even though you can’t expect exact values.

Another time when you might justifiably use hints is as a temporary expedient, when you need to test out a badly performing query in development with different query plans to work out why the optimizer isn’t using the best-performing strategy. It can lead you to out-of-date distribution statistics or missing columns in a covering index.

I’ve heard it suggested that it is fine to use NOLOCK on a database that has only static data in it that isn’t going to be changed. However, such databases have a habit of suddenly changing data when you don’t expect it. If on the other hand, you set the database to READ_ONLY, you don’t need the hint anyway because shared locks aren’t taken out for a read-only database.

Dealing with bad hints you can’t delete

If you append OPTION(TABLE HINT(<tablename>)) to a query it will override (disable) any other table hints issued by that query. You can enforce this ‘override’ hint for all queries by adding it to a plan guide. Nobody is upset by the disappearance of a cherished hint, and developers can continue to believe that their hint is really doing a good job (see Override Bad Table Hints with A Plan Guide).

Conclusion

Normally, with SQL Server, the features work as advertised. Hints are an exception. They aren’t hints at all but directives. They generally do what they say they do but it isn’t always obvious that there is a big downside: they must be constantly reviewed and assessed as the data changes in either size or distribution, or as the indexing strategy changes. Whenever the SQL Server is upgraded, the hints must be rechecked.

They remove the query optimizer’s free hand and generally condemn the database execution strategy to exist in time warp. When the Query Optimizer is making the wrong choice, it is not because it is stupid, because it isn’t. It is because your queries are obtuse, it has the wrong information on which it is making its choice, or you are providing indexes that don’t meet the requirements of the queries you make. To make an uncharacteristic biblical reference, using hints is like Balaam using the whip on his donkey because it refuses to move. It turns out that the donkey was more perceptive than he was.

Tools in this post

SQL Prompt

Write, format, and refactor SQL effortlessly

Find out more

SQL Prompt

Write, format, and refactor SQL effortlessly

Find out more

Product articles SQL Prompt SQL Code Analysis SQL Prompt Code Analysis: A Hint is…

Guest post

SQL Prompt Code Analysis: A Hint is Used (PE004-7)

Guest post

Misuse of Hints

Is it OK to use the NOLOCK hint?

Edge cases for use of NOLOCK

Dealing with bad hints you can’t delete

Conclusion

Tools in this post

SQL Prompt

SQL Prompt

You may also like

SQL Code Analysis from a PowerShell Deployment Script

How to format a set of SQL scripts using SQL Prompt

Avoid use of the MONEY and SMALLMONEY datatypes (BP022)

A day in the life of a developer with SQL Prompt

Using SQL Prompt Code Snippets for Repetitive Tasks

Product articles SQL Prompt SQL Code Analysis
SQL Prompt Code Analysis: A Hint is…