The default trace in SQL Server – the power of performance and security auditing

Since the introduction of SQL Server 2005, there is a simple lightweight trace that is left running by default on every SQL Server. This provides some very valuable information for the DBA about the running server, but it isn't well-documented. Feodor reveals many of the secrets of this facility and shows how to get reports from it.

SQL Server provides us with variety of tools for auditing. All of them have their advantages and pitfalls. The default trace, introduced in SQL Server 2005, has the great advantage of being switched on by default, and is usually already there to use. It provides comprehensive information about changes in the system.

Firstly, let’s start by answering some basic questions:

What is the default trace? The default trace is enabled by default in SQL Server and is a minimum weight trace which consists by default of five trace files ( .trc) located in the SQL Server installation directory. The files are rolled over as time passes.

How do we know that the default trace is running? We can run the following script in order to find out if the default trace is running:

If it is not enabled, how do we enable it? We can run this script in order to enable the default trace:

What is logged in the Default Trace? If we open the Default trace file in Profiler and look at the trace definition we will see that events in 6 categories are captured: Database, Errors and Warnings, Full-Text, Objects, Security Audit and Server. Also, all available columns are selected for every sub-event.

1254-Fig1.jpg

Figure 1: This is how the Default trace looks like

So, how can we benefit from each audited category? In the following sections I will explain briefly what each category means, as well as some of the sub-events, and will provide essential scripts for auditing the events in the Default Trace.

Database Events

Let’s start with the first event: the Database. As we can see, the sub-events are pretty much self-explanatory – the growth and shrinkage of data and log files, together with the changes in mirroring status. It is important to monitor file growths and shrinkages; It would be a vast topic to explain why, but in an nutshell, it is because of possible performance issues. Every time a file is grown or shrunk, SQL Server will halt and wait for the disk system to make the file available again. And halt, in this case, means halt: no transactions processed until the action is completed.

These are the database events that are monitored:

  • Data file auto grow
  • Data file auto shrink
  • Database mirroring status change
  • Log file auto grow
  • Log file auto shrink

Here is a script which will list the data file growths and shrinkages:

The output of the script will not tell you why the database grew, but will show you how long it took to grow. (Be careful about the value of the Duration column, since it might be in milliseconds or in microseconds, depending on the SQL Server version).

Also, I would recommend extending this query to search for databases which took longer than, say, a second to grow (this is just a guideline).

Here is another query which will return the log growths and log shrinking.

Also keep in mind that the query will not tell you if your junior DBA has been shrinking the data and log files. In the default trace we can find only the AUTO growth and shrink events and not the ones triggered by the ALTER DATABASE statement.

Errors and Warnings

Now let’s move on to the next section of the events: the Errors and Warnings. As we can see, there is an abundance of information here.

The Errorlog sub-event occurs when something is written to the SQL Server event log; Hash and Sort warnings happen generally when a sort or a hash match operation is spilled to disk (and since the disk subsystem is the slowest, then our queries become much slower.) Missing column statistics events will occur only when the ‘Auto create statistics’ option is set to off. In this case SQL Server indicates that it might have chosen a bad execution plan. The missing join predicate occurs when two tables do not have a join predicate and when both tables have more than one row. This can result in a long running queries or unexpected results.

These categories of errors and warnings are:

  • Errorlog
  • Hash warning
  • Missing Column Statistics
  • Missing Join Predicate
  • Sort Warning

Here is a script which will outline the errors:

Note that this script has neither  EndTime nor Duration columns, for obvious reasons.

Here is another script which will outline the sort and hash warnings:

The Full Text Events

The Full-Text event category shows information about the Full-Text population events: If it is aborted, then you should look into the event log for a more detailed message; the FT Crawl Started sub-event indicates that the population request has been picked up by the workers. FT Crawl Stopped indicates either a successful completion or stop by error.

Full-Text events are…

  • FT Crawl Aborted
  • FT Crawl Started
  • FT Crawl Stopped

Here is a script which will return the Full text events:

Notice that the records in the DatabaseName column are null, so we have to get the database name from the DB_NAME() function.

Object events

Here is where the real detective work starts: the changes of the object. In this category we have altered, created and deleted objects, and this includes anything from index rebuilds, statistics updates, to database deletion.

Object events include:

  • Object Altered
  • Object Created
  • Object Deleted

Here is a script which will give you the most recently manipulated objects in your databases.

Keep in mind that SQL Server by default has five trace files, 20 MB each and there is no known supported method of changing this. If you have a busy system, the trace files may roll over far too fast (even within hours) and you may not be able to catch some of the changes. This article will not discuss in detail any workarounds for efficient SQL Server tracing, instead I will do this in a separate article later on.

Security Audit Events

Another part of the default trace is the Security Audit. As you can see from the event list below, this is one of the richest parts of the default trace. In general, what this event group tells us is what significant security events are occurring in our system.

Security events include…

  • Audit Add DB user event
  • Audit Add login to server role event
  • Audit Add Member to DB role event
  • Audit Add Role event
  • Audit Add login event
  • Audit Backup/Restore event
  • Audit Change Database owner
  • Audit DBCC event
  • Audit Database Scope GDR event (Grant, Deny, Revoke)
  • Audit Login Change Property event
  • Audit Login Failed
  • Audit Login GDR event
  • Audit Schema Object GDR event
  • Audit Schema Object Take Ownership
  • Audit Server Starts and Stops

Let’s take it one step at a time and:

  • create a SQL Server login
  • assign read permissions to this user in one of our databases.

By running the following query we will be able to track what users have been created on our SQL Server instance:

Here is how the result of the query looks like after we have created one login with giving it read permission to one database:

1254-Fig2.jpg

As we can see, the first row is announcing the creation of the login in the master database, together with the creator (SessionLoginName column) and the create user (TargetLoginName column).

The next two rows are as follows: creating the database user and granting it database access, and last – adding the database user to a DB role.

Keep in mind that if you add the user to more than one role and if you give the login access to more than one database, then you will see several rows noting every event in your default trace.

Now let’s audit the dropped users and logins by running the following query:

As you can see, the event name is the same for both creating and dropping logins: i.e. Audit Addlogin Event, however the subclass column value is what defines the difference, ie. In the case of creation of a login the subclass would be ‘Add’ and in the case of deletion it would be ‘Drop’.

In fact, if we drop the database user and the SQL login we created earlier, this query will return two rows – one for each event together with the dropped user and login names and the login name of the user who deleted the user and the login.

The following query will give us all the failed logins contained in our default trace file:

There are quite a few events in the Security Audit class and for the sake of compactness of this article I will turn your attention only to one more event, namely to the ‘Audit Server Starts and Stops’.

The following query will give you only the server start event:

Yes, you read it correctly: the above query will return only the Server Start event, and never the Server Stop event. Here is the explanation: as I mentioned earlier, SQL Server’s default trace consists of five trace files in total, which are 20 MB each. These five trace files are rotated (‘refurbrished’ or ‘recycled’, if you like) upon several conditions: when the instance starts or when the file size reaches 20 MB. Now, let’s think about this for a second: the queries I have listed so far in this article are returning the results only from the current trace file, i.e. the most recent one. Further, since the default trace file is rolled over every time the instance starts, this means that the event indicating the Server Stop will remain in the previous default trace file. Put simply, after the SQL Service restarts, our current default trace file will have the Server Start event as a first row. If you really wish to know when your SQL Server instance was stopped, you will need to include at least the contents of the previous file, but in fact we can include the contents of the other four default trace files to our result set. We can do this by changing the way we call sys.fn_trace_gettable so that it appends all default trace files. This function accepts 2 parameters – file location and name and number of files; if we pass as the first parameter the file location and the name of the oldest default trace file, then the sysfn_trace_gettable will append the newest ones, as long as we specify the appropriate value for the second parameter (the number of files). If we specify the newest file as a parameter to the function (as it is the case in all scripts in this article) then the older files will not be appended. As the filename contains the index of the file and they increment as each new file is created, it is easy to calculate the name of the oldest file.

To find the exact file location of the default trace files, you just need to execute the following query:

Server Memory Change Events

And now, let’s move on to the last event class in our default trace: the Server class. It contains only one event – Server Memory Change.

The following query will tell us when the memory use has changed:

The event subclass indicates if the memory has increased or decreased.

Conclusion

The default trace is a very powerful way to examine the health and the security of your SQL Server instance. There are several pitfalls to keep in mind – mainly related to file rollovers and size limitations, but with some programming the workarounds are not impossible. It is important to remember that the queries presented in this article will return the result from the single most recent default trace file. Depending on how busy the SQL Server instance is, the files may roll over way too fast for a DBA to catch all significant events; therefore, some automation is needed.

Further Reading

I have written a set of reports for SSMS which query the default trace.

Here is some documentation: Default Trace Audit documentation and FAQ

And here are some screenshots of some of them: Default Trace Audit documentation and FAQ: the screenshots/