Product articles SQL Monitor SQL Server Security Monitoring
Checking for Database Drift using…

5 October 2018

4 Comments

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

5 October 2018

4 Comments

Checking for Database Drift using Extended Events and SQL Monitor

Phil Factor uses Extended Events and a SQL Monitor custom metric to detect when the metadata of a database has 'drifted', meaning that a database object has been created, deleted or modified outside of the official change management process.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

You need to make sure that nobody makes unauthorized changes to the metadata of your production databases.

Governance and testing processes during the deployment pipeline are there to ensure that any proposed database changes aren’t going to open any legal or security issues, and that they won’t introduce any bugs as a side effect. Once a database has been deployed, it shouldn’t suffer any changes, other than hotfixes that have gone through an auditable change-management process. Any drift in the state of the database, from the one that was deployed, is a security issue and must be investigated.

How do you make sure that the schema of the database, in the sense of the metadata, hasn’t been changed surreptitiously? Firstly, you want to be alerted when a change happens, and then to get a list of all the changes, who did them, what type of change, to which object, when they happened and from which client application it came. It sounds simple.

How to monitor for ‘uncontrolled’ database changes

The default trace records any DDL changes to the database. This is fine for one-off investigations, but it doesn’t give us quite enough information for our requirements. I’ll discuss how to use it for tracking changes in the access control of your databases in the next article. Fortunately, there are Extended Events that track every database object creation, deletion or update, for every database on the system. However, you will soon tire, for example, of the changes happening in the tempdb database. Your applications will be creating plenty of temporary objects such as temporary tables, prepared statements, table variables, and the like. You are also unlikely to be interested in certain types of objects, such as statistics.

You need a way to check the objects for one or more databases on the server, but not all. This filtering is best done by Extended Events, so that it collects and stores only the event data for the required databases. You also will need to filter out certain objects that change legitimately such as statistics objects, and therefore provide ‘noise’. This filtering of objects is likely to cause some head-scratching because the Extended Events definition of an object is different from the definition of an object within the system tables. A database, for example, is a type of object in Extended Events. You may need to filter out a range of Extended Events objects, depending on what is going on in background with the database you are monitoring. In my example, I’ve just filtered out the Statistics objects, but it is easy to change the code.

Another issue that you need to decide on is whether to store the events in a ring buffer or a file. I prefer a ring-buffer, but the problem with this is that it will eventually jettison old records to recycle the space. However, if you want to ensure the permanence of the event data, then it is easy to shred the XML document into a relational format and store the relevant rows in a table.

Why use SQL Monitor?

First, we define and test an Extended Event session that will collect the events we need to detect any unauthorized object changes and provide the details of who made what changes to which objects.

With the event session running, the job of SQL Monitor is then to query the collected event data, on a schedule, to see if any ‘object-change’ events have been raised within a specified period. We can set this up easily using a ‘Database Drift’ custom metric that will query the event data for the number of objects changed in, say, the past half hour.SQL Monitor will record the metric value, and can alert the team when it detects any of these uncontrolled database changes.

This will also allow us to analyse the ‘Database Drift’ metric data in a timeline graph, and look for correlations with other metrics, such as the ‘Suspicious Errors’, ‘Permission Changes’ or ‘Unauthorized Configuration Changes’ metrics.

One of the other advantages of SQL Monitor is that it is an external system, outside the bounds of the database user. It is difficult to tamper with. While Extended Events are useful for detecting intrusion or unauthorized changes, it suffers the potential weakness that Extended Event sessions can be switched off by a hacker who gains sufficient privileges. A simple way to get around this is to use SQL Monitor to track the number of active Extended Event sessions and fire an alert when the number drops.

How to be certain of ‘drift’

Once you are alerted to the fact that database drift has happened, you need to establish the difference between the database as it exists now, and the deployed version. I won’t show that here, but it’s easy to do using SQL Compare, or SQL Change Automation. I’ve also shown how to do this with the DacPac and a DacPac-registered database.

The problem with tracking the schema changes, without comparing the database with the deployed version, is twofold. Firstly, it is not always possible to tell easily what has changed when an object is altered, though you can see the batch that did it. Secondly, you might have missed some changes, and an alteration to an object doesn’t always mean a change away from what it should be. Permanent database objects are often altered by disabling constraints before a BCP operation, for example. There will therefore be two alterations of the table, one before and one afterwards, but the database ends up being in the state it should be.

Developing and testing the Extended Events session

The first task is to create an event session that will capture the events data we need to detect draft and tell us who changed what. This is, I believe, best done with a test harness.

It starts with a test harness that creates, alters and deletes a variety of database objects, and you then just work on the event session until it detects everything you want it to do and nothing that you don’t want. For every test run, you start the event session, run the test harness, display the result and stop the session. By doing this, you can restrict the number of recorded events by refining the filters until you get all the information you want but no irrelevant information. If, for example, you were designing an Extended Event session to look for, say, failed login attempts, you would be unwise to choose to record all errors, including informational messages and warning because you can easily get overwhelmed by what is returned by the session.

A simple test harness for database drift detection

To get started, I used a rather trivial test harness that creates, alters or deletes several different database objects.

Listing 1

I hope I’ve included enough to make the point. You can also test for changed to logins and users, but I want to deal with that in a separate metric, in a separate article.

Creating the event session

Listing 2 shows how to create and start the WhoChangedWhat event session that will capture each of the three “object changed” events, in the Objects Event Category, namely object_altered, object_created, and object_deleted.

In addition to the default ‘payload’ of event columns for each of these events, we also capture the optional database_name event column, so that we see the name of the database on which the event occurred. We add actions to capture the name of the client application, and associated database principal, that made the change, and the text of the DDL that was executed. We record the event data for all databases, except master or tempdb, and we also exclude changes to statistics objects.

With any monitoring that uses a combination of Extended Events and SQL Monitor, we must create and start the session before we can use the results. When a session is stopped, you lose the results.

Listing 2

If you have a single production database, then you’ll probably want to set up the event session to monitor just that database, rather than all databases minus master and tempdb. If so, simply swap the existing:

AND [database_name]<>N'tempdb' AND [database_name]<>N'master'

For:

AND [database_name]=N'MyDatabase'

This is the joy of Extended Events sessions: it’s very easy to alter them to suit your task.

Checking the results: who changed what?

Having created and started the event session, we can run our test harness code (Listing 1) and then check the event data that’s collected. We can view the data directly, by using the ‘Watch Live data’ in SSMS, but I much prefer to extract the data using SQL, so I see a nice simple report of who change what objects and how.

Listing 3

As you can see, it shows us when the change happened, what sort of change, the type and name of the object changed, the affected database, and the SQL Server principal that made the change.

Scrolling right, there is also the text of the batch that did the alteration to the object, which is essential to try to work out what changed, because it isn’t always obvious how an object was altered.

Of course, this sort of detailed investigation isn’t, and shouldn’t be, part of SQL Monitor. Once we’re sure that the event session is reporting all the details we need, SQL Monitor’s job is simply to run the SQL that gathers the “number of object changes in last x minutes” metric, then displays the metric values, and alerts you if or when you need to investigate the details, which you can then do using Listing 3.

Adding SQL Monitor into the mix

Listing 4 shows the code that we can then install into SQL Monitor as a custom metric. ‘It will capture and display the number of object changes in the past half hour, to get a chunky blip on the graph.

Listing 4

When we create the metric, we specify that it runs only on the master database, which is ironic when we consider that we are specifically excluding metadata events in master. If you were to run it on every database, you’d get the same result for each, because events in Extended Events are server-wide, like errors.

In the next screenshot, we are creating custom metric in SQL Monitor, pasting in the code from Listing 4.

We are going to run this on master database of a staging ”server. I’ve not shown it here, but you’ll also need to set the frequency at which the metric data is collected. Of course, this will need to be less than the period over which you’re collecting the metadata changes, in Listing 4, to ensure that nothing gets missed. On a second screen, we configure the associated alert. If SQL Monitor is set to alert if this metric value is greater than zero, then you will know about every change.

To simulate a suspicious attack, where someone is attempting to modify database objects, we run our test harness (Listing 1) with a random delay between test items:

Use a SELECT within a batch instead of the DECLARE after the initial declaration in the batch. You can only have one delay in a batch that used the DECLARE, and it must be the first, so the test code will look like this:

The next screenshot shows the resulting analysis graph in SQL Monitor, displaying the ‘Database Drift’ metric (in red) alongside ‘the ‘suspicious errors’ metric (in blue), which I’ve already written about (I was simultaneously running Listing 3 from that article in a separate window).

If you see that, you can be sure that there is an alert:

And you can drill into the detail up to a point, using Listing 3 to investigate exactly what has been collected by the event session.

Conclusions

We’ve now got a way of detecting the ‘smell’ of database drift. We can see where the metadata of a database has been changed. We can also see the SQL in the batch that made the changes. However, the only certain way to tell that the database has drifted is to compare it with what is in source control for that version of the database, and the settings in the configuration management system.

We are, at this point, getting close to detecting the signs of a database being tampered with. To complete the picture, we need two more nuggets of information, we need to know what changes there are to the users, logins, passwords and principals in the security part of SQL Server, and we need to see if anyone has tampered with SQL Agent. We’ll tackle the security and access control events in the next article.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter
  • Kay

    Thanks for the great article,

    I’m using this to monitor creation and deletion of objects but it seems to catch creation of temporary tables which are created from within stored procedures. These are not filtered with the current settings, and I am wondering if it is possible to create a filter which will leave out all the creation of temporary tables even if they occur within a stored procedure? I my environment I get 50.000 changes every day, which makes it hard to use.

    • Phil Factor

      Yes, it is possible to filter out the TempDB table creation. I’ll check the code and post it as a comment.

    • Phil Factor

      The filter in the XE specifically excludes master and tempdb from the recording of ‘create’ ‘delete’ and ‘alter’ events. This is to try to make generic code that will run anywhere. I’ve re-run my own test harness and on my test, no tables from TempDB are recorded. I’ve run stored procedures that create and delete temp objects, and no temp table or temp stored procedure events are recorded. I suspect that there is a server setting that is causing the difference.

      The simplest and most obvious thing to try is a more specific session that only records these events in the database.

      CREATE EVENT SESSION [WhoChangedWhat] ON SERVER
      ADD EVENT sqlserver.object_altered(SET collect_database_name=(1)
      ACTION(sqlserver.client_app_name,sqlserver.database_name,sqlserver.server_principal_name,sqlserver.session_id,sqlserver.sql_text)
      WHERE ([package0].[equal_uint64]([ddl_phase],(1)) AND [sqlserver].[equal_i_sql_unicode_string]([database_name],N'MyDatabase') AND [package0].[not_equal_uint64]([object_type],(21587)))),
      ADD EVENT sqlserver.object_created(SET collect_database_name=(1)
      ACTION(sqlserver.client_app_name,sqlserver.server_principal_name,sqlserver.session_id,sqlserver.sql_text)
      WHERE ([package0].[equal_uint64]([ddl_phase],(1)) AND [sqlserver].[equal_i_sql_unicode_string]([database_name],N'MyDatabase') AND [package0].[not_equal_uint64]([object_type],(21587)))),
      ADD EVENT sqlserver.object_deleted(SET collect_database_name=(1)
      ACTION(sqlserver.client_app_name,sqlserver.server_principal_name,sqlserver.session_id,sqlserver.sql_text)
      WHERE ([package0].[equal_uint64]([ddl_phase],(1)) AND [sqlserver].[equal_i_sql_unicode_string]([database_name],N'MyDatabase') AND [package0].[not_equal_uint64]([object_type],(21587))))
      ADD TARGET package0.ring_buffer
      WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
      GO

      You’ll notice that I’ve used a unicode string comparison [sqlserver].[equal_i_sql_unicode_string] rather than the simple ‘=’ equality. What happens if you alter the original code from AND [database_name]N'tempdb' to AND [sqlserver].[equal_i_sql_unicode_string]([database_name],N'tempdb' ) ?

      • Kay

        It was my fault. The procedure I was calling had more code below which created tables in my database also, not only in tempdb. All of the items that I do not need, belongs to the same schema, so maybe possible to filter them by excluding a schema when saving the results to a table.

        I tested this and it seems to work:

        AND OBJECT_SCHEMA_NAME(the.event_data.value(‘(data[@name=”object_id”]/value)[1]’, ‘int’))’staging’

You may also like

  • Webinar

    Data privacy & protection: A logical extension to DevOps

    Are you considering data privacy and protection as part of your DevOps process? In light of legislation like GDPR, making sure that any personally identifiable information (PII) is protected as it moves through your development and testing environments, is now an essential part of the process to ensure that your Database DevOps practices are compliant.

  • Article

    Using SQL Monitor to Detect Problems on Databases that use Snapshot-based Transaction Isolation

    Use of the read committed snapshot isolation level is often an effective way to alleviate blocking problems in SQL Server, without needing to rewrite the application. However, it can sometimes lead to tempdb contention. This article offers a small-scale solution (not suitable for use on large tables) to detect cases when tempdb contention is related to use of RCSI.

  • Article

    Scaling SQL Monitor to Large SQL Server Estates

    Tony Davis describes the features and capabilities of SQL Monitor that allow it to scale smoothly to monitor a growing estate of servers and databases, while still providing a single, simple dashboard that gives the team all of the essential SQL Server metrics and alerts, establishes baselines, and detects trends in behavior.

  • Webinar

    Change your SQL Server troubleshooting from reactive to proactive with a monitoring tool

    Redgate’s James King and Ben Emmett will demonstrate how to troubleshoot your SQL Server estate, and discuss how it shouldn’t be about reacting to problems when they occur, but rather having the tools to make proactive decisions to plan ahead, identify potential disruptions, fix common problems and make improvements.

  • Forums

    SQL Monitor Forum

    Real-time SQL Server performance monitoring, with alerts and diagnostics