Consolidating SQL Server Error Logs from Multiple Instances Using SSIS

SQL Server hides a lot of very useful information in its error log files. Unfortunately, the process of hunting through all these logs, file-by-file, server-by-server, can cause a problem. Rodney Landrum offers a solution which will allow you to pull error log records from multiple servers into a central database, for analysis and reporting with T-SQL.

SQL Server hides a wealth of useful information inside its error log files. Aside from containing the details of specific errors that have occurred, as it name implies it must, there are details of successful and unsuccessful login attempts, the output of DBCC commands, and more. You can even raise your own errors or events to the log, for example to indicate when an ‘index rebuild’ process is complete.

Unfortunately, I use the term “hides” advisedly. Although it puts a lot of useful information in there, getting it out again in a useful, easy-to-digest format is often a long and painful trial, unless of course you’re the sort of DBA who cherishes nothing more in life than a few free hours each morning to scroll through hundreds of thousands of SQL Server error log entries, server-by-server, using the Log Viewer application, raptly focused on each and every interesting message.

What would be useful to all other DBAs, I hope, is the means to consolidate the error log entries for all your servers into a central location, and to report on them for there, quickly firing off SQL queries that could tell you instantly, for example:

  • “When was the last time X logged in or accessed the server?”
  • “Have there been any prolific login failure attempts by unknown accounts?”
  • “Have any of these specific errors occurred on any of my servers in the last x days”

The solution I offer here uses an SSIS package to pull error log data from the current error log on any number of remote servers, and load the data into a staging table in a central database. From here, any new log records are pushed into a permanent table using the new SQL 2008 MERGE command. From here the log records can be diced and sliced with T-SQL, to your heart’s content.

Shortcomings of the Standard Error Log Reader

Before we dive into our custom “error log consolidation” solution, let’s take a brief look at the SQL Server Log Viewer application for SQL Server 2005 and beyond. Figure 1 shows the Log Viewer window, with an applied filter, which searches for messages containing the term “failed”, across multiple error logs, on a server instance named SRVSAT (one that is not much-used).

1001-RL1.JPG

Figure 1 – Standard SQL Server 2005 Log Viewer

Notice that, while I’m specifically search the SQL Server error logs here, I can also choose to view the SQL Server Agent, Database Mail and Windows NT logs. There is also a search function that lets us find specific log entries based on a keyword.

In my filtered search of the SQL Server logs, you can see that there are two entries on this particular server that contain the word “failed”; one is a login failure for sa and the other is a failure in opening a database file. The filter capability is limited, however, in the sense that a filtered search for “Failed” or “Login Failed” will return login failure results. However, a filtered search on “Failed Logins” you will return no results.

Overall, the Log Viewer is a reasonable ad-hoc tool for tracking down errors on a given server. However, the fact that the Log viewer can only be directed at one server at a time makes it a rater tedious means of tracking down specific errors across multiple servers. Also, its filter and searching capabilities certainly do not match what can be achieved with the power of T-SQL.

Custom Error Log Consolidation

As noted in the Introduction, this solution is built on SSIS. I first began using SSIS back in 2007, when I developed the “DBA Repository” solution, described in full detail in my SQL Server Tacklebox book. It uses an SSIS package to execute a range of useful scripts across all servers under my care, and pull the results back into a central SQL Server database, for reporting and analysis. A fringe benefit is that it’s been relatively easy to update and adapt this original solution to accommodate new requirements, such as the error log consolidation I describe here.

My central database, DBA_Rep, is housed on a SQL Server 2008 instance and contains all manner of tables in which to store various bits of useful SQL Server information. In order to support my error log consolidation, I simply needed to add two new tables to this database:

  • StageErrorLog – a temporary holding table for the error log data read from each server.
  • SQL_ErrorLog – the permanent storage table for the error log data

The two tables’ schemas are identical except that the permanent table, SQL_ErrorLog, contains an identity column called LogID. The code to create these tables is included as part of the code download bundle for this article, along with the package code, sample queries and so on.

As I describe the error log solution, I’ll focus on those aspect most closely related to gathering and analyzing the log data, rather than on the specific SSIS techniques use, for example, to assign object and string variables in the ForEachLoop Containers within the packages. The latter is covered in detail in one of my previous Simple-Talk articles.

The SSIS Package

The SSIS package that in a single run polls, consolidates and merges the error logs from any number of servers is really quite simple. A single screen shot, shown in Figure 2, captures the entire package, comprising just five steps.

1001-RL2.JPG

Figure 2 – Error Log Reader SSIS Package

Let’s examine each step in turn, with most emphasis on the Error Log Reader for-each loop, and the MERGE step.

Get the Server Names into an Object Variable

The first task “Populate ADO Variable For SQL Server Names” populates a SQL_RS object variable with the list of the servers from which we will be extracting error log information. In a later step, our package will iterate through this list and populate a string variable, SRV_Conn. This, in turn, is used to dynamically update the ServerName property of the MultiServer Connection Manager object, which controls the the connection to each SQL Server instance we need to interrogate. There is also a second Connection Manager object, DBA_Rep, which is used to connect to the central repository. This whole step is described in fuller detail in my previous article, Using SSIS to monitor SQL Server Databases.

1001-RL3.JPG

Figure 3 – Connection Manager Objects

Truncate the Staging table

The second step is straightforward and simply issues as single T-SQL command, TRUNCATE TABLE StageErrorLog, to empty the contents of the StageErrorLog temporary table, which temporarily stores error log data for each server, as the package executes.

The Error Log Reader ForEachLoop Container

The “Error Log Reader” ForEachLoop container forms the heart of the package, and performs two important functions:

  1. It sets the connection string for each server using the two variables SQL_RS and SRV_Conn
  2. It executes the objects contained within, namely “Create Temp Table for Error Logs” and “Get Error Logs From All Servers“.

Figure 4 shows the SSIS Task Editor for the Create Temp Table for Error Logs task. It uses the MultiServer Connection Manager to create a temporary table on the remote server, in which to store the error log data during package execution.

1001-RL4.JPG

Figure 4 – Create Temp Table For Error Logs Task Editor

The T-SQL statement that is executed within this task is shown in Listing 1. First, it drops (if it exists) and creates on the remote server, in TempDB, a temp table, wErrorLog and then populates the table with the output of the sp_readerrorlog system stored procedure.

The sp_readerrorlog takes a parameter value, in this case “0”, which instructs it to read only the current log file. While it is possible to read historic logs, I decided that only reading the current log would improve performance. I run the task, daily to minimize the risk of losing log data, and run an additional step once a month that interrogates all of the available logs to catch anything that may have been missed. The three actions that I know will initiate a new error log on SQL Server are running sp_cycle_erroglog, running DBCC ERRORLOG and restarting SQL Server.

Enumerating through multiple log files
If you choose to interrogate all the log files, rather than just the current one, a script that may prove useful is included in my DBA Script Thumb.

The final step in the Read Error Logs script alters the temp table wErrorLog to add a Server column and then updates the records with the ServerProperty(ServerName) function. This step is critical as it allows us to distinguish from which server the error logs came, once the records are all merged into the final destination table (a step I will explain in short order).

Listing 1 – The Read Error Logs script, executed by the Create Temp Table for Error Logs task

The output of the sp_readerrorlog stored procedure, executed on a SQL Server 2005 instance, can be seen in Figure 5. Notice that three columns are returned, LogDate, ProcessInfo and Text.

1001-RL5.JPG

Figure 5 – Output of sp_readerrorlog

When the package executes it will create and populate the wErrorLog temp table on each of the remote servers. The next step is to pull the data from each of the wErrorLog temp tables into the staging table, StageErrorLog, in the DBA_Rep central repository.

So why do we need temp tables on each remote server, as well as a staging table in the central database? The main reason is to provide the next task in this container, the Get Error Logs from All Servers data flow task, a valid source table on the remote server from which to select. This task pulls all of the error log data from the remote servers into a single staging table in my central SQL Server 2008 database, DBA_REP. From there, in the final step in the package (discussed shortly) I can use the MERGE command to move any new error log records into the SQL_ErrorLog table. Some of the remotes servers were running SQL 2005 instances, so in fact it was not possible to run the MERGE command directly against these SQL Server 2005 sources, as it would fail syntactically.

Figure 6 shows the Get Error Logs From All Servers data flow task with the Server Source (wErrorLog) and the Repository Destination (StageErrorLog).

1001-RL6.JPG

Figure 6 – Data Flow Task for Error Log

Merging the Error Log Data into the Central Database

After all of the remote servers bear their error log fruit into the StageErrorLog basket all that remains is to merge that data into its final destination, SQL_ErrorLog, which can then be used for analysis via standard T-SQL queries.

I chose to use the MERGE statement for a number of reasons. I felt that the typical “truncate and refresh solution” is inefficient for large data loads. I could have built an equivalent “UPSERT” solution, without MERGE, by using IF EXISTS constructs to control INSERTs and UPDATEs, as well as DELETEs, but MERGE meant I could avoid all of that work. Finally, I just wanted a good excuse to experiment with a new command, and this was a perfect project for that!

I have not found an equivalent of MERGE in the SSIS Toolbox. There are some third party tools that will provide this functionality, at a cost, but instead I decided simply to execute the MERGE command within an Execute SQL task.

The MERGE command is shown in Listing 2. It looks at matched and unmatched (NOT MATCHED) records in the target table, StageErrorLog, based on all four field values (LogDate, ProcessInfo, Text and Server) and INSERTs new records into SQL_ErrorLog. I know I will never DELETE or UPDATE the records, so new INSERTs are all I am concerned about.

Listing 2 – merging new log records into the SQL_ErrorLog table

Executing and Testing the Package

So that is it, five fairly simple steps. Now all we have to do is execute the package to start consolidating logs for all of the SQL Servers. I would imagine that the list of servers would be limited to critical production environments, where logs must, by mandate, be stored for audit purposes and for problem/resolution analysis. Figure 7 shows the package executing, with each task turning green as it completes successfully.

1001-RL7.JPG

Figure 7 – Executing the Error Log Reader Package

It is quite easy to determine if the package is working as expected. If you have Audit Logins enabled, you can simply, on a database in your test environment, deliberately generate failed login attempts. You can then rerun the package and make sure that only those new records were inserted in the SQL_ErrorLog table. In my test environment, I have two servers. On the last execution of the package, I had 951 records in the SQL_ErrorLog table, 778 for Server1 and 173 for Server2, which the following query showed.

After a couple of failed login attempts, using SQL authenticated users “sa” and “bad user”, I reran the package and the row counts increased as expected by only a handful of records per server making the new totals 779 and 178 for Server1 and Server2 respectively. I verified that the failed logins were in the SQL_ErrorLog table by executing the following query:

As Figure 8 shows, “bad user” tried to login and I captured it via the package execution. The important part is that only the news records were inserted, so the MERGE statement worked as anticipated.

1001-RL8.JPG

Figure 8 – Failed Login Attempts From SQL_ErrorLog table.

Querying the Error Log Data

With the records loaded and the package tested, it is time to get creative with the T-SQL analysis of the data. If you know the specific errors (or log entries I should say) for which you are looking, such as DBCC or memory or deadlocks, then you can simply use a LIKE or EXISTS statement as in the following two queries that scan for “CPU” and “MEMORY”. Of course you can do this with one query, but I broke them out for emphasis.

The results are already proving valuable as you can see from the highlighted rows regarding memory, In Figure 9. Apparently I am not using all the memory that I could do, and SQL Server 2008 is helping me by telling me what to do to increase the amount of memory SQL Server can utilize. Thanks, Error Log.

1001-RL9.JPG

Figure 9 – Memory- and CPU-related data in SQL_ErrorLog

Of course, as DBAs, I will leave it to you to be to be creative in the development of useful queries for your environment. Heck, you can even add a Full Text index if you like. In fact, if you notice in the code download, the SQL_ErrorLog DDL adds an identity column called LogID to facilitate the creation of such a Full Text index. That process is an article in itself and one which I will tackle next.

Conclusion (with a few small caveats)

I do not like putting the caveat section so close the end of the article but you may have guessed by now that there are a few and I will just note them really quickly and move on to the conclusion before you have a chance to utter “Ah Shucks”. First, this solution only pulls error logs from SQL Server 2005 and higher, due to the different way the sp_readerrorlog data is returned in earlier versions. Second, the solution does not pull SQL Agent logs or Mail Logs, like the Log Viewer application does. Finally, the package and MERGE statement therein requires SQL Server 2008 to house the DBA_Rep database. Not too horrible after all, I hope. If I were to have said it only works on SQL Server 2000 that might have been a problem.

I hope this error log consolidation solution, based on SSIS, and making use of the new SQL 2008 MERGE command, proves useful to many DBAs. If it also saves you lots of time and money then great; I did my small part for the economy.