It is accepted best practise to set some alerts on every SQL instance you install. They aren’t particularly well publicised but I have never seen any one not recommend setting up alerts for Error 823, 824 and 825. These alerts are focussed on successful access(IO) to the hard drives that SQL Server is using. If there are any errors when reading or writing to the drives then one of these errors will be returned. Having the alerts on these errors means that any IO issues will be brought to the DBAs attention well before a disk actually fails.
The way these alerts work is that SQL Server will try to read/write to a disk up to 4 times, if after that it fails the Error 823 or 824 get fired and there is an entry on the error log reflecting this. Now because you are an outstanding DBA you check your error log every day and if you see this sort of thing you take action. (For the exact (boring) details of what, when and how this happens then you can read all about it at http://support.microsoft.com/kb/828339 (823) and http://support.microsoft.com/kb/2015756 (824)). The important thing to note however is that the failure has to happen 4 times for the errors to be raised, what if there is an IO that fails once, twice or three times? This could be an early(ier) sign of an impending hard drive crash that you want to be aware of. That’s where error 825 comes in (http://support.microsoft.com/kb/2015757) , it is the error that signifies a read or write retry happened, but there is no error added to the error log. This is because the severity of Error 825 is different from the others:
SELECT m.[message_id] ,
FROM sys.messages AS m
WHERE language_id = 1033
AND m.[message_id] IN ( 823, 824, 825 )
AND is_event_logged = 1
ORDER BY severity ,
Checking the Raiserror help we know that “Severity levels from 20 through 25 are considered fatal. If a fatal severity level is encountered, the client connection is terminated after receiving the message, and the error is logged in the error and application logs.”
Setting an alert on these errors however means that regardless of your inspection of the Error Log you will know if they happen, even Alert 825. Alerts can be set to automatically email Operators when they fire so you can sit back watching Wimbledon SQL Monitor‘s traces of your server performance, and not worry you are missing advance notice of a possible hardware crash.
However, make sure that the Alerts are set up correctly.
I have recently had to pick up on a case where these 3 alerts were in place on these errors to alert the DBA team by email. When a server restarted a drive was offline. The 823 alert did it’s job and fired an email via the Exchange server to the DBA email group. However the drive didn’t come online as there was a power supply issue. The Exchange server did it’s best to keep up but croaked at approx. 160k emails. The alert fired over 200k times, right up until the server was rebooted with the power supply to all drives as expected. The issue was that the Alert was set up with no delay between responses value. The default was to alert every time the drive was referenced, it should have had a delay – say every ten minutes or so and that would have massively reduced the number of alerts and the unexpected exercise for the Exchange server
Just to make sure you have these alerts, and that they are set up with a delay you can use this script on your servers:
[dbo].[sysalerts] AS s
--( message_id IN ( 823, 824, 825 )
-- AND [s].[delay_between_responses] <> 600
( ENABLED = 1
AND [s].[delay_between_responses] = 0
Comment / uncomment the WHERE clause sections as you need to to locate all alerts and/or those that are set up with no delay.
P.S. If you have never set up any Alerts, this is where to start:
I’ll do a follow up blog on creating Operators and Alerts in a while, get in touch if you want it done soon.