{"id":101067,"date":"2024-03-11T18:15:08","date_gmt":"2024-03-11T18:15:08","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=101067"},"modified":"2024-12-10T20:22:55","modified_gmt":"2024-12-10T20:22:55","slug":"get-the-most-out-of-sql-server-agent-logs","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/get-the-most-out-of-sql-server-agent-logs\/","title":{"rendered":"Get the most out of SQL Server Agent logs"},"content":{"rendered":"<p>If you haven&#8217;t migrated your workloads to a managed database platform yet, you&#8217;re probably still relying on SQL Server Agent for various maintenance and other scheduled tasks. Most of the time, these processes just work. But when it&#8217;s time to troubleshoot, it can be cumbersome to get to the root of some problems. In this post, I&#8217;ll share some ideas to help you minimize the level of annoyance and tedium when you have to figure out what went wrong with the execution of a job.<\/p>\n<p>Recently, I was investigating a job that ran a stored procedure against multiple databases in sequence, and ultimately failed. Here was the output of the job step as seen in Management Studio&#8217;s View Job History dialog:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tex decode:true whitespace-before:1 whitespace-after:1\"> Executed as user: sa. \n ...ation. [SQLSTATE 01003] (Message 8153)  \n Warning: Null value is eliminated by an aggregate \n or other SET operation. [SQLSTATE 01003] (Message 8153)  \n Warning: Null value is eliminated by an aggregate \n or other SET operation. [SQLSTATE 01003] (Message 8153)  \n Nonqualified transactions are being rolled back. Estimated \n rollback completion: 0%. [SQLSTATE 01000] (Message 5060) \n Nonqualified transactions are being rolled back. Estimated \n rollback completion: 100%. [SQLSTATE 01000] (Message 5060)\n\n ... lots more of that ...\n\n Warning: Null value is eliminated by an aggregate \n or other SET operation. [SQLSTATE 01003] (Message 8153)  \n Warning: Null value is eliminated by an aggregate \n or other SET operation. [SQLSTATE 01003] (Message 8153)  \n Nonqualified transactions are being rolled back. Estimated \n rollback completion: 0%. [SQLSTATE 01000] (Message 5060) \n Nonqualified transactions are being rolled back. Estimated \n rollback completion: 100%. [SQLSTATE 01000] (...\n The step failed.<\/pre>\n<p>This wall of text was absolutely useless. The reason? The table <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">dbo.sysjobhistory<\/code> stores the output in the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">message<\/code> column, and this column is <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">nvarchar(4000)<\/code>. If your output is longer than 4,000 characters, then all you&#8217;ll see here is a subset. In no case so far has that subset contained the important information I was looking for.<\/p>\n<p>While it is the quick and easy way in a lot of cases, we need to stop relying only on <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">msdb<\/code> and SSMS to troubleshoot jobs because, sometimes, it isn&#8217;t enough.<\/p>\n<h4>Log output to a file<\/h4>\n<p>Thankfully, we also log step output to a file, especially when there is any kind of loop (like backing up all databases). When the output might exceed the capacity of agent history, you can send the output to files by going to the Advanced tab of any job step&#8217;s Properties dialog and populating an Output File:<\/p>\n<p><img decoding=\"async\" style=\"border: 1px solid #333; width: 85%; min-width: 320px; max-width: 640px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/01\/job-step-output-file.png\" alt=\"Job Step Advanced Properties dialog\" \/><\/p>\n<p>Reviewing the output file is much easier and does not have any size limits; it is also easier to script and automate and build parsing tools around. And you can standardize this for all jobs, perhaps to feed the files into an org-wide system like Datadog.<\/p>\n<p>Some people shy away from this option because you have to hard code the path and file name, and they have very different configurations from server to server, making it harder to automate. But this entry supports tokens, so you can tokenize everything from the local SQL Server log directory and the current date and time to the job and step names and IDs. For example, this path:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tex decode:true whitespace-before:1 whitespace-after:1 \"> $(ESCAPE_SQUOTE(SQLLOGDIR))\\\n    $ESCAPE_SQUOTE(JOBNAME))_$(ESCAPE_SQUOTE(JOBID))\n   _$(ESCAPE_SQUOTE(STEPID))_$(ESCAPE_SQUOTE(STRTDT))\n   _$(ESCAPE_SQUOTE(STRTTM)).txt<\/pre>\n<p>Might resolve to:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tex decode:true whitespace-before:1 whitespace-after:1\"> C:\\Program Files\\Microsoft SQL Server\\150\\Log\\\n   JobA_0x0A15AF54FEEECC4BB9A662C8C288D178_1_20240215_123456.txt<\/pre>\n<p>Tokens are a pretty big topic on their own, but you can get a start in the official docs here:<\/p>\n<ul>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/ssms\/agent\/use-tokens-in-job-steps\" target=\"_blank\" rel=\"noopener\">Use Tokens in Job Steps<\/a><\/li>\n<\/ul>\n<p>One thing you&#8217;ll want to make sure is that the SQL Server Agent account has sufficient permissions, including the important <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">SeAssignPrimaryTokenPrivilege<\/code> right; see this article for more info:<\/p>\n<ul>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/ssms\/agent\/configure-sql-server-agent?view=sql-server-ver16&amp;redirectedfrom=MSDN#Permissions\" target=\"_blank\" rel=\"noopener\">Configure SQL Server Agent<\/a><\/li>\n<\/ul>\n<p>Without these permissions, the job will still work, but the file will not appear. If you have set an output file but the file never shows up, you can troubleshoot this by reviewing SQL Server Agent&#8217;s own error log.<\/p>\n<p>One other note: SQL Server Agent won&#8217;t clean up these files for you, so they will pile up. This may become important as systems age, particularly if your jobs have <em>a lot<\/em> of output.<\/p>\n<h4>What else can we do?<\/h4>\n<p>Even when you log output to a file, it can still be tedious scrolling through a lot of noise in the log file. You can reduce the noise there with a few techniques:<\/p>\n<ul>\n<li>If your procedure has a lot of intentional debugging output, use a <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@debug<\/code> flag to print the output by default, but override it from the job step. Or, at least, only output the bad stuff. Or suppress all output by default and only turn it on when <em>really<\/em> debugging. Some useful tidits from Erik Darling <a href=\"https:\/\/erikdarling.com\/the-art-of-the-sql-server-stored-procedure-debugging\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/li>\n<li>Use <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">SET NOCOUNT ON;<\/code> to eliminate <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DONE_IN_PROC<\/code> messages (like <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">n row(s) affected<\/code>). I talked about <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NOCOUNT<\/code> <a href=\"https:\/\/sqlperformance.com\/2016\/02\/t-sql-queries\/nocount\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/li>\n<li>For <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DBCC<\/code> commands that support it, use the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">WITH NO_INFOMSGS<\/code> argument.<\/li>\n<\/ul>\n<h4>ANSI_WARNINGS<\/h4>\n<p>One of the items you might think is missing from the previous list is <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">ANSI_WARNINGS<\/code> &#8211; this was intentional, as I wanted to treat this option separately. The output I showed above had the following:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tex decode:true whitespace-before:1 whitespace-after:1\"> Warning: Null value is eliminated by an aggregate or\n other SET operation.<\/pre>\n<p>You can completely suppress all of this output with the following:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> SET ANSI_WARNINGS OFF;<\/pre>\n<p>But you need to be careful with this. Some code may suddenly break, particularly if you use linked servers. I added this line to the stored procedure I was troubleshooting, and the job then failed with:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tex decode:true whitespace-before:1 whitespace-after:1\"> [SQLSTATE 01000] (Error 0) Heterogeneous queries require\n the ANSI_NULLS and ANSI_WARNINGS options to be set for \n the connection. This ensures consistent query semantics. \n Enable these options and then reissue your query.<\/pre>\n<p>I was reminded of this one the hard way.<\/p>\n<h4>On the other end<\/h4>\n<p>Sometimes the problem isn&#8217;t that there is job step history that is incomplete; it&#8217;s that there isn&#8217;t enough job history available at all. While logging can give you greater control over both what is in job history and how long it&#8217;s kept in those files, you may also want to optimize how much history is kept in <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">msdb<\/code>. You can control this instance-wide using the following SQL Server Agent Properties dialog:<\/p>\n<p><img decoding=\"async\" style=\"border: 1px solid #333; width: 85%; min-width: 320px; max-width: 640px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/agent-properties.png\" alt=\"SQL Server Agent Properties\" \/><\/p>\n<p>If you have a complex system with some jobs that run every minute and others that run weekly or monthly, this is a very generic brush that will mean some jobs keep way too much history, and others roll away almost immediately. You can be more granular by writing your own process to manage history <em>per job<\/em> or even to only keep history for job executions that had steps that failed; I wrote about this technique in great detail in &#8220;<a href=\"https:\/\/www.mssqltips.com\/sqlservertip\/5969\/manage-sql-server-agent-history-more-effectively\/?utm_source=AaronBertrand\" target=\"_blank\" rel=\"noopener\">Manage SQL Server Agent History More Effectively<\/a>.&#8221;<\/p>\n<h3>Conclusion<\/h3>\n<p>There are a lot of little ways you can make reviewing and troubleshooting job history easier and more sensible. Logging job output to a file is an approach that has a huge payoff and is very easy to automate. What techniques have you used to wrangle SQL Server Agent?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you haven&#8217;t migrated your workloads to a managed database platform yet, you&#8217;re probably still relying on SQL Server Agent for various maintenance and other scheduled tasks. Most of the time, these processes just work. But when it&#8217;s time to troubleshoot, it can be cumbersome to get to the root of some problems. In this&#8230;&hellip;<\/p>\n","protected":false},"author":341115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143527,53],"tags":[4151],"coauthors":[158980],"class_list":["post-101067","post","type-post","status-publish","format-standard","hentry","category-database-administration-sql-server","category-featured","tag-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=101067"}],"version-history":[{"count":34,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101067\/revisions"}],"predecessor-version":[{"id":104826,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101067\/revisions\/104826"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=101067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=101067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=101067"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=101067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}