{"id":87553,"date":"2020-07-08T17:58:27","date_gmt":"2020-07-08T17:58:27","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=87553"},"modified":"2020-07-08T17:58:27","modified_gmt":"2020-07-08T17:58:27","slug":"batch-framework","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/batch-framework\/","title":{"rendered":"Batch Framework"},"content":{"rendered":"<p>In SDLC (software development life cycle) process, creating a batch job is the last step. As the application maintenance grows over period of time and new component added to the overall process can make over batch support complicated. This article provides guidelines to create batch jobs that can be well supported in future. Though this article and the examples in this article focuses on Investment management application, this framework can be applied for another domain<\/p>\n<h2>Nomenclature of the job name<\/h2>\n<p>Organization have dedicated support team to monitor the batch jobs and the support team have lot of application batch to monitor. In order to make life easy, it is recommended to have nomenclature while naming the job. There are two ways to name your batch job<\/p>\n<h3>Short form &#8211; ABC12345<\/h3>\n<table>\n<tbody>\n<tr>\n<td>\n<p>Prefix<\/p>\n<\/td>\n<td>\n<p>Description<\/p>\n<\/td>\n<td>\n<p>Explanation\/Example<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>First 3 character<\/p>\n<\/td>\n<td>\n<p>Denotes your System<\/p>\n<\/td>\n<td>\n<p>TRD \u2013 Trading<\/p>\n<p>REP \u2013 Reporting<\/p>\n<p>REC \u2013 Reconciliation<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>4<sup>th<\/sup> character<\/p>\n<\/td>\n<td>\n<p>Denotes schedule of your job<\/p>\n<\/td>\n<td>\n<p>1 \u2013 Daily<\/p>\n<p>2 \u2013 Biweekly<\/p>\n<p>3 \u2013 Once in 3 weeks<\/p>\n<p>4- Monthly<\/p>\n<p>5 \u2013 Yearly<\/p>\n<p>9 \u2013 On demand\/Adhoc<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>5<sup>th<\/sup> &amp; 6<sup>th<\/sup> character<\/p>\n<\/td>\n<td>\n<p>Denotes your interface\/internal process<\/p>\n<\/td>\n<td>\n<p>your own way of sequence number for your interface<\/p>\n<p>Examples:<\/p>\n<p>01-Ratings data<\/p>\n<p>02-custodian data<\/p>\n<p>03-NAIC data<\/p>\n<p>04-Factors<\/p>\n<p>05-Coupons<\/p>\n<p>06-Foreign exchange<\/p>\n<p>09-Amortization calculation<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>7<sup>th<\/sup> &amp; 8<sup>th<\/sup> character<\/p>\n<\/td>\n<td>\n<p>Denotes your batch purpose<\/p>\n<\/td>\n<td>\n<p>your own way of sequence number for your job<\/p>\n<p>01 \u2013 File watcher<\/p>\n<p>02 \u2013 validation job<\/p>\n<p>03 \u2013 ETL job<\/p>\n<p>04 \u2013 FTP job<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Example: TRD10101<\/p>\n<p>In this example,<\/p>\n<p>First 3-character TRD denotes Trading system<br \/>\n4<sup>th<\/sup> character 1 denotes Daily job<br \/>\n5<sup>th<\/sup> &amp; 6<sup>th<\/sup> character 01 denotes Ratings data<br \/>\n7<sup>th<\/sup> &amp; 8<sup>th<\/sup> character 01 denotes the job is file watcher<\/p>\n<p>Overall TRD10101 denotes, that this is daily job belongs to Trading system and it\u2019s a file watch job for rating file.<\/p>\n<h3>System-Sch-App-Process<\/h3>\n<p>If your organization doesn\u2019t have restriction on the file name, then you can follow this naming convention. Job name contains system name followed by schedule, then interface and process name<\/p>\n<p>Example: Invest-Dly-Rating-FW<\/p>\n<p>This example denotes that job is Ratings File watcher job which runs daily for investment system<\/p>\n<h2>Structuring of batch job<\/h2>\n<p>Now we have defined the naming convention for individual job, now we have to group them appropriately<\/p>\n<h3>Grouping of jobs based on Process flow\/Events<\/h3>\n<p>Group individual jobs based on process like Inbound, Critical, Outbound, FTP<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"314\" height=\"128\" class=\"wp-image-87564\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/07\/word-image-35.png\" \/><\/p>\n<p><strong>Fig 1. Grouping of jobs based on process flow\/events<\/strong><\/p>\n<p>Inbound \u2013 Include all jobs that feeds data to your application. For example, in investment management application batch this group will contain the batch job that process security master, trades files, ratings, foreign exchange, factors, coupons, etc.<\/p>\n<p>Critical \u2013 Group all your jobs that does core processing for your system. For example, in investment management system, the core jobs include nightly processing, accounting updates, amortization calculation, market value calculation etc.<\/p>\n<p>Outbound \u2013 Group all your jobs that send files to downstream system or reporting extracts.<\/p>\n<h3>Grouping of job based on Timelines<\/h3>\n<p>Group the batch job based on the timeframe. If your batch cycle for your system starts at 6pm, then you can group your jobs like 6pm-9pm, 9pm-2am,2am-6am.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"316\" height=\"129\" class=\"wp-image-87565\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/07\/word-image-36.png\" \/><\/p>\n<p>Fig 2. Grouping of job based on timelines<\/p>\n<h2>Designing of individual batch job<\/h2>\n<h3>3-part design<\/h3>\n<ul>\n<li>Part 1: <br \/>\nMake sure that job gets triggered by any one of these events<\/p>\n<ul>\n<li>by time dependency<\/li>\n<li>by file watcher<\/li>\n<li>by dependency to existing job<\/li>\n<\/ul>\n<p>Create late shout if the job does not run before certain time<\/li>\n<li>Part 2: <br \/>\nNext part is actual job to run. This is the core design of the system.<\/li>\n<li>Part 3: <br \/>\nLast part is housekeeping. Rename the file produced by the process with date time stamp.<\/li>\n<\/ul>\n<h3>Ability to rerun the job during failure<\/h3>\n<p>During SDLC phase, developer should design a process that can be either scheduled to rerun the job or should be aborted which requires manual intervention during failure without causing any data loss or duplicate data. Let\u2019s say if your job does ETL process, if one of the records failed, the developer should have the system designed to either skip the record, continue the batch process and trigger a notification to support\/business team on the skipped record or fail the job. This should be identified as part of system design.<\/p>\n<p>Most of the scheduler has feature to rerun a job automatically when it fails first time with time lag. Sometime batch might fail due to network connectivity, enabling this feature can resolve the batch failure automatically.<\/p>\n<h3>Fail a job with right attitude<\/h3>\n<p>Though our objective of batch framework is to run the jobs without any manual intervention but at same time we wanted to capture the error which are generated during the batch run. Scheduler application like BMC control-m has feature to fail the job based on key word in the output log returned to the scheduler.<\/p>\n<p>Let\u2019s say you have file watcher job and a processing job as a dependency. If the file received from upstream is empty, processing job will complete without any issue. If you have a log as 0 records processed. You can pro-actively fail the job.<\/p>\n<p>This depend on the business rules that are pre-defined.<\/p>\n<h2>Let\u2019s prepare for BAD Day<\/h2>\n<h3>Run book for individual job<\/h3>\n<p>Run book is bible for application support team. So, runbook should capture detailed instruction for each job. If job is file watcher, then document the point of contact for the job including phone number, email of the support team responsible for transmitting the file. For processing job, capture the instruction to login procedure to the server, navigating to the application folder, steps to fix the problem such as running a script to exclude a bad record which is causing the batch to fail, mandatory files required to rerun the job, communication procedure to downstream or to the business users on the failure. The run book should also contain escalating procedure. Application support run book is a live document and it cannot be perfect initially and support team should keep updating this document as team gains more knowledge.<\/p>\n<h3>Recovering batch to a critical point<\/h3>\n<p>Take backup of database at beginning of batch start, at middle of the batch and at end of batch. If your system has non recoverable process, it is better to take a backup before and after the process. For example, investment management application has a process to roll forward accounting system date to next date. This process is very critical and any system failure during this critical process at times can make the system non recoverable. So, it is recommended to take database backup before and after the critical process.<\/p>\n<p>Another advantage of taking backup during the batch is troubleshooting. As application data changes over the nightly batch, finding the root cause of production issue can be really challenging. With this additional backup, application support team can restore test database with backup taken during the batch and proceed with the investigation.<\/p>\n<h2>Housekeeping<\/h2>\n<ul>\n<li>Archiving \/backup of files\u00a0<br \/>\nAfter daily batch, archive the files that are received and processed. Zip them and name with date. <br \/>\nFor example \u2013 InboundArch_12082015 <br \/>\nAt end of month, move the individual zipped files to corresponding monthly folders. Similarly, at end of year, archive all monthly folders to the year folder.<\/li>\n<li>Scheduling calendar <br \/>\nIt is recommended to align batch jobs to enterprise scheduling calendar. Create custom calendar for scheduling criteria that does not align with enterprise scheduling calendar.<\/li>\n<li>Disabling a job vs Decommissioning a job\u00a0<br \/>\nAs part of maintenance, you will need to stop running a job. This needs to be done by documenting, validating all the dependencies and make sure there are no upstream and downstream application waiting for this job. Once validated and signed off by all the team first step is to disable the job. Let it run for a month. Finally retire the job.<\/li>\n<\/ul>\n<h2>Reactive vs Pro-active<\/h2>\n<p>Application support is reactive but with below additional steps in batch can turn application support from reactive to pro-active<\/p>\n<ul>\n<li>Create checkpoint jobs at regular intervals or after critical flow. This checkpoint will help one to know how far the batch has run and how long it will take to complete the batch cycle in case of delay.<\/li>\n<li>Identify the long running jobs and update run book for known errors.<\/li>\n<li>Track the batch performance on daily basis and evaluate the trends at regular basis. This evaluation will help to determine to pattern\/trends of overall batch.<\/li>\n<\/ul>\n<p>For example, accounting application usually runs longer during month end and first business day. Pro-actively batch support can be informed about this trend and downstream system can be informed about possible delay of file delivery.<\/p>\n<ul>\n<li>As application support evolves over period of time, team can pro-actively identify bad data in the system that can cause potential failure in overnight batch.\u00a0<br \/>\nFor example, we don\u2019t want to have a bad foreign exchange data in the system. Support team can have a batch job to identify bad foreign exchange before the start of critical process.<\/li>\n<li>Monthly vs Quarterly vs Yearling vs Special holiday\u00a0<br \/>\nAccounting\/Finance related application have special process or report to be processed during month end, quarter end and year end or special holiday. Support team can pro-actively create checklist to make sure to check if batch jobs are scheduled appropriately.<\/li>\n<\/ul>\n<h2>Batch framework in (SDLC) Software development lifecycle perspective<\/h2>\n<p>Now we have seen different component of batch framework, let\u2019s see how we can incorporate these features in software development lifecycle process (both waterfall and agile).<\/p>\n<p>Waterfall methodology is traditional process and it consists of requirement gathering, analysis, design, coding, testing and production release.<\/p>\n<h2>Requirement gathering:<\/h2>\n<p>As part of requirement gathering, it is essential to identify below questions<\/p>\n<ul>\n<li>Availability of the system\/application to end users. This will help us to identify the available window for our batch to run. Based on this available window, we can decide if can run some jobs in parallel to gain time. In case of gathering requirement for end user report or extract to downstream system, then we should identify the hard target time to deliver the report or extract. In case of error during the batch, does the user or downstream system accept extract or report that has missing records.<\/li>\n<li>Schedule of the process\/extract. For an extract or report, identify if the extract\/report is daily or monthly.<\/li>\n<\/ul>\n<h2>Analysis &amp; Design:<\/h2>\n<ul>\n<li>As developers\/programmers perform analysis and design for the core requirement, it is necessary to design the job for failure scenario and rerun scenario. Also, it is essential to design the process that can run in parallel without causing locks on any files or database tables. If a job cannot run in parallel with other process, then the job should be made as sequential. Most of the scheduling application have a concept of resource allocation to the scheduler. If we decide to run all the job is sequential then the allocated resource should be only 1 at any point of time. The scheduler will wait for completion of job before starting the next job.<\/li>\n<\/ul>\n<h3>Testing:<\/h3>\n<ul>\n<li>Test plan should contain the scope of batch stress testing, batch regression testing and stimulated failure scenario.<\/li>\n<li>Batch stress testing\n<ul>\n<li>Some upstream system will send large number of transactions during certain day and it is critical to perform stress testing. For example, mortgage backed securities pay date is 15<sup>th<\/sup> of the month and it is expected to receive lot of transaction on 15<sup>th<\/sup> of the month<\/li>\n<\/ul>\n<\/li>\n<li>Batch regression testing\n<ul>\n<li>It is recommended to run the complete batch for at least a week duration in test environment. If there is a system modification and if it affects the monthly process, then it is necessary to run complete month end scheduled jobs.<\/li>\n<\/ul>\n<\/li>\n<li>Stimulated failure scenario.\n<ul>\n<li>As part of stress testing and batch regression testing, it is not possible to cover all the failure scenario. So, it is responsibility of the project team to stimulate the failure scenario specific to batch and test it.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>SDLC \u2013 Agile methodology<\/h3>\n<p>Agile methodology follows an iterative development approach because of this planning, development, prototyping and other software development phases may appear more than once. So, implementing batch framework should be part of overall process. If possible, as part sprint planning there should be separate task for batch testing.<\/p>\n<h2>Conclusion:<\/h2>\n<p>Often team thinks that batch is just calling a process in scheduling tool, but bad design and inconsistent approach can make application support complicated. Implementation of batch framework will create consistency across application<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In SDLC (software development life cycle) process, creating a batch job is the last step. As the application maintenance grows over period of time and new component added to the overall process can make over batch support complicated. This article provides guidelines to create batch jobs that can be well supported in future. Though this&#8230;&hellip;<\/p>\n","protected":false},"author":324217,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[105880],"class_list":["post-87553","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/87553","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/324217"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=87553"}],"version-history":[{"count":11,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/87553\/revisions"}],"predecessor-version":[{"id":104597,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/87553\/revisions\/104597"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=87553"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=87553"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=87553"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=87553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}