Redgate logo for print use

Redgate Monitor

Back to all resources

Whitepaper

An Integrated Approach to Enterprise Database Monitoring and Incident Management

To maintain high service levels in enterprise database systems, it is essential to implement a robust and scalable monitoring and incident management strategy. Unpredictable failures will occur even in resilient software and hardware architectures, making continuous monitoring and maintenance crucial. The whitepaper describes an integrated approach where the comprehensive metrics and intelligent alerting of a specialized database monitoring tool (Redgate Monitor) enable an efficient, tiered incident response strategy. 

Via simple integration, the database monitoring tool sends notifications for urgent alerts to a unified notification system, where they are 'triaged' by the first line (Tier 1) response team, ensuring a prioritized response. Incidents escalate to operational and technology specialists for diagnosis (Tier 2), and then to subject-matter experts for resolution (Tier 3). This strategy encourages collaboration and engages the full range of skills in dealing with incidents, optimizing response times while minimizing disruption to the core work of each team. 

The database monitoring tool enables this strategy by providing: 

  • Comprehensive and Automated Data Collection: Continuous tracking and analysis across all database server types and platforms. 
  • Service-oriented Approach: Incorporating custom metrics to track application KPIs. 
  • Intelligent Alerting: Using categorization, filtering, and aggregation to prevent 'flooding.' 
  • Estate-wide Dashboard: Making data accessible to all teams involved in incident response for quick identification and resolution. 
  • Incident Prevention: Baselines, projections, and analysis allow correction of issues before they affect service levels. 

The whitepaper concludes by highlighting further optimization through DevOps practices such as 'left shifting' monitoring into development and test systems for early issue detection and team collaboration on documented response procedures for an effective first line response.