Effective Data Governance: Being Grown Up About Data
William Brewer explains how to make data governance a continuous organizational activity, based on well-established standards and practices, rather than a knee-jerk response, and which skills and tools will help you achieve compliance, including SQL Data Catalog for discovery and classification of data held in SQL Server.
Why bother with data governance?
If you occasionally clutch your forehead and cry despairingly “No! Not more data regulation” you may be doing it wrong. You are probably taking a ‘piecemeal’ approach to data governance, meeting each new regulation, such as Sarbanes–Oxley Act (SOX), Basel I, Basel II, HIPAA, GDPR or cGMP, with an individual response to tighten up existing data practices.
What’s required, instead, from any organization that handles data, is to establish practices and data controls that allow them to establish a once-and-for-all change that is likely to give them compliance with all foreseeable national and international legislation.
How, though, do you set about the task?
Established standards and codes of practice
When GDPR was passed into European law, the regulation itemized what was required, without needing to spell out how to achieve it. Encryption is hardly mentioned, data access controls not at all. This is because these standards and practices are already known and there is a consensus on how to reach compliance.
Data Governance is a term that was initially applied to highly regulated industries such as insurance and financial services. These industries were pioneers in establishing the essential organization-wide practices that allow them to comply with the regulatory frameworks for data security and management, in a way that restricts their business processes as little as possible. The lessons learned from the pioneering work done in achieving data quality, security, retention, and compliance fed through to the internationally accepted best practices and guidelines such as COBIT and ISO/IEC 38500.
Data stewards
Data Governance is most successful where it has strong buy-in from the management of the organization. There should be a data governance team that advises all data stewards, in other words everyone with responsibilities for data, on how to ensure that their data is accurate, unduplicated, protected, consistent, complete, timely, resilient, and valid. This team should report directly to the leadership in the organization, who are legally responsible for any consequences of deficiencies in data governance. The team will have assistance from specialist teams such as security teams and legal compliance teams. They do not have responsibility for implementation, just advising, reporting, and auditing. The existing Operational responsibilities remain, and data stewards will be the normal point for liaison and updating the team.
The Essential aspects of Data Governance
- Data Discovery, Mapping and Modeling
 The task of knowing what data is held, where and why by the organization. This will ‘catalog’ the data in various sources, including relational databases, flat files, JSON documents, NOSQL sources and others (see The Data Catalog comes of Age). It then needs to classify the data according to the level of resilience it needs, and the protection it requires. Also, identifying the paths by which data is processed by the organization, the source of all the data (data provenance) and its destination.
- Data Accountancy
 Ensuring that the board, committee, trustees, or partners of the organization have frank and realistic reports on the degree to which the organization’s data currently meets the data standards that it has adopted. Also, checking regularly that the framework of data standards is sufficient to meet all current and projected regulations, either legal or industry-based.
- Data Compliance
 Having in place an audit process that checks that data, during its route through the organization, is stored, processed, published, changed and deleted in compliance with the regulatory framework and data practices adopted by the organization
- Data Retention
 Creating a document that lists all the retention periods for all types of data, either defined in law, by common practice or by the management of the organization
- Data Protection
- Data Security
 Documenting all steps taken to keep data secure, however it is stored: laptop, network, offsite secure storage, database. If data is stored there, perhaps as a spreadsheet, backup, or database, the security precautions that have been taken must be documented and should meet the organization’s standards.
- Appropriate Data Access
 The process of making certain that only sufficient data is visible to all individual members of the organization to allow them to perform their role. This will normally be done by role-based security, ideally with schema-based security.
- Data Resilience
 This is an umbrella term for implementing the appropriate quality of disaster recovery, providing a level of service that makes data loss highly unlikely, and of making resilience appropriate to the level of risk.
- Data Distribution Policies (a.k.a. Data Movement)
 Ensuring that where data is processed by another organization, or is provided as a data feed to another organization, there is a legitimate reason for this and the data distribution is covered by a written agreement
 
- Data Security
- Data Quality
- Data Controls and Constraints
 Implementing checks on the data, wherever possible, to make sure that it is complete and conforms to the definition of the data in terms of its value and range
- Data Consistency
 Ensuring that data has not become corrupted in any way, leading to inconsistency. Data that has relied on caching and denormalization or that has been restored incorrectly, can end up in an inconsistent state, even though the data conforms to the expected value and range. Data must have point-in-time consistency, Transactional consistency, and allocation consistency (errors in the way that pages are linked to objects). It also pays to ensure that a datatype is consistent in the unit of measurement.
- Information Quality
 Being confident that data is accurate and up-to-data, and that there are processes in place that allow checks on the correctness of data
 
- Data Controls and Constraints
What benefits will come from data governance, other than compliance?
Once the organization knows where data is held and how it is used, it can help the users of the data, particularly knowledge workers, to determine what is available and prevent duplication. It helps greatly in managing rapid company growth, sudden emergencies such as pandemic or disaster, corporate mergers and reorganizations. It will also remove any temptation to force through departmental initiatives that take insufficient account of the overall context, leading to incongruent and redundant data quality processes. Probably the most popular side-effect of successful data governance is to ensure that data interchange is as easy as possible to achieve.
What skills and tools are necessary for data governance?
The most important day-to-day skills for this data governance team is to be good at communicating and listening, and then to have the knowledge to understand complex data management issues, and the ability to record information clearly.
Tools can help, of course, and make the process much more rapid. The type of tool chosen will, in part, depend on the methodology used for tracking and representing enterprise data, for data mapping, profiling, and monitoring data.
The task of finding out where data is currently stored, the type of data it is and how it is held is made much easier by having access to the source of the databases, wherever possible, and a way of checking the database, for database-driven applications such as accounts, invoicing or payroll that are bought-in or managed by third-party data-processors.
It is at this point that it will be useful to survey the enterprise’s databases. SQL Data Catalog can quickly establish what categories of data are held in SQL Server databases across the enterprise, perhaps hosted in the cloud. It can come as a surprise that it is rare to have a single source of knowledge within a large organization about where data is held, and so if a merger or acquisition happens, it can prove painful to do a retrospective search. Where a SQL data catalog is maintained across databases, it can particularly help to spot the duplication of data resources such as postcode lookups, time zones, product lists and so on. See my previous article, The Need for a Data Catalog, for more discussion on the role such a tool plays within data governance.
Tools that can perform high-level UML mapping of data are also of great use, but the maps and models it produces must be easy to change.
Conclusion
Rather than waiting with trepidation for each new wave of legislation, and responding by concocting a way of complying, it makes economic sense to make data governance one continuous activity that ensures that the organization uses codes of industry practice in data governance that are likely to eventually become enforced by legislation.
This reduces the risk of an unexpected cost in re-engineering information technology and work practices within the organization when a new law appears, such as CCPA, Brazil’s LGPD (Lei Geral de Proteção de Dados), India’s PDPB (Personal Data Protection Bill) and Thailand’s PDPA (Personal Data Protection Act).
It might seem extravagant to adopt a culture of consistent data governance within an organization, but it aims at a single approach to being grown-up and responsible with data, with a policy that is easy for everyone in the organization to understand and assimilate.
 
                             
                             
                             
                         
	