Building a Culture of Data

One of the major trends in enterprise computing, and really in enterprises themselves is an increased emphasis on data. My career has always revolved around data, but this is a new focus for many parts of the organization. Even business units that traditionally don’t care about data realize that access to more, and better, data can make their job easier or expand their capabilities.

Creating an overall focus on data isn’t a short-term project. It can’t be done via a single team dictating “everyone uses data more, or else!” It is a coordinated effort, with technical efforts over many teams, business efforts over many teams, and help from management. This is similar to the focus on security seen by many companies in the early 2000s. It is an effort that is never done and must be a continued focus to be successful.

If creating a culture of data is an objective, it has to be a top priority or it won’t happen. Consider your top initiatives. Anything beyond the second or third in priority is essentially getting shelved (and be reasonable… I don’t mean adding yet one more top priority, either!) There isn’t enough time or resources to put toward the less essential items. I don’t mean your daily or weekly task list – I’m referring to your big items. Things like moving to the cloud, upgrading your accounting system, or in this case, making a major culture shift. It’s going to be a painful process. There is going to be resistance. “This is not how we did it before!”

If you are ready for the challenges, it can be done.

You don’t have to have complete buy-in on the concept for this to work. You can make small, continuous improvements to show the value. A pilot project is useful. But you do need to have a commitment from management and policies enacted. This won’t work as well without full commitment, but if a small project gets the process started, it can help showcase the potential benefits.

Like many initiatives in technology, the actual technology isn’t the hard part. The hard part is all of the conceptual work, organizational work, and team changes. Don’t underestimate the difficulty of planning and coordinating a change of this size. Planning something correctly saves later work. Everyone who interacts with data has to care about data quality and consistency, management must support and drive this effort, and tools and technology must be put in place to assist this and be used consistently.

What is a culture of data?

 A culture of data can mean different things to different people or different organizations, so I’ll lay out my definition. When I say building a culture of data, I mean that one of the top priorities of the organization is data. Not just saying that data is important, but making decisions that clearly put data at the forefront. Good design practices. Access to data while preserving security. Enabling better decision-making practices. A culture of data will analyze the impacts of data utilization for most projects and seek ways to use all of the relevant data available to an organization. Teams should think of data first when making changes. Everyone needs to emphasize data quality. And it also increases data availability for everyone.

Symptoms

The question that may come up is, why put such a large emphasis on data? Why would you want to change your focus to data? I’m going to approach the answer to this from the other direction. What happens when you don’t emphasize the importance of data and just hope that everyone is doing the right thing? What happens when you don’t have a culture of data in the modern age?

Most organizations don’t have an active culture of data, and it may not be right for every organization. Clearly, I’m biased, given my background and interests, but almost every company would improve with more rigor in data processes. In the following sections, I will cover some of the key indicators that will help you argue for improvements or help you realize you can benefit by focusing on data culture.

Data errors/data change requests

Data errors are inevitable. Excessive data errors are not. If you notice data quality issues, frequent custom scripts to fix data, manual processes, and other errors in your data, the root cause needs to be examined. You can’t predict every anomaly, especially with external datasets, but errors should go down over time. These errors have a direct impact on the business and decision-making. You could miss a manufacturing window or need to change to a more expensive shipping method if the data is inaccurate. A change to shipping method may not sound like much, but I’ve seen situations where the difference is millions of dollars. Data quality matters.

This can be quantified by looking at the number of data change requests in your system. It may not be exact, but it can get you closer to an understanding of your quality issues. Change requests don’t always equate to errors, so error tickets can be combined with general data tickets to create this measure. Even if imperfect, it’s a better metric than guessing and can be accurately gauged over time.

Data proliferation/duplication

Data getting copied excessively has several issues. It’s difficult for users to know which source is the definitive source and should be used. It increases the workload, usually for multiple teams. It increases the chance of misinterpreting data or misconfiguring data. It makes securing the data much more difficult. Data proliferation is another item that should be addressed as you move to a culture of data.

Bad/no standards

When I review code, the ideal situation is that I can’t tell who wrote the code simply by looking at the style. The code is never actually quite that standardized, but that’s the goal.  Standards help the organization to work the same way, so everyone can just know what they are looking at when they get into the code. Poor performance, confusing code, and very different implementations for similar problems are other indicators of standards not being followed.

Frustrated users

User frustration is less tangible than things like poorly written code but is still an important indicator of the state of your data culture. This will manifest as comments from users or support tickets. This is a clear indication of frustration. When users get very frustrated, they also lose trust in the process. When trust is lost, it can be difficult to earn it back.

Another big indication of frustrated users is when they stop using IT services and build their own solutions. This could be due to scheduling issues and the IT department getting overloaded, or overly restrictive requirements. End users and business departments likely won’t have the same rigor that IT departments require and may not understand the need.

The reasons for the frustration are important. The analysis of that frustration and understanding of the root causes is more important. A user may be frustrated that they can’t get a particular piece of data for a report, there may be an issue with data discoverability, it could be that the data isn’t exposed, there could be a lack of resources to service requests, or many other things. You can’t fix the frustration if you don’t understand the real cause.

Support path undefined or not followed

This ties into the previous indicator, frustrated users. End users need to be able to get help with data. Especially in a data-centric, user-empowered environment. If the support path is undefined or not followed it is another item that needs to be corrected and put on the roadmap. Without this, you will also have confused and unhappy users of your data products. This will show up as comments or complaints to management or just general frustration. This is an item that can hurt morale.

Creating the culture (fixing the problem)

Creating a data-focused culture has to be a conscious decision. It won’t happen by accident or even through the will of a few key individuals. Changing the focus of a business is very disruptive and costly from a resource perspective. It takes time and it is a continuous, ongoing effort.

All of the following ideas are interrelated and depend on each other. You can’t prioritize data quality without the right resources. A cohesive training program relates to the quality of the workers. These aren’t discrete groups, but they are presented so you can think about ways to infuse a culture of data into your organization. 

Top-down Solution

Putting data first and making data part of your culture has to start at the top of the organization. This is a big initiative and needs some real force behind it. This point can’t be emphasized enough. Getting the enterprise to focus on data requires a tremendous amount of coordination and effort. Without the full support of all decision-makers, you just end up with people discussing whether or not data ought to be considered a primary goal. If you don’t have that sponsorship, the effort won’t be successful. A single team or team member can make a difference, but you won’t be able to change the whole organization.

Sincerity

Teams aren’t naive, they understand when a stated objective isn’t a true priority. If management doesn’t put data culture in the forefront, it won’t be a priority with the teams. Stating that data is the most important aspect of a project, but not listening to the team, not putting the right resources on a problem, or putting every other priority before data and data quality is noticed by the team. And they will adjust their priorities to match the implicitly stated priorities. It doesn’t matter if data culture is listed on every story in some way or if it is discussed during team meetings. It has to be an actual priority. Management and other team leaders need to be sincere and make decisions that are consistent with the philosophy.

Prioritize Data Quality

This is obvious, but if you want to create a data-centric culture, data quality needs to be a major part of that process. Data quality is about ensuring business rules are enforced and data can be trusted to be correct wherever it is accessed. For example, if it isn’t possible to produce less than zero units or the price of a product can’t be less than zero, that should be reflected in the data. These rules should be enforced as early as possible in the data flow. The data entry screen is the best place. This requires that all development teams understand the importance of data quality and know how to enforce that quality in their applications. It also requires that the business can relay good requirements for data quality.

If data comes from outside sources, you hopefully have some leverage to insist on quality. If you are paying for a product, that should be part of the user agreement or contract. If the data comes from another business unit, you will need to develop a good relationship with those teams. You want to fix issues with data as close to the source as possible. This means working with these outside teams, which can be challenging, but in the long run, it will also have benefits for them.

Data governance tools can assist with data quality. Using the same naming standards, defining your source of truth, defining the team responsible for a particular data product, defining algorithms, and agreeing on business rules all make the data more consistent. Which improves data quality.

Quality at all Levels

Quality at all levels simply means that all teams that are responsible for data, take it seriously and constantly strive to improve systems and data. Quality at all levels is a must and it isn’t something that just happens. It starts with end-users and goes all the way to development teams and management. Teams need to support related teams and business units. When another team finds an issue, it can’t be viewed as a nuisance – it is an opportunity to improve and produce a better product. A truly collaborative environment is a must. Every team is helping the organization and related teams improve. This requires an open mindset.

You must take responsibility at every level. It can be an uncomfortable process to look at weaknesses. Without this examination, real change isn’t possible. You can’t fix an issue if you won’t acknowledge it’s there. But it shouldn’t be a witch hunt. Finding problems and proposing solutions is what you want, not finding someone to blame. This is striving for quality.

Part of Each Project and Each Team

Data must be important to each team that produces, curates, or interacts with data. That narrows it down to all teams and business units. Every team must care about data quality, data security, and the accessibility of data. It should be part of every conversation related to new projects. Modifications to projects should have the same emphasis.

Hire Data Professionals

This is another obvious item, but you must hire data professionals. Hiring is a difficult process. It’s always a leap of faith in some ways. Here I want to make a distinction. Most positions in the company that make decisions will use data. But that doesn’t make them a data professional. You need people who understand the platform and the architecture or you will not be able to achieve 

This can be a delicate topic, but without an emphasis on involving true data professionals on your teams, you can have teams that aren’t qualified to work on your data, which can make your problems bigger. Using data to make business decisions and achieving a culture of data is great, but if too much of the data used is garbage, the decisions made could be worse than the decisions made on the feelings of the staff that have “been doing this job forever.” It isn’t fun to think about changing team members, but it can be necessary. There is always resistance to learning new things with some team members. You probably thought of someone as you read that statement. It might be easier to get new team members than to fight those old patterns. But that also has challenges.

Be careful though, some of the people who have done this forever really do know what they are talking about. They may have even been doing a really good job making decisions without the mass of data you can capture. The goal is to get such people to buy in and realize that the data can help them hone their skills.

When necessary, the process of adjusting your company’s staffing needs to be transparent. If you use euphemisms, such as upskill, when what you mean is replacing some current workers with workers with a different skill set, you will sow distrust. The expected skills need to be defined. If workers want to adjust and gain those skills, a training path is needed.

Training

Continued training is an understood part of being in IT. Technology is always changing and if you want to keep even remotely up-to-date, you have to train in some fashion. It can be reading blogs or books, watching videos, formal classes, or peer-to-peer sessions. It is one of the constants of IT, there is always something new to learn.

This may be less obvious to teams outside of IT, but if a data-centered culture is the goal, all users who interact with that data need to be updated. The burden isn’t as high with business users as IT staff, but there are expectations. For instance, if users are creating their reports, they need to understand the reporting tools and have a good understanding of the data.

Keeping current workers is much more efficient than importing new workers without business knowledge. Training has long-term benefits and also improves morale and lets workers know they are valued.

Empowerment

Letting teams make improvements (within standards) in the way that best fits their team skills and workflow is key to building a data-centric organization. Teams must also be allowed to fix issues when they find them and help contribute to the standards. This could be changes to the technical standards, business definitions, methods for fixing data, and prioritization of efforts.

Everyone has agency to make the data better. All groups may not align with their goals and they may even be at odds. However, finding common goals and making data quality a top priority will help the strategy win. Empowerment also fits with the Agile ethos.

This is similar to sincerity. Employees can tell if they are actually empowered and can enact changes.

Creating and Enforcing Standards

In the next entry in this series, I will discuss more of the technical aspects of creating a data culture, but some aspects blur the line. Standards are one of them that is technical (since it mostly applies to how you apply it to technology, but it is also non-technical as they will be based a lot on how people feel about how technology is applied.

Style

Overall, there needs to be agreement on processes and standards. Standards should include code style, common patterns, methods and approaches for specific problems, tools used, and non-functional requirements (NFR).

Fonts, code style, and how icons are applied to a screen don’t impact functionality directly, but they have an indirect impact on functionality. If code is difficult to read or non-standard compared to the rest of the code base, it is harder to review and harder to revise when changes are needed. Using a standard code style also helps when getting new team members, changing teams, or if you need to review code from other teams.

This isn’t limited to data-centric code. This is a good rule for all code produced, and really all artifacts. For example, consider a set of directories to hold data files or documents. Not really a very technical concern, but it will have a big effect on usability.

It often isn’t productive to refactor code to meet style standards, especially code in maintenance. If you do need to modify code that is badly out-of-standard, it usually is better to update the code for style standards at the same time.

Algorithms

Finally, one last standard to mention is algorithms. Some will seem obvious. You need the amount of an invoice, it should add up to Price * Quantity for every line in the invoice. You want the average sales in the first quarter on non-sales days to good customers… that gets more tricky to define. What is a “good” customer? Is this an attribute, a calculation, etc?

Once someone does something interesting it is possibly going to become important. Then you will need to redo it. Creating models that can be reused by other teams becomes more important.

Support Path

No matter how you use data, the support path must be clear and easily accessible. It becomes even more important if you are moving toward a self-service model for data access. Users become frustrated quickly and just say “This is too hard” and want to move on if they can’t get quick and reliable help.

Finding help

Much of the purpose of a data-centric enterprise is to enable users to access and use data as needed. They don’t have to wait for the IT department to scope, prioritize, and develop new reports for them. They can do it themselves. Whether you consider self-service, democratized data an over ambitions aspiration, or well within the grasp of average users doesn’t matter. If part of your goal of building a data-centric organization is having a support path and training for users.

It isn’t enough to make the data available, users need to understand how to use the data and how they can get help when they hit a barrier. That can be a technical barrier such as getting access to data or loading the right tools, or it can be an educational barrier such as not understanding the data model.

Data governance

A strong data governance team helps with this too. Using a common language for your entities and attributes (tables and columns) and your processes makes things much easier. A good data governance program can also help ensure standards are followed between different teams.

Data governance will also likely help with the implementation of a data dictionary/business glossary. This is key to enabling users to find data without seeking out the help of key team members as their starting point. The data has to be discoverable, defined, reliable, and current. This helps solve the discoverable issue.

Summary

Data quality and a culture of data is front-loaded. It gets easier as you go. As systems are put into place and habits are formed, they build on each other. Getting started is the hardest part.

Teams have many priorities and this impacts what they are able to accomplish. Insisting on data as a priority is necessary or it will get shoved down and deprioritized. Maybe not to the bottom of the stack, but it won’t happen unless you insist. Improving data quality and improving processes requires teams to be self-critical. Teams need to always look for ways to improve. It’s easy to get stuck in a rut, whether that is bad habits or just your regular routine that follows the current standards. It’s good to do a reset on attitude and goals on a regular basis, especially if you are struggling to make effective changes. A data-centric culture requires time and effort, coordination between teams, and support from all levels of management to work.