Challenges to Database DevOps: How Do You Maintain Master Data?

When it comes to deploying databases, one of many challenges that arises is dealing with your master data, lookup data, or sometimes even reference data. It’s those tables that store the lists of choices like a Suffix to a name such as Dr., or Ms. Or something else. In fact, during a recent panel discussion one of the questions that came up was:

One of my main challenges is how to maintain master data. I have not been able to find a reliable solution, especially when you have multiple instances of the database.

Indeed. This is a challenge. There are probably as many ways to solve this problem as there are different kinds of master data. All of them have shortcomings and caveats. I’m going to outline two mechanisms that I’ve used. One is simple and works for smaller data sets. One is incredibly complex, but will let you work with just about any kind of lookup data you may have.

Data In Source Control

Your source control system is not meant to manage data. It’s not the strength of the system and in fact will likely prove to be difficult to maintain over time. However, if you only have small sets of lookup data, source control becomes a viable option.

First though, we have to talk about what we mean when we say “small”. For one person, small might be 10-20 rows. For another, small is 10,000-20,000 rows. In my case, when I say small, I’m envisioning no more than 100 rows and probably a great deal less. We’re talking quite small data sets then.

Next, I’d say that you should also ensure that this data is not highly volatile. Most lookup data should be relatively stable by it’s nature. Volatile data is usually transactional data, not simple master data. However, situationally, you may have lookup data that changes a lot. This data probably wouldn’t lend itself well to be stored in source control.

However, if we find that we’re talking small, stable data sets, then you can put them into source control. If you use a tool like SQL Source Control, it will assist you in getting your data into, and out of, your source control system. If you are not using SQL Source Control, you can still maintain the data in source control, but you’ll have to decide how you’re going to save it there. Then, you have to decide how you’re going to retrieve it and apply it.

If you choose to put your master data into source control, but you’re not taking advantage of a tool to help, I’ve found that the code you put in there should run the data commands for you. This means you’ll want to store your data, not as data, but as a script that can INSERT/UPDATE/DELETE data for you on the fly. Frankly, this quickly becomes a giant pain. In fact, if you’re not going to use a tool, I’d advocate against using source control entirely for master data.

Data In a Database

The best place to manage data is in a database. So, when I was presented with the need to maintain master data, a database was my first choice. However, you quickly realize that you can’t simply maintain a database with your lookup/master data. Why? Well, because there are different versions of your master data for different releases of the application. Further, even though you have shared lookup data across your environments, different applications consume that lookup data in different ways (more or less columns, rows, etc.). All that combined makes this a giant pain.

However, you can design a database that stores master data. Also that defines the version for the data. You can then create a function/procedure, that when you provide a given version, it retrieves the correct data. You can also define applications and the columns that those applications support. Add this to your function that works from the version so that now, you pass in the version and the application and you get back a complete set of correct lookup data in the correct form.

If that sounds like a ton of work, it is. Sadly, the original IP for this is completely owned by the organization I developed it for, so I can’t share the details with you here. However, if you follow one of these methods for storing versioned data, you can pretty easily build out this system. Then it’s just a question of using functions to create dynamic views of the data. With a dynamic view available, you could again implement a tool, like SQL Data Compare, to use to generate the appropriate change scripts.

Conclusion

The management of master data, or lookup data, can be very simple or very complex. It’s really dependent on the needs of your system, not the lookup data itself. Even if you find yourself in a position where, initially, it seems like you have very simple lookup data needs, experience tells me you should be prepared for that to change.