Unit Testing, Databases, and You

As database changes are added to the DevOps pipeline, unit testing of those changes becomes crucial. In this article, Adam Hafner describes unit testing and how it applies to databases.

One of the biggest complaints I experience is that of developers and end users who run into bugs. GASP! It’s a four-letter word (only because I pluralized it, right, but it counts). When one reflects on those bugs, one of the biggest causes for them boils down to “The unknown.” You can point the finger at requirements, configuration, design, etc., but so many times, those items will all boil down to…

  • I didn’t know that should be a requirement
  • Who would have thought to talk to the DBA about how that database was configured?
  • Why would we have designed a table relationship like that?
  • Who would have guessed that Carl would win the lottery and leave the company?

These issues and questions can’t be solved with a tool, which is usually the first place people go. What I would like to talk about is how unit testing can get you thinking in a way that will encourage more discussion and, hopefully, find the underlying cause of some of these bugs earlier.

I am not going to bore you with the definition of what unit testing is. (Here’s a link to Wikipedia Unit Testing if you want to read it) What it boils down to is this question: Does my code do what it says it is supposed to do? Unit testing breaks down the requirements, either explicitly defined or implied by how your code is defined. You can get into a discussion over different methodologies of testing (Test Driven, Behavior Driven, etc.), but I am going to save that discussion for another day. The focus here is on code accuracy. If the requirement is to write some code to calculate a person’s age, will it return the person’s age? At face value, this is a pretty simple request, especially in SQL. However, what about the quirks that live deeper in that request? Will the code work if someone enters an invalid date format? Will the code work if someone enters a date that is in the future? How about if the person was born in the 1800s? These are types of questions that can be answered by unit testing. Another aspect is that it ensures that if someone changes your code, that it will still work as it was intended to do. How many times have you heard someone say “Oh, that’s an easy change” and once the change is made chaos ensues? While unit tests can’t provide absolute confidence, they can go a long way in making you certain that when something is changed, it will throw an error when it isn’t working like it was originally designed.

How Unit Testing Works

So, how does it do this? Unit testing is a way of taking those questions and presenting them in small, quick and understandable tests that will answer each of those questions independently of one another and independently of anything else other than the code you wrote. Before jumping into the complexities of unit testing, here are some core fundamentals:

  • Fast – unit tests can’t take a long time to run. The longer things run, the longer it takes to write the code since the developer will have to wait to know if a correction is needed.
  • Repeatable – unit tests have to be executed multiple times with ease so that you can make code adjustments and rerun the test to determine if the requirement has been met
  • Self-Checking – unit tests have to result in a single, binary result. Unit tests pass or fail; there is no gray area
  • Timely – unit tests cannot take a disproportionately long time to write compared to the code being tested
  • Isolation – unit tests cannot rely on outside data, objects, people or processes

Why are these important? I would argue that they are not just important, but they are critical to database unit testing. Ask yourself these questions:

  • If the unit tests take hours to run, will developers really find value in running them if they are sitting around waiting for a test to complete rather than coding?
  • If a unit test takes hours to clear data and rerun data in order to perform a test, will developers really go through the effort of determining how to do that just to pass some tests?
  • If a unit test result has to be interpreted by complex scenarios or require getting end users interpretations each time, will developers want to spend time doing that instead of coding?
  • If a unit test takes days to write to validate that a number isn’t a letter, will developers see any value to writing the unit test?
  • If a unit test has to rely on someone else’s availability or wait on a vendor to provide data that won’t be ready for weeks, will a developer want to be told to be patient and find something else to do?

The answers to these questions should all be “no.” Developers need to be able to do their jobs and not be weighed down by these situations. Unit testing needs to and does address these scenarios.

The way in which unit testing achieves the goals of those fundamentals is by utilizing frameworks. Just like any other technology, a framework provides functionality and patterns for working through a problem. Specifically, for databases, there are several frameworks to choose from. There are implications that depend on the framework you select. One of the implications is the language in which the tests are written. The tests in some frameworks are written in SQL or C#, while others may use their own proprietary language. This will affect the learning curve necessary to start writing tests and should be based on the skillsets of the people that will be responsible for writing and maintaining the tests.

You should also look at how the framework interacts with other technologies, specifically how it integrates with objectives like continuous improvement/continuous deployment and DevOps. If the tool doesn’t have any way to integrate to your tooling, you may need to develop a custom integration between systems to make this work. Whereas another framework may work with standards like MS Unit or NUnit out of the box which are widely adopted by many CI/CD platforms.

Some of the common features that should be included in a framework are stub/mock/fake functionality, assertions and test suites. Stub/mock/fake functionality refers to the ability of the testing framework to allow for creating database objects that resemble an object that is being tested. For example, faking a database table would allow a table to exist that resembles all or at least some of the schema of the original table. This allows unit tests to be written to be isolated from the original object and isolated from existing data that may exist in the table. Assertions are predefined methods that are used to provide binary results for testing conditions. They are methods that provide for the comparison of objects, comparison of data, or comparison of expected outcomes.

The last item would be test suites functionality, which is the ability to group tests that evaluate the same objects together so that test can be executed quickly and provide a binary result to indicate that the object unit tests all passed or if there was a failure that indicates that the object is not suitable for deployment. These features are necessities for unit test development, which will be apparent when you start writing unit tests.

Unit Testing and Databases

So far, the discussion has focused around some basics about unit testing and some of the implications they have on databases. This hasn’t really given any insight to why this is important to database centric development. I don’t want to glaze over this by assuming that everyone has an understanding of unit testing value. So think back to a time when you wrote your last stored procedure. You were probably given some sort of requirement, or at the very least someone gave you a vague request for something. The stored procedure probably used tables that had foreign keys on them between tables both involved and not involved with the code that you had to write to get the results you needed. On top of that, the data in the tables doesn’t have all the data in them to support being able to see the results of the query to validate that your stored procedure works. With that in mind, consider how you could approach the development of that stored procedure. You are generally left with a few options:

  1. You can use your expertise and build the query as best you can without having any supporting data or only the data that you have at that time
  2. You can spend a bunch of time trying to update or insert data into the tables to support the scenario you have
  3. You can try to build tables in a sandbox environment that closely resemble the original tables with data that you need
  4. You can ask someone to load data into the databases that meet all of the requirements of the scenario that your stored procedure needs to use

With each of these approaches, there are risks and repercussions. In option #1, if you build the procedure without data, you are not going to have high confidence that there aren’t any errors or typos in your code. In option #2, you will spend time going through and mocking up data that could lead down a rabbit hole of ancillary tables that have to be updated in order to satisfy constraints. You also risk creating data in the database that does not adhere to other standards or states of data, which leads to bugs or false bugs and potential instabilities. In option #3, if you build things in a sandbox, there is a risk that you don’t build the objects in such a way that it resembles the environment closely enough and your design may be flawed. You would also potentially have issues where moving your code from the sandbox to the development area could cause conflicts or typos. Lastly, in option #4, this could be time intensive as you may need to wait for the users to create the data, they may need more information to build out all the necessary data to satisfy system requirements, or they may have competing requests that will delay your development. The other issue that all four of these scenarios have in common is that they aren’t easily repeatable and require separate documentation in order for other developers or testers to maintain and execute.

So how does unit testing address this? One of the most impactful ways that unit tests address this is through the use of fake/stub/mock objects. Because our development is directly impacted and driven by the data, the ability to craft our own test data at will allows us to write tests that ensure our code operates correctly. In the scenario outlined previously, the tables that are involved in the stored procedure can be faked. This means that during the execution of the test, and only during the execution of the test, a table is created with the same name that is structurally identical to the table used in the stored procedure. The important difference is that the tables that are created during this process do not have any data in them and do not have any foreign keys or any other constraints. This enables all kinds of possibilities because you can fill the fake table with only the data that is needed in order to verify the conditions of the procedure. You won’t need to populate any unnecessary foreign keys; you won’t need to populate any unnecessary fields in the table that are not being used by the stored procedure, and as mentioned earlier it is completely automated. That means the existing data and constraints will be quickly moved, replaced with the data needed for the test, and then when the test is completed the previous data and constraints will be moved back to table and it will look like nothing happened to the table at all. This makes the entire development process easier and infinitely easier to validate that is doing what it needs to do. This is a very basic example of how faking functionality can become a huge asset to unit testing. Other scenarios could be having to fake a function used within the stored procedure, a stored procedure within a stored procedure, or a database view used in the stored procedure. This capability alone will save you time and make writing tests very easy!

You are probably saying to yourself, “Why wouldn’t everyone be writing unit tests if it was this easy”? The answers to that vary, but largely, I believe it is due to the lack of tools to incorporate the database into the development life cycle. That lack of tools is definitely changing due largely to the rise of DevOps, Agile, CI/CD and other methodologies that strive for automation, iterative development and continual improvement. If things change frequently, you can’t spend hours or days doing regression testing. This lack of tools has established a culture of exclusion. I have heard things like “oh, we can’t do that because it’s in a table” or “that a function we call in the database, it just works”. Those kinds of statements somehow granted database development a pass for a long time, but now that data is at the heart of so many decisions, it’s imperative to have verified confidence in what is designed and developed. In addition, I will hear a lot of the same excuses:

  • The stored procedure code is too complex to test.
    • If it’s that complex, maybe it should be broken down into a manageable size. No one likes a giant stored procedure that takes hours to read and understand what is happening.
  • The test I would have to write for that is too complex.
    • If it’s that complex then what you wrote is probably too complex and needs to be broken down.
  • I don’t know how to write a test.
    • That’s what software development is all about, learning new things. As new things come out, we need to embrace learning them and putting them to work in our processes, not ignoring them and hoping they go away.
  • I don’t have time to write a test.
    • One way or another, you will find time. Either it will break at some point and you will have to make time to identify what is broke and fix it with everyone watching you (including the CEO/VP/Director) or you will get fired, and then you have TONS of time on your hands.
  • I don’t know what tests to write.
    • This is a topic all-in-of-itself as there are a few different methods for determining what tests to write. You can always start with discussing the requirements with the business, testers, or analysts to determine what tests are necessary.

Conclusion

The important thing to learn from this is that unit testing is important and, in some way, shape, or form you should incorporate it into your development practices. No matter if you are the only database person in your company or if there are 5000 people in your company. No matter if you have one database or 1000 databases. The application of unit testing allows you to ensure once you write something, it can be verified and then when it needs to change you can verify that all of the tests that previously passed will continue to pass. That will give you confidence that your changes will be more accurate and less likely to break other things that rely on your code. That will inevitably help you sleep at night, enjoy your vacations more and be more confident in what you develop.