Why Design and Test Your Utility Code for Publishing?

As the saying goes: do to others as you would have them do to you. When writing code, I would suggest a complimentary saying of “do to yourself as you would to do others”.  Write your software as if it will be used by a very wide audience, even if you aren’t planning on being the next Ola Hallengren and publishing your tools for everyone to use.  The biggest reason for thinking this way is to get yourself out of the tunnel-visioned, this is the only thing I am trying to solve mindset. Or at least will make you make your code intentionally narrow in an obvious way. 

On my website, at http://drsql.org/code, I have a set of utilities and code available for download by whomever wants it. These utilities are not the sort to compete with Redgate, SQL Sentry, or any vendor (in usage, or quality, for that matter), but rather are bits of code that I find useful for doing common coding or administration tasks, generally because I needed something similar to solve a real problem and I couldn’t find it easily on the web. I am currently working on several things: a set of metadata objects to get things like a max datatype value; and the one that prompted this article, a tool to clear out all of the objects in a database. 

I will describe the process I went through, not so much as a guide to writing this tool, but more for the process that I think many of us writing software for our own uses commonly go through. It is also somewhat my punishment for how cavalier I was initially in the coding process.

This database clearing tool (actually it will allow filtering to the schema level) was borne out of a need to clear out data from a database I was using to test downloading from an external source during my day job. I kept writing DELETE and TRUNCATE TABLE statements, and after a few tables were added, I said: “I need to build a tool to do this” (I am a programmer/engineer at heart. If I have to do something more than 2 times, I start to code a solution.)  15 minutes later, I had the basic format:

FOR EACH TABLE IN A SCHEMA
         IF [TableName] is the ReferencedObject in a FK
                DELETE [TableName]
         ELSE
                TRUNCATE TABLE [TableName]

I won’t go into every detail in pseudocode, but the next step is to deal with the fact that the DELETE statements needed to be ordered in a given order to work (can’t delete the parent rows before the child rows) so I used my hierarchy skills, which I had just used to present at the SQL Saturday in Atlanta, to attempt to order the DELETE statements, and given the test I tried:

GrandParentTable <– ParentTable <– ChildTable
                                                               \-<–ChildTable2

It worked! At this point, I think that this is good enough, and in fact, maybe even good enough to publish it on my blog and website. So I sat down quite a few nights ago and said, let’s make sure this is bulletproof enough to publish (very little software is bug free, but you need to get better than I had at that point.) At this point I had unit tested the code, and felt relatively good enough about it. 

But I knew I hadn’t tried the tool on the breadth of cases that I needed to put this out as a utility for any case other than this one database. So that night, and several after, I decided to try using the tool on WideWorldImporters to see what I missed. Oh, my, so many things wrong. Temporal Tables can’t be truncated. Neither can In-Memory OLTP tables. And you can’t delete from a Temporal Table’s archive table. After fixing simple issues like this, I realized that I hadn’t thought through the hierarchies of relationships in WideWorldImporters well enough and the ordering of tables to delete was very off. I realized that I would either need to do a lot more work to order the FKs, or use another utility that I built to drop the foreign keys, truncate or delete the data, then add back the constraints. And then test some more. (My next attempt involves the latter, since making tables able to be truncated is a win anyhow.)

But the point of this blog is not  how I end up solving his problem, or if I ever get this published. The point is that because I am trying to achieve a tool that can be published, the work is going to be better for my usage at my day job. If I write a utility that does something useful, but don’t consider all of the things that can go wrong, the tool may eventually become a liability . But taking a bit of time to think about how your utilities may be used by anyone helps to future proof your software in your own organization, even if you never think about publishing it.

Honestly, you don’t even need to solve every problem that you run into. It is perfectly acceptable to check the tables you are planning to delete from and say: “This utility will not work on temporal tables.” Or, “This utility will only work on the following types of table objects, and your code includes type X. Type X is not supported.” Now, the tool is acceptable for other uses and anyone employing it is expected to make sure it works before using it in a production manner. It will be a bummer to the user that hits this limitation, but it won’t be some esoteric error that needs to be diagnosed. Some things may be parameters. For example, you can delete from a temporal table easily, which will write rows to the table’s archive. This may be desired, or you can include parameters like “should the tool clear the archive table also?” should be considered, and warnings added, or figure out how to delete the archive in the procedure. That sounds like more work for sure, but unless you are writing commercial, paid for utilities or general purpose code, if you don’t want to support temporal tables, it is your choice. 

Figuring out how to solve a problem, and looking for the ways it can fail is truly one of the things that has kept me writing code, blogs and doing presentations for the past 17 or so years. You can’t just know what you need to get a single task done. You need to think about the breadth of tasks so you can get things right, and be able to know about the entire problem set. No one, not even yourself, should have to use your code if it is written so naively that it only handles one problem while ignoring all of the others. At least thinking through and understanding the breadth of problems you may encounter (and then testing for them) will elevate your game and make you more valuable.