Database Documentation – Lands of Trolls: Why and How?

When database documentation is mentioned in an IT Department, everybody nods wisely, yet everyone does their best to avoid doing it. Attention to the database documentation can be the best invertment in time a development group can make. It is essential, and no system can be properly maintained without it. Feodor gives a sensible explanation and guideline for the unloved task of creating database documentation.

Database documentation often seems to be one of those uncharted territories, unknown grounds, lands of trolls and even monsters. In my experience as a DBA, some 80% of production databases don’t have sufficient documentation. This seems quite well accepted, perhaps even as an implicit rule. It seems that the larger the organization and the management overhead therein, the more cursory the documentation is.

What is the purpose of the documentation (the WHY)

Just as there is sufficient documentation in the cockpit of every airplane, so, in a production database, there should be the equivalent of maintenance checklists, pre-flight checklists, and onboard manuals that give the detail of all the routines that should be executed in case of a significant malfunction or untoward event. On an airliner, if one of the thousands of indicators on the dashboard lights up, the co-pilot checks the documentation, determines the reason for the event, and starts executing a routine to correct the reason.

So it should be with an IT organization; even if failure doesn’t cost lives, the risk is as high. The mission of the DBA team can be critical to the survival of an organisation.

Database documentation is important because it meets several needs. For example, it:

  • Provides a common language between business decision makers and IT personnel – two different breeds, usually with no common language, but with a shared ability to read a nice flow chart and discuss it.
  • Provides a shortcut to finding ‘hot-spots’ – by looking at a global functionality chart, one can easily outline the most troublesome parts of the system. In this way, educated decisions can be made about hardware and software requirements, and the outcome can be easily communicated to the financial department.
  • Facilitates a ‘no-panic’ rule – by having proper documentation and using it, the chances of making a wrong decision are diminished because the risk is easier to assess.
  • Makes maintenance easier, and reduces risk when extending or upgrading a system.
  • Reduces training costs, by acting as a mediator between newcomers and existing staff. For example, when a consultant or a new hire comes to the company, having up-to-date documentation reduces the time required from the existing staff to transfer knowledge.
  • Improves productivity of both newcomers and seasoned employees, reducing the likelihood of costly misunderstandings by providing a glossary of commonly used terms, naming conventions, and even commonly-used strategy patterns.

Who will use the documentation (the WHO)?

In a company, we usually have quite a few roles participating in the development process: testers, developers, team leaders, project managers and so on. Whatever the organizational structure might be, the database documentation proves necessary as a way of coordinating the development activities and should be accessible to all involved.

Of course, when it comes to the maintenance of the documentation, it is most likely to be a common task of the testers and the developers to keep the documentation up-to-date.

Aside from the technical staff, the documentation is a powerful tool in the hands of the business executives; the people who vote on the budget and decide on the company’s priorities.

Here are couple examples from everyday life:

Business people and upper management staff respond to pictures and well-documented proof as to why they need to make a decision and/or release some of that precious budget. If you walk up to your manager and say ‘we really need a new server, because…’, the manager will look at you and will say ‘Hmm, we’ll see.’ And that’s about it. However, if you walk up to your manager and back up your request with a picture of the environment highlighting the hotspots in red, you will get a more positive response.

Very simply put, management tends to respond very well to highlighted reports and schemas. Nevertheless, you should keep in mind that however ‘pretty’ a schema, it still needs a solid base of factual data and impeccable documentation logic in order to offer a complete picture.

The database documentation must cater for a variety of audiences, so there must be different representations of the same documentation for each target audience.

Walk up to a developer or IT person, with a ‘pretty’ diagram, highlighted in red, and they will almost certainly frown. This is simply because a developer or an IT specialist is usually first on the line and, logically, when they see red it tells them that they will have some kind of trouble to deal with. It is a compelling message. You will be confronted by a defensive wall even before you have started talking, a wall that will take time to get around. Technical personnel tend to react much more favourably to a factual consolidation of application logic, instead of business mechanics highlighted in red

In conclusion, database documentation is important for a wide range of roles: we have already mentioned technical staff (developers, DBAs, testers and so on), business executives, and IT specialists, but there is a whole other set of people involved in the work process, such as external consultants (on the business or on the IT sides), auditors, and even potential buyers of the company. However, it is important to remember that the representation of the documentation has to be tailored according to the target audience.

Documentation types (the WHAT)

The most common classification of the different aspects of database documentation is by tier: for example, application logic documentation, database logic documentation. There is also hardware environment documentation – installation, maintenance, DR (disaster recovery) planning and testing.

Different levels of documentation exist at different stages of a project. For example, there is a project planning documentation, which is used as a common language between the business department and the technical department; there is development cycle documentation during the development of new system modules; finally, there is a deployment documentation, which ensures that releases and release upgrades are not an unpleasant surprise for the end user.

The process of defining the documentation model is critical, as parts of it will reflect the unique requirements, goals, and resources of the particular organization. It will vary according to the point in time at which the documentation is created; whether, for example, it is part of the development process, or whether it is a retrospective gift for the tenth anniversary of the existence of the project or organization.

Methodology (the HOW)

Before starting the documentation

There is probably no company or organization which does not have some kind of documentation. At the very least, there will be some financial information about the project, its participants, and its purpose. Quite often though, the existing documentation is scattered throughout the organization in various forms and versions.

Start by locating whatever documentation exists already. This could be in a ‘volatile’ state in someone’s head, on someone’s workstation, on a common shared directory, in a central management system etc.

Also, try to locate any person who is directly involved in the development of the system, or who has been in touch with the vendors and has been passing application specifications and requirements to them.

The questionnaire

Nowadays, there are probably very few businesses that have a one-to-one correspondence between application and database. A database might serve several applications, and an application may span several databases.

In complex environments, we usually see many applications, related to different departments, developed and maintained by different people. This would be perfectly fine if the applications did not depend on each other and the company could afford one server (or one clustered instance) per application. Wouldn’t this be nice? In reality, however, the environment is shared, sliced and diced, the hardware is exposed to different kinds of workloads and the overall responsibility has to reside somewhere.

In order to get a clear picture of an undocumented or poorly documented system, we would need to create a standard questionnaire and require all the application owners to fill it in, so as to gather the basic information about the application, its description, demands and expectations, as well as technical and owner contact information.

The general topics touched in the application/database questionnaire should be as follows:

  • Responsible contacts: owner, technical, other
  • Performance: expected load, peak times
  • Scalability: initial and expected data growth
  • Security: auditing, encryption
  • Compatibility: collations, versioning (which SQL Server version is the application tested for, etc)
  • Expected network bandwidth needs
  • Network connectivity specifications
  • Other: depends on individual business priorities

The detailed documentation

In a standard organization we would usually have several modules or departments, and their data would probably overlap.

Let’s say, for the purposes of this example, that we have several modules such as a user interface (e.g. a banking system with a web UI), a financial/billing department, customer support, and an administrative interface.

Each department will consume data from the same source, but with different requirements: the user interface will always use the ‘hottest’, most current data, plus some historical data. The financial/billing department will deal with reports on the data from the previous day backwards, mostly aggregations and analytical dimensions. And so on.

The point is that the documentation should eventually bring an overview of the processes of the entire system and the way the data is treated by the different departments, which will make the changes in the system easier, better, and more reliable.

The priorities of the attributes of the application, such as performance, security, and scalability, are unique to each organisation. Therefore, the results of the questionnaire should be discussed inside the organization in order to clarify the relative priorities.

This is a good time to mention that there are plenty of methods of documenting a system; the idea of this article is not to list the various methods, but to outline a ‘bird’s-eye view’ methodology of what to document in an organized manner, no matter what approach is used. I strongly recommend the Red Gate Guide to SQL Server Team-based Development, which will give you plenty of options and detailed examples of the database development, documentation and refactoring.

There are various tools that can be used to document a system – visual tools, analytical tools, metadata tools and so on. Whether you use Red Gate’s SQL Doc, SQL Dependency Tracker 2, MS Visio, or even Adobe Photoshop, the following points will give you a brief idea of what to document.

As I suggested earlier, you should look into how processes are defined in the organization, establish the order in which they should be documented, and attempt to identify the main aspects of the system and sort them by importance.

A documentation template

Here is an example of the contents of a process documentation template:

  • Process flow description: A short introduction, what is the flow for each process? How is it related to the system as a whole?
  • Flow classification: What characterizes the process? What makes it similar or distinguishes it from other processes?
    • data traffic and the load on the database:
                      <insert performance metrics here>
    • primary users: …
    • etc …
                     
  • List of objects involved in the process:
    • file group: …
    • schema: …
    • full-text catalog: …
    • etc …
  • Business logic flow paths: Is this process wrapped in a bigger piece of application logic? Is it a multi-step process itself?And so on…
  • Business logic restrictions: What conditions exist in the application layer in order for this process to take place?
  • Expected input parameters based on the business logic flow and restrictions:
    • prerequisites
    • mandatory information
    • optional information
  • Output: What is the expected outcome of the process? For example, returns some information to the user/application, confirmation, and so on.
  • Error handling: How are the errors handled? Is there a ‘try-catch’ code? And so on.
  • Create diagrams, flow charts, dependency lists etc â⬔ this will make it easier for people from different backgrounds to talk about the process.

When the template is ready, start filling it out until you have all processes documented.

Where should the documentation be kept and how should it be distributed (the WHERE)?

To recap, here are the most common documentation storage methods in a company:

  • in the mind of the senior developer or system designer
  • in a file on some computer
  • in an unknown share
  • in a centralized repository or under source control
  • in a database
  • in a specialised Wiki

It is generally up to the organization itself to decide which way is best; however, you can guess that the risks of the first few storages mentioned above are much higher than the rest. What I would recommend is to use a content management system which allows searching and indexing.

How to encourage the use of the documentation (the PERSISTENCE)

Of course, creating the documentation is just part of the road to ultimate happiness in database development. The rest of the path is to encourage the use of it.

Again, it is up to the organization to encourage the use of the documentation in meetings, discussions, and in the decision making processes.

Keep in mind that proper documentation can increase productivity and minimize the risk of incorrect decisions.

How to make sure the documentation is up to date (the RELEVANCE)

It would be so nice if we could use Policy Based Management to track if changes introduced to the system are also well documented… It would save a lot of grief in production if this were enforceable.

Well, unfortunately, it does not work this way. It is up to the developers and system administrators to evaluate changes and document them as they happen.

Depending on how the organization is structured, it might be a good idea to introduce rules that enforce teams to sign off their output by presenting the documentation for it. Or, if the organization can afford it, there could be a separate role defined in the organization for ‘documentation officer’.

Keep in mind that out-of-date or inaccurate documentation is almost as dangerous as the lack of documentation.

Conclusion

Documentation is a time-consuming, but rewarding process. It is also unique to each organization and the way it operates. The quality of documentation depends greatly on resources and time allocated to it and the priority it is given.

After reading this article I hope you have developed your own ideas on how to develop and use documentation, as well as a clear vision of how you will benefit from it.