Storing and Retrieving the Initialization and Configuration Data for Applications

Comments 0

Share to social media

All developers hit the problem of how and where to store and set their configuration, profile, or initial data. A long time ago, it was generally decided that simple text files containing key/values were best, stored with the application. After all, you are relying on being able to entice busy people to get the permanent settings right for their requirements, folks who are generally not interested in your elegant computer science constructs. Not only that, but the settings must be parsed very quickly and efficiently, otherwise a process that uses the tool will slow to a crawl.

Since then, this simple software device has been elaborated in many different, sophisticated and incompatible ways. We are less worried about performance now, given the superior hardware resources we have, and we have less cultural resistance to using complex data documents, even XML and JSON, for configuration data.

Although it is usually believed that configuration data can be private to the application it serves, a standard for configuration data is useful for any process that needs to know what settings are in place for all tools engaged in the process. When working with Flyway, for example, a scripted process such as a callback often needs more details than are offered as environment variables, so the process has to read the configuration to get the missing details it needs. Anyone administering a scripted database-deployment process will take a close interest in the settings for all the tools in the chain. The joy of a simple format for the users of a tool is that the tool’s settings aren’t then hard to read into a script, so that the script knows the tool’s settings.

The Simple Requirements for INI files

As well as the obvious requirement to read an entire INI file into an application, converted into an object such as a hash table, It must be possible to read, write and update individual key/value pairs, whilst preserving the comments and section order within the INI file.

The easy part of an INI file is in reading it. With the simplest type of .conf file, you just need to read in every key and establish its value. The Windows INI file established the use of sections, using square brackets. This means that the section name needs to be read so that the subsequent key-value pairs can be placed in into their correct section. Windows provides the interface to allow individual keys to be read, inserted and updated, without having to read in the whole file, which makes it easy to preserve comments when making changes, and that interface can be used in PowerShell. In Unix, the use of dotted notation for the keys of .conf files is as common as using sections, but this is relatively easy to read

Why no consistent convention for configuration files?

All simple software conventions have to be actively defended against over-complexity. Problems build up fairly quickly otherwise. Solutions require compromise, of course. But even obvious improvements can escalate. If you use sections in configuration files, for example, then a simple array of assignments soon turns into a hashtable. Once the average programmer smells the scent of a hashtable, you are getting the demand for storing hierarchies. After all, there are no fixed standards for INI, CFG, or conf files. The simple lists that you assign to keys soon become arrays that could suddenly, in turn, host hashtables. This all seems like fun to the propeller-head geeks, but soon the users of your tool, and the scripts that need to read your configuration, are frowning. You’ve lost the essential simplicity of configuration settings, and all these settings are likely to be simple, because they are traditionally provided as parameters in the command tail. This is sufficient motivation to keep file-based config information simple

Going back into the mists of time, the old configuration files were just comments and key/value pairs. They were easy to read, even when parameters were ‘folded’ due to their length. The only complication is in reading a setting only once, even if it is duplicated in different files, and in filtering out null assignments and comments. In PowerShell, once you have gathered all the content of your configuration files into a collection of lines, then getting a hashtable of the settings is as simple as this…

Windows refined the existing convention, allowing categories (Sections) and comments. For Windows programmers, the means of reading and writing of individual entries in an INI file were in every SDK, called the ‘Profile API’. The name of these configuration files, ‘INI’, comes from the filename extension ‘INI’ which was used widely.

In your Command-line application, you could treat the accessing of information to and from INI file casually, because the code that supported the use of INI files was highly optimized. You didn’t stray from the convention because it was so well-supported. If your application was dealing with complex objects in its configuration, it became usual to use XML files instead.

Linux, by contrast, inherited the more laid-back approach in that there was less effort to standardize, and the means of parsing the files wasn’t in the standard library. Configuration files are typically plain text files with variations in the syntax depending on the application. They generally use a format of key-value pairs, often with support for comments (typically using # or ;). However, the format can vary widely between applications and services. Linux systems provide a variety of tools and libraries, reflecting the diverse nature of Unix usage.

Windows tried to kill off the whole idea of using INI files in favor of using the registry. This allowed a hierarchical organization of configuration items and supported more complex values. It used a separate API, the Windows Registry API. This wasn’t a particularly popular move, but developers eventually assented, but reluctantly. Although the support for INI files was maintained for a while, Windows applications are encouraged to use the registry.

Whereas the INI file had a clearly defined standard, and clearly understood limitations, configuration files have continued to be developed and elaborated. They’ve been described as a ‘federation of dialects’, usually incompatible in complex and perplexing ways, some of them even being case-sensitive. TOML has represented the first organized attempt to create a standard: although it is easy to criticize the details, it demands respect because it is properly documented, and the implementations have an excellent suite of tests.

Why not use data documents instead

Why bother with an INI file now that data documents such as YAML and JSON are increasingly easier to read? Many complications inherent in data documents are unnecessary in a data document that just has to retain the current configuration data.

The simple answer is that INI files must be easy to read or updated, by both automated scripts and manually by humans. They are configuration files, not data documents. Despite recent enthusiasm for YAML, it is always a relief to the user that things are kept simple. There was, at one time, a clear and important distinction between an object serialization format such as JSON, XML and YAML on one hand, and a configuration format such as INI on the other. As well as the obvious requirement that INI files must be fast to read or write, there are more subtle distinctions: the value of a configuration item is always a string, which can be interpreted by the application it supports as a particular type such as binary, numeric, float, date or string, and also validated when it is consumed. This is an attitude that usually works well, but can occasionally cause problems, particularly with dates, but not when the application that produces or owns the INI file also reads it. It is only when different applications share configuration files that there can be a conflict.

If you need to keep a more complex configuration file in place for a PowerShell script, And it needn’t be particularly simple for a human to edit, then it is easy. Nowadays, anyone who is scripting in PowerShell, whatever the platform, can create a configuration object as a file in PowerShell object notation (PSON). This is made up of Array Literals and Hash table Literals like so: It isn’t particularly difficult to read or write.

You might leap up excitedly at this point and say that this is PowerShell code, and executing code from a file like this is a huge security risk. It is PowerShell code, but if the processor is created to convert this code into an object the right way, it is not an issue. The right way is to use the $scriptBlock.CheckRestrictedLanguage method to check the code first.

This is fine for configuring scripted processes. You can also write out in PSON (PowerShell Object Notation) with ConvertTo-PSON, but of course, you lose all those useful comments that you’ve added to your PSON file!

Extending the INI/Conf convention

There is a good case for carrying on using specialized configuration files, though it would need to be a dialect that allows sections and comments. It’s a well-tested convention. Why not just carry on using the classic format of INI files? The main reason is that the sections that contain the objects they describe are single level rather than being nested, and they can’t easily depict arrays. Also, multiline string values aren’t consistently supported. Binary data isn’t handled consistently, and there is no concept of a schema that can tell you whether the object can be validated as it is read in. Another problem is that there is no certain way of ensuring that everything that was intended has been read in. It can still be successfully parsed even if the file, message or string containing the text is truncated

Extending the traditional INI file

When you are scripting, you have to take whatever methods of use come from the CLI applications you are driving. It would be nice to think that there were standard conventions for passing parameters to CLI applications, or even for running them, but that doesn’t even apply to the ‘help’ parameter. Even the way that you execute them in Bash or PowerShell will vary. You can’t be picky. I need to read dialects of INI files that allow hierarchical ini files to be created, using section nesting, such as those of Glib, Python Configparser, libconfini and php as well as TOML. I also need to read Multi-line strings. All this is in TOML, but not many applications that use TOML files for configuration actually max-out on using all the features. Although TOML has many features, even arrays of hashtables, it is missing the preservation of one of the most useful nuggets of information, which is comments. Another aspect of the old INI system is now missing, which is writing to and updating values, whilst preserving the comments.

There are other features that come from dialects of INI that are generally useful and are occasionally used, such as Arrays, error detection in parameters and multiline strings. The reason to use the ConvertFrom-INI cmdlet is to be able to read other applications’ configuration files when I need them to read those configurations.

You might wonder why I don’t just implement TOML in its entirety. My objective as a scripter, and integrator, is to read as many different dialects of configuration file as possible, and TOML is no longer entirely compatible with the old ‘INI’ file standard, which is my primary target. My purpose is to read as broad a range of .INI files as possible, so features such as strict parsing of value types isn’t of so much value, and Inline Tables just muddy the waters.

With this in mind, I use a Cmdlet that has evolved to read in INI files in as liberal a way as possible. The code for the conversion Cmdlet is a bit long to publish in this article, and changes as I require extra features, but the latest version is here on Github.

ConvertFrom-ini will deal with the most widely used parts of TOML but not yet the entire standard.

Conventional INI file

A simple ini file is represented as a hashtable. This example has two sections, Owner and Database

and when represented in JSON gives this

Nested Hashtables

The traditional .conf or .ini file translates easily into a single-level hashtable. However, you may want to define hierarchies. There are two alternative ways of doing this, either in sections or in the key. In this case, we use in the [build] section, the dotted syntax at the beginning of the section name to denote a subsection.

Which will produce (in a JSON representation)

Arrays as values

Config items are often lists. Here, the value of flyway locations is presented as a delimited list. This is converted into an array of string values.

…will provide a hashtable whose contents will look like this in JSON

Conclusions

I can remember when I first discovered Douglas Crockford’s new and wonderful invention, JSON. The format is logical, simple and unbreakable. Its masterstroke is its simplicity, even when describing the data of complex objects. The configuration data of applications can be just as complex, and one yearns for an equivalent standard that is just as simple yet is as versatile. It must be a standard that is actively defended against any unnecessary elaboration.

However, from my worms-eye view of the technology, I have to be able, in a script, to read a configuration file easily, deal with things like ‘precedence’ (when default settings are overwritten within the application) and any arcane ways of representing the various settings. I also need to be edit configuration files, when necessary, without suffering ‘curly-bracket anxiety syndrome’

Load comments

About the author

Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Phil Factor's contributions