Representing Hierarchical Data for Mere Mortals

Why is it that we use XML, but with so little enthusiasm when it does so much, and is so feature-rich? Phil Factor argues that there are better ways of doing it, more complete than JSON, but easier to read than XML. To try to convince you, he gives a set of flying demos, using PowerShell and his PSYaml module, to illustrate how YAML can let you work faster, and more accurately.

DBAs are often faced with having to create, maintain or inspect documents that contain hierarchical data and information. Sometimes, hierarchical information has even been shoehorned into a table or two, but at other times it happens for no better reason than the culture behind SMO and SSMS. SMO is resolute in recording SQL Server’s information an object-oriented hierarchical way, even though it is, within any relational database, entirely based on relational tabular data and Metadata. Codd insists on it.

The grey-muzzle DBA will, of course, have secure tricks for saving the more familiar tabular information, all the way from CSV to SQLite. Most of us, however, scratch our heads when faced with XML or JSON. The former is mark-up, tricky to inspect and over-ambitious in scope; the second is a subset of JavaScript, and inadequate for specifying data types in sufficient detail. My well-thumbed copy of ‘XML in a Nutshell’ runs to 680 pages. YAML is more intuitive to use, though: Although it is a superset of JSON, you can jettison those brackets and use an intuitive style that has been used for representing hierarchical information since before computers. A good YAML Deserializer will deal with JSON as well as YAML and output JSON dialect if you choose.

Yes, YAML is equally readable and updateable by either machines or humans. YAML has been slow to gain traction because it has never had a human-oriented textbook or adequate training materials. The extremely clever people who devised it weren’t natural communicators.

In celebration of the introduction of PSYaml, I want to demonstrate how you’d use YAML with PowerShell for recording server information or recording results that are easy to read into applications. (.NET developers will find it even easier because they can use  Antoine Aubry’s excellent YamlDotNet library directly). We’ll start simply.

Following Along

To follow along, you’ll need my PSYaml module. I’ve tried to make it self-installing and you’ll see it described here.

Getting Query Results from a Server in a Hierarchical Way

Simple queries with metadata

Let’s start with a simple PowerShell function that executes a query that returns a result. It not only records the data in a way that makes it easy to inspect, but also records when and where you executed the query and lists the table columns you’d need to store the data.

The indentation represents the nesting level. The number of spaces used at any level isn’t significant as long as the left margin lines up. The ‘-‘ sign within the left margin means that your data is a list item. Most collections of data are list items or key:value pairs. The latter are represented by the key being followed by a colon’:’ followed by a value.

You’ll notice that we are mixing up lists and ordered hash tables. In YAML, you can represent collections of data as:

  • maps, which are unordered set of key: value pairs without duplicates.
  • omaps (ordered maps) that are ordered sequences of key: value pairs without duplicates.
  • pairs: which are ordered sequence of key: value pairs allowing duplicates.
  • sets which are unordered set of non-equal values.
  • seqs (Sequences) consisting of a sequence of arbitrary values.

Generally, our data consists of structures of omaps and seqs. JSON has unordered sets of name/value pairs without duplicates (maps) and arrays, which are ordered collections of values, equivalent to a seq. The YAML documentation and reference card will show how to represent the other types.

We’re giving you the SQL column names at the top of the list and then providing each row as an ordered list. This is a compact way of storing longer results.

One problem with this rendering is that it is done in a style that is economical on space. Because I know the column headings, I’m doing each row as a simple list, as if it were CSV. You might something that is easier to read.

If I do this

Then I get each row looking like this:

Whichever way you do it, you’ll notice in the full listing that the columns in the ‘schema’ section at the top get a datatype as well as the column name. Here is the extract.

That suits me fine. With any structured document you can add all sorts of detail that are impossible with a tabular representation such as CSV. We tend to get around this with comments or descriptive filenames when there is no other way, but that boxes you in when you are doing automation work, so it is definitely a bonus to have this feature.

YAML isn’t alone in giving you this luxury. You can, of course use JSON or XML, Python or PowerShell. Here we’ll show JSON. Here is just one record.

This can, incidentally, be deserialized successfully with ConvertFrom-YAML, so you get the choice of either style.

Queries that return XML columns

You’ll notice that .NET’s provider split your SpatialLocation up into its components. With ConvertTo-YAML, we can do the same for XML columns too if you want.

Yes, you don’t see the XML, but a proper hierarchical rendering of the data. This is because the YAML converter is set to unwrap XMLDocuments as if they were objects. You can change that behaviour, of course if you wish to preserve the XML document. (I only show a bit of this YAML document)

If you are still wondering why anyone would find this useful, try running the same queries in SSMS just to see how informative the result pane is in comparison.

So, what does the query plan look like in YAML? Well, it isn’t exactly pretty, but it is a lot easier to scan by eye.

Just a sample here, of course. You’ll have to try it out to see more.

Here is the PowerShell code to get the top five slow-running queries from which I took that excerpt, and produce a YAML document from them, including the SQL and the execution plan.

Normally, you’d want to do a xpath query to just get the nodes you really want because there is a lot of clutter in an XML query plan.

You can, of course, easily create an XML hierarchical result that was created with the FOR XML PATH syntax. The only stricture is that you must supply a ROOT. You must also coerce it into an XML datatype as it is otherwise sent as ntext and can’t easily be recognised as XML.

Which gives.. (just an extract again )

The function that gave these results, Get-QueryResults, is this

Using SMO

Saving Endpoint Configuration Settings

Let’s now go to a slightly different task, recording configuration data. Endpoints, for example, can cause a problem, especially if you have a fairly subtle setting to outwit intruders. You need to record your settings in your CMS so you can reconstitute them on another server. We’ll start by getting hold of SMOs Server object, also known as
Microsoft.SqlServer.Management.SMO.server

Then we’ll get the endpoint information, but also use the opportunity to add the essential details about the server and the date the record was made. No need to worry about what extra information you record in the YAML document. If it is useful, add it because it doesn’t affect the ease with which we can import the settings as an object if we wish to transfer these settings to a different server.

This will give you something like this (I’ve only included only a couple of endpoints)

I won’t go into the nuts and bolts of how you save this to a file. If you’re unsure, I’ll show how in the next illustration.

Without sliding too far off-topic, you can store this endpoint configuration in source control and use it in a PowerShell script for building your server to host a database application. After all, endpoint settings are as much part of a database as any other aspect of access control.

Saving and Comparing Configuration Settings

Perhaps you just want to have your basic configuration settings written out in your CMS. (we will be using the same server object $srv in this code)

This will give you a simple list of your configuration data that can subsequently be saved or read from disk into a PowerShell object for analysis.

What use is this? Well, in this case, I’ve saved the information in a temporary location and I’ve saved it for a few days. I’ve then changed a setting. Can I find when I changed it? Well, yes.

Or you can let your source control system do the work for you, in which you’d definitely want to remove the date!

Now you have these on file you can access single settings from PowerShell. These get the setting for the cost threshold for parallelism for 27th September and checks the date on the file!

Getting the Quantities of each Object in Each Database.

So what about something a bit more spectacular? A stock-take of your database objects? Here is some simple code for getting the number of each type of database object from all the databases on a server. Once again, we’ll leave out the obvious details of creating and connecting the SMO server object and cut to the essential details.

So, on a nice clean instance, we get

We can, of course read this YAML into PowerShell and find out, for example, the total number of database objects, and the maximum number in any database

..to give…

Conclusion

As far as I’m aware, YAML is the only format or language of data documents that is designed for both machines and humans. It is for XML what Markdown is for HTML. As data professionals, we can no longer avoid representing hierarchical information, and YAML is a good compromise. Already PostgreSQL is using YAML for EXPLAIN plans. It is great for headers of function and procedure scripts. You can pack a lot of information into an extended property as well!

References

  1. PSYaml: PowerShell does YAML
  2. Converting XML files to YAML or PSON
  3. Getting Data Into and Out of PowerShell Objects
  4. JSON and other data serialization languages
  5. Documenting your SQL Server Database