PSYaml: PowerShell does YAML

PSYaml is a simple PowerShell module that I’ve written that allows you to serialize PowerShell objects to “YAML Ain’t Markup Language” (YAML) documents and deserialize YAML documents to PowerShell objects. It uses Antoine Aubry’s excellent YamlDotNet library

To start, you can simply load the PowerShell file and the manifest from its home on GitHub PSYaml into this directory or use a script that I provide in the next listing

I do this initial load, or update, via a script like this.

Beware that you need to be clear about your execution policy before you start and check the file before you load the module. Once you are ready, you can load the module into your PowerShell session like this.

The first time you do it, you need to be connected to the internet so it can load the latest version of the YamlDotNet library from NuGet.

Once the module is in place and working, you can execute code like this

to give you a YAML representation of the data that is easy to assimilate.

Try it with something like Format-table and you’ll probably agree that there is a place for rendering hierarchical information in a human-oriented way.

YAML and PowerShell

When you need to use structured data in PowerShell, you have to think of writing it out – serializing it – and reading it into an object– deserialising it. You’ll hear talk of serializing objects, but really, you’re only serializing the data within it, such as properties, lists, collections, dictionaries and so on, rather than the methods. In a compiled language, a serialized object can’t do anything for itself once it has been deserialised and re-serialised. It is just a container for data. PowerShell is unusual in that it can include scripts in objects, as ScriptMethods and ScriptProperties, so it is theoretically possible to transfer both between PowerShell applications, but this is out of the scope of this article.

You’ve got some choice in PowerShell of how you serialize objects into structured documents, and back again. The two built-in formats are XML and JSON. I’ll be showing you how to get to use a third: YAML.

Why YAML?

You’d need a good reason for not using XML. It is the obvious format for juggling with data. PowerShell allows you to query it and treat it as an object. If you use XML Schemas, you have a very robust system. The downside of XML is that it is complex, arcane, and the XML documents can’t be easily read or altered by humans. It can take a long time to process.

JSON is popular because it is so simple that any language can be used to read or write it. The downside is that it doesn’t do much, and has a restricted range of datatypes. You can’t actually specify the data type of a value, for example. It isn’t an intuitive way of laying data out on the page. YAML is a formalization of the way that we used to lay out taxonomies and forms of structured data before computers. It is easy to understand. When you start doing bulleted lists within lists, it starts to look like YAML. As far as readability goes, here is YAML document

And here is the same in JSON.

I haven’t really the space in this article for the XML version.

YAML is officially now a superset of JSON, and so a YAML serializer can usually be persuaded to use the JSON ‘brackety’ style if you prefer, or require, that. The PSYaml module has a function just to convert from the indented dialect of YAML to the ‘Brackety’ dialect aka JSON. Beware that not everything in YAML will convert to JSON so it is possible to get errors in consequence.

which will give …

YAML also allows you to specify the data type of its values explicitly. If you wish to ensure that a datatype is read correctly, and Mr and Mrs Null will agree with me on this, you can precede the value with !!float, !!int, !!null, !!timestamp, !!bool, !!binary, !!Yaml or !!str. These are the most common YAML datatypes that you are likely to across, and any deserializer must cope with them. YAML also allows you to specify a data type that is specific to a particular language or framework, such as geographic coordinates. YAML also contains references, which refer to an existing element in the same document. So, if an element is repeated later in a YAML document, you can simply refer to the element using a short-hand name.

Another advantage to YAML is that you can specify the type of set or sequence, and whether it is ordered or unordered. It is much more attuned to the rich variety of data that is around.

I use YAML a great deal for documentation and for configuration settings. I started off by using PowerYAML which is a thin layer around YamlDotNet. Unfortunately, although YamlDotNet is excellent, PowerYAML hadn’t implemented any serialiser, hadn’t implemented data type tags, and couldn’t even auto-detect the data type. As it wasn’t being actively maintained, and was incompatible with the current version of the YamlDotNet library that was doing all the heavy work, I wrote my own module using YamlDotNet directly. Note that there is a very good JavaScript YAML that works in Node.js but this isn’t going to be any help to use in PowerShell. PSYaml gives you a lot more

You merely load the module:

and you will have a number of functions that you will require.

You don’t really need a special module, of course. Using YamlDotNet directly isn’t a big deal if you don’t want to bother with PSYaml. You just need to import a single library. To get hold of the latest version of YAML.net, you should get it from NuGet. You’d get hold of Nuget.exe and run

Don’t worry about this unless you would like to work directly with YamlDotNet for special purposes. In my module, I have a function that does all this for you and allows you to keep up-to-date with the latest version of YamlDotNet.

In our simple PowerShell script we load this library

And we can then create some simple functions

This will give us the basics. Naturally, there is a lot more we can, and will, do; but this will get you started. Of course, this is all done for you in PSYaml and you can access these very functions.

Now we just want a simple YAML string to test out the plumbing.

So let’s create a PowerShell object, and convince ourselves that it can read it in correctly by taking the object it produced, accessing properties from it and then outputting it as JSON.

You should get the simple invoice back again. Job done? Well, possibly, but if you need to process the results in PowerShell, you may still hit problems. You’d expect, from using ConvertFrom-JSON, that this would work:

But it doesn’t. What is also bad is that in the PowerShell IDE, you haven’t got the intellisense prompt for the object either. You want the equivalent of this to happen with YAML

…and whatever else in terms of accessing the data via dot notation that you care to try. The problem is that the YAML deserialiser creates NET objects, which is entirely correct and useful, but it is just more convenient to have PowerShell objects to make them full participants.

Refining the Deserializing process.

Generally speaking, a good library for parsing and emitting data documents does so in two phases. The main work on a string containing XML, YAML, CSV or JSON is to create a representational model. The second phase is to turn that representational model into real data structures that are native to your computer language.

In the case of YAML, you can have several separate documents in a single YAML string so the parser will return a representational model for every data document within the file:. Each representational model consists of a number of ‘nodes’. All you need to do is to examine each node recursively to create a data object. Each node contains the basics: the style, tag and anchor. The mapping-style of the node is the way it is formatted in the document, The anchor is used where a node references another node to get its value, and a tag tells you what sort of data type it needs, explicitly. This will include ‘omap’, ‘seq’ or ‘map’, where the node contains a list, sequence or a dictionary, or ‘float’, ‘int’, ‘null’, ‘bool’ or ‘str’ if it has a simple value. You can specify your own special data, such as coordinates, table data or whatever you wish.

A typical YAML library will parse the presentation stream and compose the Representation Graph. The final input process is to construct the native data structures from the YAML representation. The advantage of this is that you can then specify how your special data types are treated in the conversion process. Because YAML is a superset of JSON, you still have to allow untyped values that then have to be checked to see what sort of data it contains.

Here is a routine that takes as a parameter a representational model and converts it into a PowerShell object. It is easy to check this by converting the resulting object to XML or JSON or even YAML.

In order to use this, all you need to do is to load the text of the YAML document into a YAML stream.

So there you have it. We now wrap this last code in a function and we have a PowerShell module that we can use whenever we need to parse YAML. I won’t bother to list that here as I’ve put it on GitHub for you.

I also have added ConvertTo-YAML, because this is handy if you need plenty of control over the way that your PowerShell objects are serialized. Some of these objects are very unwieldy, with a lot of irrelevant information, and if you try serializing them without any sort of filtering, you will accidentally contribute to the Big Data crisis.

Last but most important, I wanted a way of loading a third party .net library into a module from nuget. I therefore added a function to add the library using add-Type, but which checked to make sure that everything was there first, and load it in the right place if it wasn’t. You can call it explicitly to check that you have the latest version of YamlDotNet. If it breaks something, you just delete the directory that it put the new version in: The module always loads the latest version in the YamlDotNet directory that it can find.

Simple Example of use

Here is a way of producing a YAML result from any SQL expression on a database

 This will give the result (just the first three rows)

So what is the point of all this?

Besides the fact that it is an intuitive way of representing data, one of the most important advantages of YAML over JSON is that YAML allows you to specify your data type. You don’t need to in YAML, but it can resolve ambiguity. I’ve implemented the standard YAML scalar tags of timestamp, binary, str, bool, float, int and null. if there is no scalar tag, I also autodetect a string to try to get it to the right data type.

YAML also has a rather crude way of allowing you to represent relational data by means of node Anchors. These A have an ‘&’ prefix. An alias node can then be used to indicate additional inclusions of the anchored node. It means that you don’t have to repeat nodes in a document. You just write it once and then refer to the node by its anchor.

I find YAML to be very useful. What really convinces me of the power of YAML is to be able to walk the representational model to do special-purpose jobs such as processing hierarchical data to load into SQL. It is at that point that I finally decided that YAML had a lot going for it as a format of data document.