Getting Data Into and Out of PowerShell Objects

You can execute PowerShell code that creates the data of an object, but there is no cmdlet to generate the 'object notation' code from an existing PowerShell object; until now, that is. Phil Factor also produces a ConvertTo-YAML function and explains how they both work, with illustrative code

When you need to pass the data of an object, in all its hierarchical intricacy, to another process,  or save it to storage, there was, until recently, no alternative to XML for ‘serialization’ other than ASN.1 (Abstract Syntax Notation One).  However, every computer language has a way of describing and representing data structures. PowerShell has its own terse, but powerful, style  of  object notation, richer than JSON, and easier to comprehend than XML. Why, I thought, was there no ConvertTo-PSON (PowerShell Object Notation) that would give us the means to get the PowerShell script for the data from objects, in a format that is capable of recreating the object, data-wise; Just like JSON.stringify()?. Why not, when one comes to think about it, is there no ConvertTo-YAML to make is easier to inspect this data as if it were in a hierarchical list?

‘Why not?’, I muttered, as I strode to the keyboard.

Object Notation

Any data object, in the same way as  a database table,  is pretty useless without the means to easily get the data into or out of it. To transfer object data across a network, or to save it in a database or file, it has to be ‘serialized’ into a representation of its object hierarchy, generally in XML.  When it is rehydrated, or ‘de-serialized’, the reverse process recreates the object hierarchy. In PowerShell, this sometimes happens under the covers when accessing remote object data.

We are most familiar with JSON as an object notation, because it is increasingly used for data exchange and storage, but it started life as being almost 100% standard JavaScript.  It was valuable because it was so easy to ‘de-serialize’ because one could, if one was feeling reckless, merely execute it as JavaScript code.  You can produce JSON in JavaScript with JSON.stringify() function. JavaScript isn’t the only language where one can pull this trick: Every .NET language has its own way of representing the data within objects for the purposes of construction or persistence.  PowerShell is no exception:  Instead of the JSON array notation ‘[‘ and ‘]’, you have ‘@{‘ and ‘}’, and the ‘{‘ ‘}’ blocks need a  ‘@ in the front ‘@{‘ ‘}’, to make it into a hash table. The colon ‘:’ becomes the assignment operator ‘=’. (<Pedantry> PowerShell officially doesn’t have an array notation,  the @(  … ) is an array sub-expression.</pedantry>)

I used the term ‘reckless’ to describe the old habits of executing Object Notations. JSON and PSON (and to a lesser extent YAML) have a fundamental security weakness. Although a JSON document could merely be executed in order to create the JavaScript object, any JavaScript would be executed. This would be an opportunity for a malicious hacker to get code executed.  The same is true of PSON. You just execute it with invoke-expression.  It is just too easy to slip in malicious PowerShell code. With JavaScript they closed the exploit by adding JSON.parse(), which is now in the  ECMA-262 standard.  I know of no way of doing this with PSON. I put in a rabbit-proof fence of Regex before doing anything like this.

PowerShell’s PSON  (PowerShell Object Notation)  is more comprehensive than JSON  in that it allows you to specify the datatype, and allows many more datatypes.  YAML, by contrast, was designed to be as easy as possible for humans to read, but it has made it hard to create a parser for it.

So, the classic JSON example

 … becomes the similar-looking PowerShell equivalent …

…as you will see if you then pass it through ConvertTo-JSON -depth 4 to get...

In YAML, this becomes

There are other good reasons for using YAML as well. I use it for embedding information in routines and procedures, it is excellent since it can be read easily, and updated automatically. It is great for document headers for the same reason. YAML is just so close to existing conventions for writing structured information that it has many uses.  PowerShell shouldn’t be without it. To de-serialize YAML, I use YAML.NET.

Taking it for a spin.

Lets just see what a SQL Server table looks like in YAML. We’ll just grab a table in PowerShell and examine a few rows.

Here is just one row from AdventureWorks contact table.

…and here are the first five rows from the production.Location table in PowerShell (PSON) instead.

Here is a bit of RSS feed rendered the same way. (the PowerShell script is in the  header of the article).

Creating a ConvertTo Cmdlet.

There was once only one available object-notation conversion in PowerShell and that was to XML using ConvertTo-XML. You can access XML directly using dot-notation. To create JSON output from a PowerShell object, you can now use the built-in ConvertTo-JSON, and there is a symmetrical ConvertFrom-JSON.  There is no equivalent ConvertTo-PSON or ConvertTo-YAML, but It is comparatively easy to create them. Theoretically, you don’t need a ConvertFrom-PSON because you can just execute the string as a scriptblock though in reality it would be wise to have one to prevent a security loophole. You can convert from YAML with YAML.NET.

Here is a quick demo that shows a PowerShell object being converted into a string, which is then executed and finally turned into JSON to check that nothing got lost!

Since the built-in ConvertTo-XML and ConvertTo-JSON both do a lot of  work to produce their object notation objects, it would seem, at first glance, wisest to just use these, with some clever Regex Strings to subsequently  interpret into PowerShell object notation.  However, although you can get quick results this way,  it starts to get complicated when you try to refine it: You are limited in what you can do with PowerShell data types such as script blocks and XML. Likewise, you can implement a ConvertTo-YAML very quickly using YamlNet or SharpYAML but YAML can be represented in a number of ways (it is effectively a superset of JSON so you could use ConvertTo-JSON!) and it is likely you’ll want more control. (I use Scott Muc’s PowerYaml)

In creating these Cmdlets, I chose, instead, to use a recursive routine that could keep a count of the recursion level for formatting purposes, and which  iterated through the arrays and hash tables using ForEach.   This gives a lot of freedom in choosing how the various types of data are displayed.  You can use a strict ‘canonical’ form that specifies the datatype to avoid ambiguity, or use a looser, more readable form. You can indent the code as you wish. Just alter the function to taste.

The only real difficulty I hit was in dealing with the typical PowerShell objects such as Process ­Thread ­Collections,  or SMO database classes.  It is easy got get this information, held as properties, but there are huge numbers of them.  There is little consistency in the way the various Cmdlets deal with this.

Try this just to get a flavour of the enormous amount of  data in just the first contained objects of a System.Diagnostics.Process object , try this..

Even if you could, It is most unlikely that you’d need them all since many are obsolete.  I ended up reckoning that,  for  ConvertTo-PSON and ConvertTo-YAML that it is better to  specify the parameters you’re interested in, using the S

Just to show you that I’m no wimp, I settled on an algorithm that prevented any further recursion on a property of a complex object, and merely told you what the value was if it is a simple leaf value, or else the type of object it is. This is what ConvertTo-XML does. This still makes the lights dim when it hits something like an SMO Database or Server object. It is much better to tell it what you want via Select-Object.

. You can render the object represented by the XML or you can do it just as an InnerXML string. The former method allows us to convert XML files directly into YAML or PSON.  Useful? Sure. It is easier to demo than explain. First  rendering it as an object…

…giving …

and the other way, simply giving it as an XML fragment, (note the parameter if you want that behaviour)  is.

… giving …

So, here is the basic function to produce YAML from a PowerShell object, ranging from an integer to a complex object.

How it works.

ConvertTo-YAML and ConvertTo-PSON are almost identical. There are just to many differences to make it worth trying to fold it all into one function but it should  give you a start on how to do a ConvertTo-CSharp or ConvertTo-VB.

This is a recursive function that just returns the representation of whatever object is passed to it. If it is a simple data value, then it is rendered according to the rules for the notation. The only difficulties come with strings, XML, bitmaps and so on.  If it is a complex object, then there are a number of ways it can be shredded to get at the individual elements. Where the object is a hashtable of array, one merely needs to  pass it through a pipeline that explores and returns the branch. There could be all sorts of nested arrays and hashtables in there. Different types of objects require different treatments,  but the various alternative pipelines are somewhat similar.

The function is done as an advanced PowerShell function so it can be used in a pipeline  as well. Where possible, I’ve made it work in a similar way to ConvertTo-JSON. The only difference is in the handling of complex objects, where ConvertTo-JSON and ConvertTo-XML relies on the -depth parameter to stop you from returning huge elaborate structures. I preferred just to handle them by stopping at the first level at which it found properties in complex objects. (Depth=1). I’m uneasy about NET properties and the way they’re abused. Properties generally supposed to expose the internal state of an object, which is what we want. Functions, which we avoid, return a query, or modify the state. Unfortunately, in.NET, there are plenty of functions disguised as properties.

Different flavors of Object Notations.

Because of the different requirements of any object notation, it is difficult to package up a universal solution without a bewildering number of  switch-parameters to specify what you want. It could be better to roll your own function based on a working ‘core’. One could, of course, go entirely the other way and produce a function that will serialize to any object notation and positively bristles with options. I’d rather not.

Formatting

There  are many different ways of formatting JSON. The same is true  of PSON, because the indenting  is pure decoration. You can leave out whitespace entirely outside string literals to save space, or have formatting optimized to make it easy on the eye.  Few will agree on the best way of doing this. YAML is different because indenting is used, Pythonesque,  instead of bracketing to indicate nesting. YAML has its own creative outlets since it allows JSON as a subset of YAML. Lists can be bracketed and comma-delimited.  Long strings can be represented in a number of different ways depending on how easily you’d like the YAML to be read.  In fact, YAML allows  so many different choices that the creation of a YSON deserialiser is extraordinarily difficult. Yes, there are probably  too many options and preferences to make a simple built-in serialiser entirely useful.

Datatype definition

As well as formatting, both PowerShell and YAML allow a great deal of latitude in the way that a datatype is specified. In both cases, it can be left implied. It is reasonably easy to tell the difference between an integer and decimal number, and one can represent a float in a way that makes it obvious.  Other datatypes are less obvious and can be specified.  YAML has a canonical form that is more exact but more readable. PSON can, likewise, be liberally sprinkled with Type accelerators in square-brackets to specify the datatype unambiguously.

Object-Depth and hierarchical-depth

Even though .NET objects can contain simple hierarchies consisting of simple values, arrays and hashtables, they can also have  contained objects,  and so act more like graphs/networks.  If you explore them in too much depth you can end up in all sorts of difficulties, particularly with loops.  Simple properties can be disguised as functions. The built-in Cmdlets  ConvertTo-XML limits depth  of contained objects arbitrarily to a set figure which prefents infinite looping.  The default value is 1; Anything more than that produces a cacophony of useless information. Worse, one can get into an endless circular reference.   ConvertTo-XML has a proper understanding of what a contained object (Data structures like hashtables and arrays aren’t counted as contained objects)

whic

Whereas ConvertTo-JSON has an entirely different concept of what a contained object is. It seems to have  equated hierarchical nesting level  with depth and produces an entirely different result.

… producing …

In the ConvertTo-YAML and -PSON routines,  I only iterate through the parameters values of  complex objects as a last resort and then only as a leaf. I’ll describe the nature of the leaf but not shred it. This is equivalent to ConvertTo-XML at level 1. This way I can make a reasonable representation  of one of those objects without going too far  into the details. There was a time I removed that constraint and kicked up the depth figure just to see what happened. I got the scrolling screen of terror, It looked for all the world as if half the Wikipedia and several binary images  went scrolling up the console.

 There is another reason for caution: Some SMO objects use lazy loading, and to iterate through all the possible data ‘within’ contained objects would bring the SQL Server instance to its knees. They were designed to be eaten in small bites.

More irritating for me than the byzantine complexity of many windows objects was seeing that the CMDlets such as Format-table are able to display just the more important property values of a contained object. Perhaps a kind reader can tell me how they know.  At that point I felt there was enough complexity for an article.

The point of recounting all this is to illustrate the suggestion that there may be some virtue in ‘rolling your own’ object serialiser for special purposes.

Conclusions.

I hope I’ve shown that there is nothing hideously difficult in ‘stringifying’ object data into forms that can be consumed by other processes, or for the purposes of searching or exploring data.

I’ve had mixed motives for doing this. I wanted to show how to use PowerShell to read and write YAML in headers , and to use PSON to gain object persistence in PowerShell. I wanted to suggest a more economical way of passing PowerShell Objects across a network. I also wanted to hint that instead of using XML or JSON as a universal ‘esperanto’ data transfer format, one could use whatever format is native to the destination.

To take a human parallel, it is easier with data transfer  to speak a foreign language than it is to understand one. To make this point, I’ve done two PowerShell functions to convert to PowerShell Object notation and YAML. To do it in Visual Basic or C# is easy. What about SQL? A ridiculous idea, you might think, but one can use the SELECT…VALUES statement to pass data, and one could represent a hierarchy in a hierarchy table.

I like the idea of databases being able to consume object data without the application developer having to feel responsible for the mental gymnastics required to convert the object viewpoint of data into the relational form. We currently have the technology to hand XML, in all its trickiness, through to the database to be shredded there into its relational data. Rather than repeat the process with other object notation formats, why not devise an intermediate form, a hierarchy table, that can be passed via its SQL object notation or a table-valued parameter to a stored procedure that performs the mystery of  updating base tables with the data in a transactional way, as we once dreamed of with the updategram.