Convert from XML

I have to deal with XML data with PowerShell. That’s fine, but I always find myself sighing when I need to read an XML object. I have a copy of ‘XML in a Nutshell’. It is over six hundred pages of essential information on XML. Some nutshell. XML is, I believe, a way of representing data that should be kept at arms-length, especially if the arm is reaching out of the window.

I want to convert reasonably small XML files to hash tables and PowerShell objects. PowerShell never had a ConvertFrom-XML Cmdlet because gulping a large XML file into a PowerShell data object is expensive in resources. It is the sheer time it takes to consume a large XML file. Instead, you have to use the XMLDocument object to navigate to the data you want or use an Xpath query. It is all well and good to handle XML in this way, but it is inconsistent to have no ConvertFrom-XML cmdlet. After all, there is a ConvertFrom cmdlet for CSV, JSON, and a variety of text-based data. It would be good to have one for XML as well. Usually, I just want to consume relatively small XML files and just pick out the data I want. I hoped that one that worked would turn up but somehow it never did. So I wrote my own.

There are certain problems with tackling a routine that has to successfully convert all the permutations of XML into arrays and hashtables. XML doesn’t handle arrays natively but implies them by assigning them the same keys, it allows empty elements, or elements that contain only other elements. There is no built-in concept of NULLs. It can have elements that contain only text, or that mix text and elements. Additionally, attributes don’t have any intrinsic order whereas elements do. It is interesting to see how the online conversion utilities fare. There is little consensus about this.

In addition, the requirements of users vary. How do you distinguish attributes? Do you prefix them with a character such as ‘@’ or ‘-‘. Do you show the document element? You will soon understand and appreciate how difficult it is to consistently interpret XML.

XML is better understood as a document language that can be used as a data description language but it is too open-ended to be optimal for data-interchange. Because it is so open-ended, there are fewer certainties as to how it is used for storing data. This makes it more difficult to produce a function that renders any XML file as PowerShell. Hopefully, this is one of those routines that can be improved by experience.

Testing this routine has been an interesting experience. The method I’ve used is to take a range of XML files, an pass them through some online XML to JSON translation systems. I pick the one that seems the best fit. Then I take the output of this routine and check that it produces the same JSON, using the ConvertTo-JSON cmdlet.

Here is a sample of the tests, which are placed in an array and executed in turn. Any problems, and a warning appears.