COM Automation of Office Applications via PowerShell

There need be no shame in using Office by automating it via COM. It was designed to be used that way, and with PowerShell, the various Office applications can be used as glorious output devices for data. Phil Factor uses some practical examples to try to persuade you to take the plunge.

There is something rather satisfying in using Office applications via COM automation in order to cut corners in development work. It can be a very useful way of providing functionality very quickly, and there is a lot of use locked within these applications. In this article, I’ll be giving a few illustrations of what can be achieved, with scripts that I find useful. I’ll use PowerShell, but you can get the same effect in a similar way in any .NET language, but with rather more effort. I’ve stripped each script down to the bare minimum so as to make it possible to embed them into this article. We’ll end by automatically producing a database build script within a Word document and drawing an Excel graph from the raw data. We’ll start a bit simpler, though.

Of course, we’ll use automation for silly reasons, such as getting a simple scripted way of generating speech

…or animating the awful animated paper-clip. (In the interests of public safety I won’t show you how to do this, but it is great for livening up PowerShell presentations. See Jeffrey Snover’s code to do this, on page 412 0f Bruce Payette’s book ‘PowerShell in Action’. )

For simple PowerShell scripting of Word and Excel, you’d probably prefer to use Out-MSWord by Jeffery Hicks (OUT-MSWord Revised) for using MS Word as means of  output. and the equivalent IMPORT-EXCEL and EXPORT-EXCEL However, we’ll try to go beyond this simple sort of usage.

Scripting MS Word

Microsoft Word is an obvious choice for automation because it is a glorious means of providing output. I once designed a web- application that emailed beautiful PDF invoices using Word. (it read in an HTML file generated by the application). I had it running within hours of an accountant embarrassingly asking ‘how does it generate invoices and credit notes’ at a presentation. After making it robust and auditable, it remained for the life of the application.

We’ll start off with taking a webpage and saving this as a Word document. Actually, for this example we’ll save a whole list of articles from Simple-Talk and save them with all their formatting.

1298-WordImport.PNG

Of course, you can save in a variety of formats, such as HTML, Text or PDF (if you have the necessary filter) Although this works pretty well, there is a quirk of Word that means that it sometimes makes a better job of the formatting and conversion if the document is pasted from Internet Explorer. OK. we can cope. By modifying the routine, It gives us some advantages because we can get more information about the file at the same time, such as the title of the document.

So what have we done? We’ve automated two key Microsoft products, Internet Explorer, and Microsoft Word. I’ve just taken a manual process and scripted it

The script….

  • Checks whether  the directory where you want to save the Word files actually exists, and if not it then creates it.
  • Fires up internet Explorer
  • Fires up MS Word
  • navigates to the correct page
  • Finds the next article number of the article you want to save as a word file
  • Creates a new Word file
  • Works out he URL and navigates to the page using IE
  • Waits until the page loads
  • Select all
  • Copy the whole rendered page onto the clipboard
  • Pastes the rendered page into Microsoft Word
  • Works out the title of the file
  • Checks to see if it is already saved in the directory. If so it deletes it
  • Saves the file and closes it
  • Finds the next article to save
  • If nothing else to save, quits

Before you creating a script like this, it pays to be very clear about what the process consists of. If the process is using Word, Excel or Access, you can record a manual process as VBA (Visual Basic for Applications), and convert the VBScript files to PowerShell. Other components such as IE take more fishing around in Books-On-Line to use.

Why do this elaborate way when MSWord’s own file-conversion isn’t too bad? The answer is that you may want to assemble parts of different websites, various images, or a collection of HTML fragments,  to create a document. Typically, it would be in making an integrated report from a number of places on an intranet. You can, instead of selecting the whole page, select just a part of it, and by selecting and pasting from various places you can concatenate a document.

You can create HTML, text or PDF versions of your MSWord files very easily. Here, a word file is being saved as a rather voluminous HTML file

To turn to a more practical usage of  Word automation, here is a cut-down version of a build-script generator I use to save SQL Server database build scripts as MS Word files. As generating a build script is rather slow, it is best done on the back burner, by running it on the scheduler. I also have a version in SQL Scripts Manager, for doing ad-hoc build scripts.

1298-Buildscript.PNG

Just page 115 of 225! The build script in a Word document. Shouldn’t it be in color?

You’ll see that  the process is extraordinarily simple. As everything is formatted the same way, I do that at the end of the import, simply by highlighting everything and formatting everything at once; automatically, of course.

Automating Excel

For me, the easiest way of getting data into Excel is by pasting it in via Internet Explorer, or reading it in as an HTML table fragment.  It can be done by iterating through the cells via script, but it is a relatively slow way of doing it. If you have a dataSet or dataTable, you can convert it to an HTML table very easily in PowerShell. There are other ways such as writing to the ExcelXML format via the .NET SDK.  No, me neither, I’ll stick to the wild way. Here is a script that illustrates the ‘wild man’ approach to pasting data into Excel. it is almost identical to the technique I used with Word.

1298-Quotes.PNG

Our data (quotes in this case, is imported into Excel and formatted automatically

You’ll have noticed that, just to show that we have been able to get in and format the columns that we’ve entered into Excel, we’ve changed the widths of the columns and made on italic, and the other one aligned bottom right. Excel has auto-detected the datatype of the values, and generally seems to get this right. (It sometimes needs a little help with dates)

Having proved that it is easy to get data into Excel automatically, we ought to do something more serious with it. In the next example, we’ll read in the weather data for Cambridge (UK) from 1961 to 2010. Then we’ll create two new computed columns, the first to hold the date, and the second to hold the rainfall in inches rather than the new-fangled centimeters.

Having done that, we’ll produce a graph and add a few features, such as a trend-line (the good news is there is no significant change in the rainfall figures)

The objective of all this is to prove that one can generate graphical reports from a simple data set with just a small amount of automation, but with more versatility than one can get from reporting services.

1298-RainDataImported.PNG

Here is the imported data and the two added columns that calculate the date and the rainfall in inches

1298-Graph.jpg

The automatically-generated Excel graph of rainfall. Click on it to see it full-size.

A few traps for the unwary.

Most automation examples are in VBS. Not many have been converted to PowerShell yet. None of the Office applications will record your actions as PowerShell scripts yet, which is odd, considering Microsoft’s keen adoption of PowerShell: You are stuck with VBA. There are one or two things that can cause puzzlement.

 COM collections are indexed through the Item parameterised-property. Visual Basic uses that Item property as the default if you don’t specify it. PowerShell doesn’t, because the general .NET interop method isn’t perfect, so occasionally you have to add it. PowerShell uses brackets instead of Visual Basic’s parentheses for indexing collections. COM collections follow the Microsoft standard of indexing items in a collection from 1 to n whereas  .NET and PowerShell collections use the index 0- n-1  for that. Another problem you’ll hit is that methods in VBA don’t require the empty parentheses, whereas in PowerShell, they do. You’ll spot this problem if you start getting PowerShell listing the properties of the object rather than doing the method!

More awkward than this is the problem that the recorded scripts have to follow the logic of the way you approach the GUI. Usually, you work by selecting the object you want to work with and then manipulate its attributes via a series of dialog boxes. This means that scripts tend to work on the same approach even when it isn’t the quickest and most efficient way of automating it. I find it is better to get something running fairly quickly and then rework the code to walk the object model rather than work on the selected object.

Sometimes, the documentation you find is pretty rudimentary. Microsoft technical authors for  Office Automation are notoriously tight-lipped. Fortunately, PowerShell is very good at telling you what is available at any point. An easy way to get to grips with a COM object is to get its members. This is a way of doing it. In this example, we are getting the members exposed for COM automation by Internet Explorer.

…Which gives the result…

  Name MemberType Definition
  ClientToWindow Method void ClientToWindow (int, int)
  ExecWB Method void ExecWB (OLECMDID, OLECMDEXECOPT, Variant, Variant)
  GetProperty Method Variant GetProperty (string)
  GoBack Method void GoBack ()
  GoForward Method void GoForward ()
  GoHome Method void GoHome ()
  GoSearch Method void GoSearch ()
  Navigate Method void Navigate (string, Variant, Variant, Variant, Variant)
  Navigate2 Method void Navigate2 (Variant, Variant, Variant, Variant, Variant)
  PutProperty Method void PutProperty (string, Variant)
  QueryStatusWB Method OLECMDF QueryStatusWB (OLECMDID)
  Quit Method void Quit ()
  Refresh Method void Refresh ()
  Refresh2 Method void Refresh2 (Variant)
  ShowBrowserBar Method void ShowBrowserBar (Variant, Variant, Variant)
  Stop Method void Stop ()
  AddressBar Property bool AddressBar () {get} {set}
  Application Property IDispatch Application () {get}
  Busy Property bool Busy () {get}
  Container Property IDispatch Container () {get}
  Document Property IDispatch Document () {get}
  FullName Property string FullName () {get}
  FullScreen Property bool FullScreen () {get} {set}
  Height Property int Height () {get} {set}
  HWND Property int HWND () {get}
  Left Property int Left () {get} {set}
  LocationName Property string LocationName () {get}
  LocationURL Property string LocationURL () {get}
  MenuBar Property bool MenuBar () {get} {set}
  Name Property string Name () {get}
  Offline Property bool Offline () {get} {set}
  Parent Property IDispatch Parent () {get}
  Path Property string Path () {get}
  ReadyState Property tagREADYSTATE ReadyState () {get}
  RegisterAsBrowser Property bool RegisterAsBrowser () {get} {set}
  RegisterAsDropTarget Property bool RegisterAsDropTarget () {get} {set}
  Resizable Property bool Resizable () {get} {set}
  Silent Property bool Silent () {get} {set}
  StatusBar Property bool StatusBar () {get} {set}
  StatusText Property string StatusText () {get} {set}
  TheaterMode Property bool TheaterMode () {get} {set}
  ToolBar Property int ToolBar () {get} {set}
  Top Property int Top () {get} {set}
  TopLevelContainer Property bool TopLevelContainer () {get}
  Type Property string Type () {get}
  Visible Property bool Visible () {get} {set}
  Width Property int Width () {get} {set}

you may get a “Old format or invalid type library” error when automating Excel. the problem is in your Windows regional settings. If the client computer runs the English version of Excel and the locale for the current user is configured for a language other than English, Excel will try to locate the language pack for the configured language. If the language pack is not found, the error is reported. To fix this problem, it is best to to install the multilingual user interface pack. (See Microsoft Knowledge Base article http://support.microsoft.com/default.aspx?scid=kb;en-us;320369 )

Further reading.

For COM automation of office applications, the internet is a frustrating source of wisdom and insight. There are, however, some very good PowerShell books by people who know PowerShell very well, with chapters on COM automation.

  • Pro Windows PowerShell  by Hristo Deshev
  • Windows PowerShell in action by Bruce Payette
  • Windows PowerShell Cookbook by Lee Holmes

Because it is reasonably searchable, the Technet script repository is a good source of ideas, though the quality of script will vary!  There are some other archives with very good COM scripts to Office applications in them, but this is the first port of call.