A Gentle Introduction to .NET Code Generation

Code-generation has been used throughout the age of the digital computer. The use of code to generate code might, at first glance, seem an odd thing to want to do, but the technique is alive and well, and is widely used in .NET.  Nick Harrison explains, and introduces the CodeDom...

Why Generate Code?

I have a dream that one day the computer will write its own code.  Sure, this may be farfetched but why not?

There are however certain types of code that are more sensibly generated by the computer.  We should be able to explain what the code should look like, how it should be written, but leave its’ generation to the computer.  Code that is tedious and error prone is better left to computers that thrive on tedious tasks.  Code that requires creative innovative thought is better left to creative programmers.  By letting the computer handle the tedious code, we are free to focus on the more creative aspects of coding.

There is also a great deal of code where there aren’t yet any established best practices.  This is also typically very tedious code, but worse still, this code may have to be updated to reflect new best practices.  Several examples quickly come to mind, such as  Data access logic, Web Service wrappers, and XML proxies.  They all have rather tedious implementations that easily conform to a well-defined pattern.  This makes them ideal candidates to be generated instead of hand written.

Code Generation: an Overview

In the DotNet world, there are two basic schools of thought for generating code.  One is template driven.  The other relies on the objects in the CodeDom namespace.  Each approach has its own strengths and weaknesses.

Products like Tier Builder, Code Smith, My Generation etc are based on the template approach.  This is similar to XSLT.  You pass metadata about the code that you want to generate through your templates and the various engines will produce code based on the template.  You end up having to learn a new language for the templates, but the templates make it relatively easy to describe the code that you want to have generated

Tools like the WSDL code generator that produces web server proxies and strongly typed data sets rely on the objects in the CodeDom namespace to generate their code.  The biggest advantage with the CodeDom is its language independence.  Once you have expressed your program logic in terms of the CodeDom objects, CodeDom can output code in any DotNet language. 

Regardless of the approach you chose, there is a golden rule that must always be followed when dealing with generated code:

Never directly edit generated code.  Modify a derived class or modify a separate file in a partial class

The rest of this article will focus on the CodeDom approach to code generation

CodeDom: an Overview

The CodeDom namespace includes objects to represent nearly every language construction in a language-independent fashion; I say nearly every language construct because there are key areas missing.  We will go over these areas a little later. 

To use the CodeDom, we need to create a language tree populated by these objects.  Each DotNet language provides a CodeProvider, which can generate code from this language tree.

The process of creating such a language tree is similar to diagramming sentences from grammar class.   Do you remember diagramming sentences?

728-AGent4.gif

Every word in a sentence was assigned its part of speech and placed appropriately on a diagram.  I loved these exercises.  CodeDom takes you through a similar exercise mapping the components of a language structure back to CodeDom objects.

Some statements are easily to map:

 This statement easily maps back to a simple CodeAssignStatement. The Left is a simple CodeVariableReferenceExpression and the Right is a CodePrimitiveExpression.

Other statements may be more difficult to pull apart:

This statement is also a CodeAssignStatement.  The Left is still a simple CodeVariableReferenceExpression, but the Right gets a bit more complicated.

Here we have CodeMethodInvokeExpressionDateTime.Parse is a straightforward CodeMethodInvokeExpression, but the “Parameters” is a little harder to pull apart.  We have only one parameter:

This is a CodeMethodInvokeExpression.  This time the “Parameters” is simple.  There are none, but the CodeMethodReferenceExpression has some complications.   The TargetObject property for the CodeMethodReferenceExpression is more complicated than we typically see:

The TargetObject is a CodeArrayIndexerExpression.  The TargetObject for the indexer is a CodeVariableReferenceExpression to the viewState variable.  The Indices will also be a CodeVariableReferenceExpression, this time to the index variable.

Pulling it all together, the CodeDom to produce a statement like:

Might look like this:

Why Use the CodeDom?

May people may look at the above code sample and conclude that it looks like too much work.  This is rather verbose taking 12 lines of code to generate one line.  From the outside, it hardly looks like a good return on investment.

But there are some great benefits making the steep learning curve worth the effort.

Breaking your logic up into its base parts is a great way to learn about what you are writing.  The same way that diagramming sentences in school helped you understand English grammar, expressing your logic in terms of CodeDom objects helps you understand the logic you write.  You become more attuned to duplicated code.  You gain greater insight into the patterns in your code.

Sometimes the code that you want to generate cannot be expressed in a template.  This may require you to use CodeDom.  Consider a code generator to build regular expressions based on the format of a fixed length record.  Such a code generator is rather easy to specify in CodeDom but surprisingly difficult as a template.

CodeDom is also a self contained solution.  With template code generation, you need to version the templates, and the template engine as well as your metadata to be able to regenerate the code when needed.  Because CodeDom is self contained this versioning is simplified

Finally the CodeDom is language independent.  Once you have defined your logic in terms of a CodeDom graph, you can then output your logic in any DotNet language.

What’s wrong with the CodeDom?

All of this is not to say that there is nothing with the CodeDom.  There are many critics and they raise some valid concerns.

We have already seen that the code needed can be very verbose.  This scares many people away, but   there are parsers and libraries to help lessen this impact.  Refly is one such library. Refly lets you operate at a slightly higher level of abstraction and can tremendously lessen the learning effort.

There are certain language constructs that are missing.  Some of these language constructs you could arguably say should never be used in the first place, others are truly frustrating.  There is klutzy at best syntax for a expressing foreach type loop. 

There are a handful of operators that are annoyingly missing.  Most of the Binary Bit operators are missing, (LeftShift (<<), RightShift (>>),UnsignedRightShift (>>>),ExclusiveOr (^))  All Unary operators are missing as well.  So much for ++ or –.

The missing operators are annoying, but this can worked around fairly easily by calling a helper method in a library class.

One annoyance that I have yet to find a reasonable workaround for is that you cannot attach custom attributes everywhere that you would like.  Most problematic is not being able to attach custom attributes to the get and set of a property.  I want to attach attributes to generated properties directing the debugging to not step into them.

It used to be said that the Laser was a tool looking for a use.  Many people feel the same way about Code Generation in general.  Many developers may view a code generator as a threat to their job.  Others have had bad experiences with code generation in the past and are now apprehensive about using one.

Code generators will not take work away from developers.  Every compiler is essentially a code generator keeping us from having to write in binary.  Effective use of code generation frees developers from getting bogged down in tedious error prone code and allows us to focus on more innovative ways to solve real world problems.  Code generation done properly will not box you into a corner where you are not able to make any changes to the system.  Code generation done properly should never force you to maintain “ugly” generated code.  Code generation done properly should reinforce the need for solid object oriented design, not detract from it.

The key to effective code generation is the golden rule mentioned earlier:

Never directly edit generated code.  Modify a derived class or modify a separate file in a partial class

This requires effective object oriented design.  Not modifying the generated code means that we really don’t care about what the code looks like.  We should never look at it.  If we want to change what the generated code does, we need to change the metadata that was used to generate the code or modify a derived class.  In fact, I will often take the generation a step further and compile the generated code so that there is no opportunity to modify it.  It can be viewed only through tools like Reflector.

That being said, the language providers in general produce pretty readable well formatted code.

Basic Tricks

Almost any code generation exercise will employ some common tasks.  Let’s  review how to create the skeleton for a class, how to create the LanguageProvider, , and how to render generated code.

Class Skeletons out of the Closet

At a high level, a class skeleton can be thought of as:

ü  NameSpace

§  Type

o   Properties

o   Methods

Note the CodeNameSpaceImport objects.  We simply list out the namespaces that we want to use or import for VB.  Note that we only give the name of the namespace.   The individual languages handle getting  the syntax right.

The BaseTypes is a collection even though all DotNet languages will support only a single base class.  The first entry in the collection will be the base class.  The subsequent entries will be the interfaces implemented.   VB and CSharp have different syntax for specifying the base class and any interfaces that an object is implementing.  The VB language provider will require that the first entry will be the base class. This causes problems because VB uses different syntax for the base class and the interface.   If you are implenting an interface, you should specify a base class just in case the code needs to be rendered in VB.  If you are implementing an interface and do not have a natural base class, explicitly specify System.Object. With this simple work around, VB and CSharp will both generate correct code.

The logic for the method will be housed in the CodeStementCollection exposed as Statements.  Any logic not expressly dependent on the metadata should be defined in a base class.

Properties represent one of the most annoying problems that I have with the CodeDom.  You cannot attach custom attributes to the Get and Set methods of a property.   I would like to attach DebuggerHidden attributes here.

Putting it all together

We pull it all together by adding out generating type to the Types collection of the namespace, and adding out generated Method and Property to the Type.

Where’s My Provider

Every DotNet language defines a language provider that gives us access to the Generator and Compiler for that language.  Each of these derives from System.CodeDom.Compiler.CodeDomProvider.

You can easily specify your favorite DotNet language here.

Rendering the Fruits of Our Labor

Once we have the appropriate provider, we are ready to generate code.  My preferred approach is to attach a StringBuilder to StringWriter and pass that into the Generate method.   Once the code is generated, we can retrieve the generated code from the StringBuilder.  You could also use a StreamWriter to write the code directly to a file on disk.

Output

With such a generator, our skeleton class will look like this:

For VB:

For C#:

This skeleton is the pattern that I often follow while generating code.

You can use this skeleton, adding properties for every column in a table and you will have the beginnings of a very useful Business Entity Generator.

Conclusion

Code generation takes us the next step to being able to describe to the compute what the code should look like and letting the computer write its own code.  In essence, every compiler is a code generator generating machine code.

The Code Generation is a powerful technique allowing us to focus on more interesting, exciting components, and allowing the computer to handle the tedious details.  With DotNet, you have two options for generating your code.  Depending on your needs, template driven generation may be ideal, or the CodeDom may provide the better solution.  This is just another tool in your toolbox.