LINQ Lycanthropy: Transformations into LINQ

LINQ is one of the few technologies that you can start to use without a lot of preliminary learning. Also, it lends itself to learning by trying out examples. With Michael's help, you can watch as your conventional C# code changes to ravenous LINQ before your very eyes.

Contents

1210-warewolf.jpg
A SQL Server Database is surprised by an
unexpected LINQ query

Much has been written about the benefits of LINQ, and on the features of LINQ. Either focus is valid, to be sure, but to someone new to the technology, LINQ can still present quite a formidable initiation. But LINQ is one of those rare commodities that can both be learned incrementally and applied incrementally. Consider, for example, learning C#. You have to understand many different concepts and learn a good part of the language’s syntax before you can even write the quintessential “Hello, World!” program. With LINQ, on the other hand, you can learn just a little bit about something you need to use now and then immediately apply just that little bit.[1] In this article I want to illustrate this incremental nature of learning LINQ by showing how to convert conventional code to LINQ code.[2]

A Brief LINQ Introduction

LINQ, short for Language Integrated Query, was introduced with the .NET 3.5 framework. MSDN describes its purpose succinctly:

“Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support. Furthermore, you have to learn a different query language for each type of data source: SQL databases, XML documents, various Web services, and so on. LINQ makes a query a first-class language construct in C# and Visual Basic. You write queries against strongly typed collections of objects by using language keywords and familiar operators.”

The main advantages of using LINQ are that:

  • Code is more concise and readable, especially when using multiple filters.
  • You can filter, order, and group results with little coding.
  • You can easily port a LINQ query from one data source to another with little or no modification.

LINQ Operations

With LINQ, you can filter data to retrieve a subset, transform a sequence of elements to new forms, perform aggregate operations (average, max, count, etc.), combine data from multiple sources, and more. Table 1 summarizes these operations:

1210-Michael1.jpg

Table 1 LINQ Operations

LINQ Providers

Traditionally, most of the operations above are done on a database. While LINQ of course supports the database domain, the power of LINQ is that it can be used in the very same fashion when accessing data from a variety of domains, via appropriate LINQ providers. The LINQ providers supplied “out of the box” from Microsoft include:

There are a plethora of others created by independent software developers. This list comes mostly from Charlie Calvert’s blog entry Link to Everything: A List of LINQ Providers with additional entries from Wikipedia’s list of LINQ providers (each source supplies hyperlinks for each provider in their respective list):

  • Active Directory
  • Amazon
  • Bindable Sources (SyncLINQ)
  • C# project
  • Continuous Data (CLinq)
  • CRM
  • CSV
  • Excel
  • Expressions (MetaLinq)
  • Flickr
  • Geo (Geospatial Data)
  • Google
  • Indexes (LINQ & i40)
  • IQueryable
  • JavaScript
  • JSON
  • LDAP
  • LLBLGen Pro
  • Lucene
  • MAPI
  • Metaweb
  • MySQL, Oracle, PostgreSql
  • NCover
  • NHibernate
  • Opf3
  • Parallel (PLINQ)
  • RDF Files
  • Sharepoint
  • SimpleDB
  • Streams
  • System Search
  • Twitter
  • WebQueries
  • Wikipedia
  • WIQL
  • WMI
  • XSD
  • XtraGrid

The Two Faces of LINQ

One of the first points of confusion about learning LINQ is that it looks like two different languages. Rather, it is a single language with two different syntaxes: query syntax and method syntax. Query syntax looks much like a SQL query though with the clauses in a different order. The from … where … orderby … select sequence is the canonical example of query syntax:

Here is the identical LINQ statement written with method syntax. This is also called lambda syntax because these LINQ extension methods take a lambda expression[3] as an argument.

The two types of syntax available for LINQ queries are exactly equivalent (where they overlap) so you can chose whichever you prefer. The performance question is moot; Their performance is the same because the first step of compilation will convert expressions written in either query syntax into lambda syntax. Depending on the operation, either syntax could produce code that is significantly shorter and clearer:; and you can even mix the two syntaxes if you like! The lambda syntax is much richer, particularly in C#, so if you start with query syntax, but then need an operator that is only available in lambda syntax, you can apply that as well. MSDN’s Query Expression Syntax for Standard Query Operators shows the subset of methods that may be invoked in query syntax in both C# and Visual Basic. Another useful reference is Brad Vincent’s dual syntax cross-reference that shows how to write expressions in both query syntax and in lambda syntax side by side. Notice in the code that follows that I have mostly, but not exclusively, used method syntax.

LINQ to Objects

This article focuses on LINQ to Objects for two reasons: ease of use and universality.

In this context, any C# or Visual Basic object that implements the generic IEnumerable<T> interface may be manipulated with LINQ: collections, arrays, lists, dictionaries, or your own user-defined types. This covers a vast spectrum of the data types that involve sets of data-even more than you might think at first glance-because some types that are not IEnumerable<T> are convertible to types that are.

Whichever LINQ provider is most appropriate for your situation, you have to provide additional information about your data so that LINQ may properly understand its structure. Say, for example you wish to use LINQ to SQL to process data from a database. You need to use the Object Relational Designer in Visual Studio (or the SqlMetal command line utility) to specify the tables and fields you plan to use. Once that is done you can then specify tables and field names, with the full support of strong-typing and Intellisense from Visual Studio. LINQ to Objects still has the same prerequisite but requires no additional work on your part. The reason is obvious: if you have objects you wish to manipulate, their structure is-by definition-already known!

Declarative vs. Imperative Design

To quote MSDN again:

“In a basic sense, LINQ to Objects represents a new approach to collections. In the old way, you had to write complex foreach loops that specified how to retrieve data from a collection. In the LINQ approach, you write declarative code that describes what you want to retrieve.”

This is a remarkable achievement: it adds a layer of abstraction in your code where you can now describe what you want rather than how to go about it. Higher levels of abstraction lead to easier to read-and therefore more maintainable-code. Think of assembly language compared to C#, for example.

Conventional C# code is imperative. You enumerate specific steps for arriving at a goal. With LINQ you can specify what you want to attain without having to specify the detailed steps. The book Essential LINQ, by Charlie Calvert and Dinesh Kulkarni, provides a great discussion of declarative vs. imperative code. The relevant section of the book is even available online at Declarative: Not How, But What.

The Transformation Target

I mentioned that LINQ is a technology that may be incorporated into your environment in an incremental basis. In the remainder of this article I walk through an example where I do just that. The code comes straight out of an application called HostSwitcher that I built and discussed at length in my recent articles, Creating Tray Applications in .NET: A Practical Guide and LINQ Secrets Revealed: Chaining and Debugging. In a nutshell, HostSwitcher lets you re-route entries in your hosts file with a single click on the context menu attached to its icon in the system tray. The application is heavily LINQ-centric. You can download both the source code and a prepackaged installer for the application from the first article. Here I will just show a portion of the code, focusing on the CreateMap method from the HostManager class.

First Pass: Pseudo-Code

The CreateMap method converts the hosts file into a data dictionary. There are just three steps:

(1) Decorate each hosts file line with details of its referenced project and server group, if present, discarding any lines that do not include such references.

1210-Michael2.jpg

Figure 1 Host file lines decorated with project and server groups from their meta-comments

This visualization of the data plainly reveals the syntax introduced to specify projects and server groups embedded within trailing comments in the hosts file.

(2) Create a dictionary entry for each project (unless one already exists), where the dictionary key is the project name and the dictionary value is a list of server groups associated with that project.

1210-Michael3.jpg

Figure 2 Dictionary generated from the hosts file data but sans counts

Here I include each server group just once even if it appears on multiple host lines. The goal of this step is to define the hierarchy between the project groups and the server groups independently of the number of times they occur.

(3) For each server group in each project, decorate it with a count of enabled and disabled lines.

1210-Michael4.jpg

Figure 3 Dictionary with the count fields populated

In this final step, I now turn my attention to the repeated entries for each server group. Looking back at the first step, observe that some host lines actually have two comment markers, one at the start of the line commenting out the entire entry and a second in the middle that redundantly comments out the meta-notation used by HostSwitcher. Thus, by adding or removing a comment marker at the start of a line you can enable or disable the host entry without uncommenting the meta-comment specifying the project and server group.

Second Pass: Imperative Code

The input to the CreateMap method is the contents of the system hosts file stored in a list; the output is a map between project names and server groups:

Converting the pseudo-code above to imperative code is straightforward. Here again are the three steps:

(1) Decorate each hosts file line with details of its referenced project and server group, if present, discarding any lines that do not include such references.

A regular expression determines whether a line contains a project and server group. Ones that do are added to the activeHostsFileData collection along with the associated project and the server group.

(2) Create a dictionary entry for each project (unless one already exists), where the dictionary key is the project name and the dictionary value is a list of server groups associated with that project.

Here I take into account that there may be multiple lines in the host file specifying the same server group and only add it to the list if it does not already exist.

(3) For each server group in each project, decorate it with a count of enabled and a count of disabled lines.

This triply nested loop iterates through the projects then through the server groups in the dictionary created in step 2 in order to update each such server group. The innermost loop iterates through the collection from step 1 checking for whether each line is commented (disabled) or uncommented (enabled).

The Journey to LINQ

The code archive accompanying this article includes a series of files with the .linq extension, indicating they are LINQPad files (rather than Visual Studio files). LINQPad is a great sandbox and lightweight IDE for experimenting and exploring your code, making it an ideal accompaniment here. [4]

What Can You Gain From LINQ?

The code above resides in HostSwitcher-initial.linq. There is a succession of file suffixes (A through E) that take the above code from pure imperative code to pure LINQ code. The final, completely LINQ version resides in HostSwitcher-final.linq. Table 2 attempts to show the improvement quantitatively. The second column shows the progress, going from 0% LINQ to 100%. The description column summarizes the changes from the preceding file. The lines of code is an imprecise measure of code complexity, yet reducing the line count in half certainly supports the conclusion that LINQ produces more concise code in this example. The loop count column is another simple measure of complexity. I use an abbreviated notation to show the number of singly-, doubly-, and triply-nested loops in the file. Again, the general trend scanning down the table is toward lower complexity. The final two columns indicate the number of supporting data classes and methods used; I include those here to show that you sometimes have to take a step back to move forward (i.e. B and C actually show increased complexity by that measure).

1210-Michael5.jpg

Table 2 Code Improvement due to “LINQ-ification”

Subsequent sections provide guidance on how I transformed this code to LINQ step by step.

Conversion Stage A

The most obvious place to start to think about applying LINQ is the triply-nested loop in step 3. Instead of iterating through the collection of tagged lines, start with the collection itself (activeHostsFileData). Then consider what you want to do to it: in this case, restrict it to the lines for the current project and server group then see how many are commented and uncommented. The second part of that is just two further restrictions to the same common base collection. The loop plus three conditionals-certainly a chunk of code that is non-trivial-is replaced by essentially two straightforward assignment statements. (The third line could be incorporated in the other two but it is pulled out separately to avoid duplication.) This is the code in HostSwitcher-A.linq:

Conversion Stage B

Candidate code for converting to LINQ usually involves a loop. But the code you just examined still has two more loops. So how would you apply LINQ conversion to eliminate the remaining two loops? Always start with the collection-projectDict in this case. What the two loops are really doing is operating on all the server groups in all the projects. LINQ has a special operator to flatten a two-level organization just like that into a single level. That is, we can use SelectMany to just get all the server groups obviating the need for the two loops. projectDict.Keys provides a list of all the projects. Each corresponding value is a list of the associated server groups. Thus you can say this…

…to get a flat list of all the server groups. From this collection you next want to apply the same bit of code you just did in the previous section inside the double loop. It is cumbersome to try to massage a block of code inside a LINQ chain so just put the guts of that code into a separate method:

With this Tally method defined, you must invoke it for each item in the server group collection. To do this, you actually need to go outside LINQ to the ForEach operator available on List and Array types. This operator lets you generate side effects, i.e. invoking other methods easily. One minor compatibility issue-since ForEach is only available for collections of type List<T> while the lingua franca of LINQ is IEnumerable<T>, you must use LINQ’s ToList operator to convert from one to the other. Now you have all the pieces to generate the fairly simple code to replace all the loops; this code is in HostSwitcher-B.linq:

Conversion Stage C

For the next conversion, continue going upward in the original CreateMap method, now focusing on the inner of the two nested loops in step 2:

The above code checks whether the current server group is already in the list of server groups and, if not, adds it. All of that preparatory code can be replaced by a simple LINQ expression, using the Any operator to do the very same check without having to specify how to do it. This code is in HostSwitcher-C.linq:

Conversion Stage D

Continue toward the top of the original CreateMap method to step 1. You can eliminate a loop and a conditional with the most typical of LINQ forms, a from … where … select expression. Recall that the loop iterated through the collection, checked whether each line met a certain criteria, then decorated the line and added it to a new collection. With LINQ, again start with the collection. Then just restrict it to the lines of interest and project it into the new collection. The let operator is a handy LINQ mechanism for a temporary variable in the middle of the LINQ chain.

You will find the code for this in HostSwitcher-D.linq. This code also includes a retrograde step: The double loop in step 3, last seen in HostSwitcher-A.linq, is back. I did this primarily because it is a better lead-in to the next stage. But notice the trade-off indicated back in Table 2. By removing LINQ you can eliminate the HostDetail support class (really just a multiple-field container) and the Tally support method. So neither implementation is decidedly a better piece of code.

Conversion Stage E

For this penultimate stage, the goal is to reduce the complexity of having one portion of code to build the dictionary (step 2) and a second portion to populate the enabled/disabled counts within each dictionary entry (step 3). The code below (from HostSwitcher-E.linq) uses a single loop to go through the list of relevant host file lines. But this list (in activeHostsFileData) is different this time. Suppose that instead of each entry in the source list reflecting a single line in the hosts file, it now reflects all the information for a server group, however many lines that entails. Given that, you can then directly create a server group with its EnabledCount and DisabledCount and add that to the dictionary entry for the appropriate project; no need to go back through the data to find the counts and associate them with the right groups.

The other portion of this stage, then, is to create the revised activeHostsFileData collection. Compare the code here to that in stage D. The project and server group definitions have been moved out of the select clause and into temporary variables earlier on. This lets you perform a grouping operation to get the aggregated data required. The final projection (the select expression) now contains all the lines in the group.

Final Conversion

If you look at the final code (HostSwitcher-final.linq) it is much more challenging to map it back to determine what changed from the previous stage. Rather than say you need to take a leap of faith :-), what I did at this point was start over with a much clearer picture in my head of what steps were involved to transform the lines in the hosts file into decorated dictionary entries. I outline my thought process below.

Start aggressively by stating that you want to do something to produce a dictionary from scratch:

You should know what comes next-start with the collection:

You need to add some information to each line-the regular expression match containing information about the project and server group:

With the regular expression in hand it is trivial to restrict the collection to the relevant lines:

Now add on the other bits of information needed, the project and server group, and discard the regular expression match, which is no longer needed, by projecting the elements in the collection to a new form once again:

At this point, you have a collection where each element is relevant and each element has everything needed to create the dictionary. In fact, you have already seen this collection-refer back to Figure 1. The goal is to create the finished dictionary in Figure 3.

The easiest way to create a dictionary is to use LINQ’s native support for it, in the guise of the ToDictionary operator. To do this you need to specify a key and a value for each entry in the dictionary. The key is just the project name, as you have seen. The value is a list of server groups. Here you need to introduce some groups. First, group the input lines by common project and server group:

This results in groups where the key is the unique project and server group combination and the group contents is the list of lines that have that combination. Figure 4, left side, shows the first three of five such groups. This grouping does not quite yield what you need yet, so group again:

That statement groups by projects alone because that is what you need for the dictionary keys. Note the special Key property available for a group that lets you access the project values. There are only two distinct projects in the sample data, thus this results in just two groups. Figure 4, right side, shows the first of these. Observe how the two levels of grouping are arranged.

1210-Michael6.jpg

Figure 4 Intermediate data in the LINQ chain: at left, after the first GroupBy and at right, after the second GroupBy

With these groupings in place, you now have the material needed to pass to the ToDictionary method:

ToDictionary takes two parameters, a lambda expression for the key and for the value. The key is just the project, which is available in the Key property of the latest group. To generate the value, you reference the group data-remember that this is a collection. So the Select operator is going to generate a list of elements, one for each member of the collection. Each element is a ServerGroup because of the new ServerGroup statement. A ServerGroup, as you have seen, has three properties to specify. The Name comes from the inner group’s Key property. Finally, the EnabledCount and DisabledCount come from processing the inner group’s collection.

Load this finished code (HostSwitcher-final.linq) in LINQPad to see how it works. Between each pair of LINQ operators is a Dump() call that is commented out. If you uncomment any or all of those you can see the intermediate data from start to finish. And, for completeness, here is the CreateMap method composed of the one line of LINQ code that you just dissected:

Tools to Help Learn LINQ

By this point you should have some feel for turning code into LINQ. You have seen that there are many different scenarios to consider and thus LINQ operators to apply; getting a good grasp of them can still take some effort. Here are a couple tips to give you a boost in that endeavor.

Resharper Converts Your Code to LINQ

Well known and respected for its refactoring capability, code quality analysis, navigational aids, and more, the Visual Studio power tool Resharper also has a special ability with respect to LINQ: it can automatically convert a loop into a LINQ expression! Just like other refactorings and suggested code improvements, conversion to LINQ appears on Resharper’s list of actions in the appropriate context. To find such candidate code, first direct Resharper to analyze your solution or project (Resharper >> Inspect >> Code Issues in Current Project). This will populate a Visual Studio window entitled Inspection Results. Press the Filter button (Figure 5, top) to open the Filter Issues dialog. In the dialog, press the Uncheck All button to deselect all choices then select the two LINQ-specific choices under Language Usage Opportunities (Figure 5, bottom). Close out the dialog and now the Inspection Results window lists all and only the LINQ-convertible loops. As with any find or inspection window in Visual Studio, just double click on any leaf node in the issues tree to open that line in the editor.

1210-Michael7.jpg

Figure 5 Using Resharper to Find Candidate Code for LINQ Conversion

Resharper does one more trick: it can convert an expression using LINQ query syntax to lambda syntax! Unfortunately, these code candidates do not appear in the inspection window. So though you cannot jump to these pieces of code, it is fairly easy to recognize a LINQ query expression, since it looks quite different than other bits of code. Once you land on it, the choice to convert to lambda syntax will appear on the context menu.

There is a tiny Visual Studio solution included in the code archive accompanying this article that gives you an example where you can try using Resharper to convert a plain loop to LINQ, and another where you can convert a LINQ expression written with query syntax into method syntax.

LINQPad Shows the SQL generated by your LINQ

Hearkening back to LINQPad, it has a neat ability when you are working with LINQ to SQL. Once you write a LINQ query and execute it, the output window in LINQPad offers a few buttons to show some alternate output. One is just labeled SQL. Pressing that reveals the SQL code that is generated from your LINQ query. This lets you see how LINQ queries map back to pure SQL queries. (Too bad it cannot go the other direction!)

Conclusion

One of the best ways to learn about a technology is by example-by reading other people’s code, in this case. The chances are that you could typically do this quite effectively on your own, but my goal with this article was to make the experience even more productive for you: to provide not just code to read but a guide on what to take away from it. LINQ is a powerful technology that, as I stated earlier, lends itself well towards incremental use. I have presented a cross-section of LINQ operators to aid your creative juices; as with most technologies, though, it just scratches the surface. But with this introduction you should have a good foundation for further LINQ exploration.

Footnotes

[1] Comparing learning LINQ to learning C# is unfair: LINQ is just one element of a language that you presumably already know, while learning the C# language encompasses a more fundamental basket of concepts. A more apt comparison might be LINQ vs. WPF. Both are “add ons” to C# but before you can write a line of code in WPF you need to learn a great deal, indeed!

[2] Hence my whimsical title of LINQ Lycanthropy for this article: you get to watch live, before your very eyes, as the code changes from plain ol’ C# code to LINQ!

[3] A lambda expression is simply an anonymous function. Thus, the lambda expression.

is equivalent to this conventional method:

Lambda expressions provide a handy shortcut syntax for simple methods: not only is the method itself reduced but you eliminate a separate method call and method definition.

[4]LINQPad is a free utility created by Joe Albahari. It supports C#, VB, F#, and SQL, and lets you test and experiment with code rapidly, without the overhead of Visual Studio projects. You can execute not just programs but one or more individual statements or even just an expression. My recent article LINQ Secrets Revealed: Chaining and Debugging provides a great introduction to using LINQPad and even bringing its great data visualization capabilities back into Visual Studio.