Using LINQ Lambda Expressions to Design Customizable Generic Components

LINQ makes code easier to write and maintain by abstracting the data source. It provides a uniform way to handle widely diverse data structures within an application. LINQ's Lambda syntax is clever enough even to allow you to create generic building blocks with hooks into which you can inject arbitrary functions. Michael Sorens explains, and demonstrates with examples.

This article continues my occasional meanderings into dissecting and examining what is essentially a single line of code, while at the same time providing a wealth of context to illustrate how and why to apply the techniques from that line of code in a real application[1].

In this case, while working on an application that needed a component for doing file selection by user-specified masks, I came upon a thread at StackOverflow with a very basic implementation. I added my own contribution and spruced it up in the forum thread, but I was not satisfied. So I went further and packaged it into a user control to perform the file masking – a component to drag into Visual Studio and hook up with just a couple lines of code.

I had a unique requirement, though: besides allowing the user to enter one or more masks (e.g. “User*Items”, “*.sql”, etc.) I also wanted to only select files satisfying additional, arbitrary constraints (in my case, that each file exist in two separate directories).  To allow this control to be completely generic and have additional customizable constraints, I introduced a callback mechanism, which is commonly done by passing an object that implements a particular interface. With LINQ, though, one can do the same thing with just a bit more panache and with a simpler, cleaner result than the more traditional approach.

I’ll start with a brief digression into some key LINQ concepts, then present and discuss my one line of code, which uses a doubly-nested LINQ construct plus the lambda expression callback mechanism. This makes for quite a springboard for discussing how a completely generic implementation becomes highly customized by how you choose to use it. The final third or so of the article will take you from theory to practice, illustrating how to hook up this user control, and provides two demo applications to guide you.

A Very Brief Introduction to LINQ

Language Integrated Query, or LINQ, is a Microsoft technology introduced in the .NET 3.5 framework. LINQ lets you “…write queries against strongly typed collections of objects.” That is, it lets you treat a variety of data – not just databases – as first class data sources. LINQ comes in several flavors to allow you to talk to diverse data sources, including databases, XML streams, and even objects in your program (lists, arrays, dictionaries, etc.). LINQ provides several powerful advantages over more traditional approaches: more concise code; filter, group, and ordering capabilities with little coding; portability due to a common query language for all data sources; and strong typing.

A LINQ query operation consists of just three steps: obtain a data source, create a query, and execute the query. This canonical example from Microsoft’s Introduction to LINQ Queries page clearly illustrates the three steps:

In this example, the data source is simply an array of integers, and the query – with its use of keywords like from, where, and select – resembles a SQL query, though the ordering of clauses is different. It is significant to note that defining the query is separate from its execution. Not until the foreach loop does the data source return any results (which in this case is just a list of the even numbers from the array).

Examples without Overhead

If you like to try the examples as you encounter them, here is a tip to save you from the overhead of creating a Visual Studio project and all it entails:

  • Download LINQPad, a LINQ and C# sandbox/IDE to evaluate expressions, statements, or entire programs. It includes powerful output visualization, particularly useful with LINQ.
  • Instead of using Console.Write calls, use the LINQPad Dump method.
  • To run an entire program – as with the code in Listing 1 – simply strip off the class brackets and paste everything else into LINQPad. Here is the earlier example trimmed down for LINQPad. Set the Language selector in LINQPad to C# Program (rather than the default C# Expression) and you are immediately ready to execute:

Two Faces of LINQ

The query in the above example is written in a notation called query syntax (I suppose because it’s reminiscent of a true SQL query), but there is a second notational form called lambda syntax. The two forms are exactly equivalent (where they overlap), and performance is also exactly the same because, during compilation, query syntax expressions are converted to lambda syntax internally. However, the lambda syntax is richer, particularly in C#.

See Query Expression Syntax for Standard Query Operators for a Microsoft reference chart that shows the subset of methods that may be invoked in query syntax in both C# and Visual Basic. Think of that as the theoretical reference, while Brad Vincent turns that information into the applied reference with his quick reference chart, showing side-by-side examples of query syntax and lambda syntax for comparable operations. Another useful Microsoft reference page is Query Syntax vs. Method Syntax (method syntax is another name for lambda syntax).

Here is the LINQ query from the previous example shown side-by-side in query syntax and in lambda syntax, matching up elements from one to the other:

Query Syntax

Lambda Syntax

var evenNumQuery =
    from num in numbers
    where (num % 2) == 0
    select num;

var evenNumQuery =
    numbers
    .Where(num => (num % 2) == 0)
    .Select(num => num);

Table 1. A comparison of code in equivalent Query and Lambda Syntax

However, if an identity transformation is all that is needed in the Select() method, it may be safely omitted, shortening the lambda syntax in this example to just this:

When should you use query syntax vs. lambda syntax? It is largely a matter of preference. Lambda syntax is often more concise and so makes for more readable code, but multiple joins are much cleaner with query syntax. Lambda syntax also provides a superset over query syntax, so some tasks require lambda syntax. If you’re feeling adventurous, you can even mix the two syntaxes, but this should be done judiciously (e.g. adding a Single() to the end of an expression using query syntax).

Lambda Expressions

The second form of LINQ syntax is called lambda syntax because of its use of lambda expressions. Though it sounds intimidating the first time you hear about it, a lambda expression is simply a function written in a peculiar syntax to facilitate writing it as an expression. This allows it to be placed into a larger expression, making it well-suited for functional programming. In the example above, the first lambda expression is:

The equals-greater than token is supposed to visually represent an arrow, and may be read as “goes to”, “maps to”, “becomes” or “such that”; basically, this arrow separates inputs from outputs. Here we have a single input, num, and a single output, the Boolean expression (num % 2) == 0. Lambda expressions may have one or more inputs, but they always have a single output. This lambda expression states that you want a num such that num is even, since only even numbers modulo 2 are equal to zero. You’re using a lambda expression when calling a method that takes a delegate as a parameter, and the Where() and Select() methods used above are two such methods. (They are just a couple of examples of the many LINQ extension methods that take delegates.)

Distilling Delegates Down to Lambda Expressions

The example used so far shows how to supply a lambda expression, so this section will discuss how to consume it. The code sample below shows the use of delegates distilled to its simplest case. First you define a signature, in this case a method that takes an integer argument and returns an integer result, performing some mapping on it. Next, you create a method that conforms to that signature, in this case one that squares the integer value (SquareIt). Then set up an instance of the delegate to point to the SquareIt method so that when you call the delegate you are actually calling the referenced method; passing an input value of 5, for example, yields an output value of 25.

With the advent of C# 2.0 you could condense the code by using an anonymous delegate right in place:

And with C# 3.0 you could replace the anonymous delegate with a lambda expression – recall that a lambda expression may be used wherever a delegate may be used – yielding a much more concise and arguably clearer bit of code:

(Thanks to Eric White for his blog entry on lambda expressions, from which I adapted the above syntax progression.)

The code has shrunk considerably, but you can do better still! In this latest rendition, there are two separate chunks of code: the delegate declaration and the delegate usage. A key reason for using anonymous delegates in the first place is to have it all right there, without needing to create any other pieces and then find useful places to store them. The .NET framework thoughtfully provides a set of built-in generic delegates to help you do just that. There is a set of generic delegates for up to four arguments, both for functions that return a value (Func<…>) and functions that do not (Action<…>). Here is the previous, concise code snippet made even more concise by using the built-in Func delegate that takes one argument and returns a value:

In this example, the input is an int and so is the return value, so which is which? By convention, the last element in the list is always the type of the return value. If the delegate returned a string instead, the type declaration above would have been Func<int, string> myDelegate… . Here is a list of the built-in function delegate declarations:

The Action delegates are analogous but without return values, and as long as you have no more than four arguments, you can use those to save yourself from the explicit creation of a delegate. See The Built-In Generic Delegate Declarations for more details, and for reference, the Func<T1, Result> delegate (just to pick the one used in the example) is documented here on MSDN.

If you are creating an application or libraries for your own use, I would always recommend using the built-in delegates, but I have found a circumstance where it was better not to use them – in my open-source libraries. These libraries are for general consumption, and to make them as broadly applicable as possible, I decided when .NET 3.5 was very new to compile the libraries to .NET 2.0 using the LINQBridge library[2]. I still think that there is enough corporate inertia behind .NET 2.0, so I continue to support it.

If I had used the built-in delegates in my libraries, then when you came along and wanted to use them, Visual Studio would have required you to add a reference to the LINQBridge.dll in your project, whether you compiled under .NET 2.0 or 3.5. I consider that unacceptable, as it should be transparent to you, the application developer. Defining my own delegates was the way around this; your application will still pull in LINQBridge.dll when it compiles (along with several other DLLs that are dependencies in my libraries) but it will be totally transparent to you.

For further study, you may wish to peruse Anonymous Methods or Lambda Expressions in Microsoft’s C# Programming Guide.

An Interesting Line of Code

Now that you can understand lambda expressions and how to use delegates, the one line of code for file masking should be straightforward (Listing 1), and can execute immediately in LINQPad if you want to try it. The declarations at the top include the familiar delegate declaration and definition, and the Main method contains the one line of code to be discussed. The sample concludes with a support method (FitsMask) that returns a Boolean value indicating whether the supplied file name qualifies for inclusion based on the supplied file mask.

Listing 1. File Mask Demo Program

Figure 1 shows zooms in on the key portion of code, highlighting the doubly-nested LINQ queries for clarity. There are a number of points to notice here:

  • The outer LINQ query uses query syntax while the inner query uses lambda syntax. Mixing them with these simple constructs does not add confusion or complexity. The inner query must use lambda syntax, since the Any extension method does not have an equivalent in query syntax form. The outer query could be written either way and is about as complex in either form.
  • The inner query is the single Any extension method call; the Split in front of it just converts the input (a string stored in Mask) into an IEnumerable<string> collection, allowing you to send in multiple masks separated by any combination of the separator strings given.
  • The Any method takes a delegate argument; here it is receiving a lambda expression that answers the question: does there exist any file mask such that the base file name satisfies the mask?
  • The outer query consists of a simple from…where…orderby…select sequence.
  • The outer query returns a list (sorted by file name) of all files in the specified source directory where the file name matches at least one of the specified file masks and where the file satisfies the arbitrary, user-supplied RestrictionLambda.
  • The user-supplied RestrictionLambda is optional. If it is never defined, then the gatekeeper expression (the check for null) avoids attempting to execute it, thus avoiding a cataclysm of, well, minor proportions.

924-lambda_clip_image001.gif

Figure 1. One Line of Code

This single line of functional programming code returns a list of files from a specified directory matching one or more file masks and satisfying an additional, unspecified constraint, supplied by the calling application at runtime.

C# Injection or Dynamic C#?

You may be familiar with SQL injection, a technique that exploits an oft-found vulnerability, allowing an attacker to sneak in arbitrary SQL statements through application code to examine, or even corrupt, a system. Needless to say, SQL injection is a very bad thing. On the other hand, consider dynamic SQL (also referred to as embedded SQL), a technique where one embeds SQL statements into application code to achieve specific designs. Sounds rather similar to SQL injection, no? If used judiciously, dynamic SQL is a good thing. Used carelessly, it is indeed a well-oiled path to SQL injection.

Lambda expressions are analogous in the C# universe: they let you embed arbitrary C# code into other C# code. Unlike SQL, though, the arbitrary code is supplied at compile time – the user does not have the ability to send a string of code and thus be a potential source of attack (It works with SQL because SQL commands are still just strings at runtime). So, with C# you have the good part of injection without the bad. Typically, you have a library routine written in such a way as to accept code injection through lambda expressions, and you have your main application which invokes that library, passing in its particular code fragments to do what it wants done. Let’s get into the details of that process.

Customization with Lambda Expressions

Though the one line of code in Figure 1 contains other interesting aspects of LINQ as already discussed, the focus of this article is the innocuous call to the RestrictionLambda method highlighted in red. Listing 1 has shown this as top-level application code, but in reality the code is in a library containing a general-purpose file masking control (called, not too surprisingly, the FileMaskControl).Without the RestrictionLambda call, the code produces a list of files matching one or more supplied masks. Since the set of masks is determined by the user, the routine already allows some custom behavior in a sense, but within the narrow confines of file name patterns. Now add the RestrictionLambda call, and suddenly the generic code is highly customizable, limited only by what you can frame within the bounds of a lambda expression (which is hardly a limit at all). Figure 2 illustrates this functionality. Start with the entire contents of some selected directory (frame 1). As you move to the second frame, apply a file mask to filter the original list. The figure shows two masks applied: “Conn*” and “*Data*”. File masks allow you to only select by naming patterns. The lambda expression filter (applied in the third frame) lets you select by arbitrary criteria; in this case with the same date filter you have seen previously.

924-lambda_clip_image003.jpg

Figure 2. Visualizing The Two-Stage Filtering Process

The shaded files (and more that are scrolled off the panel) in the leftmost frame survive the file mask filter into the middle frame. The shaded files in the middle successfully pass the added restriction imposed by the lambda expression to end up in the rightmost frame.

You can define RestrictionLambda to be any lambda function that satisfies the signature required by the code; recall that the delegate declaration specifies the signature. The delegate in Listing 1, repeated below, specifies that the lambda expression must take a single input (a file name) and return a Boolean output where true indicates the file meets the specified criteria, and false indicates the file fails to meet the criteria:

In the context of the FileMaskControl, any lambda expression you specify should be relevant to some characteristic of a file. Here are just a few examples of additional restrictions (on top of those imposed by the user-specified file masks that are built-in to the control). Not surprisingly, they make copious use of  File, Path, and FileInfo methods.

Restrict files to those after a certain date

In common parlance: file f, such that the modification time of f is later than some specified date.

Include only files that are in a separate list of master files

Translation: file f, such that a file of the same name also exists in a specified directory.

Ignore files marked as deprecated

Translation: file f, such that its name does not contain the word “deprecated”, ignoring case.

Require files to be unlocked

Translation: file f, where its read-only attribute is not enabled.

Restrict files to those smaller than a certain size

Translation: file f, where its length is less than a specified size.

Putting Theory into Practice

You’ve now had a complete sample program to experiment with, plus several lambda expressions to inject into the code. But folding the concepts into a practical context or real-world application should help solidify understanding.

The code in Listing 1 comprises the bulk of the FileMaskControl available from my open source libraries[3]. The API for the control is available here.Visually, the control consists of one input and two output components (Figure 3, right-hand side). The input component is a type-in box for file masks bound to the Mask property. The output components include a list box displaying the names of the matched files bound to the FileList property, and a label displaying the count of matched files.

The left side of Figure 3 shows the quite simple class diagram for the control. Besides the constructor, it has a single public method (UpdateFileMatches) that refreshes the output components on demand. Of the seven public properties, five of them have visual correlations as indicated. The remaining two non-visual properties specify the directory to search and the familiar RestrictionLambda lambda expression.

924-lambda_clip_image005.jpg

Figure 3. The FileMask Control

The left side shows the details of the control in a class diagram, relating the appropriate parts to the visual representation of the control on the right. Using this control in your own application is quite simple:

  • Download my open source libraries[3] and unzip.
  • In Visual Studio, add the control from the CleanCode.GeneralComponents.dll library to your toolbox in the visual designer (right-click in the toolbox and select Choose Items…).
  • Drag the newly-added FileMaskControl from the toolbox onto the designer surface.
  • Wire up the control in your code as described next.

Hooking it up involves, at a minimum, setting the Mask and SourceDirectory properties. A typical starting value for the file mask is just “*.*” to see all files, and you should set the source directory to your appropriate location. Optionally, assign a lambda expression to the RestrictionLambda property:

Once you have initialized properties, tell the control to reflect your inputs with the UpdateFileMatches method when the control becomes visible (or immediately, if it is on your main form):

Accompanying this article is a Visual Studio solution that provides two simple scenarios to get you started immediately[4]. The first demo (Figure 4) combines a FileMaskControl with a directory selector to let the user set the SourceDirectory property. As it’s designed to show the fewest lines of code needed to wire up the control, this demo does not use the RestrictionLambda filter. Figure 4 shows a typical default starting mask, and because no directory has been selected, the output box on the right shows no matching files yet.

924-lambda_clip_image007.jpg

Figure 4. The Basic FileMask Demo – This demo requires just a few lines of code to hookup the directory selector to the file mask control.

The second demo adds in a lambda expression filter, allowing you to dynamically include or exclude at runtime the RestrictionLambda expression that requires files to be later than a user-selectable date (Figure 5). Here a set of multiple masks are present along with a directory, so the output list already shows some results. Additionally, the lambda expression filter is applied by checking the Enabled box and selecting a date. Change the date, or disable it with the checkbox, and the resulting list of files will immediately update – whether you see any difference obviously depends on what is in your directory, though!

924-lambda_clip_image009.jpg

Figure 5. The Customized FileMask Demo
This demo lets you interactively include or exclude a lambda expression to restrict files to a certain date range.

You have already seen the particular lambda expression used both in Listing 1 and in the list of sample lambda expressions, but here it is once again in context, showing how it uses the date supplied by the user. The code fragment also takes into account whether the user has enabled or disabled the date picker, and therefore the lambda expression. Setting the delegate to null, as was shown in Listing 1, effectively disables it. Again, that simple assignment is all you need to customize the generic FileMaskControl to your application. Apply the same technique to your own libraries to make customizable, generic components.

For completeness, Listing 2 shows the complete hand-hewn portion of the second demo (i.e. it does not include the parts generated by Visual Studio’s designer). There are just three event handlers for the three supporting visual components (highlighted in the listing): the directory picker, the date/time picker, and the enabled/disabled checkbox. Follow the code from there and you will see the simple code needed to interact with the FileMaskControl.

Listing 2. Application Program to Manipulate the FileMaskControl

Conclusion

LINQ is a great leap forward in code abstraction. By  providing a uniform way to handle widely diverse data structures within a program (not to mention diverse external data sources I’ve not even touched upon in this article), LINQ makes code easier to create and, perhaps more importantly, easier to maintain. It also provides tremendous flexibility in what it can do. But more than that, it is even flexible in its own use (is meta-flexible a real word?) offering two, quite different notations: query syntax and lambda syntax.

Query syntax has the advantage of looking like a SQL query, and makes some typical SQL-type operations (e.g. joins) simpler to write.  On the other hand, Lambda syntax allows for richer expressiveness with its larger operator set, as well as access to one of the most powerful aspects of LINQ, lambda expressions. As demonstrated in this article, lambda expressions allow you to inject arbitrary functions into any properly instrumented code. This gives you, as the designer, the capability to design completely generic building blocks with hooks (i.e. callback mechanisms) that can convert your building blocks into extremely specialized components. 

The FileMaskControl I’ve described shows just one example of a practical application of this design pattern, illustrating some useful techniques to help you write cleaner, tighter code. Indeed, the amount of code directly related to the customization hooks for the FileMaskControl is a grand total of two lines: one internal line to use the lambda expression, and one external line to set a lambda expression. With the principles I’ve described you should now have a good foundation for using lambda expressions to develop your own library of building blocks.

Footnotes

  1. Previous articles focusing on this ‘one line of code‘ approach include Using Three Flavors of LINQ To Populate a TreeView and Using LINQ to Manage File Resources and Context Menus.
  2. LINQ, by default, requires the .NET 3.5 framework. At the time of writing, though, .NET 3.5 is still relatively new and, since upgrading infrastructure in corporate environments can be costly, there are plenty of installations still using .NET 2.0. If your use of LINQ is limited to LINQ-to-Objects (and thus does not need LINQ to SQL, LINQ to XML, etc.), the open source LINQBridge library provides a clever solution that lets you operate with just the .NET 2.0 framework. As explained on the LINQBridge page, LINQ queries do not depend on the 3.5 framework per se; rather, they depend on the presence of certain method signatures. It does not matter where it finds these methods, so the LinqBridge.dll library file may be used to supply them in a 2.0 environment. However, you will need to use Visual Studio 2008 or later; VS2005 is not sufficient. VS2008 provides a multi-targeting feature, allowing you to specify which framework (2.0, 3.0, or 3.5) to compile to. To use LINQBridge, therefore, set the target framework on the Application page of your project properties to 2.0, and then add a reference to the LINQBridge.dll library to your project.
  3. Two demo applications accompany this article, contained within a single Visual Studio solution, providing source code as well as the compiled executables (in each project’s bin/Debug directory). You need only rename each name.executable file to name.exe and you can run them right out of the box.