Giving Clarity to LINQ Queries by Extending Expressions

LINQ expressions can be made much easier to comprehend and modify by using extension methods to create pipes and filters. Ed takes a working LINQ expression and makes stepwise improvements to it to make it clearer.

Overview

LINQ and Entity Framework are both commonly used in the .Net ecosystem, but even well-written applications can have LINQ queries that are difficult to understand. Because LINQ is so flexible, it can be written in ways that fail to communicate the developer’s intent. Well-written LINQ should be so clear as to be self-documenting. To write clear LINQ, it helps to understand the details of a few LINQ components that improve LINQ’s readability.

We’ll be showing how to use a pipe, filter and rule pattern to make LINQ queries easier to comprehend. We’ll start by taking a look at how we can extend the Where method by creating custom filters that take advantage of IQueryable extension methods. Finally we will take a deep dive into expression trees to understand how they work, and how to manipulate them for maximum reusability.

Where the problem lies

The LINQ API allows several different styles of programming. The API allows us to chain together multiple methods, each of which can take elaborate lambda expressions, but by using this style, we can lose sight of the actual purpose of the code.

One of the most common uses for LINQ is to filter data using the Where method. The Where method takes a lambda expression which is capable of performing almost any number of filtering operations by using multiple logical operators [&& ||].

The following example shows a query requiring that five criteria are met to yield results. The query is more self-obfuscating than self-explanatory, especially without knowing any of the context that the original developer may have had when it was created. If we were asked to modify the query to meet a new business requirement, we could certainly figure out the details given enough time. If the original developer had used a better approach from the beginning, we’d now have an easier task.

Using multiple operators can create complex queries.

Now that we have seen where we can improve, we’ll discuss the Pipe and Filter pattern and how we can apply it using LINQ.

Pipe and Filter pattern

The pipe and filter pattern in its simplest form can be defined as “A chain of multiple operations used to get specific results.” The pipes and filters pattern is a common programming design pattern with many uses. http://en.wikipedia.org/wiki/Pipeline_(software)

The pattern name comes from the idea of filtering water from a source as its being piped to a faucet. In the context of this article we will be taking data from a source [database] and chaining together multiple operations to get a usable subset of the data for another layer in our application.

To construct our filters we will be using IQueryable and the LINQ Where method, as we learn more about the API we’ll build smaller filter components or “rules” which will allow for greater readability, reusability, and flexibility.

1899-Pipes-and-filters-630x235.jpg

Understanding IQueryable and Where

Before we begin writing filters let’s see how the LINQ API handles method chaining and how it acts when multiple Where methods are called.

The Where method extends IQueryable<T> in just the same way as most LINQ methods. This allows us to call Where from other LINQ methods and collections that implement the IQueryable<T> interface. Where also returns IQueryable<T>, thereby allowing additional methods to be called on the results. The Where method also takes a parameter named, predicate.

1899-IQueryable-Extensions-630x307.jpg

IQueryable does not store the results of a query but instead stores the commands required to build a query. These commands will result in an expression tree. We will look deeper into expression trees later when we take an in depth look at the predicate parameter of the Where method, but for now we’ll focus on extending IQueryable since this is a relatively simple task.

When multiple Where methods are chained together, the result is the equivalent of using the AndAlso [&&] operator. This means we can use the following code interchangeably.

The first statement is more concise, yet the second offers more flexibility. Both will result in the same query.

Writing a custom filter

We can now begin writing custom filters, using what we learned about IQueryable and the Where method. By creating an extension method that both extends and returns IQueryable<T> we can create our own chainable filter method.

An example filter extension method

Let’s look at an expanded version of our example of rather unintelligible LINQ code. We’ll now refactor two methods that return a set of blog posts from a repository.

The first method GetArticles is a filtered set of data which is further reduced by the Skip and Take methods to facilitate paging. The second method GetFeaturedArticles returns several filters with multiple rules. The filters and rules combine to define a “featured article”, however there is very little context telling us how this is accomplished.

Let’s apply what we have learned so far and refactor the two methods using the pipe and filter pattern.

In the GetArticles method we can see there are two conditions that must be met: only posts that are published and only those that have a PostedOn date less than or equal to today. Since the two conditions are joined using the AndAlso [&&] operator, they can be rewritten as separate Where statements. The code below shows how one could rewrite the filter statements, and it includes a comment to explain their function.

Now that we have separated the statements, we’ll write extension method filters that are as easy to read as the comments “Are published” and “Posted on or before today”.

Since our extension methods follow the LINQ API pattern of extending IQueryable and returning IQueryable, we can completely replace the Where statements with our custom filters.

By using this simple pattern our code reads more like human language instead of a set of conditions.

Extension methods are easy and require very little code to implement. However, they work in limited contexts where rules can be combined using the AndAlso [&&] operator. If we continue using extension methods with the example code, we could encapsulate the remaining Where method. The effect of doing this would be less than ideal because the resulting extension method would require several parameters and therefore wouldn’t be likely to improve readability.

To further improve upon our pipes and filters pattern, we’ll need to learn more about the Where method and the predicate parameter.

Predicates, Expressions and Expression Trees

Previously, we looked at the Where method and how it extends and returns IQueryable. Using this convention, we were able to create an extension method filter that is easily readable and has a single responsibility.

The code now requires some additional work due to limiting factors of the Where method. Let’s examine the Where method again and use this information to expand upon our filter pattern. This time we’ll focus on the predicate parameter and learn to apply rules for filtering our data.

If we look at the Where method signature, we’ll see that it takes an Expression<Func<TSource, bool>> parameter named predicate. A predicate in C# is usually defined as “A function that returns a Boolean result”, but in this case there is more taking place. The type is actually an Expression<T> where T is the predicate, this means that we aren’t actually supplying a function but rather an expression of that function. The inner portion of the type Func<TSource, bool> tells us that the expression of our function takes a generic type TSource and returns type is a Boolean value.

It’s important to understand the differences between the expression of a function and a function delegate; Expression<Func<TSource, bool>> and Func<TSource, bool>. An expression of a function is a complex set of Expression object types that form an expression tree. The expression tree contains parameters, operators, constants and other meta-data that can be created, examined, and manipulated with code during runtime.

1899-Expression-tree-diagram-630x515.jpg

Creating Expression Trees

Expressions do not have constructors, there is no way to “new up” an Expression.

We could use the Expression API factory methods to create an expression tree. There are many expression factory methods that we would need to invoke in order to create a single expression tree. Each parameter, constant and operator in a single expression requires its own Expression object, each are then combine using the Expression API to form the final expression tree. These manual methods are very verbose and take significant effort to construct.

http://msdn.microsoft.com/en-us/library/bb397951.aspx

Alternatively we can use the compiler to do a majority of the work for us by assigning a lambda expression to an Expression<T>. In fact, we’re already using this syntax with the LINQ API inside the Where parameter.

Creating a lambda expression tree using the compiler.

Both approaches have their benefits; while the lambda syntax is much easier to write and quicker to understand, the manual method can be manipulated at runtime. We’ll use these strengths and weakness to our advantage when building expressions [rules] for our filters.

Working with Expression Trees

Let’s consider how we might work with expression trees in our example problem. If we could replace each condition in the Where method with a function that returns an Expression we could clean up the statement much like we did using extension method filters.

Continuing with the blog posts example, let’s look at the first condition (post.PostedOn >= cutoffDate), we can see that the post is being checked to see if it was posted on or after the cutoff date. It would clean up our code nicely if we could create a PostedOnOrAfter function to replace the condition. If we continue with the other conditions, it might look something like the example below.

Unfortunately this code will not compile because Expressions cannot be combined using the && and || operators.

Since we have accomplished as much as we can using the lambda syntax we will need to exploit the Expression API. Using the Expression API we can create a utility that will take apart and reassemble multiple Expressions into a single valid Expression.

We will use extension methods again to make the syntax fluent and easy to read and write. This time we’ll create And and Or extension methods, that will serve as an API for calling our utility. The method, CombineLambdas is responsible for working with the Expression API. An ExpressionVisitor, a special class used to traverse expression trees, will be used to rewrite the expressions TSource parameter at runtime. If the parameters are not of the same instance, then an “out of scope” error will occur at runtime.

The utility code combines nodes of two expression trees and returns a single lambda expression.

With our utility in place, we can finish refactoring. Where previously we tried using operators before to combine expressions, we will now simply chain them together using And and Or. Now we have rules that read like spoken language, and we can clearly comprehend the code’s intent.

Putting it together

Now that we have created filters using extension methods and rules using expression trees, let’s give our example code one last refactor. We’ll take a last pass through the code making it as concise as possible.

The example in its current state can still be refactored.

The GetArticles and GetFeaturedArticles methods both begin with the same set of filters. These common filters can be included within a single function along with the call to the repository; it will serve as a starting point for further filtering. This method can be considered a pipe in our design pattern because it is taking data from the source and delivering it to the next operation. We won’t need an extension method here because the method will not be called from inside the method chain.

In the GetFeaturedArticles method, the Where statements read well, but the rules could still be more specific. The second set of rules WithFeaturedAuthor(featuredAuthor).And(PostedOnOrAfter(featuredAuthorCutoffDate)) are communicated more clearly as a single requirement so let’s wrap them in a single rule and reduce the statement further.

We’ve completely changed how the code in our example reads. We’ve taken several lambda expressions that were difficult to distinguish and transformed them into a clear human readable syntax. Each filter and rule has a single responsibility, allowing new developers to easily make modifications when requirements change.

Changing requirements

Continuing with the posts example, let’s imagine that we have been given a new requirement. In the past we have only featured a single author’s posts, the new requirement is to allow for any number of “featured authors” to be displayed. The featured authors’ names are provided as a string array which will be used in the query for featured authors.

Using traditional lambda expressions wouldn’t work for this requirement since we have an unknown number of featured authors. We will have to iterate through the array and dynamically build the query by appending multiple Or statements. Because we have separated our rules into single operations and have the ability to chain them using our expression utility, the task will be trivial.

Instead of modifying the FeaturedAuthorPostedOnOrAfter rule, we can reuse it. We will add a FeaturedAuthorsPostedOnOrAfter rule that iterates through an array and appends multiple rules using the Or extension method. To create a dynamic rule we will need a starting point to begin the method chain, for this we can use the Begin helper method of our expression utility.

Begin creates a parameter => false Expression, a valid rule that is discarded once the first rule is appended via And or Or, if no rules are appended the Expression will execute with our error.

Now we can simply iterate, append and return the dynamic criteria.

Conclusion

In this article we took a deep dive into IQueryable, Where and Expression. By using what we learned, we were able to implement the pipe and filter design pattern and add rules for filtering data. We started with simple IQueryable extension methods to create filters that improved the readability of our code. We explored expression trees and saw how to manipulate them. Using an Expression utility we created a flexible API for writing rules, in addition we gained the ability to create dynamic data queries at runtime.

Using the ideas from the examples given here should be considered on a per-project basis. Some of the techniques may work better in certain circumstances than in others. Experiment with pipes, filters and rules to see which combination works right for the task at hand.

Links

The sample project for this article can be found on GitHub. 
https://github.com/EdCharbeneau/PredicateExtensions

The PredictateExtensions binary
[http://www.nuget.org/packages/PredicateExtensions/
… and source …
[http://www.nuget.org/packages/PredicateExtensions.Source/
… are also available on NuGet. (This is not production software and should be used merely as a starting point for your own projects or exploration.)