Time for a rethink on SQL CLR?

It is almost seven years since Microsoft announced the sensational news that, with their new SQL Server 2005, you could do data manipulation in Visual Basic or C# as well as SQL. The marketing people got overexcited, stabbing out clichés from their keyboards such as “new era”, and “revolutionary”. However, they had run off barking in a different direction to the technologists, missing the whole value of SQLCLR for providing specialized extensions to SQL Server

Unsurprisingly, the technology was misunderstood and many DBAs have never recovered from their initial feelings of dread, still shuddering at the merest mention of the term “CLR function”. In the opening passage of his recent review of SQL#, a suite of very useful SQL CLR functions for string manipulation, regular expressions, and more, Grant uses terms such as “throwing your arms up” “screaming”, “crying out loud”, “nuts” and “stupid”, which seems a pretty fair assessment of the attitude of many DBAs toward CLR functions in SQL Server.

Several factors have added to its bad reputation. Some of this is misuse, using SQL CLR for functionality that belongs in the application tier, rather than the database. Security issues have compounded the anathema toward CLRs. Much harm was done by poor implementation, mainly on the part of the coder, but also on the part of Microsoft and SQL Server.

The worst problem is the complexity of writing effective CLR routines for high performance and scalability. The key to success is the constant streaming of data, using IEnumerable methods and the SQLDataReader, so that SQL Server never has to process too much data at once. The use, instead, of the ADO.NET dataset, and various custom user collections, can make place huge memory demands on SQL Server, involving heavy multi-page memory allocations in the non-buffer pool region. This issue, especially in 32-bit versions of SQL Server, with access to limited amounts of Virtual Address Space, was the cause of paging problems, and specific errors such as “App Domain marked for unload due to memory pressure”, or “701” errors, where SQL Server fails to allocate a contiguous region of VAS for an operation.

Unfortunately, SQL Server does not necessarily make it easy for the SQL CLR programmer. SQL CLR functions cannot use a SQLDataReader over the context connection, so the efficient IEnumerable-based streaming simply isn’t available and the programmer is forced to use less efficient techniques, or to find ways around the problem, which ultimately involves classifying your assembly as EXTERNAL ACCESS, or possibly even UNSAFE.

If you’re someone like Adam Machanic, you persevere. His blog, and various presentations, detail his journey through writing custom IEnumerable classes to enable streaming, creating multiple threads in the SQL CLR to help deal with issues such as “rude aborts” and evolving all this into a powerful “query parallelizer” component.

If all of this is enough to discourage the rest of us from writing our own SQL CLR components, it still doesn’t mean we should deny ourselves access to the useful functionality and performance that they can provide. Professionally-written and properly-tested, SQLCLR libraries, such as SQL#, may cause us to refine or qualify the general anti-CLR position. Thorough testing by the likes of the author of SQL#, Solomon Rutzky, has shown that for functions that require heavy computational power, SQL CLR routines have obvious performance advantages over T-SQL. The SQL CLR gives us a “turbo-charged cursor”. Any type of calculation that lends itself naturally to cursor-based logic, such as the running total calculation, will run faster as a SQL CLR.

For accessing important CLR libraries such as the Regex functions SQL CLR assemblies are essential. There are a host of specialized requirements where CLR can replace the clunky and antiquated OLE Automation techniques, and there is the added value of allowing new complex data types such as vectors and coordinates. Yes, the technology has its quirks, but CLR libraries are worth having. In fact, unless you have kicked out the Hierarchy functions and geospatial functions, you already probably have CLR libraries installed!

I’d be interested to hear if, and how, you’re currently using SQL CLR, or at least if your attitude towards it is softening.