Regular Expressions

I have just spent the last hour or so trying to get a .NET’s regular expression to do what I want (matching a one of several possible keys followed by an = followed by a quoted string with possible escape chars in it – not difficult you would think).

Unfortunately it seems every time you change platform or language you get bundled with yet another new regular expression syntax. Why is this necessary? As far as I can tell none of the syntax’s about make it any easier for the programmer to express what they want, they all have alternation, groups (ok we now have named groups), character classes, quantification,  and replacement syntax.

It seems those responsible for implementing regular expression support always seem to think they know better – why? We have enough dialects already PCRE, Posix and seemingly numerous Microsoft variations not to mention ANTLR, Lex and Yacc (and all their different implementations which have subtle bugs/discrepencies).

Why didnt microsoft do somthing senisble like allow us to build the FSA from an object model (which could be quite nicely done allowing for named librarys of known FSA’s) or choose a common syntax like PCRE or Posix rather than saying we will do roughly the same but

Anyway enough of a rant and back to the regex’s..

As a parting note if you are working with Regular Expressions quite often then Mastering Regular Expressions (J. Friedl, Oreilly 1997) is a great book.. well at least if you are using Posix or PCRE..