The unexpected behaviour of DirectoryInfo.GetFiles() with three letter extensions

There is a documented, but certainly counterintuitive issue with the DirectoryInfo.GetFiles() method in .Net. This method returns a list of files that match a particular pattern. For example in the following example it will return us all the files on drive Z: that have the exact extension “.foobar”

DirectoryInfo folder = new DirectoryInfo(@”z:”);

FileInfo[] files = folder.GetFiles(“*.foobar”,
    SearchOption.AllDirectories);

However, the DirectoryInfo.GetFiles method behaves very differently when you use it with an extension that contains exactly three characters.  For example consider the following example:

FileInfo[] files = folder.GetFiles(“*.sql”, SearchOption.AllDirectories);

This will, as expected, return all the files with the extension “.sql”. However, it will also return all the files that have the extension “.sql-backup”, “sqlold”, “sql~”, etc.  Surprisingly this is the behaviour that is documented in Visual Studio’s documentation. A quote from that documentation (http://msdn.microsoft.com/en-us/library/ms143327.aspx):

“The matching behavior of searchPattern when the extension is exactly three characters long is different from when the extension is more than three characters long. A searchPattern of exactly three characters returns files having an extension of three or more characters. A searchPattern of one, two, or more than three characters returns only files having extensions of exactly that length.

The following list shows the behavior of different lengths for the searchPattern parameter:
•    “*.abc” returns files having an extension of.abc,.abcd,.abcde,.abcdef, and so on.
•    “*.abcd” returns only files having an extension of.abcd.
•    “*.abcde” returns only files having an extension of.abcde.
•    “*.abcdef” returns only files having an extension of.abcdef.

The reason for the above strange behaviour is the support for the 8.3 file name format. A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.

This behaviour has bitten me with a tool I’ve been working on. This tool reads “.sql” files and builds up a database schema from these files. This schema can then be compared with live database schemata. The primary motivation for such a tool is to support database schemata in source control. However, there were two different scenarios when the application started to fail. In one case I used emacs to edit a file, and it left me (as expected) a backup file postfixed with a ~ character. On another occasion, I used a source control system that decided to store caching information in the same folder where the sql scripts were located, and the cached files had an extension that started with sql and were followed by a timestamp. In both of these cases the database schema that built by my application was inconsistent, due to objects being duplicated.

The only solution to the strange behaviour of the DirectoryInfo.GetFiles() seems to be to check the extension of the file explicitly if you use an extension with exactly three characters. The FileInfo.Extension property returns the full extension of the file, not only the first three characters.