Whenever I teach database topics to a class of beginners, someone inevitably says that Microsoft Excel is a database. Excel is an amazing tool with built-in functionality for financial calculations, statistics, and more, but it’s not a database. It’s great for keeping track and organizing small amounts of data, but it doesn’t meet the requirements of a robust database system with referential integrity, constraints, and ACID properties.
One of the biggest spreadsheet messes I’ve ever seen was a payroll/HR system for a 40-person firm. Even though it was quite sophisticated for Excel, it eventually reached a breaking point where it would crash almost every time the spreadsheet was opened. I was tasked with designing a database, importing all the data into SQL Server, and creating a front end for it. That was almost 20 years ago. Excel has its uses, but it is not a database.
The latest Excel fiasco happened in the UK as Covid-19 test results were being imported into spreadsheets. The faulty process caused almost 16,000 cases over nine days to be left out of the case figures. The glitch was caused because the process imported data into xls files before uploading into the central systems. There is a limit of 65,000 rows in the old xls files. This translates to a max of 1,400 cases per spreadsheet since multiple rows were needed for each case. Once the daily caseload reached this number, other cases were accidently discarded from the import. Excel xls files were used as an intermediary step and not as a database per se, but this process must not have been thoroughly thought out or tested. The reported solution was to split out the data into more spreadsheets to avoid the limit.
The UK government says that positive tests were eventually counted, but contract tracing didn’t happen right away on the missing cases, so many people went about their lives not realizing that they were exposed and potentially infecting others. The government dashboards were also missing the cases, so things looked better than they actually were for a few days.
It might be easy to blame Excel or even Microsoft in this case, but human error and the lack of processes, testing, and proper tools are responsible. There is a lot of finger pointing involved, but it’s also not unusual to find systems in many organizations held together with duct tape and a prayer.