Big Data and the Slough of Despond

“This miry Slough is such a place as cannot be mended; it is the descent whither the scum and filth that attends conviction for sin doth continually run, and therefore is it called the Slough of Despond: for still as the sinner is awakened about his lost condition, there ariseth in his soul many fears, and doubts, and discouraging apprehensions, which all of them get together, and settle in this place; and this is the reason of the badness of this ground”.

John Bunyan, The Pilgrim’s Progress, 1677

For the past thirty-five years, I’ve been regularly staring at the promises and announcements of the IT marketing men and thinking, “what is the incredible technical breakthrough that has allowed this to happen?” Every time, it turns out that there has been no giant leap-forward in the technology, only blather. It is the way the industry works.

It is a particular relief that the Big Data soufflé has at last collapsed into froth. The idea that an analysis of the background noise and chatter of the interwebs, or logs of consumer behaviour, can reliably give you commercial insights is as daft as listening to the rushing of the waves through a seashell and imagining the voices of the sirens. The trouble is that, without knowledge of statistics, and a serious approach to rejecting any conclusions from data that could have occurred by chance, or other unknown factors, you are a danger to anyone who respects your data analysis. If you rummage through data merely to fish for ‘insights’, you will always come up with exciting correlations and trends that are likely to have occurred by chance.

I object not to the science but to the marketing hype. The tools are certainly there to help you to tease out factors that relate to any commercial trend. I remember when a ‘rocket scientist’ friend of mine in the City of London worked out, a while ago, that the price of sugar futures contracts correlated directly with the weather in Chicago. He used an Exploratory Factor Analysis technique with a huge bank of data. Sure, heavy rains, in Brazil, India and the United States, especially around harvest time, can hike the price of sugar futures, but the most significant correlation in price movement was with the occurrence of weather depressions in Chicago. Why? Chicago was the center of the bulk of sugar futures trading, at the time. The traders merely looked out of the windows and reacted instinctively. It was an insight that the bank that employed my statistician friend used very profitably.

Nothing much has changed in the art of analyzing data to gain marketing insights. It is as hard as it always was, except that it’s now possible to draw entirely the wrong conclusion from data much faster than ever before. As always, the problem is an intellectual, rather than a technical, one. It’s a problem of ensuring the quality of the data, understanding probability and population samples, and resisting the inclination to project your own beliefs onto your findings. Having the technological tools to hand to allow you to see further is no use if you’re looking in the wrong direction.

Here is just a sample of the current debate about Big Data.