The Single Version of the Truth

Comments 0

Share to social media

If only there were a ‘single version of the truth’ in data and data analytics.

Unfortunately, data and everything that derives from it is subjective: it cannot be disentangled from our cultural biases. We can only find the version of the truth that fits the evidence and our cognitive map of the world we live in, clarify it as best we can, and run with it.

Although most data professionals would agree that the single version of the truth is a chimera, they would disagree with the aim of finding a version of the truth you like and running with it. Why? It reeks of the idea of manipulation of the data and is therefore seen as being pure evil.

This reaction is logical and predictable. It is the discipline of pure science. It will not be shared by people who are on, or above, a CxO level of management, or people who have consciously chosen jobs at big PR agencies.

These people will know very well from first-hand the sad truth that:

  • There is no way you are going to beat your competitors by having only a fair competitive advantage.
  • Statistics are used nowadays to create realities (if you torture data long enough, it will confess).
  • The human-behaviour-as-a-service is a driving factor in any economy nowadays, and the stakes are very high to maintain a certain level of well-being (at almost any cost)

These people are the servants of these rules, not the masters. They are obliged to create truths that fit the realities constructed by the people that they need to persuade. Ideas must be channeled in the context of those realities in a palatable way. We all come with mental schemas that we use to make sense of the booming, buzzing confusion around us. Humanity has to adopt these world-views off-the-shelf, uncritically; they vary enormously with their fit with reality. There is just too much information out there to gather and filter the information carefully, but instead, it is much easier to just ingest a well-served and garnished theory of what life is about.

Even if we are made aware of the puppeteer’s ropes, we generally believe that decision-making needs to be fact-based to be functional and correct.

We would still believe this, even if every person working with data analysis in some way would admit that they are fully capable of bending any data into any shape they want. We’d stick to the idea even if these august data scientists were all able to first prove that what they did was correct, and then prove that it wasn’t correct.

Past generations imposed high levels of integrity on their professions to try to prevent the manipulation of data getting out of hand. To an extent, it worked: even famous statisticians such as Sir Cyril Burt were ‘outed’ and disgraced when they were caught faking or manipulating data. We’ve lost much of the self-regulation of the profession, and this has allowed the puppet ropes to be stronger than the constraints on responsible scientific method. Trends and anomalies are claimed to be significant when they aren’t, and inconvenient data is removed as an ‘outlier.’ Inappropriate statistical methods are used, often in a ‘fishing trip’ until significance is ‘hooked.’

Even if data scientists had scientific integrity, their customers or employers are less likely to suffer the same restraint. Give a million bucks to a PR agency, and they will run a garlic campaign to the skies. And the public will love that garlic chewing gum and will ask for more. Individuals would hate it, but the masses will love it.

If we could bring self-policing back to the profession of data science, could we then establish an ethical dimension to the management of the vast international corporations and political forces that have increasingly dominated the past century and a half? It would be difficult. Historically, even when the promise of hell-fire was a believable threat against liars, swindlers, and usurers, the merchants managed to stave off the perceived threat by the paid remission of sins. It seems that money and honours can salve a conscience and neutralize any counter-force.

Perhaps we have unwittingly hit on the solution: When, how and what would make ethics in data profitable? Can even commerce itself be made more effective by a strongly-enforced ethical standard? After all, the Quakers managed it. The answer is that ethical commerce, when a word is a bond, takes far less time and energy because it doesn’t require the paperwork, the checks, the insurance and the legal work. If in data science, we can trust the input data, it is all so much easier. There is so much less we need to do. Likewise, without deliberate ‘fake news,’ and fake claims that go unchallenged, we don’t have to read so much and cross-check. Deceit takes a lot of energy.

Commentary Competition

Enjoyed the topic? Have a relevant anecdote? Disagree with the author? Leave your two cents on this post in the comments below, and our favourite response will win a $50 Amazon gift card. The competition closes two weeks from the date of publication, and the winner will be announced in the next Simple Talk newsletter.

About the author

Feodor Georgiev

See Profile

Feodor has a background of many years working with SQL Server and is now mainly focusing on data analytics, data science and R.

Over more than 15 years Feodor has worked on assignments involving database architecture, Microsoft SQL Server data platform, data model design, database design, integration solutions, business intelligence, reporting, as well as performance optimization and systems scalability.

In the past 3 years he has expanded his focus to coding in R for assignments relating to data analytics and data science.

On the side of his day to day schedule he blogs, shares tips on forums and writes articles.