Sacred fact, profane data

There’s 200ml of water in this glass.  Whether the glass is half-full or half-empty is now entirely up for grabs by commentators, depending on how optimistic they are.  But there is consensus around the 200ml.  The value of comment and opinion has always been to interpret facts, to find new ways of looking at them, and to open our minds to different ways of seeing the world.  For comment to have this liberating power, we need to agree on fact.  When we don’t, comment has little utility beyond cheerleading and rabble-rousing.

The promise of the data revolution was manna from heaven for commentators.  Armed with irrefutable fact, we were in a position to comment on areas where we’d barely caught a glimpse of reality.  Facebook could show us what people were really talking about, how many of them, where they lived, what they thought about climate change.  Google Trends could offer us instant access to the world’s concerns and tell us whether privacy really matters.  Clever geolocation could show us the migration of the Rohingyas.

News reporters, too, enjoyed the power of this technology.  Data-driven technology predicted election outcomes more accurately than weary survey techniques.  The Guardian was able to show, beyond any doubt, the type and location of casualties caused by 16,000 IED attacks in its Afghanistan War Logs.

As WikiLeaks showed us, holding power to account is much easier when technology enables us to scrape every Freedom of Information request at the touch of a button; when every citizen armed with a mobile phone is, essentially, a new source; when the truth is only a tweet or two away.  Secrets were rapidly becoming a thing of the past.

For commentators, the rewards were richer still.  Visualisation enabled us to comment on the rapidity of climate change with breathtaking power; twitter feeds let our audience talk back to us at scale, creating a true dialogue; and page views showed us whether our audience was really engaging with us.  We could even A/B test different headlines to see how our audience wanted to read a story, tweaking copy at the touch of a button.

The utopian vision was that comment would rise to power, helping people around the world to understand, interpret, and speak to each other around a single version of the truth.  Together, we would make sense of an increasingly complex world.

Then the bubble burst.

We’ve woken up to the fact that, like all valuable assets, data is subject to theft and misuse.  Industry makes a great deal of money by capturing and using people’s data, but it didn’t take much for that data to be transferred to Cambridge Analytica or, probably, any number of state-sponsored Russian data laboratories.  This has, of course, escalated to the point of regulatory intervention, with the introduction of the GDPR.  Beyond crime, however, there are actually three “design flaws” inherent in all data-led activity, from marketing to journalism to policy development.

First, the algorithms which power data-led activity are necessarily driven by one thing: response.  At the most simplistic level, I will only see things which the algorithm knows I like.  While this kind of optimization drives efficiency in business, it creates echo chambers in the public space.  We inhabit ever-tighter social networks in which we only hear the opinions of those we agree with.  These networks have become a breeding ground for “alternative facts”, destroying the ideal that data is able to create a single version of the truth.  For comment and opinion, this is life-threatening: in the social space, we don’t have a truth we agree on and we don’t have the ability to influence people’s perceptions.

The second “design flaw” in digital data is that it is purely behavioural.  We only see what people do, not why they do it.  Why is the most powerful word in the commentator’s lexicon.  The facts are there; we seek to make sense of them by understanding why they have come to be, and what might result.  Brexit is a useful case in point.  Data has told us everything about who Brexiteers and Remainers are, but it has given us no insight into their motivations.  We could talk about class, but then we have Jacob Rees Mogg.  We could talk about the North, but then we have Kent.  To get to motivations, we’ve had to do things the old fashioned way: listen, understand, interpret.  It took David Goodhart to identify “Somewheres” and “Anywheres”, not an algorithm.

The final “design flaw” in data is that it does the opposite of comment by fixating on the short-term.  Data is structured to please people right now.  It’ll give them the Amazon recommendations and Facebook stories they want right now.  It isn’t generally structured to think about what they might want or need in the longer-term.  By contrast, commentators generally take a single piece of news and use it as a way of opening up a longer-term discussion.  From a single story about pay, an entire social conversation around the gender pay gap arises.  From a single murder, terms such as “institutional racism” come into existence and continue to be debated twenty-five years later.

Data has not delivered on our ideal of a single version of the truth, which we can then interpret and help people to understand.  We need to be careful about what we’re asking the data to do for us.  If we’re fact-checking or scraping records, we’re on pretty safe ground.  If we’re looking for the insight and human truths on which comment is based, it’s dangerous to rely on data.  It’s back to what has always made us valuable: listening, interpreting, and narrating.