Big data terror

Over the last hundred years, there have been some outstanding novels that capture our dystopian reality. From Aldous Huxley’s Brave New World to George Orwell’s 1984 , from Franz Kafka’s The Metamorphosis to Margaret Atwood’s TheHandmaid’sTale , each explored the ruthlessness of the state’s power and the institutions created by that power to perpetuate entrenched interests at the cost of the people. Despite their chilling narratives, these novels, at the end of the day, offered comfort in that they were works of fiction that talked about dark possibilities, which may not be our fate if we are vigilant enough. Somehow, the reader is convinced that these novels are teaching grounds to learn, to dissent, to question, to act and to resist. They are seen as catalysts for the democratic mediation of spaces for human aspirations and desire for freedom. They, at a deeper level, reaffirm the agency of the people against draconian institutional models.

However, Cathy O’ Neil’s Weapons of Math Destruction , a non-fiction inquiry into the world of big data, removes even this comfort and makes us realise how people are reduced to bits of data that can be used to expand the profits of corporations, to reduce the state’s spending on affirmative action, and to threaten the idea of democracy.

In India, in the context of the Supreme Court’s connivance with the government of India’s defiance of its interim order in pursuit of its move to make Aadhaar mandatory for the delivery of subsidies, benefits and other entitlements, this book is more disturbing than any of the dystopian novels. It deals with real big data and arrives at stark conclusions. In our case, where there is a constant change in the basic parameters in collecting data, one shudders to think what the future holds for us.

Cathy O’Neil is one of the finest data experts in the world. She started the Lede Program in Data Journalism at Columbia University. A PhD in Mathematics from Harvard, she worked for years in Wall Street as a data scientist, building models for hedge funds and predicting people’s purchasing power and digital clicks. But, she was disillusioned with this kind of mathematics, which refused to see the conditions of the lives of people.

She was dismayed with the arrogance and the authority displayed by the high priests of mathematics and computer science in determining what one was entitled to. She was disturbed by the ironic relationship between the high assumptions of the mathematical models about creating a fair system based on numbers that eliminated bias and the toxic reality of reinforcing stereotypes. In this book, she explains with rich empirical evidence how the new algorithm-driven models are opaque and incontestable, even when they are wrong.

To understand the inherent weakness in the new math-driven models, we need to look at specific human experiences recounted in the book rather than abstract terms such as probable statistical error, adjustment for margins of error, and various formulae.

Cathy O’Neil became suspicious of the numbers game following the 2008 sub-prime financial crisis in the United States that caused a rise in unemployment and wreaked havoc on the lives of millions. She wrote: “What’s more, thanks to the extraordinary powers that I loved so much, math was able to combine with technology to multiply the chaos and misfortune, adding efficiency and scale to systems that I now recognised as flawed.”

Victim of mathematics

The first case she deals with is that of an excellent schoolteacher in Washington, D.C., who became a victim of mathematics. In 2007, the city’s mayor, Adrian Fenty, wanted to turn around underperforming schools under his remit. He created a new powerful post, Chancellor of Washington’s Schools, to aid him in this mission and appointed an education reformer, Michelle Rhee, to the post.

Rhee developed a teacher assessment tool called IMPACT and decided to fire all the teachers whose scores put them in the bottom 2 per cent. At the surface level, this looked like a fair system and many good teachers felt that they had no cause for worry.

But, Sarah Wysocki, a fifth grade teacher, with excellent reviews from the school principal and parents, scored abysmally in IMPACT’s value-added modelling, an algorithm-generated scoring method, and was fired. What exactly was the value-added model measuring? It was a mathematical computation developed by a consultancy firm, Mathematica. Its job was to measure the educational progress of the students and then to calculate how much of their advance or decline was due to teachers.

The variables were plenty: the socio-economic background of the student, the effects of learning disabilities and domestic violence, to name a few. Could an algorithm capture human behaviour, performance and potential? “There are so many factors that go into learning and teaching that it would be difficult to measure them all,” said the fired teacher. One of the techniques adopted by the statisticians was to count on numbers to balance out exceptions and anomalies. “Weapons of math destruction”, Cathy O’Neil argues, often punish individuals who happen to be the exception. IMPACT’s dependence on algorithm weeded out more good teachers than bad teachers.

College rankings

The next example that Cathy O’Neil examines is that of the ranking of 1,800 colleges and universities in the U.S. by the magazine U.S. News & World Report . It was a process that began in 1983 by instituting opinion surveys. Stanford came out as the top national university and Amherst as the best liberal arts college. But others protested, forcing the magazine to go in for data. The first data-driven ranking was published in 1988. Many felt that the results were sensible. However, soon a vicious feedback loop materialised. A college that fared badly in the ranking lost its reputation, forcing good students and good professors to avoid it, and the alumni to cut down on contributions. Its ranking tumbled further.

Flawed assumption

Cathy O’Neil looked at the other flawed assumption that comes with quantifying qualitative attributes. She looked at a case of a website that was looking for a social media maven, the digital world’s term for someone who helps to aggregate and accumulate presence in the social media platforms. The hiring manager devised a proxy to evaluate the applicants. She settled for those with most followers on Twitter. It looked like a fair assumption to measure social media engagement. But once the word leaked that assembling a crowd on Twitter was key for getting the job, candidates did everything to ratchet up their Twitter numbers. Some paid money to a service that populates their feed with thousands of followers, most of them generated by robots. The proxy lost its effectiveness.

Fairness ahead of profit

O’Neil explains how big data processes codify the past in a reductionist manner and how they cannot predict the future. “Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit,” she argues.

She cites an oath drawn up by two financial engineers, Emanuel Derman and Paul Wilmott, which focussed on the possible misuses and misinterpretations of their models:

“I will remember that I didn’t make the world, and it doesn’t satisfy my equations. Though I will use models boldly to estimate value, I will not be overly impressed by mathematics. I will never sacrifice reality for elegance without explaining why I have done so. Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights. I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.”

She feels, however, that this will not suffice because solid values and self-regulation rein in only the scrupulous. Her concluding argument is that society should get a grip on techno-utopia and unwarranted hope in what algorithm and technology can accomplish. It is hard to disagree when she says: “Before asking them [algorithm and technology] to do better, we have to admit they can’t do everything.”