‘Digital inequalities will power digital colonialism’: Arvind Narayanan

Interview with the AI expert and author.

Whether its OpenAI’s ChatGPT, Google’s Bard or Meta’s Galactica, chatbots powered by generative artificial intelligence (AI) may help create, in spite of all the benefits they claim to be bringing to humanity’s table, an unequal world where the poor will be left to the mercies of AI while the rich can have the best of both worlds, feels AI expert and author Arvind Narayanan. A professor of computer science at Princeton University, Narayanan is a popular commentator on the possibilities and perils of artificial intelligence. He is the co-author of the upcoming book, AI SnakeOil (also the name of his popular Substack newsletter). Narayanan speaks to Frontline on ChatGPT, large-language models and the ethical, privacy concerns around the data they use and generate. Edited excerpts:

Why do you term the plausible text that ChatGPT generates as “bullshit”?

The philosopher Harry Frankfurt defined bullshit as speech that is intended to persuade without regard for the truth. This is exactly what ChatGPT does. It has been trained to produce text that is plausible, and it is extraordinarily good at it. That means the bot can be very convincing. But it has no way to distinguish truth from falsehood. So it often makes things up. It is often correct, but only as a byproduct of trying to sound convincing. True statements sound more plausible, after all.

ALSO READ: ChatGPT for dummies

There is a lack of transparency around the datasets used in the creation of products such as ChatGPT. How exactly “open” are these products?

There is a lot that is shrouded from view when it comes to products like ChatGPT. In addition to the datasets, there is a lot of human labour that goes into training them, and companies aren’t transparent about their labour practices. In the case of the Bing chatbot, even the most basic details weren’t released. When the bot turned out to have many disturbing behaviours, experts were left wildly speculating about what might have gone wrong. The lack of transparency hinders our ability to learn from past experiences.

ChatGPT does not really attribute or give credit to its source information while reproducing/paraphrasing results. Is this legally and ethically correct? After all, this is derived work. Also, in this scenario, can it claim originality or authenticity of information at all?

Like any AI tool, ChatGPT is only as good as its training data. I can’t speak of the legal aspects, but calling it a derived work is too simplistic from a technical perspective. It can do things like explain how to remove a peanut butter sandwich from a VCR... in biblical verse. That’s not simply a matter of rearranging its source material. In most cases, ChatGPT’s output isn’t based on one single-source document, which complicates the problem of attribution. Instead, what I would advocate for is a change to the economic model of generative AI whereby companies share their profits with those who generated the training data. Companies are unlikely to do it on their own, so it will have to happen through taxation. Of course, there is still the extremely complicated problem of how to allocate those rewards to everyone who ever wrote anything online or shared a picture, and how to do this across geographic lines in a way that’s even remotely fair. I don’t know if this is ultimately feasible, but we shouldn’t give up without trying or imagining this possibility.

ALSO READ: What should really worry us about AI

The knowledge acquisition process behind these kinds of AI products seems pretty opaque to people like us. Is that our problem or is there really a problem?

There is a lot of research that seeks to understand how these AI products exhibit the behaviours that they do. The problem is that this research tends to lag the engineering work of making these models bigger and bigger, and releasing the latest and greatest models. So researchers’ understanding is always a few steps behind the state of the art. Public understanding, of course, is much farther behind.

Stereotyped outputs

Contextual learning, which forms the basis of products like ChatGPT, is known to have its biases, based on the data used for building its character. How do we address this?

ChatGPT has a filter that has been trained to avoid biased or inappropriate responses. In other words, the model still has biases, but it outputs them much less often than it otherwise would. This approach has been surprisingly effective, although there is still a long way to go. On the other hand, with text-to-image generators, it feels like we are still on square one when it comes to biased and stereotyped outputs. For example, images of women that are generated are often sexualised.

What could be the labour implications of generative AI or conversational AI in general or services like ChatGPT in particular?

I think generative AI will partly or fully automate some types of jobs, and in other areas, it will enhance the productivity of human workers. It is hard to say exactly which jobs without an in-depth understanding of the domain. Many people claim that AI will replace doctors or lawyers. But this is based on an extremely superficial understanding of what those professionals do. It’s as if the inventor of the typewriter had proclaimed that it would make writers and journalists obsolete, failing to recognise that professional expertise is more than externally visible activity!

ALSO READ: Disruption or fantasy?

ChatGPT is developed and funded by a group that includes for-profit companies. So how open or common can the product be?

There is a spectrum of generative AI products in terms of their openness. Some, like ChatGPT, are controlled by the company developing it. Others are available for people to download and use on their own computers, but with some restrictions. Yet others are fully open source. Generally, the most capable, cutting-edge products tend to be centralised. Each model has its pros and cons. With a centralised model, the company has the ability to monitor and prohibit malicious uses (how well they’re doing that is a different question.) On the other hand, open-source models allow researchers to study the tool and better understand its strengths and limitations.

Many people are comparing this to the advent of the Internet or mobile. How true is this?

It is possible that the economic and social impact of generative AI will be on par with the Internet or mobile devices, but it is too early to predict. Based on their current capabilities, I would say the answer is no, but I also think they will continue quickly acquiring new capabilities for the next few years.

ALSO READ: Privacy a casualty in race to develop generative AIs

The biases of AI in general are a recognised concern today. Is the law catching up in order to address this? Are there any examples countries like India can emulate?

It is hard to imagine that the law could prohibit certain types of bias, such as a chatbot that parrots racial stereotypes. On the other hand, if a language model is used to screen resumes of job candidates, it is possible that these biases will result in discriminatory hiring algorithms. The good news is that this is already unlawful in most places — and it is not an AI-specific regulation.

Big Tech versus governments

All over the world, the debate around AI seems to have been reduced to a battle between Big Tech and governments. Is this the right approach?

There are many ways to rein in the power of Big Tech. Government regulation is probably the most important one, but it’s not the only one. Journalists, researchers, and activists have a huge role to play in raising awareness of Big Tech’s power and calling it out when it is misused. Another avenue is worker power. In recent years, at least in the United States, employees of tech companies have had a surprising degree of success in resisting unethical practices. There have been many whistle-blowers such as Frances Haugen. I think worker power can be even more effective if organised as a collective.

You led the Princeton Web Transparency and Accountability Project which looked into how companies collect and use personal data. Can you share some of your inferences from that project in the wake of the buzz around generative AI and its impact on society at large?

Starting a decade ago, the Princeton Web Transparency and Accountability Project exposed the details of the hidden tracking that happens on our devices and websites. Companies use many devious ways to track our every movement, even linking our online activities with what we do in stores. They compile massive dossiers of each of our activities and then use machine learning to turn those records into behavioural profiles that can be used to influence us.

There are a few lessons from that research that apply to generative AI. First, machine learning relies on ever-expanding volumes of personal data. ChatGPT is trained on data publicly available on the web, but some other generative models include private data on our phones in their training sources. Second, perhaps unsurprisingly, companies behave very differently when their behaviour is being watched and when it isn’t. That’s one reason why the lack of transparency around generative AI is a big problem. Third, the centralisation of data makes some companies economically powerful and by extension politically powerful. This concentration of power has many downsides.

ALSO READ: Escape from ChatGPT

There are concerns that ChatGPT or similar tools will negatively influence the way students learn, and the First World is already moving towards taking corrective measures in this regard, but don’t you think this could be a major concern in the Third World (poor countries) where access to AI tools will outwin regulations over the use of such tools?

Sam Altman, the CEO of OpenAI (makers of ChatGPT), recently said that one use of chatbots is medical advice for those who can’t afford healthcare. So the people in charge apparently see nothing wrong with a world where the rich have doctors and the poor can diagnose and treat themselves by talking to a bot. Regardless of what any specific people think, this kind of outcome is sadly all too likely. It’s a form of digital colonialism.

Here is another example. The recently released Bing chatbot has been reported to have many disturbing behaviours, such as threatening violence against users. But it emerged that Microsoft had used India as a testing ground for the bot back in November. When Indian users complained of these same problems, no action was taken. But when the issue reached The New York Times, the company made modifications within a few days. There are no easy solutions, but fostering a strong AI industry within the country will certainly help.

Tell us about your upcoming book, AI SnakeOil.

Generative AI has its flaws and limits, but it is a genuinely innovative technology that has many uses. The real AI Snake Oil is quite different. AI is being used by courts to predict who will commit a crime, by hospitals to predict who will fall sick, and by banks to predict who will pay back a loan. It is being used by employers to predict who will do well at a job — supposedly simply by analysing their facial expressions, body language, and manner of speech. This type of AI doesn’t work. It’s an elaborate random number generator. Yet it’s being used to make life-altering decisions about people. And this type of logic is proliferating to every area. Companies are slapping the “AI” label on whatever they’re selling to take advantage of the hype and give it a veneer of infallibility.

My upcoming book, co-authored by Sayash Kapoor, is about deconstructing the hype and helping people figure out which kinds of AI work and which ones don’t. We think everyone will need these skills—either because they might need to make professional decisions about when to use AI, or because they might be at the receiving end of some of these broken AI tools.