It all started innocuously enough, with Wombo, a text-to-image AI app that was all the rage in 2021. A friend of mine had been struggling with writing a novel, and to cheer her up, I “designed” a cover of her book, to make the goal more real. What the app generated was certainly a passable cover, potentially threatening the livelihood of cover designers and artists.
Still, I was not too worried about my own prospects. The arts were originally considered immune to the intrusions of machine intelligence. As someone who writes scripts, and works with artists to create graphic novels, I know how hard it is to bring about an alchemy of text and image. I had seen the gradual encroachment of machine intelligence into text—from auto-completion to email generators—but I felt that my little corner of the world would always be secure. Now, however, with AI image generators in art and ChatGPT in text, it seems that this immunity is eroding rapidly.
It was this feeling that made me apply for a workshop in Berlin, run by the Sound of Contagion group, for musicians, writers, and illustrators. The Contagion group, supported by the University of Oxford, is an initiative which “explores how AI, music composition, and narrative theory intersect to help us think in more innovative ways about how creativity and technology can be used to mutual benefit.”
During the COVID-19 lockdown, while most of us were binge-watching or learning to make pourover coffee, the Contagion team was busy “curating the dataset, which is made up of literary texts, from the last 2,500 years of pandemic literature” and then running them “into our algorithm and [mining] the output for narrative probes”. These have then been collated into a multi-layered short story which is at times “comprehensible and literary, at others confusing nonsense”. The workshop for which I applied and which took place over three days in May 2022, used this as a base to continue the exploration into AI as an artform.
I make my way to central Berlin, where the workshop venue, in the campus of Universität der Künste, is located. It is a beautiful, early summer day, as I walk past the green expanse of the Tiergarten. There is a certain tension in the air though; the Hauptbahnhof had been choked with Ukrainian refugees fleeing the war. But there are no traces of conflict in the graceful campus of the UdK, usually referred to as “uudeekaa.”
We meet in what used to be the library, but there are no books, only empty white shelves, as if they have all been sublimated into the datasphere. Someone has scrawled on a whiteboard, “Has the Singularity already begun?” The participants are a bright young lot, in T-shirts and scruffy sneakers, headphones slung over necks, and toting laptops covered in counter-culture stickers. One of the participants exuberantly shouts, “Where is the coffee?”
Under the guidance of Sara, an artist, we are soon plunged into the arcana of VCLIP & Disco Diffusion, software that uses AI techniques to generate images based on text inputs, called “prompts”. Vast amounts of images are “scraped” from the Internet to form the “training dataset”. To give an idea of the numbers, a commonly used dataset is LAION-5b which has approximately 5.8 billion images. I imagine a thief in a vast library rifling through volumes of art, tearing out pages of illuminated art and throwing the volumes to the ground. I think of Mallarmé, who once said, “Everything in the world exists in order to end up as a book.”
Once the datasets have been harvested, the AI is trained on them, like a child at the hands of an unforgiving teacher, through a process of nested feedback loops, that take it from image to noise and back.
I soon realise that this process is utterly unlike what would be a human approach, viewing as it does, the world as a kind of possibility space. For instance, my mind is suitably blown when Rob, the team leader of the sound team, explains how AI composes music, “Sometimes AI analyses music by categorising every sound according to thousands/ millions of parameters. It then creates a multidimensional space, where each parameter is a dimension, so each sound can be found by giving its ‘coordinates’ within this space. You can then instruct it to take a certain route through this space.”
And that is not all. The computer does not even need to “hear” a single chord: “You can also train AI on spectrographic images of music, so it learns how music looks but not necessarily the temporal direction in which it is played. You could feasibly train an AI on these images, then have it realise music backwards, down-to-up, or any other orientation that isn’t left-to-right,” says Rob.
I had read an article on how, in the 17th century, thinkers like Bacon and Leibniz, with assumptions on the basis of their world rapidly unravelling, “used monsters and other marvels as a kind of intellectual hygiene to jolt people out of their assumptions about the natural world.” In the same spirit I decide to generate a grimoire of monsters for this new age. I find out soon that the beast is highly temperamental, with my experiments crashing repeatedly, and the machine telling me, “Your session crashed for an unknown reason.”
As I tap out different prompts, wording and re-wording them to coax the machine, I realise that it is an idiot savant. My initial goal was to have all the art in a consistent Mughal miniature style, but as not enough Mughal or Indian art is present in the training data, and as it regurgitates what has been fed into it, I cannot make any headway. Because of this limitation, AI art has been criticised for producing a “generic average” of anything it has consumed. There are also odd quirks: I soon learn that the machine rarely gets the number of fingers right on a human, or simply melds them together in a fleshy stump.
Around me, demands for faster Wi-Fi, stronger coffee, for more processing power resound. I saunter around, looking at the other creations, feeling like a visitor to a bazaar of magicians. Andreas, originally from Mexico, shows me an uncanny automated fake news generator he has built, complete with a replica of the New York Times front page. When I ask him for some gyan on prompts, he explains the concept of Lemmatisation, derived from “lemma”, the dictionary meaning of a word or concept.
He explains it is like peeling an onion, peeling a word to the meaning at its heart. For instance, to make the machine understand that “German Shepherd” and the phrase “man’s best friend” most probably refer to a dog, but hotdog does not.
- The process of generating AI art is utterly unlike what would be a human approach, viewing as it does, the world as a kind of possibility space.
- AI art has been criticised for producing a “generic average” of anything it has consumed.
- I log into the Midjourney’s Discord server; the bot issues instructions, I have to go to a “newbie” channel, and then “Type /imagine and then do whatever you want—The bot will send you 4 images in 60 seconds.”
- After the first halting efforts, I find just the right phrases that can summon the ghost in the machine. The idea of experimentation, taking an artistic chance, is now devalued as the engine spits out variations mindlessly and endlessly.
In another corner, a programmer has made clips of music “alive”, and made them fight each other and survive through evolutionary impulses—the fittest survivors form a piece of music at the end. We all take turns listening. I politely applaud though I think it sounds exactly like dial-up modems from the 1990s, all firing in symphony.
A willowy English girl sets up an alternate reality travel bureau using Lisbon as the base, with machine-generated excursions into futures that never could be and pasts that can still happen.
I strike up a conversation with Mahir, originally from Turkey, whose specialty is projection mapping. He explains it as a way of projecting video onto surfaces, be it household objects or a façade of a building, and turning them into displays.
In shorts and a ponytail, like all the arty Berliners, he is fit, thanks to them as a class eschewing cars and taking up bicycling. Mahir is collaborating with some of the musicians in the group. He gives us a demo. A violinist volunteer from the music school plays a piece. Another program uses that input to generate a continuation, while Mahir uses the music to generate a recursive stream of images and project them onto the wall.
After a few glitches, everything clicks, and a river of sound and colour begins to flow. Due to the projection mapping, we see a mannequin, with lights flowing through it like digital blood. Wow, I say. He modestly shrugs it off. He has a powerful system at home, which means he has access to a massive database of images. He explains that Refik Anadol, a Turkish artist, is a pioneer in these techniques. I later recognise the name (Refik Anadol) in the prompts, like a well-known sorcerer invoked in a propitiatory rite.
Breaks in time
For breaks we repair to the courtyard, sitting on the grass in the quadrangle, nursing bottles of Club-Mate, concocted by some Bavarian distiller out of yerba and baking spices.
The quadrangle bears the traces of war—roofless and surrounded by blast-blackened walls overgrown with ivy. I wonder if this, the first post-pandemic summer, is the last pre-nuclear war summer in Europe.
Back in India, I wonder about using what I have learnt in my own “practice”, and soon enough an opportunity arises. “We need references for the background elements,” says graphic artist Harsho Mohan Chattoraj. I am working with him on a graphic novel featuring the adventures of the visionary American writer Howard Phillips Lovecraft (HPL), set in India. Lovecraft created a vast private mythology, a universe filled with colossal uncaring entities where all of humanity is merely a mote of dust floating in eternal night. His brand of cosmic nihilism seems to be an odd match with AI but the French writer Michel Houellebecq compares his writing to that of architecture, which “like that of great cathedrals, like that of Hindu temples, is much more than a three-dimensional mathematical puzzle… it is entirely imbued with an essential dramaturgy that gives its meaning to the edifice.”
Chattoraj’s muscular, realistic style needs references, such as archival photographs or paintings, especially as ours is a period piece, set in the 1930s. One of our scenes has a character marooned on a mysterious island in the Andaman Sea. This seems to be a perfect opportunity to put my newfound skills to use. This need also coincides with the rise of MidJourney, a new app which directly takes text prompts without a cumbersome interface. I want the buildings on this island to resembles the skeletons of monsters—towers that look like skulls and so on.
I log into the Midjourney’s Discord server; the bot issues instructions, I have to go to a “newbie” channel, and then “Type /imagine and then do whatever you want—The bot will send you 4 images in 60 seconds.” I watch an endless wash of prompts cascade down the screen—all those who have logged in can see what other users are demanding and what the machine serves up.
A user wants a “tiny cute grape plush stuffed toy, anthropomorphic, surrounded by vines, Pixar style eyes” while another seeks “Prussian chancellor Otto von Garfield”. The demands get ever more complex, “Girl looking at her reflection in the mirror as she dissociates, cinematic lighting, 8k ultra fine grained” going on to “Female concept character [young + tom boy + super short curly ginger hair + beautiful face covered in freckles + tank top + baggy jeans + no makeup], direct sunlight, very detailed, maximum texture, artstation, illustration by Raoul Buzzelli + Willie Real + Peter Mohrbacher + Moebius + Alphonse Mucha”.
To me, these prompts have a strange magical cadence, a chant, word made flesh, especially as within seconds of their invocation, perfectly formed images appear. The user than can choose to have upscales or variations on what the machine has spat out. It has been said that knowing the name of something is semantic knowledge, while knowing how to do things is procedural knowledge; magic makes them indistinguishable, to know the name is to control what it is. Similarly, these endless streams of incantatory prompts, invoking a spirit to straighten the dog’s tail. The critic Erik Hoel envisages a future where art is generated by non-conscious machines with little or no human input, resulting in a situation where “behind the entire aesthetic of our civilization there will be a vast emptiness, a void communicating nothing.”
I key in, and after the first halting efforts, I find just the right phrases that can summon the ghost in the machine. I want a grand edifice like the “the skull of a deep-sea creature, realistic”. Within seconds we see the output—first blurred and then sharply falling into place. I pass on the image to Chattoraj and he immediately incorporates it into a panel, with his own distinctive touches. It is exhilarating to see this interplay between minds—machine and human. “Too good” he says, adding ruefully, “there goes any chance for us artists.”
“To me, these prompts have a strange magical cadence, a chant, word made flesh, especially as within seconds of their invocation, perfectly formed images appear.”
Emboldened, I set Lovecraft walking the streets of Calcutta, a wraith amongst wraiths. The city wavers and trembles, and the minars which the machine conjures in some Orientalist fever dream, elongate into tentacles, reaching up to pull the sky down. At the end, a giant cephalopod regards us with a cyclopean gaze.
HPL conceived of a class of horrifying extradimensional beings, “shadows that stride from world to world to sow death and madness.” I subject the Howrah Bridge to an attack from these entities. The computer chooses a sky made of roiling clouds, as if concealing something in its green, seething depths. In doing so, it accurately translates the author’s vision of extradimensional entities which are so hideous that their mere sight would drive the onlooker mad. The machine seems to sense it, and only hints at the shape behind the furiously roiling clouds.
“It feels bad,” Chattoraj says with a laugh, “these are extraordinary images, and the machine does it in a second, we put in so much time and effort.” He points out that the idea of experimentation, happy accidents, taking an artistic chance, is now devalued as the engine spits out variations mindlessly and endlessly.
Jaideep Unudurti is a freelance journalist and graphic novelist.