Spat between Scarlett Johansson and Open AI raises concern about digital generative practices

The tech company allegedly used a soundalike voice of the star for its latest ChatGPT feature.

Scarlett Johansson voices the operating system Samantha in Spike Jonze's Her (Photo: Warner Bros)

When Spike Jonze’s drama Her was released in 2013, it completed a commentary on artificial intelligence (AI) that was missing: Technology is going to upend our relationships long before it remakes our economies and jobs. In the sci-fi romance, Joaquin Phoenix (pictured above) plays
Theodore Twombly, whose name is a mashup of an American president and a painter, but this wounded soul, divorced and depressed, is a nobody. Filling his void is Samantha (voiced by Scarlett Johansson), who is witty, sweet and quite literally super smart because, well, she is an operating system on our protagonist’s smartphone.

Jonze’s depiction of a futuristic society, where digital affairs are as sensual and heartbreaking as the real thing, is more than a postmodern tale of a man with a tidy moustache falling in love with his disembodied virtual assistant. Rather than investigating whether machines crave higher consciousness, this coming-of-age story is an exercise to find out whether humans are still able to feel.

It was thus a tad disconcerting to hear that the voice that serves our hero’s needs and earns his love in Her sounds alarmingly similar to “Sky”, one of five choices for GPT-4o’s voice assistant that can interact verbally with users and respond to images shown through the cameras of a device. Those who caught the live demonstration from OpenAI, the company that makes ChatGPT, were quick to identify that Sky sounded a lot like Samantha, which is to say, like Johansson.

The backstory, according to the actress’ lawyers, goes like this: Days before OpenAI introduced its flirty chat assistant during a polished premiere in San Francisco, Johansson’s team was asked by the tech company’s CEO Sam Altman whether she would consider licensing her voice for a new conversation feature in ChatGPT. Just two days before its keynote event, Altman once again reached out, urging her to reconsider. The reply both times was “no”. Despite the refusals, OpenAI debuted the heavily promoted Sky, to which Johansson reacted with “shock, anger and disbelief”.

scarlett_johansson_credit_reuters.jpg

Johansson reacted to the incident with “shock, anger and disbelief” (Photo: Reuters)

“He told me he felt that by my voicing of the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI,” she rebukes in a lengthy statement. “He said he felt that my voice would be comforting to people.”

Confounding, perhaps, is the better description here, as even her “closest friends and news outlets could not tell the difference” between the original and the alleged imitator. Exacerbating the snafu was a cheeky, one-word statement by Altman on X: “Her”, implying that he was fully aware of the reference and even proud of the resemblance between Sky’s voice and Johansson’s. As a fan of the “incredibly prophetic” film, the tech wunderkind, ironically, missed the plot.

After receiving legal letters from Johansson’s team of representatives, OpenAI eventually suspended Sky, claiming that the voice is not an imitation of the celebrity but belongs to a different professional actress using her own natural speaking tone. To be fair, if one examines closely, the clone in question, which can narrate bedtime stories and analyse facial expressions, lacks Johansson’s signature smoky rasp, but hints of the playful lilts and cadences she used while playing Samantha were definitely evident.

Say we are being cynical and that Altman, the most visible face of the AI movement, should be given the benefit of the doubt. But if everything OpenAI has claimed is accurate and there was no intent for Sky to mimic the actress, why was the company adamant on negotiating with her at the eleventh hour? It should know better than to flout US copyright law, which stipulates that the end result of a product does not need to be a clear-cut impersonation or a replica to be actionable. A person’s right of publicity can be deemed violated when one’s name, image or likeness are used without consent to promote a commercial business. There is a precedent for a lawsuit from a case brought by singer Bette Midler against Ford Motor Co in the 1980s, in which Ford used an impersonator to replicate her singing voice in a commercial. The Grammy award-winner won in the US court of appeals.

chatgpt-40_demo.png

GPT-4o’s voice assistant that can interact verbally with users and respond to images shown through the cameras of a device (Photo: Open AI)

In numerous interviews, Altman has routinely suggested that AI may eventually become a means of universal basic income, in which “everybody gets a slice”. Ushering in an era of shared prosperity while leaning into the idea of the internet as a rounded, humane ecosystem sounds promising. When ChatGPT first emerged, it helped white-collar workers tackle rote tasks that AI could likely learn to do far more efficiently. But this latest saga, another sign of eroding trust in OpenAI, taps into something more visceral — a fear that language models are taking away the very thing that defines us as humans: our voice.

We shudder to think this is the mindset that lays the bricks for the towering goal of achieving artificial general intelligence (AGI), which is a human level of intelligence. An AGI made safe would be an alluring boon to a civilisation or, in Altman’s words, “a way to turbocharge the economy”, but such a data-hungry tool should not be seen as an overarching panacea to our economic or social woes.

While copyright lawyers and music industry professors have deemed this latest scandal “a miscalculation”, others have noted OpenAI’s behaviour as so manifestly flippant and thoughtless that they suspect the whole fiasco may be a deliberate stunt. After all, the world is talking about them now. The controversial phrase “sounds like me, but it’s not me” is something Malaysians are also well acquainted with, surely.

Part of what makes Her relatable is how melancholy is rooted in the ways personal and technological development can pull people together or push them apart, sometimes in unsettling ways. More poignantly, though, is how the interaction between boy and bot in the film rekindles our desire to connect and establish intimacy. It will be a tall order for ChatGPT, which references the film so much at this point, to ameliorate emotional challenges such as loneliness and social isolation.

Sky, now retired, was prone to breaking into giggles to appear more humanlike. But we all know that the might of the big machine, which stirs one of the tech world’s testiest debates on usurping human control and reasoning, is no laughing matter.


This article first appeared on June 3, 2024 in The Edge Malaysia.

 

Follow us on Instagram