“She had the kind of walking that made benches become men.”
Take a moment to read and enjoy that sentence. Roll it around your mouth. Now take a look at it again and try and work out what, if anything, it actually means. Could it be that it doesn’t mean anything at all, because it was written by something that doesn’t have the faintest idea what it is like to walk, what a bench is, or indeed how something might become a man?
Granta recently published the top-rated entries in the Commonwealth Short Story Prize, an annual award given to fiction from five regions (Africa, Asia, Canada and Europe, the Caribbean, and the Pacific) – the opening line of this piece is taken from the winner of the Caribbean prize, a story titled The Serpent in the Grove by Jamir Nazir. Or at least that’s the name attached to the submission.
Within a few days, speculation was rife on social media that the story had been penned by AI. Ethan Mollick, a professor at the University of Pennsylvania, widely considered to be something of an expert on the technology, posted to Bluesky that “in a Turing Test of sorts, it looks like a 100% AI-generated story just won the Commonwealth Prize for the Caribbean region,” and that he had run it through best-in-class AI detection software, Pangram, which had flagged it as “100% AI-generated”. Beyond that, Mollick continued, “if you know, you know.”
Suggested Reading
Will we all end up as Chinese AI death clones?
In a slightly bizarre twist to an already-odd story, Granta hit back. The story, and a couple of others from the same contest that had also been the subject of “AIvestigation” from the online crowd, would remain online, because while “we showed Claude.ai the story and asked whether it was AI-generated… [its] response was long, concluding that it was ‘almost certainly not produced unaided by a human’.”
“The AI-generated critique of these Commonwealth writers – more than one has been accused of basing their story on AI material – may conceivably itself reflect AI bias.” As such, Granta concluded, the stories would stay on their website until the Commonwealth Foundation came to a definitive decision on the pieces’ authorship.
This illustrates one of the main, tangible results of the AI boom – we are losing the ability to distinguish between what has been made by man and what by machine.
Online AI detection software is widespread, and improving, but it is not, whatever its promoters may claim, 100% accurate. Claude is not capable of telling you, definitively, whether a piece of text was written by Claude. Even the founder of the aforementioned “best-in-class” AI detector Pangram admits that the software “sometimes makes mistakes”, and that “we still don’t fully understand how to precisely measure how much AI altered [an] original text” – meaning the extent to which work has been amended or modified by AI remains impossible to determine.
When an AI detector analyses text to determine whether it’s AI-generated, it’s not accessing some secret metadata visible solely to The Machine. LLM-generated text doesn’t come with a watermark that can be scried with the right lens. All these tools are doing is comparing the text they are fed with an internal idea of what AI-generated copy is like – vocabulary, structure, etc – and determining the degree to which there are similarities.
Suggested Reading
It’s official: AI hates women
Specifically, this tends to focus on two specific qualities of a text – “perplexity”, a measurement of the randomness of the text, and “burstiness”, a measurement of the variation in perplexity – and the extent to which a document displays those qualities. More perplexity and more burstiness should, in theory, mean the words are more likely to be human-penned. Pangram employs a slightly different approach based on extracting patterns with text and comparing them with known patterns in AI-generated copy, but the principles are similar.
Unfortunately, though, they are far from infallible. A 2023 University of Maryland paper argued that AI-text detectors are unreliable in practical settings and can be evaded by paraphrasing. OpenAI withdrew its own classifier in 2023 because of its “low rate of accuracy,” saying it correctly identified only 26% of AI-written text, while falsely flagging human text 9% of the time. The tools have improved, but they are by no means perfect – and that means certainty is out of reach when it comes to a text’s provenance.
In the intervening years, the quality of models’ prose output has improved dramatically, to the point where a parallel class of AI tools has sprung up that are designed to “de-AI” machine-generated texts to avoid detection. They do this by upping the perplexity and burstiness of the sentences.
Which means, fundamentally speaking, there’s no meaningful way of guaranteeing that words have been written by a human any more unless you’re standing behind them and watching them type.
Suggested Reading
Don’t worry – the robots aren’t coming for your job
This detection problem applies to images and video, too. There are a few different technologies in play when it comes to watermarking AI-generated visuals, but many of them can be easily removed with either software or simply by making a copy of the image or video in question.
The best-in-class is a technology called SynthID, which Google applies to all images created with its tools, but it’s not yet available to any other platforms as yet (although OpenAI has plans to integrate it), and only works on Google-generated AI pictures (not video or text), making it significantly less effective at scale.
Reassuring as it might be to believe that there’s an effective arbiter of what is real and what isn’t, the sad fact is that there simply isn’t a reliable way to tell any more, other than with your own eyes and your own research. Oh, and if you think you can always spot AI-generated copy because there are tell-tale signs, then I have bad news for you: some people just write like that.
When it comes to The Serpent in the Grove, the terrifying fact is not so much that it might have been machine-penned; it’s perhaps that a real, apparently human judge was able to read prose like “The shelf didn’t look like freedom – she couldn’t afford that word yet. It looked like not dying,” and think “yep, that deserves a literary award!” Maybe we deserve the slop.
Matt Muir is writer of the webcurios.co.uk newsletter on tech and the internet
