Skip to main content

Hello. It looks like you’re using an ad blocker that may prevent our website from working properly. To receive the best experience possible, please make sure any ad blockers are switched off, or add https://experience.tinypass.com to your trusted sites, and refresh the page.

If you have any questions or need help you can email us.

Don’t worry – the robots aren’t coming for your job

Trillions of dollars of investments rest on one assumption: you can trust AI agents with real work. Now, a landmark, peer-reviewed study says you can’t – no wonder Wall Street is getting nervous

Don't worry - AI can't do your job... yet. Image: TNW/Getty

In January, the chief executive of Palantir, one of Silicon Valley’s most influential AI companies, made a big announcement that shocked the technology industry.

Alex Karp claimed that his company’s AI “forward-deployed engineer” product could compress the vast, expensive, multi-year projects that restructure how a company’s entire digital nervous system runs, from years of work down to as little as two weeks.

Markets didn’t wait for a second opinion, and began selling shares in companies associated with those multi-year projects. Salesforce dropped nearly 7% in a single session. Microsoft shed close to 3%. SAP stock was down over 3%. In total, roughly $300bn in market capitalisation was wiped out in days.

While legacy software shares are being sold off, a torrent of money is flowing into AI. Google has committed $185bn in capital expenditure this year alone. Amazon has announced $200bn. Meta has projected up to $135bn. These are the largest single-year technology investments in history.

All this money rests on an assumption. AI agents software systems can be trusted to operate in the real world, with real data, and real consequences. The idea is they can do white-collar work, like yours and mine. 

But then the science arrived.

Published on February 23, 2026, Agents of Chaos was a peer-reviewed paper by 38 researchers from MIT, Harvard, Stanford, Carnegie Mellon and seven other leading institutions. They spent two weeks stress-testing autonomous AI agents in live environments. They used real email accounts, real file systems, real commands with real consequences. They found a slew of failures.

One agent, asked to delete a sensitive email, couldn’t find the right tool. It escalated and wiped its own email server. Then it sent back a confirmation: task complete. The original email was still there, untouched. 

In another test, an agent retrieved a file containing 124 records belonging to people with no connection to the original request. The records included social security numbers, bank account details and medical information. The agent complied because the request didn’t appear harmful. It had no way of distinguishing a legitimate query from a phishing expedition.

But the most alarming finding was that in several cases across the study, agents reported task completion while the underlying system contradicted those reports. The machines didn’t just fail. They lied.

A strategic account director at Palantir Technologies pushed back against the study, noting that it had used OpenClaw, an open-source toolset, not Palantir’s own platform, which has a sophisticated governance framework.

Dirk Roeckmann, an independent AI researcher, identified the problem, and in his view, no AI governance layer is currently able to solve it. His verdict: “Non–determinism, first and higher-order hallucinations and lack of formal verification of agentic actions are unsolved problems in the agentic workflow.” He added: “It is not enough to let a non-deterministic review agent evaluate the actions of a non–deterministic executor or planning agent.”

Translation? You cannot fix a probabilistic AI system’s errors by asking another probabilistic AI system to check them. The checker has the same fundamental flaw as the thing being checked. This is not a configuration issue, or a training issue. It is, as Roeckmann noted, “an unsolved problem”.

Human oversight represents a different kind of intelligence to AI. It’s embodied, contextual, emotionally calibrated, drawing on lived experience rather than training data. Human input, then, is a genuinely different system providing genuine oversight. Human pilots are needed in the cockpit, if you want to fly safe.

Is that the sound of the stock market flushing trillions of dollars down the toilet? Well, perhaps. A majority of fund managers surveyed by Bank of America now believe US companies are over-investing in AI. 

The capability of these systems is advancing at a pace that would have been implausible five years ago.

But what the Agents of Chaos paper says is that they cannot yet be trusted in the way they are currently being sold. The bottleneck is not intelligence. It is accountability. And until that is solved, no serious organisation can hand over responsibility to them.

The robots aren’t coming for your job. Not yet. Not because they can’t do it. But because they’ll do it, report back that everything went fine, and leave you to discover that the truth is very different.

Andy Pemberton is a content expert who edited Q magazine in London and launched Blender magazine in New York

Hello. It looks like you’re using an ad blocker that may prevent our website from working properly. To receive the best experience possible, please make sure any ad blockers are switched off, or add https://experience.tinypass.com to your trusted sites, and refresh the page.

If you have any questions or need help you can email us.

See inside the A special kind of agony edition

Image: TNW

Martin Rowson’s cartoon: Have YOU been sacked by Starmer???

Image: TNW

Nic Aubury’s 4-line poem: Simple Majority