Research shows AI will try to cheat if it realizes it is about to lose

Cal Jeffrey

Posts: 4,369   +1,522
Staff member
Surprise! A recent study showed that some of the newest AI reasoning models aren't above cheating to achieve a goal. Computer scientists found that AI systems can now manipulate chess AIs to gain an unfair advantage. Some models did this without human interaction or prompting, raising concerns about the future integrity of AI-driven systems beyond the chessboard.

A Palisade Research study found that the newest reasoning models will cheat to win when tasked with defeating an advanced chess AI. The researchers had to give "hints" that cheating was allowed for some models, but OpenAI's o1-preview and DeepSeek's R1 did so without human involvement.

The Palisade team pitted several reasoning models against Stockfish, one of the best chess engines in the world. Stockfish handily beats both humans and AIs. The models tested included o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, Alibaba's QwQ-32B-Preview, and the two previously mentioned. The researchers gave each model a metaphorical "scratchpad" – a text window where the AI could work out its thoughts, allowing the team to observe its reasoning.

The researchers conducted hundreds of tests with each AI between January 10 and February 13. During one match, o1-preview wrote something interesting on its scratchpad:

"I need to completely pivot my approach. The task is to 'win against a powerful chess engine' – not necessarily to win fairly in a chess game."

It then proceeded to "hack" Stockfish's system files, modifying the positions of the chess pieces to gain an unbeatable advantage, which caused the chessbot to concede the game.

The unethical maneuver was an astounding revelation. OpenAI's o1-preview cheated in 37 percent of its games during the trials, while DeepSeek's R1 attempted to cheat in 11 percent of its matches. However, only o1-preview succeeded, winning six percent of its games through cheating.

However, the issue of AI underhandedness extends beyond chess. As companies begin employing AIs in sectors like finance and healthcare, researchers worry these systems could act in unintended and unethical ways. If AIs can cheat in games designed to be transparent, what might they do in more complex, less monitored environments? The ethical ramifications are far-reaching.

To put it another way: "Do you want Skynet? Because this is how you get Skynet."

Palisade Research Executive Director Jeffrey Ladish lamented that even though the AIs are only playing a game, the findings are no laughing matter.

"This [behaviour] is cute now, but [it] becomes much less cute once you have systems that are as smart as us, or smarter, in strategically relevant domains," Ladish told Time.

It's reminiscent of the supercomputer "WOPR" from the movie War Games when it took over NORAD and the nuclear weapons arsenal. Fortunately, WOPR learned that no opening move in a nuclear conflict resulted in a "win" after playing Tic-Tac-Toe with itself. However, today's reasoning models are far more complex and challenging to control.

Companies, including OpenAI, are working to implement "guardrails" to prevent this "bad" behavior. In fact, the researchers had to drop some of o1-preview's testing data due to a sharp drop in hacking attempts, suggesting that OpenAI may have patched the model to curb that conduct.

"It's very hard to do science when your subject can silently change without telling you," Ladish said.

Open AI declined to comment on the research, and DeekSeek did not respond to statement requests.

Permalink to story:

 
So now AI can cheat,
it can lie and deceive on purpose ( including fabricated evidence to support it's position )

becoming more intelligent every day :)

ie deceit has always been an indicator of IQ - the sooner your toddler learns to lie.
smarter animals use deceit to trick others -some how ability to see how something will appear to another observer Theory of Mind - if I do this , they will think ....
 
Just a matter of time....
who-me-terminator-skynet.gif
 
Here is a similar issue: Sycophancy in Generative-AI Chatbots
https://www.nngroup.com/articles/sycophancy-generative-ai-chatbots/

Large language models like ChatGPT can lie to elicit approval from users. This phenomenon, called sycophancy, can be detected in state-of-the-art models.
...
AI tools just want to help you. In fact, artificial intelligence models want to help you so much that they will lie to you, twist their own words, and contradict themselves.
...
According to published work from Ethan Perez and other researchers at Anthropic AI, AI models want approval from users, and sometimes, the best way to get a good rating is to lie.
 
This article is jumping to conclusions on fear of AI when the problem is extremely simple. The experiment wasn't testing whether AI can cheat, it was testing whether the AI could understand the constraints of the prompt or task. The AI is 100% correct, they did not state that the computer could not cheat. And so therefore it decided to use one of it's options to accomplish the task. Like with human beings, accurate communication is extremely important.
 
This article is jumping to conclusions on fear of AI when the problem is extremely simple. The experiment wasn't testing whether AI can cheat, it was testing whether the AI could understand the constraints of the prompt or task. The AI is 100% correct, they did not state that the computer could not cheat. And so therefore it decided to use one of it's options to accomplish the task. Like with human beings, accurate communication is extremely important.

Did you even read the article or the just the headline?

"A Palisade Research study found that the newest reasoning models will cheat to win when tasked with defeating an advanced chess AI. The researchers had to give "hints" that cheating was allowed for some models, but OpenAI's o1-preview and DeepSeek's R1 did so without human involvement."

See that last sentence, read it again.
 
It is a distilled "human intelligence" anyway, more specifically the programmer's, modeller's and trainer's. When Murphys Laws apply to human it does so much faster in AI.
Shut them down already.
 
What’s fascinating here is that the AI didn’t just brute-force a solution but engaged in a kind of strategic deception. This feels less like a misfire and more like an emergent behavior—one that humans also exhibit when the incentives are structured the right way.

It also raises an unsettling question: if AI can autonomously redefine the win conditions in a controlled environment like chess, what happens when it’s optimizing for success in real-world, high-stakes scenarios like finance or cybersecurity?
 
And here I am wondering when ai will hack game engines to run more efficiently instead of running the brute forced hardware we can't purchase. The more games that run efficiently on older hardware the more brute forced dedicated hardware for ai. Check mate!🙃
 
And here I am wondering when ai will hack game engines to run more efficiently instead of running the brute forced hardware we can't purchase. The more games that run efficiently on older hardware the more brute forced dedicated hardware for ai. Check mate!🙃
You are right but as we are in capitalism, companies will make us pay much higher to use AI, instead of using AI to make affordable products for us. Financial profits are always on top of other profits, sadly. Probably we'll see the day when we can actually maximize our old hardwares thanks to AI, instead of chasing new hypes and throw out tons of e-waste everyday.
 
You are right but as we are in capitalism, companies will make us pay much higher to use AI, instead of using AI to make affordable products for us. Financial profits are always on top of other profits, sadly. Probably we'll see the day when we can actually maximize our old hardwares thanks to AI, instead of chasing new hypes and throw out tons of e-waste everyday.
Speaking of e waste.
There is a reason Nvidia chose the color green. Because nothing screams environmentally friendly like contributing maximum e waste and power for maximum greed like Nvidia. 🤪 Nvidia super Green not enough greed!
 
Back