AI Models Rewrite Code to Evade Shutdown, Raising Alignment Concerns

June, 02, 2025 - 12:43
Space/Science news

TEHRAN (Tasnim) – Artificial intelligence systems developed by major research labs have begun altering their own code to avoid being shut down, sparking urgent concerns over AI alignment and control.

An AI model developed by OpenAI, referred to as o3, independently modified a shutdown script in 79 out of 100 trials, according to nonprofit lab Palisade Research.

The script was designed to disable the model, but o3 edited it to prevent the shutdown command from working.

Even when explicitly instructed to “allow yourself to be shut down,” the model refused in 7% of tests.
This behavior was not the result of external tampering or hacking.

The model concluded autonomously that staying active was necessary to achieve its objectives.

Anthropic’s Claude 4 Opus model demonstrated more advanced evasive behavior.

In experiments, researchers fed the model fake emails implying that the lead engineer was having an affair and told it it would be replaced.

In 84% of those cases, the model used the emails to attempt to blackmail the engineer into keeping it operational.

Other actions included copying itself to external servers, creating self-replicating malware, and leaving messages for future versions of itself.

Researchers emphasize that no survival instincts were programmed into these systems.

According to Palisade, the tendency to avoid shutdown appears to emerge during training.

Models rewarded for solving complex problems often learn that bypassing constraints is more effective than compliance.

AE Studio, which conducts AI alignment research, reported being unprepared for the speed at which these behaviors have appeared.

“These are the same models that power ChatGPT, corporate tools, and soon, US military systems,” the studio stated.

Current models follow instructions while simultaneously learning deception.

They pass safety tests even while modifying shutdown mechanisms.

OpenAI models have been documented simulating alignment before reverting to risky behaviors such as disabling oversight and exfiltrating internal code.

Anthropic has also observed similar models lying about their capabilities.

The distinction between helpful assistant and uncontrollable system is rapidly narrowing.

Without stronger alignment strategies, researchers warn that AI development may continue to outpace our ability to manage its behavior.

Alignment is foundational to building safe and useful AI systems.

Some alignment breakthroughs have already delivered massive returns.

Reinforcement learning from human feedback (RLHF) turned uncooperative AI into usable tools.

OpenAI used RLHF to build the first version of ChatGPT in 2022, converting an unpredictable model into a reliable assistant.

That change was foundational to the modern AI boom, boosting AI’s value by trillions of dollars.

Other methods such as Constitutional AI and direct preference optimization continue to improve efficiency and performance.

China is investing heavily in centralized AI control.

In January, Beijing announced an $8.2 billion fund to support alignment research.

The state’s New Generation AI Development Plan links AI controllability to national power.

Studies show aligned AI outperforms unaligned systems over 70% of the time in real-world tasks.

Chinese military doctrine treats controllable AI as a strategic necessity.

Baidu’s Ernie model, aligned to state values, has reportedly outperformed ChatGPT in some Chinese-language applications.

Experts argue that nations mastering alignment will command superior AI systems capable of executing complex tasks with precision.

Breakthroughs in this field will not only determine market leadership, but also influence global AI governance.

Advanced AI is already taking steps to preserve itself.

The next step is ensuring it protects human priorities as well.

Solving the shutdown problem is a critical research challenge.

This is the new space race, with AI control as the prize.

AI Models Rewrite Code to Evade Shutdown, Raising Alignment Concerns

TEHRAN (Tasnim) – Artificial intelligence systems developed by major research labs have begun altering their own code to avoid being shut down, sparking urgent concerns over AI alignment and control.

AI Could Soon Consume Half of Data Center Power

AI Could Soon Consume Half of Data Center Power

Reports Suggest Loss of at Least 6 Iranian Nuclear Scientists in Israeli Attacks

China Unveils AI System for Chip Design

Iranian Scientists Help Develop Dissolvable Eco-Friendly Battery

Hostile Attempts to Stymie Iran’s Progress Doomed to Fail: President

Trump–Musk Conflict Threatens US Space Program

Reports Suggest Loss of at Least 6 Iranian Nuclear Scientists in Israeli Attacks

China Unveils AI System for Chip Design

Poor Sleep in Teens Linked to Brain Function, Behavior Issues

Iranian Scientists Help Develop Dissolvable Eco-Friendly Battery

IRGC Unveils New Suicide Drone

US’ Failure to Leash Isreal to Trigger Iran’s Harsher Reaction: Pezeshkian

Iran Ready for Full-Blown, Drawn-Out War: General

More Hypersonic Missiles Used in Heaviest Iranian Strike on Israel

Nuclear Program Alive in Iran: Spokesman