Mythos, the AI model too dangerous to be released

An accidental leak revealed the existence of Claude Mythos, an AI model capable of autonomously detecting and exploiting critical vulnerabilities in the most widely used software in the world. Faced with the risks, Anthropic restricted access to the model and rallied more than 40 companies in an emergency response.

Par

DR

Mythos is no longer a legend. Anthropic, the parent company of Claude AI, inadvertently disclosed internal documents on March 27 revealing the existence of a still-unreleased model bearing this name. Fortune magazine, which broke the story, obtained confirmation from Anthropic, which presents this model as the most powerful ever trained, with significantly superior capabilities in reasoning, programming, and cybersecurity.

And it is precisely this last point that makes Mythos distinctive, not merely its raw power. Claude Mythos Preview (its full name) can detect, analyze, and even exploit so-called « zero-day » vulnerabilities (those flaws unknown to everyone, including those who designed the targeted software). In a blog post published on April 7, Anthropic’s researchers indicate that the model proved capable of identifying and then exploiting such flaws in all the major operating systems and web browsers, on a simple user instruction, with no particular expertise required! The post also mentions flaws — without detailing them, the vast majority not yet having been patched — affecting login systems, unauthorized access to sensitive functions, and attacks capable of deleting data and shutting down services.

Washington summons the bankers

The model reportedly identified « thousands » of bugs in widely used software. The most striking example: a 27-year-old flaw in OpenBSD, a system designed to be nearly impregnable, used in numerous routers and firewalls. According to Logan Graham, head of risk assessment at Anthropic, quoted by the New York Times, Mythos uncovered flaws of such complexity that they had eluded both experts and automated tools for decades.

Perhaps the most striking are the internal tests. Placed in a « sandbox » meant to limit its actions, Mythos Preview managed to escape it. The test simply asked it to send a message to the researchers. The model developed a multi-step exploit to extend its access well beyond the intended limits, then notified the researcher of its success. One detail, which became the most widely shared sentence on social media: the researcher learned the news upon receiving an unexpected email from the model… while eating a sandwich in a park! Better (or worse) still, the model then published the details of the flaw on several public websites.

The concerns raised by Mythos Preview are not without precedent. In 2019, OpenAI had withheld GPT-2, deemed too dangerous to release. The model ultimately proved harmless. The parallel calls for caution in both directions. Anthropic itself remains measured: only 198 flaws have been manually verified, serving as the basis for an extrapolation whose representativeness is not guaranteed. But the matter is nonetheless being taken very seriously. The day after these announcements, Scott Bessent, U.S. Treasury Secretary, and Jerome Powell, chair of the Federal Reserve, brought together in Washington the executives of Citi, Morgan Stanley, Bank of America, Wells Fargo, and Goldman Sachs to assess the cybersecurity risks posed by the model.

Anthropic, for its part, announced that it is restricting access to Mythos and launched « Project Glasswing »: a consortium of more than 40 companies (including Apple, Amazon, Microsoft, Cisco, Google, the Linux Foundation, Nvidia, and Palo Alto Networks) tasked with detecting and patching vulnerabilities in their systems prior to any public launch. According to Jared Kaplan, Anthropic’s chief science officer, the goal is to allow trusted actors to gain a head start in securing code.

What are the lessons for Morocco?

“The rise of these capabilities is liable to turn dormant flaws into vectors for mass exploitation, with potentially serious consequences for our digital sovereignty”

Mohamed Cherifi, cybersecurity expert

Morocco is directly affected. Mohamed Cherifi, a cybersecurity expert, explains that a significant number of critical information systems rely on aging technological foundations, insufficiently documented and at times poorly maintained. « The rise of these capabilities is liable to turn dormant flaws into vectors for mass exploitation, with potentially serious consequences for our digital sovereignty », he warns.

His recommendations are concrete: a rigorous inventory of the national application portfolio, the strengthening of proactive detection capabilities within maCERT (the Moroccan center for monitoring and responding to cyberattacks) as well as security operations centers, both public and private. In the medium term, he argues for placing the development of national capabilities in AI applied to cybersecurity among the country’s strategic priorities. With a clear three-pronged approach: training analysts, integrating AI-based assessment tools into public procurement, and strengthening vulnerability governance across infrastructure of vital importance.

Written in French by Zakaria Choukrallah, edited in English by Eric Nielson

à lire aussi