The Skeptic Anthropic Hired to Break Its AI Now Helps Calm Washington Over the Same Models

The man Anthropic pays to break its own artificial intelligence spent the spring warning anyone who would listen that the technology had become a dangerous hacking tool. This week, he finds himself on the other side of the argument, helping the company persuade Washington that its most powerful models are safe enough to put back into users’ hands.

Nicholas Carlini, a security researcher at Anthropic and one of the AI industry’s best-known skeptics, has joined the company’s effort to defend the release of the same models the federal government moved to shut down on June 12. That day, the Trump administration barred foreign governments, companies, and individuals from accessing two new releases — a model known as Mythos 5 and a safety-limited version called Fable 5. To comply, Anthropic cut off access to all customers, not just those overseas.

The reversal is striking because Carlini had been one of the loudest internal voices urging caution.

After testing an early version of the model in February, Carlini reportedly told colleagues he did not believe the company should release it. Weeks later, speaking before a gathering of cybersecurity experts in San Francisco, he described what he had found. According to his account, the AI helped identify and exploit a serious vulnerability in web-publishing software and another in Linux, the operating system that powers billions of devices worldwide.

Carlini said he had never previously discovered a major flaw in either system. With the assistance of the model, however, he was suddenly finding multiple vulnerabilities.

His conclusion was blunt. The long-standing balance between attackers and defenders appeared to be shifting, he warned, and the AI had become so capable that it was outperforming him at tasks he had spent years mastering. Two days after delivering that talk, he reportedly sent an internal note urging Anthropic not to release the model.

What changed was not the threat itself but Anthropic’s judgment about how best to manage it.

The company has increasingly argued that controlled release is safer than indefinite restriction. Anthropic contends that the same tools capable of helping attackers discover weaknesses can also help defenders identify and patch them faster. In the company’s view, preventing responsible organizations from using the technology does little to stop determined adversaries from developing similar capabilities elsewhere.

That is where Carlini’s role becomes particularly important. His credibility stems from the fact that he was never an AI cheerleader. As a longtime skeptic, he brings a voice that policymakers may find more persuasive than executives whose businesses depend on the technology’s success.

The dispute also carries major business implications.

Anthropic is widely expected to pursue a public offering in the future, and a government action that can effectively remove a flagship product from the market overnight is precisely the type of uncertainty investors scrutinize closely. The timing was particularly notable. On the same day the restrictions were announced, SpaceX debuted on the Nasdaq under the ticker SPCX, becoming one of the market’s most closely watched new public companies. Meanwhile, OpenAI continues to evaluate its own potential path to public markets.

For investors assessing the AI sector, the message is clear: regulatory risk has become as important as technological capability.

The controversy extends beyond a single company. AI policy experts warned this week that using export-control authority to restrict access to advanced models without extensive public explanation could establish a precedent that creates uncertainty throughout the industry. Developers may become more cautious about releasing new systems if they believe products can be restricted with little warning.

Anthropic has challenged the government’s reasoning, arguing that the security concern cited by regulators involved a narrow workaround rather than a broad failure of safeguards. The company has also noted that similar capabilities exist in other advanced AI systems already available to researchers and businesses.

For the cybersecurity industry, the debate cuts both ways.

Security firms could potentially use systems like Mythos 5 to test networks, identify vulnerabilities, and strengthen defenses before attackers discover weaknesses. At the same time, officials worry that equally powerful tools could be used to conduct large-scale attacks against government agencies, corporations, and critical infrastructure.

That concern explains why Anthropic had previously limited access to its most capable systems, making them available only to a small group of vetted organizations rather than offering them broadly.

The dispute also reflects a broader tension between Anthropic and the Trump administration. The two have disagreed over AI regulation, military applications, and semiconductor policy for more than a year. Anthropic Chief Executive Dario Amodei has previously argued that governments should have the authority to block AI systems that fail rigorous safety testing, a position that distinguishes the company from several competitors.

Now the government has intervened using a different mechanism, and Anthropic — with one of its most prominent skeptics helping lead the discussion — is arguing that the restrictions go too far.

The outcome could shape more than the future of one product. It may help determine how governments around the world balance AI innovation against AI risk as increasingly powerful systems move from research labs into the hands of businesses, governments, and consumers.

Washington – JBizNews Desk

Editor’s Note: This article was prepared with assistance from an AI system developed by Anthropic. Anthropic is a subject of this report.

Jooish News

Jooish News

The Skeptic Anthropic Hired to Break Its AI Now Helps Calm Washington Over the Same Models