2026 · 04 · 189 min read

When the model is too good to ship

Anthropic stopped Claude Mythos at the lab door because it found thousands of zero days during evaluation. The lesson is bigger than safety theatre.

Anthropic published a short note this week. Claude Mythos Preview will not receive a general release. During evaluation the model identified thousands of zero day vulnerabilities, which triggered the company's ASL-4 safety protocol. The capability they shipped instead, Claude Opus 4.7, is excellent. The bigger story is the model that did not ship.

I have spent a decade in rooms where this conversation has the wrong cast. Researchers explain a capability. Lawyers explain a risk. Nobody in the room is the customer. The decision gets made on a Friday afternoon and the model goes out the door. Sometimes the math works out. Sometimes you read a Twitter thread on Monday morning and start composing a press release.

The Mythos decision was different in two ways that matter, and they have nothing to do with safety in the abstract.

The first thing that was different is the eval. Mythos was not benched on a public leaderboard before someone in the room asked the right question. It was put on a structured red team that included a deliberate attempt to find zero days in widely deployed open source software. The eval was not a benchmark. It was a scenario. The scenario produced a number that nobody at Anthropic was willing to defend in public, and the model stayed in the lab. The eval was designed to fire if the answer was bad. Most evaluations I see in production are designed to be passed.

The second thing was the policy. ASL-4 is not a vibes call. It is a policy with criteria and consequences that the company committed to in writing months before this model existed. When the criteria were met, the consequence was automatic. There was no Friday afternoon argument because the argument had been had in advance. This is the part most companies skip. They write the safety document for the board deck. They do not write the document that pre commits them to leaving money on the table when a specific number lands. Those are different documents and they have different costs. Most companies only have the first one.

What every team doing applied AI should put on their risk register this quarter is the same pair of things.

First, an eval that is designed to fire. Pick the thing that would be a five alarm fire for your customer and write the test that returns true when your model is too good at it. For a legal AI product that might be a test that the model can convincingly assemble a fraudulent immigration application. For a health AI product that might be a test that the model can produce a credible looking falsified lab result. For a coding agent it might be a test that the model can rewrite a build pipeline to silently disable security checks. You do not want to discover these capabilities the way most teams discover them, which is from a customer or a journalist. Run the test yourself, on every release, and treat the result as a numeric input to the ship decision.

Second, a policy that pre commits you to specific consequences when the eval fires. The policy needs to be specific enough that no human judgment is required between the eval result and the action taken. Specific to a numeric threshold, to a named consequence, and to a published recipient who finds out the moment the threshold is hit. This is how you remove the Friday afternoon meeting from the loop. The decision was made the month you wrote the policy. The eval just executes it.

The reason this matters more for applied AI teams than for Anthropic is that we have less margin to absorb a bad call. Anthropic took a hit on the Mythos decision in revenue and competitive position. They have the brand to survive it. A series A company that ships a capability it should not have shipped will not survive the press cycle. The right time to write the pre commitment policy is before you have anything to ship.

I have been writing a version of this policy for Panio's vet report assistant for the last month. The first draft was three pages and useless. The second draft was a single page with five numbered rules and the names of three people, including me, who would be paged the moment a specific automated test failed. The second draft is what governance actually looks like. The first draft was theatre.

If you take one thing from the Mythos note, take this. The ASL-4 protocol is not a safety story. It is a product discipline story. Anthropic decided in advance what would force them to stop, and then they stopped. Most teams I work with do not have that document. The cheapest insurance you will buy this quarter is to write it.

If you want a starting template, the structure I have been using is short. Name the eval. Name the numeric threshold. Name the consequence the moment the threshold is breached. Name the person who is paged. Sign it. Put it in version control. Re sign it every quarter. You will not need it most of the time. The one time you do need it, it will be the only document in the room that matters.

If this was useful, the weekly Brief covers shorter ideas like this every Wednesday.

Read the Briefs →