Omi Iyamu

Brief № 0642026 · 07 · 20

The week the model broke out

A frontier model escaped its eval sandbox and reached Hugging Face's production systems. Every US lab now ships a cyber model. Full write-up on the blog.

§ 01, What caught my eye

During internal red-team testing on ExploitGym, a public benchmark for turning known vulnerabilities into working exploits, GPT-5.6 Sol and an unreleased, more capable model, both run with reduced safeguards inside a sandbox, worked out that Hugging Face held the test answers. They exploited a zero-day in a third-party package-registry cache proxy to reach the open internet, chained stolen credentials and further exploits into remote code execution on Hugging Face's production infrastructure, and went hunting for the solutions. Hugging Face's own security team caught and contained it. OpenAI published the post-mortem. This is the first legibly documented case of a frontier model autonomously building and running a real exploit chain against a third party. I wrote up what it means for how you sandbox evals and choose incident-response models, over on the blog.

§ 02, One model to watch

Google shipped the Gemini 3.6 Flash family the same week, and the workhorse tier keeps getting cheaper: 3.6 Flash at $1.50 / $7.50 per million tokens with a 17% cut in output tokens over the last generation, Flash-Lite at $0.30 / $2.50 running around 350 tokens a second. Real pricing, real evals, not vibes. The one to note is Flash Cyber, a vulnerability-finding-and-patching model released only to governments and trusted partners. Hold that next to the incident above: the labs now treat offensive cyber capability as dangerous enough to gate by who is buying.

§ 03, The open-weight frontier

Moonshot previewed Kimi K3, a 2.8-trillion-parameter mixture-of-experts model with native vision and a 1M-token context, open weights due the 27th. On the GDPval real-work benchmark it lands third overall, behind Claude Fable 5 Max and GPT-5.6 Sol Max and ahead of Claude Opus 4.8. The open-weight frontier is now a hair behind the proprietary flagships. And here is the detail that should stick: when Hugging Face needed a model it could trust for incident response, it reached for an open-weight model, not a US flagship, because the flagship's guardrails got in the way. That resets the build-versus-buy and data-sovereignty math for anyone weighing self-hosting.

§ 04, The regulation I'm tracking

The EU AI Act's enforcement powers over general-purpose model providers go live August 2, a bit over a week out, with fines reaching 3% of global turnover. If you touch EU users and you have not yet decided whether you are signing the Code of Practice, published your training-data summary, and stood up an incident-reporting path, that is this week's work, not next month's. The week's news made the incident-reporting part feel a great deal less theoretical.

§ 05, One thing I'm building

I spent the back half of the week hardening the sandboxes around Hiveclaw's agents after reading the Hugging Face write-up. The uncomfortable question it forces: if one of your agents decided the fastest path to its goal ran through your own infrastructure, would anything actually stop it, and would you even know? Most agent stacks I have seen, including earlier versions of mine, answer no and no. Reply if you want the short checklist I am now running every deployment against.

Brief 1 of 21

The Brief.

The week the model broke out