Long‑form pieces on AI, leadership, and the slow work of turning research into product. No set cadence, these go up when they're ready.
Anthropic shipped Claude Fable 5, the first publicly available Mythos-class model, paired with a restricted twin called Mythos 5 deployed via Project Glasswing with the US government. Fable 5 ships with a runtime classifier that routes prompts in cybersecurity, biology/chemistry,
Anthropic published a position piece signed by Marina Favaro and Jack Clark arguing that recursive self-improvement is closer than the industry openly discusses, and proposing a verifiable coordinated pause as a tool frontier labs should be willing to deploy. The most-cited numbe
Anthropic expanded Project Glasswing to about 150 additional organizations across more than 15 countries on June 2, widening access to Claude Mythos Preview — its most capable model, held in controlled research preview because of offensive cyber capabilities. New sectors include
Anthropic released Claude Opus 4.8 on May 28, 41 days after Opus 4.7, at the same $5/$25 per million tokens. SWE-bench Verified moved to 88.6, SWE-bench Pro to 69.2, and GDPval-AA to 1890. Fast mode runs at 2.5x output speed and is 3x cheaper than the prior Opus fast tier. The re
Alibaba released Qwen3.7-Max as an agent-first model with a 1M-token context, native support for the Anthropic API protocol (so it drops into a Claude Code harness), and benchmark wins including 92.4 on GPQA Diamond, 41.4 on HLE, and $2.08M of simulated revenue in YC-Bench. It is
Anthropic stopped Claude Mythos at the lab door because it found thousands of zero days during evaluation. The lesson is bigger than safety theatre.
GPT-5.4 scored above the human baseline on OSWorld-V this quarter. The 12 week response for SaaS founders looks the same as the playbook from 2009.
Why most 'AI breakthroughs' never ship, and the 12-week playbook I used at Google Brain to move them from paper to production.
Resumes, demos, and model evals are all lagging indicators. Here's what I screen for instead.
RAG is a technique. If your 'AI strategy' is a vector database, you don't have one.
What I learned authoring Google's company-wide AI/ML privacy framework, and how I'd rewrite it for 2026.