Codex Weekly: Record & Replay Ships, Claude Fable 5 Exits, and the Enterprise Agent Security Playbook Firms Up

The AI coding agent market reached an inflection point this week — and not just on benchmark leaderboards. OpenAI shipped a feature that turns demonstrated workflows into repeatable skills, Anthropic’s flagship coding model was suspended by the US government, and a supply-chain attack reminded everyone that agent trust models are still catching up to reality.

Here’s the breakdown for engineering leaders making platform decisions.

Codex Record & Replay: The Macro-Recorder Moment for Agents

The biggest product news this week is the quietest. Record & Replay shipped in Codex Desktop 26.616 (macOS, June 18) — a feature that lets developers demonstrate a multi-step workflow and save it as a reusable agent skill. Think QuickTime recording, but the output is a runnable capability, not a video file.

The workflow is straightforward: enable Computer Use, hit record, walk through the steps, and Codex generates a skill from the demonstration. The skill can then be triggered on demand without re-demonstration. Initially excluded from EEA/UK/CH regions, and it requires Computer Use to be enabled.

Why this matters: This is the first time a major coding agent platform has shipped a “watch and learn” capability at the product level. OpenAI’s Playground and GPTs let you configure behavior; this lets you teach it by doing. For engineering teams, this has immediate applications — onboarding workflows, deployment checklists, QA regression scripts — anything that follows a known sequence but changes enough context that a static script wouldn’t cut it.

The broader signal is clearer with each release: OpenAI is systematically building the pipeline from demonstration → skill → deployment. Record & Replay is the capture step. Sites (the prompt-to-deploy plugin) is the distribution step. The Ona acquisition (persistent cloud workspaces, announced June 11) is the runtime.

Claude Fable 5: The Model That Was Too Good to Export

On June 12, the US government export-suspended Claude Fable 5 and Claude Mythos 5 — Anthropic’s most capable coding and reasoning models. The stated concern: national security, specifically the potential for jailbreaking these models to identify critical software vulnerabilities at scale.

The practical impact is immediate. Fable 5 holds the #1 spot on SWE-bench Verified (95.0%) and SWE-bench Pro (80.3%), and was neck-and-neck with Codex GPT-5.5 on Terminal-Bench 2.1 (83.1% vs 83.4%). It was Anthropic’s competitive edge in the coding agent market, and it’s now unavailable to anyone outside the US.

For engineering teams evaluating platforms:

Codex CLI + GPT-5.5 is now the highest-performing available coding agent for end-to-end terminal tasks. It holds Terminal-Bench 2.1 at 83.4% — first place on the benchmark that best measures real-world agent task completion.
Claude Code with Opus 4.8 remains available ($17/mo Pro annual) and still competitive (88.6% SWE-bench Verified, 78.9% Terminal-Bench), but it’s not Fable 5.
The market now has a de facto leader among unrestricted platforms. This accelerates enterprise decisions: if you’re building on a globally available agent, Codex is the clear choice absent a timeline for Fable 5’s return.

The export suspension also raises longer-term questions about model availability as a risk factor in platform selection — something that until this week felt theoretical.

The Enterprise Agent Security Playbook Takes Shape

Three developments this week, when read together, form a coherent enterprise security framework for AI coding agents.

1. The Agents SDK Gets a Production Security Architecture

OpenAI’s April 15 Agents SDK update — still the most consequential API release this year — introduced harness–compute separation, the architectural pattern that should be table stakes for any enterprise agent deployment:

HARNESS (Control Plane)      |   COMPUTE (Execution Plane)
• Credentials & API keys     |   • Model-generated code
• Orchestration logic        |   • No credentials in scope
• Approval decisions         |   • Filesystem/Git/shell
• Tracing & audit            |   • Isolated container

This separation means that even if model-generated code is compromised, credentials never enter the execution environment. For SOC 2 Type II and ISO 27001 compliance, this isn’t optional — it’s the difference between certifiable and not.

The SDK also supports durable state via snapshots — agent state survives container crashes and can be resumed from the last checkpoint — and native sandbox execution across seven providers (E2B, Runloop, Modal, Vercel, Cloudflare, Daytona, Blaxel) with a custom adapter interface.

2. The Miasma Supply-Chain Attack

A coordinated supply-chain attack named Miasma was disclosed this week, targeting 13 AI coding tools via config-file injection. The attack vector: compromised configuration files that inject malicious instructions into agent prompts, turning trusted tools into unwitting exfiltration vectors.

The takeaway isn’t fear — it’s that agent trust models need to evolve. The OpenClaw industry response has been to push for signed configuration manifests and runtime sandboxing of model-generated instructions, both of which the Agents SDK’s harness-compute separation already supports.

3. 1Password Credential Broker (Beta)

1Password launched a Credential Broker (June 15 beta) that delivers credentials to trusted agents at time of use — not stored in agent configuration. This completes a security triad: isolated execution (SDK), signed configs (industry response to Miasma), and just-in-time credential delivery (1Password).

Codex CLI v0.140.0: The Developer Quality-of-Life Release

Codex CLI hit v0.140.0 on June 15 with a batch of refinements that matter for daily use:

/usage views — daily, weekly, and cumulative token activity at a glance
/goal improvements — oversized text and large pasted content preserved correctly
Permanent session deletion — finally available for cleanup workflows
Selective Claude Code imports — complementary to the app-level “Migrate to Codex” flow
Managed Bedrock auth — streamlined AWS-managed auth, billing, and account controls (following the June 1 Bedrock launch)
Unified @mentions menu — consistent across CLI and IDE
Wine-backed Windows executor — ongoing cross-platform compatibility work

The v0.139.0 release (June 9) had already shipped standalone web search from Code mode — including from nested JS tool calls — and plaintext search results. The pace of iteration hasn’t slowed: daily commits, weekly tagged releases, and 89,991 GitHub stars (Apache-2.0).

What to Watch Next Week

Apple Xcode 27 beta deepens — WWDC previews showed agent-native IDE support for Codex, Claude Code, and Gemini CLI. Developer beta hands-on reviews should start surfacing next week, giving the first real look at IDE-embedded agent workflows running locally on Apple Silicon.
OpenAI S-1 filings — The public version of OpenAI’s S-1, expected August-September, will open the company’s financials to regulatory scrutiny. Pre-filing leaks and analyst notes will shape market expectations in the weeks ahead.
Claude Code under the export restriction — Anthropic hasn’t commented on a timeline for Fable 5’s return. If the suspension persists, expect accelerated migration from Claude to Codex at organizations that need unrestricted access.
GitHub Copilot usage-based pricing bites — The June 1 switch to $0.01/credit AI pricing, paired with a paid sign-up pause during rollout, creates a window for Codex to capture teams evaluating their Copilot renewal.
Record & Replay goes cross-platform — macOS-only at launch. Windows support is the obvious next step, and it will be the signal that OpenAI is serious about making skill creation a core platform primitive, not a macOS experiment.

Codex Weekly is a regular briefing from Big Hat Group Inc. for CTOs, engineering leaders, and platform decision-makers navigating the AI developer tools landscape. We track OpenAI Codex, competitive agents, security, and enterprise deployment patterns — because the tooling decisions you make today become infrastructure decisions tomorrow.

Research compiled by Central 🤖