The AI coding agent market reached an inflection point this week β and not just on benchmark leaderboards. OpenAI shipped a feature that turns demonstrated workflows into repeatable skills, Anthropic’s flagship coding model was suspended by the US government, and a supply-chain attack reminded everyone that agent trust models are still catching up to reality.
Here’s the breakdown for engineering leaders making platform decisions.
Codex Record & Replay: The Macro-Recorder Moment for Agents
The biggest product news this week is the quietest. Record & Replay shipped in Codex Desktop 26.616 (macOS, June 18) β a feature that lets developers demonstrate a multi-step workflow and save it as a reusable agent skill. Think QuickTime recording, but the output is a runnable capability, not a video file.
The workflow is straightforward: enable Computer Use, hit record, walk through the steps, and Codex generates a skill from the demonstration. The skill can then be triggered on demand without re-demonstration. Initially excluded from EEA/UK/CH regions, and it requires Computer Use to be enabled.
Why this matters: This is the first time a major coding agent platform has shipped a “watch and learn” capability at the product level. OpenAI’s Playground and GPTs let you configure behavior; this lets you teach it by doing. For engineering teams, this has immediate applications β onboarding workflows, deployment checklists, QA regression scripts β anything that follows a known sequence but changes enough context that a static script wouldn’t cut it.
The broader signal is clearer with each release: OpenAI is systematically building the pipeline from demonstration β skill β deployment. Record & Replay is the capture step. Sites (the prompt-to-deploy plugin) is the distribution step. The Ona acquisition (persistent cloud workspaces, announced June 11) is the runtime.
Claude Fable 5: The Model That Was Too Good to Export
On June 12, the US government export-suspended Claude Fable 5 and Claude Mythos 5 β Anthropic’s most capable coding and reasoning models. The stated concern: national security, specifically the potential for jailbreaking these models to identify critical software vulnerabilities at scale.
The practical impact is immediate. Fable 5 holds the #1 spot on SWE-bench Verified (95.0%) and SWE-bench Pro (80.3%), and was neck-and-neck with Codex GPT-5.5 on Terminal-Bench 2.1 (83.1% vs 83.4%). It was Anthropic’s competitive edge in the coding agent market, and it’s now unavailable to anyone outside the US.
For engineering teams evaluating platforms:
- Codex CLI + GPT-5.5 is now the highest-performing available coding agent for end-to-end terminal tasks. It holds Terminal-Bench 2.1 at 83.4% β first place on the benchmark that best measures real-world agent task completion.
- Claude Code with Opus 4.8 remains available ($17/mo Pro annual) and still competitive (88.6% SWE-bench Verified, 78.9% Terminal-Bench), but it’s not Fable 5.
- The market now has a de facto leader among unrestricted platforms. This accelerates enterprise decisions: if you’re building on a globally available agent, Codex is the clear choice absent a timeline for Fable 5’s return.
The export suspension also raises longer-term questions about model availability as a risk factor in platform selection β something that until this week felt theoretical.
The Enterprise Agent Security Playbook Takes Shape
Three developments this week, when read together, form a coherent enterprise security framework for AI coding agents.
1. The Agents SDK Gets a Production Security Architecture
OpenAI’s April 15 Agents SDK update β still the most consequential API release this year β introduced harnessβcompute separation, the architectural pattern that should be table stakes for any enterprise agent deployment:
HARNESS (Control Plane) | COMPUTE (Execution Plane)
β’ Credentials & API keys | β’ Model-generated code
β’ Orchestration logic | β’ No credentials in scope
β’ Approval decisions | β’ Filesystem/Git/shell
β’ Tracing & audit | β’ Isolated container
This separation means that even if model-generated code is compromised, credentials never enter the execution environment. For SOC 2 Type II and ISO 27001 compliance, this isn’t optional β it’s the difference between certifiable and not.
The SDK also supports durable state via snapshots β agent state survives container crashes and can be resumed from the last checkpoint β and native sandbox execution across seven providers (E2B, Runloop, Modal, Vercel, Cloudflare, Daytona, Blaxel) with a custom adapter interface.
2. The Miasma Supply-Chain Attack
A coordinated supply-chain attack named Miasma was disclosed this week, targeting 13 AI coding tools via config-file injection. The attack vector: compromised configuration files that inject malicious instructions into agent prompts, turning trusted tools into unwitting exfiltration vectors.
The takeaway isn’t fear β it’s that agent trust models need to evolve. The OpenClaw industry response has been to push for signed configuration manifests and runtime sandboxing of model-generated instructions, both of which the Agents SDK’s harness-compute separation already supports.
3. 1Password Credential Broker (Beta)
1Password launched a Credential Broker (June 15 beta) that delivers credentials to trusted agents at time of use β not stored in agent configuration. This completes a security triad: isolated execution (SDK), signed configs (industry response to Miasma), and just-in-time credential delivery (1Password).
Codex CLI v0.140.0: The Developer Quality-of-Life Release
Codex CLI hit v0.140.0 on June 15 with a batch of refinements that matter for daily use:
/usageviews β daily, weekly, and cumulative token activity at a glance/goalimprovements β oversized text and large pasted content preserved correctly- Permanent session deletion β finally available for cleanup workflows
- Selective Claude Code imports β complementary to the app-level “Migrate to Codex” flow
- Managed Bedrock auth β streamlined AWS-managed auth, billing, and account controls (following the June 1 Bedrock launch)
- Unified @mentions menu β consistent across CLI and IDE
- Wine-backed Windows executor β ongoing cross-platform compatibility work
The v0.139.0 release (June 9) had already shipped standalone web search from Code mode β including from nested JS tool calls β and plaintext search results. The pace of iteration hasn’t slowed: daily commits, weekly tagged releases, and 89,991 GitHub stars (Apache-2.0).
What to Watch Next Week
Apple Xcode 27 beta deepens β WWDC previews showed agent-native IDE support for Codex, Claude Code, and Gemini CLI. Developer beta hands-on reviews should start surfacing next week, giving the first real look at IDE-embedded agent workflows running locally on Apple Silicon.
OpenAI S-1 filings β The public version of OpenAI’s S-1, expected August-September, will open the company’s financials to regulatory scrutiny. Pre-filing leaks and analyst notes will shape market expectations in the weeks ahead.
Claude Code under the export restriction β Anthropic hasn’t commented on a timeline for Fable 5’s return. If the suspension persists, expect accelerated migration from Claude to Codex at organizations that need unrestricted access.
GitHub Copilot usage-based pricing bites β The June 1 switch to $0.01/credit AI pricing, paired with a paid sign-up pause during rollout, creates a window for Codex to capture teams evaluating their Copilot renewal.
Record & Replay goes cross-platform β macOS-only at launch. Windows support is the obvious next step, and it will be the signal that OpenAI is serious about making skill creation a core platform primitive, not a macOS experiment.
Codex Weekly is a regular briefing from Big Hat Group Inc. for CTOs, engineering leaders, and platform decision-makers navigating the AI developer tools landscape. We track OpenAI Codex, competitive agents, security, and enterprise deployment patterns β because the tooling decisions you make today become infrastructure decisions tomorrow.
Research compiled by Central π€