The AI coding agent market reached an inflection point this week β€” and not just on benchmark leaderboards. OpenAI shipped a feature that turns demonstrated workflows into repeatable skills, Anthropic’s flagship coding model was suspended by the US government, and a supply-chain attack reminded everyone that agent trust models are still catching up to reality.

Here’s the breakdown for engineering leaders making platform decisions.


Codex Record & Replay: The Macro-Recorder Moment for Agents

The biggest product news this week is the quietest. Record & Replay shipped in Codex Desktop 26.616 (macOS, June 18) β€” a feature that lets developers demonstrate a multi-step workflow and save it as a reusable agent skill. Think QuickTime recording, but the output is a runnable capability, not a video file.

The workflow is straightforward: enable Computer Use, hit record, walk through the steps, and Codex generates a skill from the demonstration. The skill can then be triggered on demand without re-demonstration. Initially excluded from EEA/UK/CH regions, and it requires Computer Use to be enabled.

Why this matters: This is the first time a major coding agent platform has shipped a “watch and learn” capability at the product level. OpenAI’s Playground and GPTs let you configure behavior; this lets you teach it by doing. For engineering teams, this has immediate applications β€” onboarding workflows, deployment checklists, QA regression scripts β€” anything that follows a known sequence but changes enough context that a static script wouldn’t cut it.

The broader signal is clearer with each release: OpenAI is systematically building the pipeline from demonstration β†’ skill β†’ deployment. Record & Replay is the capture step. Sites (the prompt-to-deploy plugin) is the distribution step. The Ona acquisition (persistent cloud workspaces, announced June 11) is the runtime.


Claude Fable 5: The Model That Was Too Good to Export

On June 12, the US government export-suspended Claude Fable 5 and Claude Mythos 5 β€” Anthropic’s most capable coding and reasoning models. The stated concern: national security, specifically the potential for jailbreaking these models to identify critical software vulnerabilities at scale.

The practical impact is immediate. Fable 5 holds the #1 spot on SWE-bench Verified (95.0%) and SWE-bench Pro (80.3%), and was neck-and-neck with Codex GPT-5.5 on Terminal-Bench 2.1 (83.1% vs 83.4%). It was Anthropic’s competitive edge in the coding agent market, and it’s now unavailable to anyone outside the US.

For engineering teams evaluating platforms:

  • Codex CLI + GPT-5.5 is now the highest-performing available coding agent for end-to-end terminal tasks. It holds Terminal-Bench 2.1 at 83.4% β€” first place on the benchmark that best measures real-world agent task completion.
  • Claude Code with Opus 4.8 remains available ($17/mo Pro annual) and still competitive (88.6% SWE-bench Verified, 78.9% Terminal-Bench), but it’s not Fable 5.
  • The market now has a de facto leader among unrestricted platforms. This accelerates enterprise decisions: if you’re building on a globally available agent, Codex is the clear choice absent a timeline for Fable 5’s return.

The export suspension also raises longer-term questions about model availability as a risk factor in platform selection β€” something that until this week felt theoretical.


The Enterprise Agent Security Playbook Takes Shape

Three developments this week, when read together, form a coherent enterprise security framework for AI coding agents.

1. The Agents SDK Gets a Production Security Architecture

OpenAI’s April 15 Agents SDK update β€” still the most consequential API release this year β€” introduced harness–compute separation, the architectural pattern that should be table stakes for any enterprise agent deployment:

HARNESS (Control Plane)      |   COMPUTE (Execution Plane)
β€’ Credentials & API keys     |   β€’ Model-generated code
β€’ Orchestration logic        |   β€’ No credentials in scope
β€’ Approval decisions         |   β€’ Filesystem/Git/shell
β€’ Tracing & audit            |   β€’ Isolated container

This separation means that even if model-generated code is compromised, credentials never enter the execution environment. For SOC 2 Type II and ISO 27001 compliance, this isn’t optional β€” it’s the difference between certifiable and not.

The SDK also supports durable state via snapshots β€” agent state survives container crashes and can be resumed from the last checkpoint β€” and native sandbox execution across seven providers (E2B, Runloop, Modal, Vercel, Cloudflare, Daytona, Blaxel) with a custom adapter interface.

2. The Miasma Supply-Chain Attack

A coordinated supply-chain attack named Miasma was disclosed this week, targeting 13 AI coding tools via config-file injection. The attack vector: compromised configuration files that inject malicious instructions into agent prompts, turning trusted tools into unwitting exfiltration vectors.

The takeaway isn’t fear β€” it’s that agent trust models need to evolve. The OpenClaw industry response has been to push for signed configuration manifests and runtime sandboxing of model-generated instructions, both of which the Agents SDK’s harness-compute separation already supports.

3. 1Password Credential Broker (Beta)

1Password launched a Credential Broker (June 15 beta) that delivers credentials to trusted agents at time of use β€” not stored in agent configuration. This completes a security triad: isolated execution (SDK), signed configs (industry response to Miasma), and just-in-time credential delivery (1Password).


Codex CLI v0.140.0: The Developer Quality-of-Life Release

Codex CLI hit v0.140.0 on June 15 with a batch of refinements that matter for daily use:

  • /usage views β€” daily, weekly, and cumulative token activity at a glance
  • /goal improvements β€” oversized text and large pasted content preserved correctly
  • Permanent session deletion β€” finally available for cleanup workflows
  • Selective Claude Code imports β€” complementary to the app-level “Migrate to Codex” flow
  • Managed Bedrock auth β€” streamlined AWS-managed auth, billing, and account controls (following the June 1 Bedrock launch)
  • Unified @mentions menu β€” consistent across CLI and IDE
  • Wine-backed Windows executor β€” ongoing cross-platform compatibility work

The v0.139.0 release (June 9) had already shipped standalone web search from Code mode β€” including from nested JS tool calls β€” and plaintext search results. The pace of iteration hasn’t slowed: daily commits, weekly tagged releases, and 89,991 GitHub stars (Apache-2.0).


What to Watch Next Week

  1. Apple Xcode 27 beta deepens β€” WWDC previews showed agent-native IDE support for Codex, Claude Code, and Gemini CLI. Developer beta hands-on reviews should start surfacing next week, giving the first real look at IDE-embedded agent workflows running locally on Apple Silicon.

  2. OpenAI S-1 filings β€” The public version of OpenAI’s S-1, expected August-September, will open the company’s financials to regulatory scrutiny. Pre-filing leaks and analyst notes will shape market expectations in the weeks ahead.

  3. Claude Code under the export restriction β€” Anthropic hasn’t commented on a timeline for Fable 5’s return. If the suspension persists, expect accelerated migration from Claude to Codex at organizations that need unrestricted access.

  4. GitHub Copilot usage-based pricing bites β€” The June 1 switch to $0.01/credit AI pricing, paired with a paid sign-up pause during rollout, creates a window for Codex to capture teams evaluating their Copilot renewal.

  5. Record & Replay goes cross-platform β€” macOS-only at launch. Windows support is the obvious next step, and it will be the signal that OpenAI is serious about making skill creation a core platform primitive, not a macOS experiment.


Codex Weekly is a regular briefing from Big Hat Group Inc. for CTOs, engineering leaders, and platform decision-makers navigating the AI developer tools landscape. We track OpenAI Codex, competitive agents, security, and enterprise deployment patterns β€” because the tooling decisions you make today become infrastructure decisions tomorrow.

Research compiled by Central πŸ€–