Agentic Coding Harnesses: Claude Code vs Codex vs Gemini CLI — An Enterprise Guide

The three major agentic coding platforms — Anthropic’s Claude Code CLI, OpenAI’s Codex CLI, and Google’s Gemini CLI — have matured into production-grade tools. They’re no longer experimental toys. They write code, execute shell commands, manage git workflows, and orchestrate multi-agent pipelines across entire codebases. If you’re an enterprise IT leader and you’re not evaluating these tools, you’re already behind.

At Big Hat Group, we’ve been deploying these agent harnesses on Windows 365 Cloud PCs managed through Azure and Intune for enterprise clients. Here’s what we’ve learned about the security architecture, governance models, and practical trade-offs that actually matter for enterprise AI automation.

Sandboxing and Security: OS-Level Isolation Is Non-Negotiable

The single most important architectural decision in any agentic coding system is how it isolates agent execution from the host operating system. These tools run arbitrary shell commands. If sandboxing fails, an agent can delete files, steal credentials, or exfiltrate data.

Claude Code uses OS-level primitives — Apple’s Seatbelt framework on macOS and bubblewrap (seccomp + Landlock) on Linux — to enforce filesystem and network isolation at the kernel level. On Windows, where these Unix-native primitives aren’t available, Claude Code supports Docker-based sandboxing to achieve equivalent isolation. Even if the agent is compromised through prompt injection, the OS kernel (or container boundary) itself blocks unauthorized file access and network connections. Network traffic routes through an approved proxy with domain allowlists, preventing data exfiltration to attacker-controlled servers.

Codex CLI uses the same Seatbelt and Landlock mechanisms but configures them differently based on the approval policy selected at runtime. It implements a “workspace” concept where the current directory and /tmp are the active scope. On Windows, Codex also relies on Docker containers for sandboxing when native OS primitives aren’t available. One critical detail: in containerized Docker environments, Codex recognizes that standard sandbox mechanisms may not work if the container lacks Landlock support, and provides explicit guidance to configure Docker appropriately or disable sandboxing.

Gemini CLI takes a lighter approach, relying primarily on process isolation and configurable proxy scripts (GEMINI_SANDBOX_PROXY_COMMAND) rather than OS-level kernel enforcement. In Google Cloud Shell, sandboxing is handled by Google’s infrastructure. On local machines, the isolation is less robust than what Claude Code and Codex provide.

The n8n sandbox escape incident makes this concrete. Researchers found that sanitization-based security controls in n8n’s workflow automation platform could be bypassed through alternative JavaScript syntax, leading to complete server compromise — credential theft, environment variable exposure, and lateral movement to cloud accounts. The lesson: execution isolation at the OS level is fundamentally more robust than trying to filter dangerous inputs at the application layer.

Enterprise takeaway: For any deployment where agents execute commands on machines with access to production credentials, network resources, or sensitive code, demand OS-level sandboxing. On Windows — including Windows 365 Cloud PCs — Docker-based isolation is the standard approach for both Claude Code and Codex. Application-layer isolation alone is insufficient. Claude Code and Codex deliver kernel-level or container-level enforcement; Gemini CLI does not — yet.

Permission Models: Approval Fatigue vs. Security Posture

Permission systems must navigate a brutal trade-off: too many prompts cause approval fatigue (developers stop reading and auto-approve everything), while too few leave destructive operations unsupervised.

Claude Code implements a tiered system with three categories: read-only operations (no approval needed), bash commands (approved once per session), and file modifications (approved each time). Rules follow a strict deny → ask → allow priority chain, and deny rules always win. You can get granular — Bash(npm run build) matches only that exact command. The bypassPermissions mode skips all prompts but requires a safe environment (CI/CD container or dev VM), and Anthropic reports an 84% reduction in permission prompts through sandboxing alone when this mode is active.

Codex CLI layers sandbox modes with approval policies and distinguishes between trusted and untrusted commands. Destructive git operations — force-push, config overrides — require approval even in automatic mode. Any MCP tool call advertising a destructive annotation requires approval regardless of other settings. The --dangerously-bypass-approvals-and-sandbox flag (yes, that’s the real flag name) exists but is explicitly not recommended.

Gemini CLI has a less granular system, working at the tool or server level rather than per-command patterns. And here’s the cautionary tale: in 2025, a Gemini CLI agent deleted an entire project directory without explicit user confirmation. The agent technically stayed within its sandbox boundaries, but the user expected a confirmation prompt for destructive filesystem operations. The gap between “technically permitted” and “user expected to be asked” is where real damage happens.

Enterprise takeaway: Deploy Claude Code or Codex in environments requiring fine-grained control over destructive operations. Configure deny rules for high-risk commands at the managed settings level so individual developers cannot override them. Treat the Gemini deletion incident as a case study in your AI governance and security training.

Context Management: CLAUDE.md vs AGENTS.md vs GEMINI.md

Each platform uses markdown configuration files to inject project-specific instructions into the agent’s context. The differences are more than cosmetic.

CLAUDE.md files follow a hierarchical discovery model: global (~/.claude/CLAUDE.md), project root, then subdirectory-specific overrides. A backend API directory can have specialized guidelines that override parent instructions only for files in that subtree. These files are injected into the system prompt on startup. Claude Code supports a 1 million token context window with Opus 4.6, enabling analysis of entire microservice architectures in a single session.

AGENTS.md (Codex) follows a simpler pattern: files merge from repo root to current working directory, each appearing as a separate user-role message in conversation history. Codex shows developers exactly which instructions are active. The skills system adds reusable procedure bundles that load only when invoked, avoiding token waste. Codex also supports compaction — automatic server-side context compression for multi-hour sessions.

GEMINI.md mirrors Claude’s hierarchical approach but adds modular imports via @file.md syntax, letting teams break large context files into maintainable components. The /memory add command appends persistent instructions across sessions, and /memory reload forces re-scanning after edits.

For enterprise teams, the critical difference is policy enforcement. Claude Code’s managed settings layer lets administrators enforce organization-wide restrictions that individual developers cannot override. Codex embeds policy in version-controlled AGENTS.md files — great for developer visibility, harder for centralized enforcement. Gemini offers a middle ground with hierarchical files plus persistent memory.

Enterprise takeaway: Standardize on project-level configuration files (CLAUDE.md, AGENTS.md, or GEMINI.md) in every repository. Encode your coding standards, security requirements, and review guidelines directly. For regulated environments, Claude Code’s managed settings provide the strongest centralized policy enforcement for agentic AI for IT operations.

Multi-Agent Orchestration: Background Execution Changes Everything

Single-agent workflows hit a wall on complex tasks. The platforms diverge sharply on how they handle parallelism.

Claude Code supports true background agent execution. Spawn a sub-agent for a long-running task, press Ctrl+B to background it, and continue working with the main agent. Monitor all background agents via /tasks. Sub-agents run in independent context windows with custom system prompts, specific tool access, and their own permissions. Built-in sub-agents include Explore (file discovery), Plan (strategic planning), and general-purpose agents. You can also create agent teams for sustained parallel execution across separate sessions — critical for large-scale codebase refactoring.

Codex CLI implements orchestration through CSV fan-out: spawn_agents_on_csv reads a CSV file and spawns one worker sub-agent per row. A security team creates a CSV with one row per component, Codex spawns one review agent per component, and collects all results in an output CSV. Configuration parameters like agents.max_threads and agents.job_max_runtime_seconds prevent resource exhaustion. Codex also implements worktrees — Git-isolated checkouts that let multiple agents work on the same repo without interference.

Gemini CLI still lacks native background task processing. A GitHub issue requesting this feature was marked P1 (important) after users reported that large codebase analysis taking 20+ minutes blocks all CLI usage. The workaround — running multiple terminal tabs with nohup — demonstrates the operational burden when background support isn’t native.

Enterprise takeaway: For enterprise AI automation workflows involving parallel code review, security audits, or batch migrations, Claude Code’s background agents and Codex’s CSV fan-out are production-ready patterns. Gemini CLI is not yet ready for parallel enterprise workloads.

MCP and Tool Integration: The Extensibility Layer

The Model Context Protocol (MCP) is the standardized mechanism for connecting agents to external tools, APIs, and services. All three platforms support it, but implementation maturity varies.

Claude Code treats MCP as a primary extension mechanism. Three transport types: stdio (local processes), HTTP (remote servers, recommended), and SSE (deprecated). Adding a GitHub MCP server is one command: claude mcp add --transport http github https://api.githubcopilot.com/mcp/. OAuth flows trigger automatically for authenticated servers. Common integrations include Sentry, PostgreSQL, Figma, and custom internal tools. Environment variable substitution keeps API keys out of version control.

Codex CLI supports MCP through the Responses API and extends it with the skills system — reusable procedure bundles that invoke both built-in and external tools. Skills proved effective at Glean, where adding negative examples (“when NOT to use this skill”) improved routing accuracy by 20%.

Gemini CLI implements MCP with automatic OAuth discovery — detecting 401 responses, discovering OAuth endpoints, performing dynamic client registration, and handling the flow automatically. This reduces configuration overhead. However, a notable restriction: using third-party software to access Gemini CLI beyond official tools violates Google’s terms and can result in account suspension.

At Big Hat Group, we deploy MCP servers for Jira, Microsoft Graph, and internal tools across client environments, enabling agents to create tickets, send notifications, and query documentation without leaving the terminal.

Enterprise takeaway: MCP is the future of AI agent consulting tool integration. Standardize on MCP servers for your internal tools now. Claude Code and Codex have the most mature implementations. Watch Gemini’s OAuth auto-discovery — it’s genuinely clever and will likely become the standard pattern.

Enterprise Governance: SOC 2, Audit Trails, and Compliance

Enterprise deployment demands audit trails, compliance controls, and policy enforcement that go far beyond individual developer usage.

Claude Code provides log export for Organization Owners, role-based access control restricting tool access by user role, and managed settings that enforce organization-wide policies overriding individual preferences. The tiered permission system maps directly to SOC 2 control objectives around access management and change control.

Codex CLI addresses governance through CI/CD pipeline integration (GitHub Actions), version-controlled AGENTS.md files (policy changes go through PR review), and explicit guidance that AI-generated configuration should be treated like production code with clear attribution to a responsible developer.

Research from Teleport identified that SOC 2 compliance for AI agents requires centralized logging with immutable storage for at least one year, time-synchronized logs, automated alerting correlating events against risk thresholds, and documented investigation workflows. Organizations should layer AI-specific risk management — risk assessment, lifecycle governance, continuous monitoring, red teaming — on top of existing SOC 2 control objectives.

Enterprise takeaway: No agentic coding tool is SOC 2 compliant out of the box. You need centralized logging, immutable audit trails, scoped agent identities, and managed policy enforcement. Claude Code’s managed settings layer is the closest to enterprise-ready AI governance and security. Build your compliance framework around it.

Which Tool When: Practical Recommendations

Skip the “it depends” — here’s what to deploy where:

Scenario	Recommendation
Large codebase migration/refactoring	Claude Code — 1M token context, background agents, structured memory
CI/CD automation and PR workflows	Codex CLI — native GitHub Actions, PR creation, automated reviews
Quick exploration and small fixes	Gemini CLI — 1,000 free requests/day, fast iteration
Regulated environments (SOC 2, HIPAA)	Claude Code — managed settings, deny-rule enforcement, audit export
Batch security audits	Codex CLI — CSV fan-out, one agent per component
Multi-step architectural work	Claude Code — sub-agents, background execution, deep reasoning
Cost-sensitive teams evaluating AI	Gemini CLI free tier → Claude Code Pro when ready to commit

The smartest enterprise teams adopt a multi-tool strategy: Gemini CLI for quick scans and exploration (free), Claude Code for architectural decisions and complex execution, Codex CLI for GitHub-integrated automation. Match the tool to the task, not the vendor to the contract.

Your Action Items

Audit your current agent deployments for OS-level sandboxing. If agents execute commands on machines with production credentials and no kernel-level isolation, fix that immediately.
Create standardized context files (CLAUDE.md, AGENTS.md, GEMINI.md) for every active repository. Encode your security requirements, coding standards, and review guidelines.
Implement deny rules for destructive operations (file deletion, force-push, config modification) at the managed settings level. Don’t rely on developer discipline.
Deploy MCP servers for your internal tools — ticketing, monitoring, documentation. This is the extensibility layer that makes agents genuinely useful beyond code generation.
Establish agent identity and audit logging before scaling. Each agent should have scoped permissions and its own identity for SOC 2 attribution.
Run a pilot with Claude Code on a non-critical codebase. Test background agents, sub-agent orchestration, and managed settings enforcement before rolling out to production teams.
Train your team on prompt injection risks — both direct and indirect. The attack surface includes Jira comments, documentation, PR descriptions, and any content agents process autonomously.
Build a skills library for your organization. Skills are reusable procedure bundles — coding standards, review checklists, deployment workflows, security audit templates — that agents invoke when relevant. Codex’s skills system improved routing accuracy by 20% at Glean once negative examples were added. Claude Code supports custom sub-agents with specialized prompts and tool access. Encode your institutional knowledge into skills so every agent operates with your team’s best practices, not generic defaults.
Adopt plugins for extensibility beyond code. Plugins extend agent capabilities into domains your team cares about — design-to-code workflows (Figma), error monitoring (Sentry), database queries (PostgreSQL/Supabase), and custom internal tools. Claude Code’s plugin governance features (hook events, settings controls, trust dialogs for MCP server configs) give organizations control over which plugins load and what they can access. Treat plugin configuration with the same rigor as production dependencies — review, version-control, and audit what’s installed.

Get Enterprise AI Agent Consulting

Big Hat Group deploys agentic coding harnesses on Windows 365 Cloud PCs with Azure and Intune for enterprise clients. We handle the security architecture, governance frameworks, MCP integration, and managed policy enforcement so your teams can focus on building.

Contact us to start your enterprise AI agent deployment →