Copilot Weekly: Token Efficiency Prep Ships as Usage Billing Nears

With the June 1 usage-based billing transition now less than four weeks away, this week’s Copilot releases are dominated by a single theme: efficiency. VS Code 1.118 ships aggressive token optimization — 93% prompt cache hit rates, agentic search and execution offload tools, and up to 20% token savings — while the broader ecosystem delivers GPT-5.5 GA, deprecation timelines for older models, cloud agent performance improvements, and a candid GitHub availability post-mortem that reveals the company is planning for 30× current capacity driven by agentic workload growth. Here is everything that changed between April 28 and May 5, and what it means for enterprise teams preparing for the post-June Copilot landscape.

VS Code 1.118: Token Efficiency at Scale

The VS Code 1.118 release (April 29) is arguably the most consequential Copilot update this month, because it is explicitly engineered for the economics of the coming usage-based billing model. GitHub is not just adding features — it is redesigning how the agent consumes tokens.

Prompt Caching — 93%+ Hit Rate

Copilot now maintains a cache-stable system prompt and tool definitions across turns, achieving a 93%+ cache hit rate on repeated context. Strategic breakpoints at turn boundaries ensure the cache remains effective without manual intervention. For enterprise teams carrying large custom agents or heavily tool-augmented workflows, this translates directly into fewer billable tokens per session.

Tool Search — Up to 20% Token Savings

Instead of loading the full tool catalog on every turn, a tool search tool (default on for Claude Sonnet 4.5+ and Opus 4.5+, rolling to GPT-5.4/5.5 via the Responses API) loads only ~30 core tools by default and fetches others on demand via embedding search. The result is up to 20% token savings on tool-heavy agent sessions — a meaningful reduction for teams running multi-turn refactors or complex debugging loops.

Agentic Search and Execution Offload

Two new small-model subsystems reduce the load on the primary model:

Agentic search tool — a fine-tuned small model handles codebase grep, file search, and semantic search, returning only relevant results to the main agent.
Agentic execution tool — a small model runs terminal commands (capped at 10 calls), filters verbose output, and handles execution decisions independently.

Both reduce main-model token consumption on the noisiest parts of agentic workflows: searching and running things.

WebSockets for OpenAI Models

Persistent WebSocket connections for OpenAI-powered Copilot sessions deliver 12% faster response times and are auto-used when the provider supports them. For Anthropic users, cache breakpoints can be enabled via github.copilot.chat.anthropic.cacheBreakpoints.lastTwoMessages.

Why it matters for enterprise teams: These optimizations are not abstract — they map directly to the AI Credit ledger that starts tracking spend on June 1. For a broader perspective on Copilot strategy, read our enterprise AI strategy guide. Teams running high-volume agent sessions in VS Code will see the most benefit. If you have not yet tested how your custom agents and skills behave with tool search and agentic offload enabled, this week is the time.

Remote Control for Copilot CLI Sessions (Experimental)

A new github.copilot.chat.cli.remote.enabled setting lets developers monitor and steer ongoing Copilot CLI sessions from GitHub.com or the GitHub mobile app. Toggle with /remote on — the session keeps running in the background while you step away from your machine. For distributed teams or scenario-based debugging, this means a teammate can hand off an active CLI session without losing context.

Semantic Indexing: Now for All Workspaces

Semantic indexing — previously restricted to GitHub and Azure DevOps repos — is now available for all workspaces, including local and non-GitHub repositories. Indexes are built automatically via the “Build Codebase Semantic Index” command. For enterprise teams using on-premise or self-hosted git infrastructure, this closes a meaningful gap: Copilot now understands your codebase structure regardless of where the repo lives.

GitHub Text Search Across Repos or Orgs

A new githubTextSearch agent tool enables grep-style (exact match) searches across any GitHub repository or entire organization, complementing the existing semantic githubRepo tool. This is especially useful for auditing policy compliance, finding configuration patterns across repos, or discovering where a specific function or constant is used across an org — tasks that semantic search does poorly on.

MCP Deduplication and Workspace .mcp.json

Workspace-level .mcp.json files are now supported, and MCP servers with the same name are automatically deduplicated — only the most-specific configuration stays enabled by default. This addresses the growing pain of overlapping MCP definitions across repository, workspace, and user-level config files.

Chronicle: Chat Session Insights (Experimental)

Copilot Chat sessions can now be turned into standup reports, tips, and answers about past work via the new Chronicle view in the Chat panel. For engineering managers tracking agent activity, this is a lightweight alternative to pulling logs — Copilot can summarize what happened across a session history without manual note-taking.

Enterprise Policy: Approved Account Organizations

Admins can now restrict AI feature access to specific GitHub organizations they trust, blocking unauthorized account use across the enterprise. For organizations managing multiple GitHub Cloud tenants, this closes a policy enforcement gap — no more relying on seat-counting heuristics.

Bring Your Own Key for Anthropic Models

Enterprise customers can now use their own Anthropic API key for Copilot, enabling direct billing, higher rate limits, and access to Anthropic’s latest models (including Claude Opus 4.7) without consuming Copilot AI Credits.

Model Updates: GPT-5.5 GA, GPT-5.2 Deprecation, and Opus 4.7

GPT-5.5 Generally Available

OpenAI’s GPT-5.5 rolled out across Copilot Pro+, Business, and Enterprise tiers, delivering strongest performance on complex, multi-step agentic coding tasks. Available across VS Code, Visual Studio, JetBrains, Copilot CLI, cloud agent, github.com, and GitHub Mobile. Ships with a 7.5× premium request multiplier (promotional pricing). Administrators must enable via Copilot settings policy. For a full breakdown of what GPT-5.5 means for enterprise development, see our GPT-5 enterprise guide.

GPT-5.2 and GPT-5.2-Codex Deprecation (June 1)

Announced May 1: GPT-5.2 and GPT-5.2-Codex will be deprecated across all Copilot experiences on June 1, 2026 — chat, inline edits, ask/agent modes, and code completions. GPT-5.2-Codex remains in Copilot Code Review. Suggested migration paths: GPT-5.5 for chat and code, GPT-5.3-Codex for code review.

Claude Opus 4.7 Promotional Pricing Ends

Anthropic’s Claude Opus 4.7 promotional 7.5× multiplier ended April 30. The model now sits at a standard 15× multiplier in the model picker for Pro+ subscribers, replacing Opus 4.5 and 4.6.

Copilot Cloud Agent: Faster Starts and Usability Improvements

The Copilot cloud agent continues to narrow the feedback loop. Optimized runner environments built from GitHub Actions custom images deliver over 20% faster startup on top of the 50% improvement shipped in March. Meanwhile, cloud agent sessions are now visible directly from GitHub issues and project boards — project managers can see session status without navigating away from their board.

Visual Studio April Update: Cloud Agent and Debugger Agent

The April 2026 Visual Studio update (April 30) brings deep agent integration to the IDE:

Cloud agent integration — Start cloud agent sessions directly from Visual Studio’s agent picker. The agent creates issues and PRs on remote infrastructure while you keep coding.
Debugger agent — A new agentic issue-to-resolution workflow that reproduces, instruments, diagnoses, and suggests fixes from GitHub/Azure DevOps issues using live runtime execution. Validates fixes against real behavior, not static analysis.
User-level custom agents — Personal agent definitions stored in %USERPROFILE%/.github/agents/ travel with you across projects.
Agent skills discovery — Skills found in .claude/skills/ and .agents/skills/ in addition to .github/skills/.
C++ code editing tools GA — Symbol call hierarchy and class hierarchy now on by default in agent mode.
Chat history panel with titles, message previews, and timestamps.

JetBrains: Inline Agent Mode Public Preview

JetBrains IDEs bring agent capabilities into the inline chat (Shift+Ctrl+I / Shift+Cmd+I) without switching to the chat panel. The update also ships Next Edit Suggestions inline edit previews, far-away edit indicators, and global auto-approve controls.

Copilot Code Review to Consume Actions Minutes

Starting June 1, 2026, each Copilot code review on private repositories will consume GitHub Actions minutes at standard per-minute rates, in addition to AI Credits. Public repository reviews remain free. Impacts Pro, Pro+, Business, and Enterprise plans.

GitHub Availability Post-Mortem: 30× Capacity Planning

GitHub’s April 28 availability update is unusually candid about the scale challenges ahead. The report reveals that agentic development workflows have driven exponential growth since December 2025, and GitHub is now engineering for 30× current capacity. Actions taken include migrating webhooks out of MySQL, redesigning auth/authz flows, isolating critical services, moving performance-sensitive code from Ruby to Go, and beginning a multi-cloud strategy. The priority order is explicit: availability > capacity > new features.

Copilot CLI: Auto Model Selection GA, gh Skill, and MCP Governance

Auto model selection is now GA in Copilot CLI, dynamically routing to GPT-5.4, GPT-5.3-Codex, Sonnet 4.6, or Haiku 4.5 based on plan and policies. Auto model selection carries a 10% discount on the model multiplier — a subtle but meaningful incentive.
gh skill launched in GitHub CLI v2.90.0 for discovering, installing, publishing, and pinning portable agent skills from GitHub repositories. Skills follow the open Agent Skills specification (agentskills.io) and work across Copilot, Claude Code, Cursor, Codex, and Gemini CLI.
Custom MCP registry allowlists in public preview let enterprise admins enforce MCP server allowlists in Copilot CLI, preventing unauthorized server usage in terminal-based workflows.

OpenClaw: After Hours at GitHub — June 3

GitHub announced OpenClaw: After Hours at GitHub HQ in San Francisco on June 3, 2026 during Microsoft Build 2026. The event features a fireside chat with OpenClaw creator Peter Steinberger, a maintainer panel, lightning talks, and a happy hour. OpenClaw has surpassed 350,000 GitHub stars.

Platform: Code Scanning API Change, Data Residency, and Rule Insights

code_scanning_upload field will be removed from the /rate_limit API on May 19, 2026. Code scanning uploads continue under the standard core rate limit.
Data residency (US and EU) and FedRAMP are now available for Copilot, addressing compliance requirements for regulated industries and government customers.
Rule insights dashboard in repository Settings > Rules provides a visual overview of ruleset evaluation activity — successes, failures, bypasses over time.

What to Watch

Preview bill experience (early May). GitHub is rolling out a preview bill in early May so enterprise admins can see projected costs before the June 1 transition. Watch for it in your Copilot billing settings.
VS Code Agents app expansion. The companion Agents app continues to add capabilities — Claude agent, shared state with VS Code Insiders, web client at insiders.vscode.dev/agents. Expect more agent hosting surfaces in the coming weeks.
Code review Actions minute consumption. If your teams rely on Copilot code review on private repos, now is the time to audit expected Actions minute consumption. The June 1 switchover is less than four weeks away.
GPT-5.2 removal window. Deprecation is June 1, but actual model removal may happen rapidly after that date. Migrate any pinned GPT-5.2 workflows now.
OpenClaw: After Hours. June 3 at GitHub HQ during Microsoft Build. If you are attending Build, this is worth the side trip.

The last week of April marks a pivot point for the Copilot ecosystem. The headline stories — VS Code 1.118’s token efficiency, GPT-5.5 GA, and cloud agent performance — matter individually. But the through-line is the June 1 transition: every release this week is, in some form, preparing for the usage-based billing world. The token optimization in VS Code 1.118 is not just a performance improvement — it is the first visible sign that the product economics are being rearchitected alongside the billing model.

Check back next week for more across the Copilot ecosystem. 🎩