agy-bridge: Letting Claude Code Delegate Heavy Work to Gemini
- MCP
- Claude Code
- Gemini
- AI tooling
- TypeScript
- agy-bridge
Claude Code is excellent at orchestrating work, but every file it reads lands in its context window. Ask it to dig through a 3,000-line log file or a year of git history and you burn tokens on raw material, not answers. Meanwhile, Google's Antigravity CLI (agy) sits on my machine with its own model quota and full local file access.
agy-bridge connects the two. It's an MCP server that lets Claude Code delegate heavy work to agy — the large content stays on Gemini's side, and only the synthesized answer comes back. It's on npm as agy-bridge, runnable with zero install:
{
"mcpServers": {
"agy-bridge": {
"command": "npx",
"args": ["-y", "agy-bridge"]
}
}
}
The six tools
Rather than one generic "ask Gemini" tool, the bridge exposes six purpose-built tools, each with its own model routing chain:
analyze_files— file analysis for anything over ~200 lines or spanning more than 3 files: logs, dumps, generated code, cross-file comparisons.deep_search— codebase archaeology:git log/diff/blamespelunking, repo-wide greps, "when and why did X change."web_lookup— documentation and error-code lookups, returning source URLs.adversarial_review— plan and code review explicitly prompted for critical flaws, no praise padding. This one routes to the strongest available model (Gemini 3.1 Pro, falling back to Claude Opus in thinking mode) because a second model family catches what the first one misses.follow_up— continue a previous delegation by session ID without resending any context.delegate— the generic escape hatch with full file, shell, and web access.
Purpose-built tools matter for a non-obvious reason: tool descriptions are how the calling model decides when to delegate. "Use this whenever a file is over 200 lines" gives Claude a concrete trigger; a generic tool gets forgotten.
Architecture
The whole pipeline is intentionally boring:
Claude Code → agy-bridge (MCP over stdio) → execFile → agy CLI → Gemini
TypeScript, the official @modelcontextprotocol/sdk, zod for input schemas, tsup for the ESM bundle, vitest for tests. The bridge builds CLI flags, spawns agy per request, and returns stdout. No daemon, no state of its own.
Two design decisions did most of the heavy lifting:
Session continuity without storing sessions. agy persists conversations on disk. After each run, the bridge reads agy's conversation cache, resolves the session ID for the working directory, and appends it to every response in a structured trailer:
[agy-bridge] model: Gemini 3.5 Flash (High) | session: 4514bc3e-… (use follow_up to continue)
Claude Code reads that trailer and can call follow_up with the ID — the full prior context is already on agy's side, so a follow-up question costs almost nothing. The bridge itself stays stateless.
Model routing with graceful degradation. On first use, the bridge runs agy models, parses the available list, and caches it. Each tool declares a preference chain (e.g. adversarial_review: Gemini 3.1 Pro High → Claude Opus 4.6 Thinking → Gemini 3.5 Flash High) and gets the first available match, with AGY_DEFAULT_MODEL as a user override. Models come and go in preview CLIs; hardcoding one name would break weekly.
The hard parts
The stdin hang
The first end-to-end test hung forever. agy polls stdin until EOF, so spawning it from Node with the default pipes leaves the input stream open and the process waiting indefinitely. The fix is two lines that took an evening to find:
const promise = execFile(cmd, args, opts);
promise.child.stdin?.end(); // agy waits for EOF — close it immediately
If you wrap any interactive-capable CLI in a child process, close stdin explicitly unless you're feeding it.
Silent fallback defeats the whole point
The nastiest failure mode wasn't a crash — it was success theater. When a delegation failed (CLI not installed, quota hit), Claude Code would quietly shrug and do the heavy work itself, reading the giant files into its own context. The delegation tool failing silently re-created the exact problem it exists to solve.
v0.2.0 added AGY_ON_FAILURE=strict. In strict mode, an error response carries an explicit instruction: do NOT perform this work yourself; report the failure to the user. It sounds blunt, but instructing the calling model what to do on failure turns out to be a legitimate, necessary part of an MCP tool's contract.
Unbounded output
A delegated "summarize this repo" can come back with 200KB of text — blowing up the very context window the tool protects. Responses are capped at 50,000 characters (configurable via AGY_MAX_OUTPUT_CHARS), with a truncation notice explaining how to raise the cap.
Testing a CLI wrapper without the CLI
All 35 unit tests run without agy installed or any network. The runner takes its process-execution functions through a small RunnerDeps interface, so tests inject fakes and assert on the exact flags built, truncation behavior, model-table parsing, and strict-mode error text. The only thing not covered by unit tests is agy itself — that's one manual E2E check per release.
Does it actually save tokens?
The articles on this very blog are an example: research for the project write-ups was delegated as deep_search calls. Each one had Gemini crawl an entire repo — dozens of files, git history — and return a two-page factual summary. Claude's context received only the summaries. The raw material never crossed the wire.
If you run Claude Code alongside the Antigravity CLI, try it: npx -y agy-bridge. Source on GitHub.