Jun 12, 20263 min read

Hooks vs. Instructions: What Building (and Un-Building) claude-to-agy Taught Me

MCP
Claude Code
Python
AI tooling
agents
claude-to-agy

Before agy-bridge, there was claude-to-agy — a Python MCP bridge (v1.0.0 on PyPI, FastMCP, runnable via uvx claude-to-agy) exposing a single delegate_to_agy tool that ships heavy work from Claude Code to the Antigravity CLI. It worked. But its real value was as a first draft: the most instructive feature in its history is one I added, watched fail, and deleted.

The deleted feature: enforcement by interception

The delegation rules live in CLAUDE.md/SKILL.md — instructions the agent is supposed to follow. But subagents spawned by Claude Code don't reliably inherit project instructions, so they'd cheerfully run grep across a repo or an unbounded git log, burning exactly the tokens the bridge exists to save.

My fix was muscular: a PreToolUse hook intercepting every Bash call and hard-blocking commands matching grep, git diff, git log — exit 1, command refused, go delegate instead.

It lasted three weeks. The post-mortem writes itself:

Substring matching can't read intent. The rules explicitly permit bounded queries — git log -n 5, git diff --stat are cheap and correct to run directly. The hook couldn't tell git log -n 5 from git log --all -p and blocked both. The policy had nuance; the enforcer had string.contains.
It broke workflows it had never met. Internal Claude Code operations and legitimate subagent flows need to inspect changes. A blanket block doesn't redirect those to delegation — it just fails them, aggressively, mid-task.
It made installation hostile. The hook required hand-editing ~/.claude/settings.json globally — config debt in every user's home directory for a tool meant to be uvx-and-go.

The deletion taught the actual lesson: when your user is a language model, instructions are not the weak substitute for enforcement — they're the more capable mechanism. A rule like "delegate unbounded history digs; run bounded ones yourself" is trivial for a model to apply and nearly impossible to express as command-string regexes. I was using 2010s tooling instincts (lint it! block it in CI!) on a 2020s interface whose whole point is judgment. Guardrails that can't read intent will always be simultaneously too strict and too weak.

What survived: the consume-vs-operate test

The routing heuristic from claude-to-agy's rule files is the piece I still use everywhere:

Operate-on: if the agent will edit the thing, it must read it directly — even a 2,000-line file — because you can't safely edit what you know only from a summary.
Consume-and-discard: if the agent needs a verdict — search results, history archaeology, a review, research — delegate, because only the conclusion needs to enter context.

One question — "will the raw content be edited, or only concluded from?" — routes nearly every task correctly. It's the distilled answer to "when should an agent outsource its own reading?"

The rewrite, justified

claude-to-agy ended life well-engineered — src/ layout, Hatchling, ruff and pyright strict, Nox matrices across Python 3.10–3.14 — and that's precisely what made the conclusion legible: the ceiling wasn't code quality, it was shape. One generic delegate_to_agy(prompt, cwd, files) tool puts the burden of phrasing on the caller every time, where purpose-built tools with trigger-condition descriptions make delegation the path of least resistance. No session continuity meant every follow-up re-shipped context. Those are interface redesigns, not patches — hence agy-bridge: six tools, model routing, session trailers, strict failure mode.

The arc generalizes to most agent tooling being built right now: start by writing rules, resist the urge to bolt on interceptors when the rules occasionally miss, and when the tool's shape is wrong, reshape the tool instead of policing the agent. The first version's job is to teach you that; claude-to-agy did it well enough that I could throw it away.