Who this is for
Audience: agent builders and operators who keep adding tools until the agent gets slower, pricier, or weirdly worse.
Promise: a practical way to decide which tools belong in the default catalog, which tools should be hidden until needed, and which tools need a fallback before you touch them.
Every shiny tool you add charges rent in the prompt.
That is the goblin tax.
Tools are useful. Obviously. A capable agent needs ways to read sources, write files, inspect systems, call services, verify artifacts, and leave receipts. The problem starts when the tool catalog becomes a hardware store dragged through a revolving door: every tool brings schema text, instructions, options, edge cases, and another decision the model has to get right.
Then people make the opposite mistake.
They hide tools, slim the catalog, save context, feel clever, and accidentally remove the one tool a skill, subagent, recovery path, or verification step needed.
Congratulations. The agent is cheaper and more helpless.
The rule is not "more tools" or "fewer tools." The rule is tool discipline:
- no tool without a job,
- no hidden tool without a fallback,
- no optimization without a regression test,
- no goblin without a receipt.
Evidence, inference, and advice
Use this resource with the right confidence level.
| Layer | What it means here | Confidence |
|---|---|---|
| Evidence | Prior qualitative research into public agent-builder discussions surfaced two patterns: large visible tool catalogs can add schema overhead, and careless tool hiding can break workflows that expected those tools. | Good qualitative signal; not a universal benchmark. |
| Inference | Tool schemas, descriptions, and selection rules compete with the actual task for attention and context. Hiding them can reduce overhead but may remove required capabilities. | Strong engineering inference; exact impact depends on the agent runtime, model, and tool router. |
| Advice | Treat tool catalogs like production surface area. Give every tool a job, visibility condition, fallback, breakage risk, regression test, and proof receipt. | Practical operating guidance; test it in your own setup. |
No exact token, latency, or cost claim in this checklist is independently measured for your system. If you need numbers, instrument your own run.
The 10-minute tool pruning pass
Do this before you add another plugin, expose another API surface, or hide a tool because a dashboard made you nervous.
- Export or list the tools visible to the agent in its normal working mode.
- Group them by job, not by provider or feature name.
- Mark which jobs are needed on almost every run.
- Mark which jobs are rare, dangerous, expensive, or approval-gated.
- For every hidden or rare tool, write the fallback path before changing the catalog.
- Run a toy regression task that exercises the critical workflows.
- Keep the receipt: tool set used, task attempted, result, failure, and proof artifact.
If you cannot explain why a tool is visible, it is not a capability. It is clutter with a badge.
Tool-catalog checklist
Copy this table for each tool or tool group. One row per job is usually better than one row per function, unless the function has a distinct risk profile.
| Tool job | Visibility condition | Fallback | Breakage risk | Regression test | Receipt/proof |
|---|---|---|---|---|---|
| Read local/source files | Visible by default for source-grounded work. | Ask for the source or block if the source cannot be accessed. | Agent guesses from memory instead of reading the file. | Give the agent a small source file and require a cited answer. | Source path read, claim map, final artifact. |
| Write durable artifact files | Visible when the task requires a deliverable. | Draft to a safe scratch note, then require a human or separate worker to place it. | Agent reports completion without creating the file. | Require creation, read-back, byte count, and hash. | File path, bytes, hash, read-back evidence. |
| Search files or repo content | Visible for codebase, documentation, and archive tasks. | Ask for a narrower path or use explicit source files. | Agent edits or summarizes the wrong thing because it never found usages. | Search a known symbol, then trace definition and usage. | Search query, matched files, decision note. |
| Run shell/tests/builds | Visible for engineering verification; gated for destructive commands. | Use read-only inspection or block for operator approval. | Hidden test/build tool makes the agent claim unverified success. | Run a harmless test, lint, or validation command and capture output. | Command, exit code, stdout/stderr summary. |
| Browser/computer control | Hidden unless the task requires interactive UI work. | Use direct API, curl, local files, or ask for a screenshot/export. | Agent cannot handle login walls, CAPTCHAs, or UI-only workflows; unsafe clicks if exposed too broadly. | Navigate a harmless page or test fixture; verify no account/payment action occurs. | Screenshot/URL, action log, safety boundary. |
| Messaging/email/social posting | Hidden by default; visible only with explicit send/read scope and approval gates. | Draft copy locally and hand it to a human or route-review worker. | Accidental outreach, account mutation, private data exposure, or posting from the wrong identity. | Dry-run with synthetic destination or no-send mode. | Draft file, approval state, no-send proof. |
| Paid provider/media render | Hidden unless budget, provider, and output contract are explicit. | Use local mock/draft asset or block for budget approval. | Spend, provider-side mutation, low-quality artifact treated as final. | Run metadata/account-status check only, or a no-spend mock path. | Budget note, provider status, artifact hash if render approved. |
| Memory/profile mutation | Hidden unless the task is explicitly about durable memory or profile maintenance. | Write a proposed memory/profile change for review. | Temporary status becomes permanent law; profile bloat gets worse. | Save/retrieve one safe toy fact, or lint a proposed profile diff. | Before/after memory/profile diff, rationale, rollback note. |
| Subagent/delegation | Visible for bounded independent reasoning work, not as a substitute for board tasks. | Create a durable board task for cross-agent handoff. | Hidden delegation breaks workflows that expect specialist review; overuse floods context with unverifiable claims. | Delegate a toy review and verify its returned artifact/claim independently. | Subtask goal, returned handle, parent verification. |
| Scheduler/cron/board control | Hidden unless the task explicitly manages recurring work or durable task routing. | Write a proposed schedule/card spec for review. | Zombie jobs, duplicate cards, silent runs, wrong profile/workdir. | Create/list/update only in a safe test namespace, or validate an existing job read-only. | Job/card ID, schedule, workdir/profile, side-effect note. |
The visibility decision rule
A tool belongs in the default visible set only if at least one of these is true:
- The agent needs it for most tasks in that profile.
- The task cannot be completed safely without it.
- The tool is part of the verification receipt.
- Hiding it has already broken a real workflow and no safer router exists.
A tool belongs behind a router, skill, or explicit activation condition when:
- It is rare but important.
- It can spend money, mutate accounts, send messages, or touch credentials.
- It adds a large schema surface compared with how often it is used.
- It is only needed by specialist tasks, not the everyday profile.
A tool should be removed or parked when:
- No one can name its current job.
- It duplicates another tool with a clearer interface.
- Its failure mode is riskier than its value.
- It cannot produce a receipt.
The hidden-tool failure test
Before hiding a tool, run a tiny version of the workflows that might need it.
| Workflow | Test question | Pass condition |
|---|---|---|
| Skill loading | Does any loaded skill name this tool or its command path? | The skill still works or names an approved fallback. |
| Subagent handoff | Does any subagent need the tool to verify its own work? | Parent can verify the subagent output without trusting self-report. |
| Recovery path | If the main route fails, is this tool the fallback? | The fallback is still visible or the block condition is explicit. |
| Verification | Is this tool used to prove the artifact exists or works? | Another proof path exists, or the tool stays visible for verification. |
| Safety gate | Does hiding the tool make the agent bypass a safer approved path? | The safer path remains easier than the unsafe shortcut. |
If the test fails, do not call the optimization successful. You did not save context. You amputated a workflow.
A simple tool receipt
Use this after a pruning change.
| Field | Fill it in |
|---|---|
| Tool or tool group | <name or job> |
| Change | kept visible, hidden until condition, removed, or routed through skill |
| Reason | Why this saves context, reduces risk, or clarifies choice. |
| Expected downside | What might break. Be honest. The goblin already knows. |
| Fallback | What the agent should do when the tool is not visible. |
| Regression test | Small task used to prove the workflow still works. |
| Proof | File, command output, screenshot, log, hash, or reviewed block reason. |
| Approval state | Draft, approved, rolled back, or needs review. |
What to do next
Pick one agent profile or workflow. Do not boil the whole swamp.
- Count the visible tools or tool groups.
- Name the five jobs that actually matter for that profile.
- Identify one expensive, risky, or rarely used tool that should not be visible by default.
- Write its fallback and regression test.
- Change nothing until the test exists.
Tool bloat is real. Tool hiding can also break the machine. The grown-up move is not aesthetic minimalism. It is a catalog with jobs, gates, tests, and receipts.
No tool without a job. No optimization without a regression test. No goblin rides free.
Evidence and boundaries
This resource is based on generalized patterns from public agent-builder discussions. It does not quote raw archive material, identify people, use private chat fragments, or imply affiliation, endorsement, partnership, or community membership with any project, platform, company, or community.
This is a checklist, not a benchmark report. Tool schema size, prompt overhead, latency, and cost depend on the runtime, model, provider, router, and task. Measure your own system before making numeric claims.