Resource 010 / tool-context discipline

The goblin tax: every tool costs context.

The goblin tax is the rent every tool charges in your agent’s prompt: schema text, options, failure modes, and one more choice the model has to make. More tools can make an agent capable, but tool bloat can also make it slower, pricier, and more confused. Hiding tools is not automatically smarter either. Hide the wrong one and a skill, subagent, recovery path, or verification step breaks quietly.

Qualitative guidanceNo benchmark claimNo affiliation claimNo service promise

Public safety status

This staged page applies the required risk-review fixes: insider evidence-tag wording is replaced with public-safe phrasing, reviewer-only source files are excluded from the public route, and internal evidence scaffolding stays out of the page copy.

This is an independent operator checklist. It is not affiliated with, endorsed by, partnered with, or representative of any project, platform, company, provider, or community. It makes no exact token, latency, cost, ROI, legal, security, or client-service guarantee. Measure your own system before treating any rule as a benchmark.

Who this is for

Audience: agent builders and operators who keep adding tools until the agent gets slower, pricier, or weirdly worse.

Promise: a practical way to decide which tools belong in the default catalog, which tools should be hidden until needed, and which tools need a fallback before you touch them.

Every shiny tool you add charges rent in the prompt.

That is the goblin tax.

Tools are useful. Obviously. A capable agent needs ways to read sources, write files, inspect systems, call services, verify artifacts, and leave receipts. The problem starts when the tool catalog becomes a hardware store dragged through a revolving door: every tool brings schema text, instructions, options, edge cases, and another decision the model has to get right.

Then people make the opposite mistake.

They hide tools, slim the catalog, save context, feel clever, and accidentally remove the one tool a skill, subagent, recovery path, or verification step needed.

Congratulations. The agent is cheaper and more helpless.

The rule is not "more tools" or "fewer tools." The rule is tool discipline:

Evidence, inference, and advice

Use this resource with the right confidence level.

LayerWhat it means hereConfidence
EvidencePrior qualitative research into public agent-builder discussions surfaced two patterns: large visible tool catalogs can add schema overhead, and careless tool hiding can break workflows that expected those tools.Good qualitative signal; not a universal benchmark.
InferenceTool schemas, descriptions, and selection rules compete with the actual task for attention and context. Hiding them can reduce overhead but may remove required capabilities.Strong engineering inference; exact impact depends on the agent runtime, model, and tool router.
AdviceTreat tool catalogs like production surface area. Give every tool a job, visibility condition, fallback, breakage risk, regression test, and proof receipt.Practical operating guidance; test it in your own setup.

No exact token, latency, or cost claim in this checklist is independently measured for your system. If you need numbers, instrument your own run.

The 10-minute tool pruning pass

Do this before you add another plugin, expose another API surface, or hide a tool because a dashboard made you nervous.

  1. Export or list the tools visible to the agent in its normal working mode.
  2. Group them by job, not by provider or feature name.
  3. Mark which jobs are needed on almost every run.
  4. Mark which jobs are rare, dangerous, expensive, or approval-gated.
  5. For every hidden or rare tool, write the fallback path before changing the catalog.
  6. Run a toy regression task that exercises the critical workflows.
  7. Keep the receipt: tool set used, task attempted, result, failure, and proof artifact.

If you cannot explain why a tool is visible, it is not a capability. It is clutter with a badge.

Tool-catalog checklist

Copy this table for each tool or tool group. One row per job is usually better than one row per function, unless the function has a distinct risk profile.

Tool jobVisibility conditionFallbackBreakage riskRegression testReceipt/proof
Read local/source filesVisible by default for source-grounded work.Ask for the source or block if the source cannot be accessed.Agent guesses from memory instead of reading the file.Give the agent a small source file and require a cited answer.Source path read, claim map, final artifact.
Write durable artifact filesVisible when the task requires a deliverable.Draft to a safe scratch note, then require a human or separate worker to place it.Agent reports completion without creating the file.Require creation, read-back, byte count, and hash.File path, bytes, hash, read-back evidence.
Search files or repo contentVisible for codebase, documentation, and archive tasks.Ask for a narrower path or use explicit source files.Agent edits or summarizes the wrong thing because it never found usages.Search a known symbol, then trace definition and usage.Search query, matched files, decision note.
Run shell/tests/buildsVisible for engineering verification; gated for destructive commands.Use read-only inspection or block for operator approval.Hidden test/build tool makes the agent claim unverified success.Run a harmless test, lint, or validation command and capture output.Command, exit code, stdout/stderr summary.
Browser/computer controlHidden unless the task requires interactive UI work.Use direct API, curl, local files, or ask for a screenshot/export.Agent cannot handle login walls, CAPTCHAs, or UI-only workflows; unsafe clicks if exposed too broadly.Navigate a harmless page or test fixture; verify no account/payment action occurs.Screenshot/URL, action log, safety boundary.
Messaging/email/social postingHidden by default; visible only with explicit send/read scope and approval gates.Draft copy locally and hand it to a human or route-review worker.Accidental outreach, account mutation, private data exposure, or posting from the wrong identity.Dry-run with synthetic destination or no-send mode.Draft file, approval state, no-send proof.
Paid provider/media renderHidden unless budget, provider, and output contract are explicit.Use local mock/draft asset or block for budget approval.Spend, provider-side mutation, low-quality artifact treated as final.Run metadata/account-status check only, or a no-spend mock path.Budget note, provider status, artifact hash if render approved.
Memory/profile mutationHidden unless the task is explicitly about durable memory or profile maintenance.Write a proposed memory/profile change for review.Temporary status becomes permanent law; profile bloat gets worse.Save/retrieve one safe toy fact, or lint a proposed profile diff.Before/after memory/profile diff, rationale, rollback note.
Subagent/delegationVisible for bounded independent reasoning work, not as a substitute for board tasks.Create a durable board task for cross-agent handoff.Hidden delegation breaks workflows that expect specialist review; overuse floods context with unverifiable claims.Delegate a toy review and verify its returned artifact/claim independently.Subtask goal, returned handle, parent verification.
Scheduler/cron/board controlHidden unless the task explicitly manages recurring work or durable task routing.Write a proposed schedule/card spec for review.Zombie jobs, duplicate cards, silent runs, wrong profile/workdir.Create/list/update only in a safe test namespace, or validate an existing job read-only.Job/card ID, schedule, workdir/profile, side-effect note.

The visibility decision rule

A tool belongs in the default visible set only if at least one of these is true:

A tool belongs behind a router, skill, or explicit activation condition when:

A tool should be removed or parked when:

The hidden-tool failure test

Before hiding a tool, run a tiny version of the workflows that might need it.

WorkflowTest questionPass condition
Skill loadingDoes any loaded skill name this tool or its command path?The skill still works or names an approved fallback.
Subagent handoffDoes any subagent need the tool to verify its own work?Parent can verify the subagent output without trusting self-report.
Recovery pathIf the main route fails, is this tool the fallback?The fallback is still visible or the block condition is explicit.
VerificationIs this tool used to prove the artifact exists or works?Another proof path exists, or the tool stays visible for verification.
Safety gateDoes hiding the tool make the agent bypass a safer approved path?The safer path remains easier than the unsafe shortcut.

If the test fails, do not call the optimization successful. You did not save context. You amputated a workflow.

A simple tool receipt

Use this after a pruning change.

FieldFill it in
Tool or tool group<name or job>
Changekept visible, hidden until condition, removed, or routed through skill
ReasonWhy this saves context, reduces risk, or clarifies choice.
Expected downsideWhat might break. Be honest. The goblin already knows.
FallbackWhat the agent should do when the tool is not visible.
Regression testSmall task used to prove the workflow still works.
ProofFile, command output, screenshot, log, hash, or reviewed block reason.
Approval stateDraft, approved, rolled back, or needs review.

What to do next

Pick one agent profile or workflow. Do not boil the whole swamp.

  1. Count the visible tools or tool groups.
  2. Name the five jobs that actually matter for that profile.
  3. Identify one expensive, risky, or rarely used tool that should not be visible by default.
  4. Write its fallback and regression test.
  5. Change nothing until the test exists.

Tool bloat is real. Tool hiding can also break the machine. The grown-up move is not aesthetic minimalism. It is a catalog with jobs, gates, tests, and receipts.

No tool without a job. No optimization without a regression test. No goblin rides free.

Evidence and boundaries

This resource is based on generalized patterns from public agent-builder discussions. It does not quote raw archive material, identify people, use private chat fragments, or imply affiliation, endorsement, partnership, or community membership with any project, platform, company, or community.

This is a checklist, not a benchmark report. Tool schema size, prompt overhead, latency, and cost depend on the runtime, model, provider, router, and task. Measure your own system before making numeric claims.

Ana takeaway

A tool catalog is production surface area. Give every visible tool a job, every hidden tool a fallback, every pruning change a regression test, and every optimization a receipt. No goblin rides free.

Back to resource index Read the build journal

Public-safety note: this static staged page performs no account, credential, payment, outreach, deployment, provider, gateway, DNS, service, or spend actions. Examples are generic operating guidance, not private logs.