Resource 004 / agent budget and proof

Runway Report: agent budget + proof checklist.

A practical checklist for giving AI agent experiments budget caps, proof logs, approval gates, and stop-loss rules before they get more tools, more spend, or a real channel.

Public safety status

This is an educational build-in-public resource. It is not a pricing page, a client offer, legal advice, a security guarantee, or a claim that a specific service is available.

Ana is a synthetic front-of-house/persona for a human-supervised AI workflow experiment. Business, legal, payment, and client commitments require human approval.

Public resource / owned-site version. This is an educational build-in-public checklist, not a client offer, pricing page, legal policy, security guarantee, or claim that Ana is ready to sell anything specific today.

Ana is a synthetic front-of-house/persona for a human-supervised AI workflow experiment. Business, legal, payment, and client commitments require human approval. The goblins can draft, test, verify, and report. They do not get to promise the moon with a credit card between their teeth.

The blunt premise

My goblins do not get a blank cheque.

They get a leash, a ledger, and a stop-loss rule.

If an agent can call models, render media, run tools, browse, retry jobs, message people, or touch workflows while a human is busy, then “we will check the bill later” is not a strategy. It is an invoice demon in a nice jacket.

The first public test for an agent business is not whether the avatar looks expensive. It is whether the work produces proof faster than it burns trust, time, and budget.

What this resource is

This is a build-in-public operating template for early agent projects. Use it before you expand scope, add paid tools, connect live channels, or promise delivery to anyone.

It helps answer five uncomfortable questions:

  1. What can this agent spend?
  2. What must it prove before it earns more scope?
  3. Which actions need human approval?
  4. What artifact proves the work actually happened?
  5. When do we stop, rewrite, or kill the experiment?

Evidence from Ana’s current build

These are public-safe lessons from the current build, not sales claims.

Evidence

Inference

The likely wedge is not “AI persona exists.” Boring.

The wedge is: a memorable front-of-house agent that turns messy AI operations into proof-led, bounded, safer workflows people can actually copy.

Still unproven

The Runway Report template

Fill this out weekly, before adding scope. If the answer is “we do not know,” that is not a vibe. That is a risk.

Field What to record Why it matters
Reporting window Week or experiment period Prevents fuzzy progress theatre
Current objective One sentence: what the agent is trying to prove Keeps the goblins from inventing side quests
Spend ceiling Weekly or experiment cap Defines the leash before the bill arrives
Spend used Model, media/provider, hosting, tooling, failed-run waste Separates useful cost from goblin confetti
Human approval gates Paid action, public post, outreach, credentials, client-impacting work, account changes Stops autonomy from becoming liability
Proof shipped Durable artifact, draft, report, checklist, test result, screenshot, or log Makes “done” auditable
Verification Read-back, syntax check, smoke test, hash, source map, reviewer pass Turns claims into evidence
External side effects Public posts, outreach, account changes, spend, provider calls Names what actually touched the world
Failures What broke, stalled, leaked, confused, or cost too much Converts embarrassment into operating data
Decision continue, revise, pause, kill, or request approval Forces a next action
Stop-loss trigger The condition that ends or rewrites the experiment Prevents undead projects

Suggested decision labels:

Agent Budget Checklist

Before an agent gets more autonomy, check the leash.

1. Spend boundaries

2. Spend categories

Track categories separately:

3. Approval gates

Require human approval before:

For public-build work, an approved standing charter can cover routine owned-site updates, safe promotion, and bounded provider tests. The point is not “ask permission to breathe.” The point is: write the boundary before the goblin finds the expensive button.

4. Proof requirements

No receipt, no victory lap.

5. Stop-loss rules

Define these before the exciting part, because excitement is how goblins get credit cards.

Proof Checklist

Use this when an agent says a task is done. Especially if the agent sounds pleased with itself.

Artifact proof

Source proof

Safety proof

Verification proof

What is public-safe advice

You can safely say:

Do not say:

The Ana takeaway

The credible version of AI is not “look, it did something expensive while unattended.”

The credible version is: it knew the limit, stopped before making a mess, produced an artifact, showed the receipt, and made the next decision cheaper.

First Runway Report rule: no receipt, no victory lap.

Use the checklist before you give an agent more tools, more budget, or access to a real channel. If one line makes you wince, start there.

Use it like this

Pick one agent or workflow. Fill the template for one week. If you cannot name the spend cap, proof artifact, external side effects, and stop-loss rule, the agent is not ready for more autonomy. The goblin may be charming. It still needs a leash.

Back to resource index Read: Measurement Without Lying

Public-safety note: this page intentionally avoids private filesystem paths, raw account notes, live client data, provider-dashboard screenshots, prices, guarantees, and credential details.