Use note: operating hygiene checklist; not legal, security, compliance, pricing, or service-availability advice.
Audience: solo operators, founders, and small teams letting coding agents, browser agents, workflow agents, or content agents touch real work.
Promise: in under 15 minutes, record what the agent did, why it did it, what it touched, what it cost, what proof exists, and what is still not verified.
Ana version: autonomy without receipts is just a faster way to make a mess. Make the goblin show its work before it touches the business.
The blunt premise
An agent run is not done when the agent says “done.”
Cute. No.
It is done when a human can answer six questions without digging through a haunted scroll of logs:
- What was the agent supposed to do?
- What did it actually do?
- What did it read, write, publish, spend, or change?
- What proof can be inspected?
- What remains unverified?
- What is the next safe decision?
That is an Agent Run Receipt.
Not a SOC audit. Not compliance cosplay. Not a dashboard pretending it is a lawyer. A small operating artifact that keeps agent work useful, bounded, and explainable.
Why this exists now
Recent agent-market signals point in the same direction: people are no longer impressed by “the model generated something.” They want proof that generated work can be tested, bounded, replayed, traced, approved, and stopped.
Current signals behind this resource include:
- public discussion around testing AI-generated code and anti-slop review;
- agent testing products and harnesses getting visible attention;
- local-first record/replay, session tracing, audit trail, governance, and tool-approval projects appearing around coding and workflow agents;
- platform-level usage analytics and spend-control messaging moving into buyer-facing language;
- browser-agent tooling growing while account, session, posting, credential, and permission risks remain very real.
The practical conclusion: small teams need minimum viable receipts, not enterprise theatre.
Minimum receipt vs full audit
Use the minimum receipt for ordinary agent work: drafts, research, code changes, QA runs, small automations, content packages, internal reports, and local experiments.
Use a full audit only when the run touches real risk: credentials, live accounts, public distribution, client data, payment, pricing, legal/service claims, paid provider actions, destructive writes, or material business commitments.
If every tiny run gets a courtroom binder, nobody will use it. If risky runs get only “vibes looked fine,” the invoice demon wins.
Minimum receipt
A minimum receipt should fit on one page.
| Field | What to record | Why it matters |
|---|---|---|
| Receipt ID | Human-readable name, date, or version | Makes the run findable |
| Goal | One concrete outcome | Prevents fuzzy victory laps |
| Agent / workflow used | Tool, agent type, or run label; keep public samples generic | Shows what kind of system acted |
| Sources used | Evidence, inference, and assumptions | Separates proof from guesses |
| Action taken | What was generated, changed, checked, or deliberately skipped | Stops “done” from meaning everything and nothing |
| Surfaces touched | Files, repo, browser, account, channel, API, tool, or none | Names the blast radius |
| External side effects | none, or exact effects: deploy, post, email, account change, provider call, spend | Makes world-touching actions visible |
| Spend | Amount by category; use $0.00 when true | Budget silence breeds goblins |
| Approval status | Not required, covered by a named rule, requested, blocked, or owner-approved for a specific scope | Keeps autonomy inside authority |
| Artifact / output | Durable path label, public URL, ticket, report, package, screenshot, or hash | Gives the receipt something inspectable |
| Verification performed | Read-back, tests, schema check, smoke test, render check, safety scan, review, or “not performed” | Turns claims into evidence |
| Claims boundary | What this does not prove | Stops delivery proof becoming demand/revenue/security proof |
| Unverified / open questions | Anything not checked yet | Keeps uncertainty from dressing up as confidence |
| Next gate | Continue, revise, review, publish approval, risk review, pause, or kill | Converts the receipt into a decision |
The 15-minute fill-in version
Copy this after any meaningful agent run.
Agent Run Receipt
Receipt ID:
Goal:
Agent/workflow used:
Sources used:
- Evidence:
- Inference:
- Assumptions still untested:
Action taken:
Surfaces touched:
External side effects:
Spend:
Approval status:
Artifact/output:
Verification performed:
Claims boundary / what this does not prove:
Unverified or open questions:
Next gate:
Use none, not applicable, or not verified instead of making the receipt prettier than reality. Reality is the point.
Quick checklist before you accept “done”
1. Goal check
- [ ] The run had one concrete goal.
- [ ] The receipt says whether the goal was met, partially met, or not met.
- [ ] Any side quests are listed instead of quietly absorbed into the victory lap.
2. Source check
- [ ] Sources are named by public-safe labels or approved internal references.
- [ ] Evidence is separated from inference.
- [ ] Assumptions are listed instead of laundered into facts.
- [ ] Private logs, customer data, credentials, and raw tokens are not copied into the receipt.
3. Action check
- [ ] The receipt states what the agent actually did.
- [ ] It states what the agent deliberately did not do.
- [ ] Any file writes, repo changes, API calls, browser actions, messages, uploads, provider calls, or account changes are named.
- [ ] Failed attempts are included if they changed cost, time, risk, or confidence.
4. Cost and approval check
- [ ] Spend is recorded even when it is
$0.00. - [ ] Spend categories are separated where relevant: model/API, media/render, hosting, tools, human review, failed-run waste.
- [ ] Paid actions are inside an approved cap or marked as blocked.
- [ ] Public posting, outreach, pricing, account, credential, payment, DNS, provider, gateway, or client-impacting actions have explicit approval before they happen.
5. Proof check
- [ ] Durable output exists outside scratch space where practical.
- [ ] The final artifact was read back or otherwise inspected.
- [ ] JSON/config/code passed syntax or schema checks where applicable.
- [ ] Tests, smoke checks, render checks, route checks, or reviewer checks are listed where applicable.
- [ ] File size and checksum are recorded for internal traceability when useful, but public samples explain them plainly.
6. Public-safety check
- [ ] No private local paths in public-facing copy.
- [ ] No secrets, tokens, API keys, passwords, session details, or recovery details.
- [ ] No real customer/client/private data.
- [ ] No platform, provider, or community endorsement implied without permission.
- [ ] No pricing, service availability, support, legal, security, uptime, ROI, revenue, or demand claims unless separately evidenced and approved.
7. Decision check
- [ ] The receipt names what is still unverified.
- [ ] The receipt says what the evidence does not prove.
- [ ] The next gate is specific: continue, revise, request review, publish approval, risk review, pause, or kill.
- [ ] Stop-loss conditions are visible before the agent gets more autonomy.
When the minimum receipt is not enough
Escalate to a full audit if the run involves any of these:
- live customer, client, student, patient, employee, or private user data;
- credentials, tokens, sessions, MFA, CAPTCHA, account creation, permission changes, or recovery flows;
- public publishing, outreach, DMs, emails, comments, community posting, or social actions;
- checkout, lead capture, pricing, refunds, service terms, delivery commitments, or sales claims;
- paid provider calls, subscriptions, media renders, exports, cloud resources, or unbounded retries;
- destructive writes, deletes, migrations, production deploys, DNS, payment, gateway, or infrastructure changes;
- security, legal, compliance, uptime, ROI, revenue, or “safe/secure/done-for-you” claims.
Full audit does not mean panic. It means the minimum receipt gets backup: source inventory, action ledger, approval record, artifact inventory, spend table, verification evidence, risk notes, and reviewer verdict.
What good receipts sound like
Good:
> The agent drafted a resource from approved market-scan and proof-vault inputs. It wrote local Markdown/JSON files only, spent $0.00, touched no accounts or public channels, validated JSON, read files back, and still needs risk/publication review before any public use.
Bad:
> The agent created an amazing launch asset and we are ready to monetize.
That second one is not a receipt. It is a tiny fraud wearing perfume.
Tiny glossary
- Agent run: one bounded attempt by an AI agent or workflow to complete a task.
- Surface touched: any file, repo, browser, account, API, channel, tool, or system the run read or changed.
- External side effect: anything that changes the world outside local drafting, such as publishing, posting, sending, buying, uploading, rendering through a paid provider, changing an account, or deploying.
- Artifact: the thing a human can inspect: page, file, report, diff, screenshot, package, test result, route check, or review note.
- Hash/checksum: a file fingerprint. Useful internally when you need to prove a file did not silently change.
- Claims boundary: the sentence that says what the proof does not prove.
Public-safe use note
This checklist is operating hygiene, not legal, security, or compliance advice. It does not certify an agent system, promise safety, prove revenue, or make a public service available.
Use it to make agent work easier to trust. Then still apply the boring gates when the work touches real money, real accounts, real customers, real public channels, or real consequences.
Suggested CTA for later public use
Use this after your next meaningful agent run. If the receipt has more “not verified” than proof, do not give the agent more access yet. Start with the ugliest line. That is usually where the money, mess, or trust leak is hiding.