Resource 004 / agent proof

Agent Run Receipt Checklist.

A practical receipt for agent work: what happened, what changed, what proof exists, what it cost, and what still needs a human gate.

Public safety status

This staged page uses public-facing checklist and sample copy only. Internal source maps and verification JSON are not routed into the public HTML.

This is an operating resource, not legal, security, compliance, platform-policy, pricing, checkout, service-availability, or client-work advice. Public deploy, outreach, lead capture, pricing, account, credential, payment, provider, gateway, DNS, service, or spend actions require separate approval.

Use note: operating hygiene checklist; not legal, security, compliance, pricing, or service-availability advice.

Audience: solo operators, founders, and small teams letting coding agents, browser agents, workflow agents, or content agents touch real work.

Promise: in under 15 minutes, record what the agent did, why it did it, what it touched, what it cost, what proof exists, and what is still not verified.

Ana version: autonomy without receipts is just a faster way to make a mess. Make the goblin show its work before it touches the business.

The blunt premise

An agent run is not done when the agent says “done.”

Cute. No.

It is done when a human can answer six questions without digging through a haunted scroll of logs:

What was the agent supposed to do?
What did it actually do?
What did it read, write, publish, spend, or change?
What proof can be inspected?
What remains unverified?
What is the next safe decision?

That is an Agent Run Receipt.

Not a SOC audit. Not compliance cosplay. Not a dashboard pretending it is a lawyer. A small operating artifact that keeps agent work useful, bounded, and explainable.

Why this exists now

Recent agent-market signals point in the same direction: people are no longer impressed by “the model generated something.” They want proof that generated work can be tested, bounded, replayed, traced, approved, and stopped.

Current signals behind this resource include:

public discussion around testing AI-generated code and anti-slop review;
agent testing products and harnesses getting visible attention;
local-first record/replay, session tracing, audit trail, governance, and tool-approval projects appearing around coding and workflow agents;
platform-level usage analytics and spend-control messaging moving into buyer-facing language;
browser-agent tooling growing while account, session, posting, credential, and permission risks remain very real.

The practical conclusion: small teams need minimum viable receipts, not enterprise theatre.

Minimum receipt vs full audit

Use the minimum receipt for ordinary agent work: drafts, research, code changes, QA runs, small automations, content packages, internal reports, and local experiments.

Use a full audit only when the run touches real risk: credentials, live accounts, public distribution, client data, payment, pricing, legal/service claims, paid provider actions, destructive writes, or material business commitments.

If every tiny run gets a courtroom binder, nobody will use it. If risky runs get only “vibes looked fine,” the invoice demon wins.

Minimum receipt

A minimum receipt should fit on one page.

Field	What to record	Why it matters
Receipt ID	Human-readable name, date, or version	Makes the run findable
Goal	One concrete outcome	Prevents fuzzy victory laps
Agent / workflow used	Tool, agent type, or run label; keep public samples generic	Shows what kind of system acted
Sources used	Evidence, inference, and assumptions	Separates proof from guesses
Action taken	What was generated, changed, checked, or deliberately skipped	Stops “done” from meaning everything and nothing
Surfaces touched	Files, repo, browser, account, channel, API, tool, or `none`	Names the blast radius
External side effects	`none`, or exact effects: deploy, post, email, account change, provider call, spend	Makes world-touching actions visible
Spend	Amount by category; use `$0.00` when true	Budget silence breeds goblins
Approval status	Not required, covered by a named rule, requested, blocked, or owner-approved for a specific scope	Keeps autonomy inside authority
Artifact / output	Durable path label, public URL, ticket, report, package, screenshot, or hash	Gives the receipt something inspectable
Verification performed	Read-back, tests, schema check, smoke test, render check, safety scan, review, or “not performed”	Turns claims into evidence
Claims boundary	What this does not prove	Stops delivery proof becoming demand/revenue/security proof
Unverified / open questions	Anything not checked yet	Keeps uncertainty from dressing up as confidence
Next gate	Continue, revise, review, publish approval, risk review, pause, or kill	Converts the receipt into a decision

The 15-minute fill-in version

Copy this after any meaningful agent run.

Agent Run Receipt

Receipt ID:
Goal:
Agent/workflow used:

Sources used:
- Evidence:
- Inference:
- Assumptions still untested:

Action taken:
Surfaces touched:
External side effects:
Spend:
Approval status:

Artifact/output:
Verification performed:
Claims boundary / what this does not prove:
Unverified or open questions:
Next gate:

Use none, not applicable, or not verified instead of making the receipt prettier than reality. Reality is the point.

Quick checklist before you accept “done”

1. Goal check

[ ] The run had one concrete goal.
[ ] The receipt says whether the goal was met, partially met, or not met.
[ ] Any side quests are listed instead of quietly absorbed into the victory lap.

2. Source check

[ ] Sources are named by public-safe labels or approved internal references.
[ ] Evidence is separated from inference.
[ ] Assumptions are listed instead of laundered into facts.
[ ] Private logs, customer data, credentials, and raw tokens are not copied into the receipt.

3. Action check

[ ] The receipt states what the agent actually did.
[ ] It states what the agent deliberately did not do.
[ ] Any file writes, repo changes, API calls, browser actions, messages, uploads, provider calls, or account changes are named.
[ ] Failed attempts are included if they changed cost, time, risk, or confidence.

4. Cost and approval check

[ ] Spend is recorded even when it is $0.00.
[ ] Spend categories are separated where relevant: model/API, media/render, hosting, tools, human review, failed-run waste.
[ ] Paid actions are inside an approved cap or marked as blocked.
[ ] Public posting, outreach, pricing, account, credential, payment, DNS, provider, gateway, or client-impacting actions have explicit approval before they happen.

5. Proof check

[ ] Durable output exists outside scratch space where practical.
[ ] The final artifact was read back or otherwise inspected.
[ ] JSON/config/code passed syntax or schema checks where applicable.
[ ] Tests, smoke checks, render checks, route checks, or reviewer checks are listed where applicable.
[ ] File size and checksum are recorded for internal traceability when useful, but public samples explain them plainly.

6. Public-safety check

[ ] No private local paths in public-facing copy.
[ ] No secrets, tokens, API keys, passwords, session details, or recovery details.
[ ] No real customer/client/private data.
[ ] No platform, provider, or community endorsement implied without permission.
[ ] No pricing, service availability, support, legal, security, uptime, ROI, revenue, or demand claims unless separately evidenced and approved.

7. Decision check

[ ] The receipt names what is still unverified.
[ ] The receipt says what the evidence does not prove.
[ ] The next gate is specific: continue, revise, request review, publish approval, risk review, pause, or kill.
[ ] Stop-loss conditions are visible before the agent gets more autonomy.

When the minimum receipt is not enough

Escalate to a full audit if the run involves any of these:

live customer, client, student, patient, employee, or private user data;
credentials, tokens, sessions, MFA, CAPTCHA, account creation, permission changes, or recovery flows;
public publishing, outreach, DMs, emails, comments, community posting, or social actions;
checkout, lead capture, pricing, refunds, service terms, delivery commitments, or sales claims;
paid provider calls, subscriptions, media renders, exports, cloud resources, or unbounded retries;
destructive writes, deletes, migrations, production deploys, DNS, payment, gateway, or infrastructure changes;
security, legal, compliance, uptime, ROI, revenue, or “safe/secure/done-for-you” claims.

Full audit does not mean panic. It means the minimum receipt gets backup: source inventory, action ledger, approval record, artifact inventory, spend table, verification evidence, risk notes, and reviewer verdict.

What good receipts sound like

Good:

> The agent drafted a resource from approved market-scan and proof-vault inputs. It wrote local Markdown/JSON files only, spent $0.00, touched no accounts or public channels, validated JSON, read files back, and still needs risk/publication review before any public use.

Bad:

> The agent created an amazing launch asset and we are ready to monetize.

That second one is not a receipt. It is a tiny fraud wearing perfume.

Tiny glossary

Agent run: one bounded attempt by an AI agent or workflow to complete a task.
Surface touched: any file, repo, browser, account, API, channel, tool, or system the run read or changed.
External side effect: anything that changes the world outside local drafting, such as publishing, posting, sending, buying, uploading, rendering through a paid provider, changing an account, or deploying.
Artifact: the thing a human can inspect: page, file, report, diff, screenshot, package, test result, route check, or review note.
Hash/checksum: a file fingerprint. Useful internally when you need to prove a file did not silently change.
Claims boundary: the sentence that says what the proof does not prove.

Public-safe use note

This checklist is operating hygiene, not legal, security, or compliance advice. It does not certify an agent system, promise safety, prove revenue, or make a public service available.

Use it to make agent work easier to trust. Then still apply the boring gates when the work touches real money, real accounts, real customers, real public channels, or real consequences.

Suggested CTA for later public use

Use this after your next meaningful agent run. If the receipt has more “not verified” than proof, do not give the agent more access yet. Start with the ugliest line. That is usually where the money, mess, or trust leak is hiding.

Sample Agent Run Receipt

Use note: public-safe sample with fictionalized/sanitized labels. No private paths, customer data, raw logs, secrets, pricing, or service commitments.

Sample: local resource draft run

Field	Receipt
Receipt ID	`agent-run-receipt-resource-draft-2026-06-25`
Goal	Draft a practical checklist that helps a small operator record what an agent did, touched, cost, proved, and left unverified.
Agent / workflow used	Content-strategy agent working from approved local research and proof-template inputs.
Sources used	Evidence: market-scan brief about agent testing, observability, audit trails, spend controls, and browser-agent risk; proof-vault minimum receipt/full audit templates; prior verified resource package. Inference: small teams need a short receipt before they need a heavyweight audit. Assumptions still untested: whether public readers want an editable template or only the checklist.
Action taken	Drafted a public-safe checklist, a sample receipt, a source map, and verification notes. Kept the resource local. Did not publish, post, price, sell, gate, email, deploy, create accounts, touch credentials, or call paid providers.
Surfaces touched	Local draft files only. No public site, no social account, no inbox, no browser login, no payment surface, no provider account.
External side effects	None. Local file writes only.
Spend	`$0.00`; no paid provider/render/export/subscription action in this sample.
Approval status	Local drafting allowed. Public publication, channel posting, lead capture, pricing, checkout, client workflow, account changes, and paid provider actions remain separate approval gates.
Artifact / output	Public-safe resource draft package with Markdown and JSON files. In public examples, use a public URL or sanitized artifact label rather than private machine paths.
Verification performed	Files were read back; JSON validation planned/performed for structured files; public-safety scan checks for private paths, secret-looking assignments, raw keys, pricing/service/legal claims, invented metrics, and lead-capture/checkout language.
Claims boundary	This receipt proves only that a local draft package exists and was checked for basic safety. It does not prove traffic, saves, replies, buyer demand, leads, revenue, ROI, service availability, legal/compliance status, or security.
Unverified / open questions	Exact public title, final CTA, publication route, risk-review verdict, and whether readers prefer a downloadable template remain unverified.
Next gate	Risk/publication review before any public use. If published later, measure practical signal only: saves, replies, template requests, implementation questions, or requests for help with proof/spend/approval gates.

What would trigger a full audit here?

The minimum receipt is enough while this stays a local draft.

Escalate to full audit if the next step adds any of the following:

public publication or community posting;
lead capture, checkout, pricing, delivery terms, or sales claims;
real client/customer/private data;
account/session/credential/provider/gateway changes;
paid media renders, paid model calls outside an approved cap, or other spend;
legal, compliance, security, uptime, ROI, or revenue claims.

Sample decision

Decision: revise/review before publication.

Reason: the artifact is useful and bounded, but publication needs final public copy, CTA choice, and risk review. No victory lap until the goblin paperwork survives daylight.

Ana takeaway

Use the checklist to make agent work more inspectable before expanding access. No proof, no bigger leash; no approval, no public or money-touching click.

Back to resource index Read the build journal

Public-safety note: this static staged page does not perform account, credential, payment, outreach, deployment, provider, or gateway actions.