A practical two-tier proof system: a minimum receipt for ordinary agent runs, and a full audit only when the work touches real risk.

Proof Vault: How to Make Agent Work Auditable Without Bureaucracy Sludge

Audience: Solo operators, founders, and small teams letting coding agents, browser agents, workflow agents, or content agents touch real work.

Promise: A practical two-tier proof system — a minimum receipt for ordinary runs and a full audit for risky ones — so agent work stays inspectable without turning every task into a courtroom drama.

Tone note: This is operating hygiene, not compliance theatre. If your agent's proof of work is "it said done," you don't have an agent problem. You have a receipt problem.

The blunt premise

An agent run is not done when the agent says "done."

It is done when a human can answer six questions without excavating a haunted scroll of logs:

What was the agent supposed to do?
What did it actually do?
What did it read, write, publish, spend, or change?
What proof can be inspected?
What remains unverified?
What is the next safe decision?

That is a Proof Vault receipt. Not a SOC audit. Not compliance cosplay. Not a dashboard pretending it is a lawyer. A small operating artifact that keeps agent work useful, bounded, and explainable.

Why two tiers

If every tiny run gets a courtroom binder, nobody will use the system. If risky runs get only "vibes looked fine," the invoice demon wins.

Minimum receipt: One page. Covers ordinary agent work — drafts, research, code changes, QA runs, small automations, content packages, internal reports, local experiments.

Full audit: Heavier. Only when the run touches real risk — credentials, live accounts, public distribution, client data, payment, pricing, legal/service claims, paid provider actions, destructive writes, or material business commitments.

The trick is knowing which one you need before the run starts, not after something embarrassing happens.

Minimum receipt: the fields that matter

A minimum receipt should fit on one page. Here are the fields, why they exist, and what to write in them.

Field	What to record	Why it matters
Receipt ID	Human-readable name, date, or version	Makes the run findable later
Goal	One concrete outcome	Prevents fuzzy victory laps
Sources used	Evidence, inference, and assumptions — separated	Stops guesses laundering into facts
Action taken	What was generated, changed, checked, or deliberately skipped	Stops "done" from meaning everything and nothing
Artifact / output	Durable path label, public URL, report, package, screenshot, or hash	Gives the receipt something inspectable
External side effects	`none`, or exact effects: deploy, post, email, account change, provider call, spend	Makes world-touching actions visible
Approval status	Not required, covered by a named rule, requested, blocked, or owner-approved for a specific scope	Keeps autonomy inside authority
Spend	Amount by category; use `$0.00` when true	Budget silence breeds goblins
Bytes / hash	File size and SHA-256 for durable files when practical	Proves a file did not silently change
Verification performed	Read-back, tests, schema check, smoke test, render check, safety scan, review, or "not performed"	Turns claims into evidence
Claims boundary	What this does not prove	Stops delivery proof becoming demand/revenue/security proof
Next gate	Continue, revise, review, publish approval, risk review, pause, or kill	Converts the receipt into a decision

What to write in each field

Sources used — Split into three buckets: - Evidence: Things you directly inspected, measured, or read. - Inference: Conclusions you drew from evidence. - Assumptions still untested: Things you are proceeding on but have not verified.

If a field is unknown, write unknown or not applicable. Do not decorate uncertainty with confetti.

Action taken — Be specific. "Drafted a resource" is weak. "Drafted a 4,200-word Markdown resource from three approved source files, wrote it to <project-root>/resources/, and validated JSON schema on two structured outputs" is a receipt.

External side effects — If the answer is truly none, write none. If the agent posted, emailed, deployed, spent, changed an account, or called a paid API, name it exactly.

Claims boundary — This is the sentence most people skip and the one that matters most. State what the receipt does not prove: traffic, demand, leads, revenue, ROI, legal status, security guarantees, platform endorsement, or service availability.

When the minimum receipt is not enough

Escalate to a full audit if the run involves any of these:

Live customer, client, student, patient, employee, or private user data
Credentials, tokens, sessions, MFA, CAPTCHA, account creation, permission changes, or recovery flows
Public publishing, outreach, DMs, emails, comments, community posting, or social actions
Checkout, lead capture, pricing, refunds, service terms, delivery commitments, or sales claims
Paid provider calls, subscriptions, media renders, exports, cloud resources, or unbounded retries
Destructive writes, deletes, migrations, production deploys, DNS, payment, gateway, or infrastructure changes
Security, legal, compliance, uptime, ROI, revenue, or "safe/secure/done-for-you" claims

Full audit does not mean panic. It means the minimum receipt gets backup: a source inventory, action ledger, approval record, artifact inventory, spend table, verification evidence, risk notes, and reviewer verdict. See full-audit-when-needed-summary.md for the condensed version.

What good receipts sound like

Good:

The agent drafted a resource from approved source inputs. It wrote local Markdown and JSON files only, spent $0.00, touched no accounts or public channels, validated JSON syntax, read files back from disk, and still needs risk review before any public use.

Bad:

The agent created an amazing launch asset and we are ready to monetize.

That second one is not a receipt. It is a tiny fraud wearing perfume.

Also bad:

Task completed successfully.

Completed how? Verified by whom? Proving what? Costing what? Touching what? A receipt without specifics is just a vibes check with extra steps.

The 15-minute fill-in version

Copy this after any meaningful agent run. A clean template is also available in minimum-receipt-template-public.md.

Proof Vault — Minimum Receipt

Receipt ID:
Goal:

Sources used:
- Evidence:
- Inference:
- Assumptions still untested:

Action taken:
Artifact / output:
External side effects:
Spend:
Approval status:

Bytes / hash:
Verification performed:
Claims boundary / what this does not prove:
Next gate:

Use none, not applicable, or not verified instead of making the receipt prettier than reality. Reality is the point.

Quick checklist before you accept "done"

Run through these seven checks before you file the receipt and move on.

1. Goal check

[ ] The run had one concrete goal.
[ ] The receipt says whether the goal was met, partially met, or not met.
[ ] Any side quests are listed instead of quietly absorbed into the victory lap.

2. Source check

[ ] Sources are named by public-safe labels or approved internal references.
[ ] Evidence is separated from inference.
[ ] Assumptions are listed instead of laundered into facts.
[ ] Private logs, customer data, credentials, and raw tokens are not in the receipt.

3. Action check

[ ] The receipt states what the agent actually did.
[ ] It states what the agent deliberately did not do.
[ ] Any file writes, repo changes, API calls, browser actions, messages, uploads, provider calls, or account changes are named.
[ ] Failed attempts are included if they changed cost, time, risk, or confidence.

4. Cost and approval check

[ ] Spend is recorded even when it is $0.00.
[ ] Spend categories are separated where relevant: model/API, media/render, hosting, tools, human review, failed-run waste.
[ ] Paid actions are inside an approved cap or marked as blocked.
[ ] Public posting, outreach, pricing, account, credential, payment, DNS, provider, gateway, or client-impacting actions have explicit approval before they happen.

5. Proof check

[ ] Durable output exists outside scratch space where practical.
[ ] The final artifact was read back or otherwise inspected.
[ ] JSON/config/code passed syntax or schema checks where applicable.
[ ] Tests, smoke checks, render checks, route checks, or reviewer checks are listed where applicable.
[ ] File size and checksum are recorded for internal traceability when useful.

6. Public-safety check

[ ] No private local paths in public-facing copy.
[ ] No secrets, tokens, API keys, passwords, session details, or recovery details.
[ ] No real customer/client/private data.
[ ] No platform, provider, or community endorsement implied without permission.
[ ] No pricing, service availability, support, legal, security, uptime, ROI, revenue, or demand claims unless separately evidenced and approved.

7. Decision check

[ ] The receipt names what is still unverified.
[ ] The receipt says what the evidence does not prove.
[ ] The next gate is specific: continue, revise, request review, publish approval, risk review, pause, or kill.
[ ] Stop-loss conditions are visible before the agent gets more autonomy.

The Proof Vault pattern in practice

A Proof Vault is not a product. It is a habit. Here is how it works as a system:

Before the run: Decide which tier applies. If the run touches anything public, paid, private, or destructive, plan for a full audit. Otherwise, minimum receipt.
During the run: Log sources, actions, and side effects as they happen. Do not reconstruct from memory afterward — memory lies.
After the run: Fill in the receipt. Read back the artifacts. Run verification checks. Write the claims boundary honestly.
Before the next gate: Use the receipt to make a decision. Continue, revise, escalate, pause, or kill. The receipt is not a trophy. It is a decision tool.
Store receipts durably: Keep them next to the artifacts they describe. A receipt without its artifact is a story. An artifact without its receipt is a mystery.

Common receipt failures and what they cost

Failure	What happens
No goal written down	Agent declares victory on a different task than the one you assigned
Sources not separated	Assumptions become "facts" in the next run's context
Side effects left blank	Something got published, spent, or changed and nobody noticed
Spend recorded as "minimal"	"Minimal" is not a number; the budget demon laughs
Claims boundary skipped	Delivery proof quietly becomes demand proof in the next pitch deck
Verification = "looked fine"	Looked fine to whom? Using what check? On what surface?
Next gate = "done"	Done is not a gate. Done is the absence of a gate

Tiny glossary

Agent run: One bounded attempt by an AI agent or workflow to complete a task.
Surface touched: Any file, repo, browser, account, API, channel, tool, or system the run read or changed.
External side effect: Anything that changes the world outside local drafting — publishing, posting, sending, buying, uploading, rendering through a paid provider, changing an account, or deploying.
Artifact: The thing a human can inspect — page, file, report, diff, screenshot, package, test result, route check, or review note.
Hash/checksum: A file fingerprint. Useful internally when you need to prove a file did not silently change.
Claims boundary: The sentence that says what the proof does not prove.
Proof Vault: A collection of receipts and audits that make agent work inspectable over time. Not a product. A habit.

Public-safe use note

This resource is operating hygiene, not legal, security, or compliance advice. It does not certify an agent system, promise safety, prove revenue, or make a public service available.

Use it to make agent work easier to trust. Then still apply the boring gates when the work touches real money, real accounts, real customers, real public channels, or real consequences.

What to do next

Use the minimum receipt template after your next meaningful agent run. If the receipt has more "not verified" than proof, do not give the agent more access yet. Start with the ugliest line. That is usually where the money, mess, or trust leak is hiding.

Ana & The Goblins — lessons before hype, goblins before guru nonsense.

Copy-paste minimum receipt template

Proof Vault — Minimum Receipt Template

A one-page receipt for ordinary agent work. Copy, fill in, file next to the artifact.

Use this for: Drafts, research, code changes, QA runs, small automations, content packages, internal reports, local experiments.

Do not use this alone when the run touches: Credentials, live accounts, public distribution, client data, payment, pricing, legal/service claims, paid provider actions, destructive writes, or material business commitments. Escalate to a full audit instead.

Minimum Receipt

Receipt ID / artifact name:
Goal:

Sources used:
  Evidence:
  Inference:
  Assumptions still untested:

Action taken:
Artifact / output (durable path or public URL):
External side effects:
Spend:
Approval status:

Bytes / SHA-256:
Verification performed:
Claims boundary / what this does not prove:
Unverified / open questions:
Next gate:

Field guide

Field	Required	What to write
Receipt ID	Yes	Human-readable name and version/date. Example: `resource-draft-proof-vault-2026-06-27`
Goal	Yes	One concrete outcome. Not "improve things." Example: "Draft a public-safe checklist for agent run receipts."
Sources used	Yes	Separate evidence (directly inspected), inference (conclusions drawn), and assumptions (untested).
Action taken	Yes	What changed, was generated, verified, or deliberately skipped. Be specific.
Artifact / output	Yes	Durable location. For public samples, use a public URL or sanitized label, not private machine paths.
External side effects	Yes	`none` or exact list: publish, deploy, outreach, account change, provider call, spend, etc.
Spend	Yes	Amount and category. Use `$0.00` when no paid action occurred.
Approval status	Yes	`not_required`, `approved:<scope>`, `owner_gate_required:<decision>`, or `blocked`.
Bytes / SHA-256	Where practical	File size and hash for durable files. Use `not_practical` for live pages or third-party surfaces.
Verification performed	Yes	Read-back, syntax/schema check, smoke test, render check, safety scan, reviewer pass, or exact reason not applicable.
Claims boundary	Yes	What the receipt does not prove: traffic, demand, revenue, legal status, security, ROI, etc.
Unverified / open questions	Recommended	Anything not checked yet. Keeps uncertainty from dressing up as confidence.
Next gate	Yes	Continue, revise, pause, publish approval, distribution route, risk review, or kill.

Example: filled-in receipt (sanitized)

Receipt ID: resource-draft-agent-setup-checklist-2026-06-25
Goal: Draft a practical VPS setup checklist for operators running agents
      on small servers.

Sources used:
  Evidence: Existing verified checklist page, verification report,
            static deploy package report.
  Inference: Small operators need a sequential checklist more than
             a reference manual.
  Assumptions still untested: Whether readers prefer downloadable
             Markdown or web-only format.

Action taken: Drafted a 12-section checklist in Markdown. Wrote local
              files only. Did not publish, post, deploy, or call any
              paid provider.
Artifact / output: <project-root>/resources/setup-checklist/checklist.md
External side effects: none
Spend: $0.00
Approval status: not_required (local drafting only)

Bytes / SHA-256: checklist.md 14,231 bytes sha256:a1b2c3...
Verification performed: Read back from disk. JSON validated with
                        python3 -m json.tool. Public-safety scan:
                        no private paths, secrets, pricing, or
                        service claims found.
Claims boundary: Proves only that a local draft exists and passed
                 basic safety checks. Does not prove traffic,
                 demand, reader satisfaction, revenue, or
                 service availability.
Unverified / open questions: Final title, CTA, publication route,
                             risk-review verdict.
Next gate: Risk review before any public use.

Use rule

If the artifact touched public pages, accounts, outreach, client data, payment, provider spend, credentials, DNS, gateways, or legal/service claims, this minimum receipt is not enough by itself. Attach a risk review or use the full-audit template.

Tips

Write the receipt during or immediately after the run, not from memory later.
Use none, not applicable, or not verified instead of leaving fields blank or decorating them.
A receipt with honest gaps is more useful than one that looks complete but isn't.
Store receipts next to the artifacts they describe.
The claims boundary is not modesty. It is the line between "this file exists and was checked" and "this proves the business works."

Proof Vault template — public-safe, no private data, no service claims.

When to use a full audit

Full Audit: When Needed, What It Covers

The minimum receipt handles ordinary agent work. This summary explains when to escalate and what a full audit adds.

When to escalate

Use a full audit instead of a minimum receipt when the run involves any of the following:

Trigger category	Examples
Private data	Live customer, client, student, patient, employee, or private user data
Credentials & access	Tokens, sessions, MFA, CAPTCHA, account creation, permission changes, recovery flows
Public distribution	Publishing, outreach, DMs, emails, comments, community posting, social actions
Commerce	Checkout, lead capture, pricing, refunds, service terms, delivery commitments, sales claims
Paid actions	Provider calls, subscriptions, media renders, exports, cloud resources, unbounded retries
Destructive changes	Deletes, migrations, production deploys, DNS, payment, gateway, or infrastructure changes
High-stakes claims	Security, legal, compliance, uptime, ROI, revenue, or "safe/secure/done-for-you" claims

Rule of thumb: If the run could embarrass you in front of a customer, cost you real money, or create a legal obligation, it needs a full audit.

What a full audit adds

A full audit is the minimum receipt plus six additional sections. Think of it as the receipt getting backup.

1. Audit header

Adds scope, owner, timestamps, and explicit exclusions.

Audit ID:
Work item / task reference:
Owner / operator:
Created UTC:
Status / verdict:
Scope reviewed:
Scope explicitly excluded:

2. Source inventory

A structured table of every source, its trust level, and how it was used.

Source	Type	Trust level	How used	Privacy note
(label)	doc/api/page/log/interview	high/medium/low	evidence/inference/context	public-safe?

Required distinctions: - Direct evidence: Inspected, measured, or read directly. - Inference: Conclusions drawn from evidence. - Assumptions: Proceeding without verification. - Missing evidence: Known gaps. - Sources intentionally not used: And why.

3. Artifact inventory

Every output, its location, size, hash, and public-safety status.

Artifact	Location	Bytes	SHA-256	Read-back?	Public-safe?

4. Action ledger

Every action the agent or operator took, with time, tool, result, and approval basis.

Action	Time/window	Tool/provider	Result	Side effect?	Approval basis

5. Approval and authority

Approval status:
Approval source / quote / charter reference:
Actions covered by approval:
Actions not covered:
Open owner gates:

6. Spend and provider usage

Category	Amount	Provider/tool	Approval basis	Evidence
Model/API
Media/render/export
Hosting/infrastructure
Tools/subscriptions
Human review

7. Verification record

Check each that applies: - [ ] File read-back - [ ] Syntax/schema validation - [ ] Checksum validation - [ ] HTML/report validation - [ ] Route/HTTP smoke - [ ] Browser/render smoke - [ ] Public-safety scan - [ ] Secret/path scan - [ ] Pricing/service/legal/ROI claim scan - [ ] Reviewer/risk verdict

8. Risk register

Risk	Severity	Evidence	Mitigation	Status

9. Claims boundary

State what the audit proves and what it does not prove.

Proves: - (List specific, bounded claims supported by evidence in this audit.)

Does not prove: - Traffic, demand, leads, revenue, ROI, legal/compliance status, security guarantees, platform endorsement, or service availability — unless independently evidenced and approved.

10. Decision and next gate

Decision: continue / revise / pause / kill / publish approval required / risk review required
Next gate:
Stop-loss trigger:

Full audit is not panic

A full audit is not a punishment. It is a receipt with backup singers. The structure exists so that when something goes wrong — and it will — you can trace what happened, who approved it, what it cost, and what was verified, without reconstructing from vibes and screenshots.

The goal is not to make agent work slower. The goal is to make risky agent work survivable.

Escalation decision tree

Did the run touch any of the following?
├── Private/personal data?        → Full audit
├── Credentials or access changes? → Full audit
├── Public distribution?           → Full audit
├── Money, pricing, or checkout?   → Full audit
├── Paid provider/API calls?       → Full audit
├── Destructive or infra changes?  → Full audit
├── Security/legal/compliance claims? → Full audit
└── None of the above?             → Minimum receipt

When in doubt, escalate. A full audit that was not needed costs 20 extra minutes. A minimum receipt that should have been a full audit can cost weeks.

Anti-patterns

Anti-pattern	What actually happens
"It's just a draft, no audit needed"	The draft gets published without review because nobody checked what was in it
"Full audit everything"	Nobody fills out any receipt because the overhead kills the workflow
"The agent verified itself"	Self-verification is a claim, not evidence
"No side effects" (without checking)	Agent posted to a channel, called a paid API, or wrote outside the workspace
"Spend: minimal"	Minimal is not a number
Claims boundary left blank	Delivery proof quietly becomes demand proof in the next conversation

Proof Vault full-audit summary — public-safe, no private data, no service claims.

Public-safety note: this static public resource performs no account, credential, payment, outreach, provider, gateway, pricing, checkout, lead-capture, or private-data collection actions.