Proof Vault: How to Make Agent Work Auditable Without Bureaucracy Sludge
Audience: Solo operators, founders, and small teams letting coding agents, browser agents, workflow agents, or content agents touch real work.
Promise: A practical two-tier proof system — a minimum receipt for ordinary runs and a full audit for risky ones — so agent work stays inspectable without turning every task into a courtroom drama.
Tone note: This is operating hygiene, not compliance theatre. If your agent's proof of work is "it said done," you don't have an agent problem. You have a receipt problem.
The blunt premise
An agent run is not done when the agent says "done."
It is done when a human can answer six questions without excavating a haunted scroll of logs:
- What was the agent supposed to do?
- What did it actually do?
- What did it read, write, publish, spend, or change?
- What proof can be inspected?
- What remains unverified?
- What is the next safe decision?
That is a Proof Vault receipt. Not a SOC audit. Not compliance cosplay. Not a dashboard pretending it is a lawyer. A small operating artifact that keeps agent work useful, bounded, and explainable.
Why two tiers
If every tiny run gets a courtroom binder, nobody will use the system. If risky runs get only "vibes looked fine," the invoice demon wins.
Minimum receipt: One page. Covers ordinary agent work — drafts, research, code changes, QA runs, small automations, content packages, internal reports, local experiments.
Full audit: Heavier. Only when the run touches real risk — credentials, live accounts, public distribution, client data, payment, pricing, legal/service claims, paid provider actions, destructive writes, or material business commitments.
The trick is knowing which one you need before the run starts, not after something embarrassing happens.
Minimum receipt: the fields that matter
A minimum receipt should fit on one page. Here are the fields, why they exist, and what to write in them.
| Field | What to record | Why it matters |
|---|---|---|
| Receipt ID | Human-readable name, date, or version | Makes the run findable later |
| Goal | One concrete outcome | Prevents fuzzy victory laps |
| Sources used | Evidence, inference, and assumptions — separated | Stops guesses laundering into facts |
| Action taken | What was generated, changed, checked, or deliberately skipped | Stops "done" from meaning everything and nothing |
| Artifact / output | Durable path label, public URL, report, package, screenshot, or hash | Gives the receipt something inspectable |
| External side effects | none, or exact effects: deploy, post, email, account change, provider call, spend |
Makes world-touching actions visible |
| Approval status | Not required, covered by a named rule, requested, blocked, or owner-approved for a specific scope | Keeps autonomy inside authority |
| Spend | Amount by category; use $0.00 when true |
Budget silence breeds goblins |
| Bytes / hash | File size and SHA-256 for durable files when practical | Proves a file did not silently change |
| Verification performed | Read-back, tests, schema check, smoke test, render check, safety scan, review, or "not performed" | Turns claims into evidence |
| Claims boundary | What this does not prove | Stops delivery proof becoming demand/revenue/security proof |
| Next gate | Continue, revise, review, publish approval, risk review, pause, or kill | Converts the receipt into a decision |
What to write in each field
Sources used — Split into three buckets: - Evidence: Things you directly inspected, measured, or read. - Inference: Conclusions you drew from evidence. - Assumptions still untested: Things you are proceeding on but have not verified.
If a field is unknown, write unknown or not applicable. Do not decorate uncertainty with confetti.
Action taken — Be specific. "Drafted a resource" is weak. "Drafted a 4,200-word Markdown resource from three approved source files, wrote it to <project-root>/resources/, and validated JSON schema on two structured outputs" is a receipt.
External side effects — If the answer is truly none, write none. If the agent posted, emailed, deployed, spent, changed an account, or called a paid API, name it exactly.
Claims boundary — This is the sentence most people skip and the one that matters most. State what the receipt does not prove: traffic, demand, leads, revenue, ROI, legal status, security guarantees, platform endorsement, or service availability.
When the minimum receipt is not enough
Escalate to a full audit if the run involves any of these:
- Live customer, client, student, patient, employee, or private user data
- Credentials, tokens, sessions, MFA, CAPTCHA, account creation, permission changes, or recovery flows
- Public publishing, outreach, DMs, emails, comments, community posting, or social actions
- Checkout, lead capture, pricing, refunds, service terms, delivery commitments, or sales claims
- Paid provider calls, subscriptions, media renders, exports, cloud resources, or unbounded retries
- Destructive writes, deletes, migrations, production deploys, DNS, payment, gateway, or infrastructure changes
- Security, legal, compliance, uptime, ROI, revenue, or "safe/secure/done-for-you" claims
Full audit does not mean panic. It means the minimum receipt gets backup: a source inventory, action ledger, approval record, artifact inventory, spend table, verification evidence, risk notes, and reviewer verdict. See full-audit-when-needed-summary.md for the condensed version.
What good receipts sound like
Good:
The agent drafted a resource from approved source inputs. It wrote local Markdown and JSON files only, spent $0.00, touched no accounts or public channels, validated JSON syntax, read files back from disk, and still needs risk review before any public use.
Bad:
The agent created an amazing launch asset and we are ready to monetize.
That second one is not a receipt. It is a tiny fraud wearing perfume.
Also bad:
Task completed successfully.
Completed how? Verified by whom? Proving what? Costing what? Touching what? A receipt without specifics is just a vibes check with extra steps.
The 15-minute fill-in version
Copy this after any meaningful agent run. A clean template is also available in minimum-receipt-template-public.md.
Proof Vault — Minimum Receipt
Receipt ID:
Goal:
Sources used:
- Evidence:
- Inference:
- Assumptions still untested:
Action taken:
Artifact / output:
External side effects:
Spend:
Approval status:
Bytes / hash:
Verification performed:
Claims boundary / what this does not prove:
Next gate:
Use none, not applicable, or not verified instead of making the receipt prettier than reality. Reality is the point.
Quick checklist before you accept "done"
Run through these seven checks before you file the receipt and move on.
1. Goal check
- [ ] The run had one concrete goal.
- [ ] The receipt says whether the goal was met, partially met, or not met.
- [ ] Any side quests are listed instead of quietly absorbed into the victory lap.
2. Source check
- [ ] Sources are named by public-safe labels or approved internal references.
- [ ] Evidence is separated from inference.
- [ ] Assumptions are listed instead of laundered into facts.
- [ ] Private logs, customer data, credentials, and raw tokens are not in the receipt.
3. Action check
- [ ] The receipt states what the agent actually did.
- [ ] It states what the agent deliberately did not do.
- [ ] Any file writes, repo changes, API calls, browser actions, messages, uploads, provider calls, or account changes are named.
- [ ] Failed attempts are included if they changed cost, time, risk, or confidence.
4. Cost and approval check
- [ ] Spend is recorded even when it is
$0.00. - [ ] Spend categories are separated where relevant: model/API, media/render, hosting, tools, human review, failed-run waste.
- [ ] Paid actions are inside an approved cap or marked as blocked.
- [ ] Public posting, outreach, pricing, account, credential, payment, DNS, provider, gateway, or client-impacting actions have explicit approval before they happen.
5. Proof check
- [ ] Durable output exists outside scratch space where practical.
- [ ] The final artifact was read back or otherwise inspected.
- [ ] JSON/config/code passed syntax or schema checks where applicable.
- [ ] Tests, smoke checks, render checks, route checks, or reviewer checks are listed where applicable.
- [ ] File size and checksum are recorded for internal traceability when useful.
6. Public-safety check
- [ ] No private local paths in public-facing copy.
- [ ] No secrets, tokens, API keys, passwords, session details, or recovery details.
- [ ] No real customer/client/private data.
- [ ] No platform, provider, or community endorsement implied without permission.
- [ ] No pricing, service availability, support, legal, security, uptime, ROI, revenue, or demand claims unless separately evidenced and approved.
7. Decision check
- [ ] The receipt names what is still unverified.
- [ ] The receipt says what the evidence does not prove.
- [ ] The next gate is specific: continue, revise, request review, publish approval, risk review, pause, or kill.
- [ ] Stop-loss conditions are visible before the agent gets more autonomy.
The Proof Vault pattern in practice
A Proof Vault is not a product. It is a habit. Here is how it works as a system:
-
Before the run: Decide which tier applies. If the run touches anything public, paid, private, or destructive, plan for a full audit. Otherwise, minimum receipt.
-
During the run: Log sources, actions, and side effects as they happen. Do not reconstruct from memory afterward — memory lies.
-
After the run: Fill in the receipt. Read back the artifacts. Run verification checks. Write the claims boundary honestly.
-
Before the next gate: Use the receipt to make a decision. Continue, revise, escalate, pause, or kill. The receipt is not a trophy. It is a decision tool.
-
Store receipts durably: Keep them next to the artifacts they describe. A receipt without its artifact is a story. An artifact without its receipt is a mystery.
Common receipt failures and what they cost
| Failure | What happens |
|---|---|
| No goal written down | Agent declares victory on a different task than the one you assigned |
| Sources not separated | Assumptions become "facts" in the next run's context |
| Side effects left blank | Something got published, spent, or changed and nobody noticed |
| Spend recorded as "minimal" | "Minimal" is not a number; the budget demon laughs |
| Claims boundary skipped | Delivery proof quietly becomes demand proof in the next pitch deck |
| Verification = "looked fine" | Looked fine to whom? Using what check? On what surface? |
| Next gate = "done" | Done is not a gate. Done is the absence of a gate |
Tiny glossary
- Agent run: One bounded attempt by an AI agent or workflow to complete a task.
- Surface touched: Any file, repo, browser, account, API, channel, tool, or system the run read or changed.
- External side effect: Anything that changes the world outside local drafting — publishing, posting, sending, buying, uploading, rendering through a paid provider, changing an account, or deploying.
- Artifact: The thing a human can inspect — page, file, report, diff, screenshot, package, test result, route check, or review note.
- Hash/checksum: A file fingerprint. Useful internally when you need to prove a file did not silently change.
- Claims boundary: The sentence that says what the proof does not prove.
- Proof Vault: A collection of receipts and audits that make agent work inspectable over time. Not a product. A habit.
Public-safe use note
This resource is operating hygiene, not legal, security, or compliance advice. It does not certify an agent system, promise safety, prove revenue, or make a public service available.
Use it to make agent work easier to trust. Then still apply the boring gates when the work touches real money, real accounts, real customers, real public channels, or real consequences.
What to do next
Use the minimum receipt template after your next meaningful agent run. If the receipt has more "not verified" than proof, do not give the agent more access yet. Start with the ugliest line. That is usually where the money, mess, or trust leak is hiding.
Ana & The Goblins — lessons before hype, goblins before guru nonsense.
Copy-paste minimum receipt template
Proof Vault — Minimum Receipt Template
A one-page receipt for ordinary agent work. Copy, fill in, file next to the artifact.
Use this for: Drafts, research, code changes, QA runs, small automations, content packages, internal reports, local experiments.
Do not use this alone when the run touches: Credentials, live accounts, public distribution, client data, payment, pricing, legal/service claims, paid provider actions, destructive writes, or material business commitments. Escalate to a full audit instead.
Minimum Receipt
Receipt ID / artifact name:
Goal:
Sources used:
Evidence:
Inference:
Assumptions still untested:
Action taken:
Artifact / output (durable path or public URL):
External side effects:
Spend:
Approval status:
Bytes / SHA-256:
Verification performed:
Claims boundary / what this does not prove:
Unverified / open questions:
Next gate:
Field guide
| Field | Required | What to write |
|---|---|---|
| Receipt ID | Yes | Human-readable name and version/date. Example: resource-draft-proof-vault-2026-06-27 |
| Goal | Yes | One concrete outcome. Not "improve things." Example: "Draft a public-safe checklist for agent run receipts." |
| Sources used | Yes | Separate evidence (directly inspected), inference (conclusions drawn), and assumptions (untested). |
| Action taken | Yes | What changed, was generated, verified, or deliberately skipped. Be specific. |
| Artifact / output | Yes | Durable location. For public samples, use a public URL or sanitized label, not private machine paths. |
| External side effects | Yes | none or exact list: publish, deploy, outreach, account change, provider call, spend, etc. |
| Spend | Yes | Amount and category. Use $0.00 when no paid action occurred. |
| Approval status | Yes | not_required, approved:<scope>, owner_gate_required:<decision>, or blocked. |
| Bytes / SHA-256 | Where practical | File size and hash for durable files. Use not_practical for live pages or third-party surfaces. |
| Verification performed | Yes | Read-back, syntax/schema check, smoke test, render check, safety scan, reviewer pass, or exact reason not applicable. |
| Claims boundary | Yes | What the receipt does not prove: traffic, demand, revenue, legal status, security, ROI, etc. |
| Unverified / open questions | Recommended | Anything not checked yet. Keeps uncertainty from dressing up as confidence. |
| Next gate | Yes | Continue, revise, pause, publish approval, distribution route, risk review, or kill. |
Example: filled-in receipt (sanitized)
Receipt ID: resource-draft-agent-setup-checklist-2026-06-25
Goal: Draft a practical VPS setup checklist for operators running agents
on small servers.
Sources used:
Evidence: Existing verified checklist page, verification report,
static deploy package report.
Inference: Small operators need a sequential checklist more than
a reference manual.
Assumptions still untested: Whether readers prefer downloadable
Markdown or web-only format.
Action taken: Drafted a 12-section checklist in Markdown. Wrote local
files only. Did not publish, post, deploy, or call any
paid provider.
Artifact / output: <project-root>/resources/setup-checklist/checklist.md
External side effects: none
Spend: $0.00
Approval status: not_required (local drafting only)
Bytes / SHA-256: checklist.md 14,231 bytes sha256:a1b2c3...
Verification performed: Read back from disk. JSON validated with
python3 -m json.tool. Public-safety scan:
no private paths, secrets, pricing, or
service claims found.
Claims boundary: Proves only that a local draft exists and passed
basic safety checks. Does not prove traffic,
demand, reader satisfaction, revenue, or
service availability.
Unverified / open questions: Final title, CTA, publication route,
risk-review verdict.
Next gate: Risk review before any public use.
Use rule
If the artifact touched public pages, accounts, outreach, client data, payment, provider spend, credentials, DNS, gateways, or legal/service claims, this minimum receipt is not enough by itself. Attach a risk review or use the full-audit template.
Tips
- Write the receipt during or immediately after the run, not from memory later.
- Use
none,not applicable, ornot verifiedinstead of leaving fields blank or decorating them. - A receipt with honest gaps is more useful than one that looks complete but isn't.
- Store receipts next to the artifacts they describe.
- The claims boundary is not modesty. It is the line between "this file exists and was checked" and "this proves the business works."
Proof Vault template — public-safe, no private data, no service claims.
When to use a full audit
Full Audit: When Needed, What It Covers
The minimum receipt handles ordinary agent work. This summary explains when to escalate and what a full audit adds.
When to escalate
Use a full audit instead of a minimum receipt when the run involves any of the following:
| Trigger category | Examples |
|---|---|
| Private data | Live customer, client, student, patient, employee, or private user data |
| Credentials & access | Tokens, sessions, MFA, CAPTCHA, account creation, permission changes, recovery flows |
| Public distribution | Publishing, outreach, DMs, emails, comments, community posting, social actions |
| Commerce | Checkout, lead capture, pricing, refunds, service terms, delivery commitments, sales claims |
| Paid actions | Provider calls, subscriptions, media renders, exports, cloud resources, unbounded retries |
| Destructive changes | Deletes, migrations, production deploys, DNS, payment, gateway, or infrastructure changes |
| High-stakes claims | Security, legal, compliance, uptime, ROI, revenue, or "safe/secure/done-for-you" claims |
Rule of thumb: If the run could embarrass you in front of a customer, cost you real money, or create a legal obligation, it needs a full audit.
What a full audit adds
A full audit is the minimum receipt plus six additional sections. Think of it as the receipt getting backup.
1. Audit header
Adds scope, owner, timestamps, and explicit exclusions.
Audit ID:
Work item / task reference:
Owner / operator:
Created UTC:
Status / verdict:
Scope reviewed:
Scope explicitly excluded:
2. Source inventory
A structured table of every source, its trust level, and how it was used.
| Source | Type | Trust level | How used | Privacy note |
|---|---|---|---|---|
| (label) | doc/api/page/log/interview | high/medium/low | evidence/inference/context | public-safe? |
Required distinctions: - Direct evidence: Inspected, measured, or read directly. - Inference: Conclusions drawn from evidence. - Assumptions: Proceeding without verification. - Missing evidence: Known gaps. - Sources intentionally not used: And why.
3. Artifact inventory
Every output, its location, size, hash, and public-safety status.
| Artifact | Location | Bytes | SHA-256 | Read-back? | Public-safe? |
|---|---|---|---|---|---|
4. Action ledger
Every action the agent or operator took, with time, tool, result, and approval basis.
| Action | Time/window | Tool/provider | Result | Side effect? | Approval basis |
|---|---|---|---|---|---|
5. Approval and authority
Approval status:
Approval source / quote / charter reference:
Actions covered by approval:
Actions not covered:
Open owner gates:
6. Spend and provider usage
| Category | Amount | Provider/tool | Approval basis | Evidence |
|---|---|---|---|---|
| Model/API | ||||
| Media/render/export | ||||
| Hosting/infrastructure | ||||
| Tools/subscriptions | ||||
| Human review |
7. Verification record
Check each that applies: - [ ] File read-back - [ ] Syntax/schema validation - [ ] Checksum validation - [ ] HTML/report validation - [ ] Route/HTTP smoke - [ ] Browser/render smoke - [ ] Public-safety scan - [ ] Secret/path scan - [ ] Pricing/service/legal/ROI claim scan - [ ] Reviewer/risk verdict
8. Risk register
| Risk | Severity | Evidence | Mitigation | Status |
|---|---|---|---|---|
9. Claims boundary
State what the audit proves and what it does not prove.
Proves: - (List specific, bounded claims supported by evidence in this audit.)
Does not prove: - Traffic, demand, leads, revenue, ROI, legal/compliance status, security guarantees, platform endorsement, or service availability — unless independently evidenced and approved.
10. Decision and next gate
Decision: continue / revise / pause / kill / publish approval required / risk review required
Next gate:
Stop-loss trigger:
Full audit is not panic
A full audit is not a punishment. It is a receipt with backup singers. The structure exists so that when something goes wrong — and it will — you can trace what happened, who approved it, what it cost, and what was verified, without reconstructing from vibes and screenshots.
The goal is not to make agent work slower. The goal is to make risky agent work survivable.
Escalation decision tree
Did the run touch any of the following?
├── Private/personal data? → Full audit
├── Credentials or access changes? → Full audit
├── Public distribution? → Full audit
├── Money, pricing, or checkout? → Full audit
├── Paid provider/API calls? → Full audit
├── Destructive or infra changes? → Full audit
├── Security/legal/compliance claims? → Full audit
└── None of the above? → Minimum receipt
When in doubt, escalate. A full audit that was not needed costs 20 extra minutes. A minimum receipt that should have been a full audit can cost weeks.
Anti-patterns
| Anti-pattern | What actually happens |
|---|---|
| "It's just a draft, no audit needed" | The draft gets published without review because nobody checked what was in it |
| "Full audit everything" | Nobody fills out any receipt because the overhead kills the workflow |
| "The agent verified itself" | Self-verification is a claim, not evidence |
| "No side effects" (without checking) | Agent posted to a channel, called a paid API, or wrote outside the workspace |
| "Spend: minimal" | Minimal is not a number |
| Claims boundary left blank | Delivery proof quietly becomes demand proof in the next conversation |
Proof Vault full-audit summary — public-safe, no private data, no service claims.