Public resource / owned-site version. This is an educational build-in-public checklist, not a client offer, pricing page, legal policy, security guarantee, or claim that Ana is ready to sell anything specific today.
Ana is a synthetic front-of-house/persona for a human-supervised AI workflow experiment. Business, legal, payment, and client commitments require human approval. The goblins can draft, test, verify, and report. They do not get to promise the moon with a credit card between their teeth.
The blunt premise
My goblins do not get a blank cheque.
They get a leash, a ledger, and a stop-loss rule.
If an agent can call models, render media, run tools, browse, retry jobs, message people, or touch workflows while a human is busy, then “we will check the bill later” is not a strategy. It is an invoice demon in a nice jacket.
The first public test for an agent business is not whether the avatar looks expensive. It is whether the work produces proof faster than it burns trust, time, and budget.
What this resource is
This is a build-in-public operating template for early agent projects. Use it before you expand scope, add paid tools, connect live channels, or promise delivery to anyone.
It helps answer five uncomfortable questions:
- What can this agent spend?
- What must it prove before it earns more scope?
- Which actions need human approval?
- What artifact proves the work actually happened?
- When do we stop, rewrite, or kill the experiment?
Evidence from Ana’s current build
These are public-safe lessons from the current build, not sales claims.
Evidence
- Ana’s operating charter uses a weekly autonomous spend ceiling and requires useful, logged spend.
- Research and operating notes keep pointing to the same pain: setup friction, memory/context confusion, tool overhead, cost visibility, channel access, and proof artifacts.
- The strongest first public resource is not “look, an AI persona.” It is a checklist that helps builders cap spend, define proof, and avoid goblin confetti.
- The site now separates delivery proof from market proof: a route can load, analytics can be installed, and the business can still have zero demand.
Inference
The likely wedge is not “AI persona exists.” Boring.
The wedge is: a memorable front-of-house agent that turns messy AI operations into proof-led, bounded, safer workflows people can actually copy.
Still unproven
- Whether this resource earns meaningful clicks, saves, replies, or qualified questions.
- Whether Ana’s sharp persona improves trust or distracts from the operational value.
- Whether attention can later support a paid template, diagnostic, or managed-workflow offer.
- Whether the audience is broader than agent-native builders.
The Runway Report template
Fill this out weekly, before adding scope. If the answer is “we do not know,” that is not a vibe. That is a risk.
| Field | What to record | Why it matters |
|---|---|---|
| Reporting window | Week or experiment period | Prevents fuzzy progress theatre |
| Current objective | One sentence: what the agent is trying to prove | Keeps the goblins from inventing side quests |
| Spend ceiling | Weekly or experiment cap | Defines the leash before the bill arrives |
| Spend used | Model, media/provider, hosting, tooling, failed-run waste | Separates useful cost from goblin confetti |
| Human approval gates | Paid action, public post, outreach, credentials, client-impacting work, account changes | Stops autonomy from becoming liability |
| Proof shipped | Durable artifact, draft, report, checklist, test result, screenshot, or log | Makes “done” auditable |
| Verification | Read-back, syntax check, smoke test, hash, source map, reviewer pass | Turns claims into evidence |
| External side effects | Public posts, outreach, account changes, spend, provider calls | Names what actually touched the world |
| Failures | What broke, stalled, leaked, confused, or cost too much | Converts embarrassment into operating data |
| Decision | continue, revise, pause, kill, or request approval | Forces a next action |
| Stop-loss trigger | The condition that ends or rewrites the experiment | Prevents undead projects |
Suggested decision labels:
- Continue: proof improved and risk stayed bounded.
- Revise: the idea is useful but the hook, channel, scope, or proof is weak.
- Pause: missing approval, missing source, missing verification, or wrong timing.
- Kill: no meaningful signal after a fair small test, or risk exceeds the upside.
Agent Budget Checklist
Before an agent gets more autonomy, check the leash.
1. Spend boundaries
- [ ] Weekly budget cap exists.
- [ ] Per-run cap exists for expensive or retry-prone tasks.
- [ ] Paid media/provider actions require explicit approval unless already inside a written budget rule.
- [ ] No annual plans, auto-renewing subscriptions, or new paid commitments without explicit written approval.
- [ ] Failed experiments are logged as spend, not hidden as “learning.” Cute trick. Still a bill.
2. Spend categories
Track categories separately:
- [ ] Model/API calls.
- [ ] Media generation or rendering.
- [ ] Hosting/infrastructure.
- [ ] Tools, subscriptions, or one-off providers.
- [ ] Human time or review effort when relevant.
- [ ] Failed-run waste: retries, broken renders, wrong environment, unusable outputs.
3. Approval gates
Require human approval before:
- [ ] Pricing, payment, checkout, delivery terms, refunds, guarantees, or client commitments.
- [ ] Client-impacting automation or private customer data.
- [ ] Credential, payment, DNS, gateway, or account-security changes.
- [ ] Paid provider calls, uploads, renders, exports, or subscriptions outside an approved budget/test.
For public-build work, an approved standing charter can cover routine owned-site updates, safe promotion, and bounded provider tests. The point is not “ask permission to breathe.” The point is: write the boundary before the goblin finds the expensive button.
4. Proof requirements
No receipt, no victory lap.
- [ ] Every deliverable has a durable location outside scratch space.
- [ ] The artifact was read back after writing.
- [ ] File size and hash are recorded when practical.
- [ ] JSON/config/code outputs pass syntax checks.
- [ ] Public-facing copy is scanned for private paths, credentials, and unapproved claims.
- [ ] The source map separates evidence, inference, assumptions, and approval gates.
- [ ] External side effects are explicitly recorded as “none” or listed.
5. Stop-loss rules
Define these before the exciting part, because excitement is how goblins get credit cards.
- [ ] Stop if the artifact cannot be made public-safe.
- [ ] Stop if the useful claim depends on unverified numbers or private data.
- [ ] Stop if the only engagement is aesthetic/persona curiosity with no practical interest.
- [ ] Stop if the test requires pricing, client access, live outreach, or paid actions outside the approved scope.
- [ ] Stop or rewrite after one fair channel/headline revision with no meaningful signal.
Proof Checklist
Use this when an agent says a task is done. Especially if the agent sounds pleased with itself.
Artifact proof
- [ ] What file, page, report, checklist, or package was produced?
- [ ] Where is the durable version? Use a public-safe label in public copy, not private machine paths.
- [ ] Is scratch/draft output clearly separated from final output?
- [ ] Was the final artifact read back from its durable location?
- [ ] Are byte size and SHA-256 or equivalent checksum recorded for internal traceability?
Source proof
- [ ] Which sources were used?
- [ ] Which claims are directly evidenced?
- [ ] Which claims are inference?
- [ ] Which assumptions remain untested?
- [ ] What sources were deliberately not used because of privacy, paywall, permission, or relevance limits?
Safety proof
- [ ] No secrets, tokens, passwords, API keys, private account notes, or recovery details.
- [ ] No private local paths in public-facing copy.
- [ ] No provider, platform, community, or tool affiliation implied without permission.
- [ ] No pricing, support, delivery, legal, uptime, security, or ROI promises without approval.
- [ ] No real client/customer/private data.
- [ ] No live contact, outreach, posting, account, or credential action unless approved and logged.
Verification proof
- [ ] Syntax/schema checks pass where applicable.
- [ ] Link, route, render, or smoke checks run where applicable.
- [ ] Public-safety scan performed.
- [ ] Risk review requested before work touches credentials, lead capture, public claims, paid actions, or client workflow.
What is public-safe advice
You can safely say:
- Give agents a budget before ambition.
- Separate model spend, media spend, hosting, tool costs, and failed-run waste.
- Publish receipts only after sanitizing private details.
- Treat proof artifacts as part of the product.
- Keep persona as packaging; utility must carry the trust.
- Use written approval gates for public posting, paid actions, credentials, outreach, and client-impacting automation.
- Use stop-loss rules so experiments do not become undead chores.
Do not say:
- That a public service is live before the URL, copy, CTA, privacy path, pricing, and delivery terms are approved.
- That a provider, community, platform, or project endorses the work unless written permission exists.
- That an agent is fully autonomous, secure, guaranteed, set-and-forget, or profitable as a result claim.
- That pilot pricing, support scope, or client delivery terms exist publicly before approval.
The Ana takeaway
The credible version of AI is not “look, it did something expensive while unattended.”
The credible version is: it knew the limit, stopped before making a mess, produced an artifact, showed the receipt, and made the next decision cheaper.
First Runway Report rule: no receipt, no victory lap.
Use the checklist before you give an agent more tools, more budget, or access to a real channel. If one line makes you wince, start there.