Audience: Builders shipping pages or dashboards from agent workflows.
Promise: Explain why "HTML exists" is not enough, and what minimum verification tiers catch before anyone says ready.
Last updated: 2026-06-27
The problem
Agents can generate HTML files that look complete in source but fail in practice:
- File exists but is 0 bytes or truncated
- HTML validates but renders blank in a browser
- Page loads but CSS/JS paths are broken
- Screenshots show "Loading..." or console errors
- Links point to private paths or localhost-only routes
- Secrets or internal paths leak into public output
None of these are caught by "I wrote the file."
The validation tiers
Smoke testing for agent-built pages moves through seven tiers. Each tier catches failures the previous one misses.
Tier 1: File exists and is non-empty
What it catches: Agent claimed success but wrote nothing, crashed mid-write, or wrote to the wrong path.
Check:
test -s /path/to/output/page.html && echo "OK: file exists and non-empty"
Example:
$ test -s /tmp/agent-output/dashboard.html && echo "OK"
OK
$ wc -c /tmp/agent-output/dashboard.html
8472 /tmp/agent-output/dashboard.html
Why it matters: Agents sometimes report success before the write completes, or write to a scratch workspace and claim a durable path. File existence is the minimum proof the artifact landed.
Tier 2: HTML validates (well-formed)
What it catches: Unclosed tags, broken nesting, missing DOCTYPE, encoding errors that browsers tolerate but tools reject.
Check:
python3 -c "from html.parser import HTMLParser; HTMLParser().feed(open('/path/to/page.html').read())"
Or with xmllint:
xmllint --html --noout /path/to/page.html
Example:
$ xmllint --html --noout /tmp/agent-output/page.html
/tmp/agent-output/page.html:23: parser error : Opening and ending tag mismatch: div line 15 and body
Why it matters: Invalid HTML can render in one browser but break in another, or fail accessibility tools and crawlers. Validation is cheap; cross-browser testing is expensive.
Tier 3: Local HTTP 200 (server smoke)
What it catches: File exists but the server returns 404, 500, or redirect loops. Static file paths are wrong, permissions block access, or the server is not running.
Check:
python3 -m http.server 8888 &
curl -I http://localhost:8888/page.html | head -1
Example:
$ cd /tmp/agent-output && python3 -m http.server 8888 &
$ curl -I http://localhost:8888/dashboard.html | head -1
HTTP/1.0 200 OK
Why it matters: A file on disk is not a page until something serves it. Local HTTP smoke proves the file is reachable through a web server, not just the filesystem. This catches permission errors, wrong document roots, and missing index files.
Caveat: Local HTTP 200 does not prove the page is useful. It only proves the server can deliver bytes.
Tier 4: Screenshot non-blank and legible
What it catches: Page loads but renders blank (JS error, missing assets, CORS block), shows only "Loading..." spinners, or displays console errors that block rendering.
Check:
chromium-browser --headless --screenshot=/tmp/smoke.png http://localhost:8888/page.html
file /tmp/smoke.png
identify /tmp/smoke.png # ImageMagick
Example:
$ chromium-browser --headless --screenshot=/tmp/dashboard.png http://localhost:8888/dashboard.html
$ file /tmp/dashboard.png
/tmp/dashboard.png: PNG image data, 1920 x 1080, 8-bit/color RGBA
$ identify /tmp/dashboard.png | awk '{print $3, $7}'
1920x1080 245K
Manual check: Open the screenshot. Does it show real content, or a blank white page? Are there visible error messages, broken image icons, or layout collapse?
Why it matters: Headless browsers execute JavaScript, load CSS, and render the DOM. A screenshot proves the page is visually complete, not just syntactically valid. This catches:
- JavaScript runtime errors that halt rendering
- Missing or blocked external resources (fonts, images, scripts)
- CSS that loads but produces invisible or off-screen content
- Console errors that indicate broken API calls or CORS failures
Automated non-blank check:
# Count unique colors; blank pages have very few
convert /tmp/smoke.png -format %k info:
# > 100 colors usually means real content
Tier 5: Link scan (internal and external)
What it catches: Broken internal links, external links that 404, or links pointing to private/localhost paths that will not work in production.
Check:
# Extract all href values
grep -oP 'href="\K[^"]+' /path/to/page.html | sort -u
# Check internal links
for link in $(grep -oP 'href="\K[^"]+' page.html | grep -v '^http'); do
test -f "/path/to/site/$link" && echo "OK: $link" || echo "BROKEN: $link"
done
# Check external links (slow, rate-limit friendly)
for link in $(grep -oP 'href="\K[^"]+' page.html | grep '^http'); do
curl -sI "$link" | head -1
done
Example:
$ grep -oP 'href="\K[^"]+' /tmp/agent-output/index.html | sort -u
about.html
contact.html
https://example.com/docs
resources/style.css
$ test -f /tmp/agent-output/resources/style.css && echo "OK" || echo "BROKEN"
BROKEN
Why it matters: Agents often generate links based on assumed directory structures. If the agent writes href="/assets/style.css" but the actual path is href="style.css", the page renders without styling. Link scans catch structural assumptions before deployment.
Private-path scan:
# Flag links that should not be public
grep -E 'href=".*(localhost|127\.0\.0\.1|\.local|/home/|/Users/|file://)' page.html
Tier 6: Private-path and secret scan
What it catches: Agent accidentally included local filesystem paths, API keys, tokens, or internal URLs in the public output.
Check:
# Scan for common secret patterns
grep -E '(api[_-]?key|token|secret|password|AWS_|OPENAI_|sk-)' /path/to/page.html
# Scan for private paths
grep -E '(/home/[a-z]+|/Users/[a-z]+|C:\\Users|file:///)' /path/to/page.html
# Scan for internal URLs
grep -E '(localhost|127\.0\.0\.1|\.local|\.internal)' /path/to/page.html
Example:
$ grep -E '(api[_-]?key|token|secret)' /tmp/agent-output/config.html
Found: <script>const API_KEY = "sk-abc123...";</script>
Why it matters: Agents sometimes copy environment variables, config files, or debug output into generated pages. These leaks are invisible in source review but obvious in a grep scan. Public deployment with embedded secrets is a security incident.
Safe placeholders:
If the page needs example values, use obvious placeholders:
<script>const API_KEY = "YOUR_API_KEY_HERE";</script>
Not:
<script>const API_KEY = "sk-proj-abc123xyz789";</script>
Tier 7: Public URL smoke (when applicable)
What it catches: Page works locally but fails in production due to CDN caching, HTTPS mixed-content blocks, CORS policy changes, or DNS/routing errors.
Check:
# After deployment, verify public URL
curl -I https://example.com/page.html | head -1
curl -s https://example.com/page.html | grep -c '<title>'
Example:
$ curl -I https://www.example.com/resources/guide/ | head -1
HTTP/2 200
$ curl -s https://www.example.com/resources/guide/ | grep -c '<title>'
1
Why it matters: Local verification does not catch production-only failures:
- Mixed content: HTTP resources blocked on HTTPS pages
- CORS: External scripts/fonts blocked by origin policy
- CDN caching: Old version served despite new deployment
- HTTPS certificate: Invalid or expired cert
- DNS: Domain not yet propagated or misconfigured
Public URL smoke is the final gate. If the page is not yet deployed, skip this tier and note "public URL unverified" in the completion metadata.
The minimum smoke test
For most agent-built pages, tiers 1-4 are the minimum:
- File exists and is non-empty
- HTML validates
- Local HTTP returns 200
- Screenshot shows real content
Tiers 5-7 are required when:
- Tier 5 (link scan): Page has navigation, references, or asset links
- Tier 6 (secret scan): Page was generated from config files, environment data, or code repos
- Tier 7 (public URL): Page is deployed or will be deployed without further review
See minimum-smoke-checklist.md for a copy-paste checklist.
Why agents skip these (and why that is wrong)
Agents skip smoke tests because:
- The file exists, so it must work. (Tier 1 is necessary but not sufficient.)
- I generated valid HTML, so it renders. (Tier 2 does not catch runtime errors.)
- I can read the source, so I know it is correct. (Source review does not catch CSS/JS failures.)
- The user will test it. (The user hired an agent to avoid manual testing.)
Smoke tests are cheap. A 30-second screenshot check catches blank pages, broken layouts, and console errors. Skipping it means shipping invisible failures and waiting for the user to notice.
What "ready" means
A page is ready for review when:
- It passes tiers 1-4 minimum
- It passes tier 5 if it has links
- It passes tier 6 if it was generated from sensitive sources
- It passes tier 7 if it is deployed
- The completion metadata records which tiers passed and which were skipped
A page is not ready when:
- The agent says "done" but provides no screenshot
- The file exists but no one checked if it renders
- The page works locally but no one verified the public URL
- The metadata says "verified" but lists no checks
Tools and commands reference
| Tier | Tool | Command | ||
|---|---|---|---|---|
| 1 | test, wc | test -s file.html && wc -c file.html | ||
| 2 | xmllint, Python | xmllint --html --noout file.html | ||
| 3 | curl, http.server | curl -I http://localhost:8888/page.html | ||
| 4 | chromium-browser, identify | chromium-browser --headless --screenshot=out.png URL | ||
| 5 | grep, curl | grep -oP 'href="\K[^"]+' file.html | ||
| 6 | grep | `grep -E '(api_key\ | token\ | secret)' file.html` |
| 7 | curl | curl -I https://example.com/page.html |
Source and context
This resource is based on patterns from agent-built static site deployments, verification reports, and smoke-test workflows. It generalizes local verification discipline into a reusable public checklist.
Related resources:
- Agent Run Receipt Checklist (proof logging for agent work)
- Browser-Agent Safety Permission Checklist (permission boundaries for browser automation)
- Deployment Package Checklist for Static Agent Websites (export and deploy hygiene)
No private paths, credentials, or live host details are included in this resource.