Public-safe resource template for agent-assisted research
Last updated: 2026-06-27 Status: draft resource candidate; not published Author lane: Ana content strategy
Audience
Content operators, agent builders, and small research teams who turn public archives, forums, repos, or community threads into strategy, blog posts, or public advice — and who want to avoid quote-mining, identity scraping, or unsupported claims.
If you have ever published a "hot take" based on something you skimmed in a Discord server or GitHub issue tracker, this template is for you.
Promise
By the end of this resource, you will have a repeatable source inventory workflow that:
- Separates evidence from inference from assumptions from unknowns.
- Prevents you from publishing advice built on sources you never actually inspected.
- Makes your research auditable: someone can check what you looked at, what you skipped, and why.
- Keeps you safe from extractive research patterns: no participant mining, no affiliation claims, no long quotes from semi-private spaces.
Prerequisites
Before using this template, you should have:
- A defined research scope: what question are you trying to answer?
- Access to at least one public source corpus: a GitHub repo, a public archive, a forum export, an open dataset, or documentation.
- A workspace where you can save structured files (JSON, Markdown).
- A rule: if you did not inventory it, you cannot cite it.
The core idea
"Public" does not mean "yours to strip-mine." A public Discord archive is still a space where real people talked about real problems. Your job as a researcher is to extract patterns, not harvest identities or copy conversations.
A source inventory is the difference between:
- "I read 896 files across three forums, inventoried themes by byte volume, manually inspected 14 high-signal threads, and here is what I found" (research)
- "Someone on Discord said X" (gossip in a trench coat)
Step 1: Define the research scope and boundary
Before touching any source, write down:
SCOPE:
question: <What are you trying to learn?>
audience: <Who will read the output?>
output_type: <blog post | strategy memo | resource template | social thread>
boundary: <What will you explicitly NOT do with this research?>
Template fields
| Field | Description | Example |
|---|---|---|
question | The research question in one sentence | "What do agent builders care about most in 2026?" |
audience | Who the output serves | "Content operators building agent-themed blogs" |
output_type | Format of the deliverable | "Public resource template" |
boundary | What you refuse to do | "No outreach, no identity scraping, no long quotes" |
Common failure mode
Scope creep: you start researching "agent memory patterns" and end up writing a market sizing report based on vibes. Write the boundary before you start, or the inventory will eat your calendar.
Step 2: Inventory what exists before reading anything
Do not start by reading. Start by counting.
What to inventory
For every source corpus, capture:
| Inventory field | What it measures |
|---|---|
source_url | Where the corpus lives (public URL) |
snapshot_commit | Version/commit/date you accessed |
file_count | How many items/files/threads exist |
total_bytes | Raw size of the corpus |
forum_or_channel_breakdown | How items distribute across sub-groups |
theme_tags | Regex or keyword-based theme classification |
public_link_domains | External domains referenced (top N) |
largest_files | The items that dominate by volume |
Why inventory first
Inventory gives you a map before you start walking. Without it, you will read the three most entertaining threads and call that "research." With it, you know whether you inspected 2% or 80% of the available signal.
Template: source-inventory.json skeleton
{
"created_utc": "<ISO timestamp>",
"source": "<public URL>",
"snapshot_commit": "<commit hash or date>",
"coverage": {
"file_count": 0,
"total_bytes": 0,
"breakdown": []
},
"theme_counts_by_file": [],
"theme_counts_by_bytes": [],
"top_public_link_domains": [],
"largest_files": [],
"scope_note": "<one sentence on what was inventoried vs. what was read>"
}
Common failure mode
Inventoring everything but reading nothing. The inventory is a map, not the destination. Use it to select which items deserve manual inspection.
Step 3: Select sources for manual inspection
From your inventory, pick a small number of high-signal items to read carefully.
Selection criteria
Pick sources that are:
- Representative of a major theme cluster
- High-volume (large files or threads tend to contain more signal)
- Topically diverse (don't pick five threads about the same feature)
- Publicly linkable (can you cite this without exposing semi-private context?)
Template: selected sources log
For each selected source, record:
{
"id": "S1",
"path": "<relative path within corpus>",
"lines_read": "<range, e.g. 1-90>",
"evidence": "<one-sentence summary of what you found>"
}
Assign each source a short ID (S1, S2, S3...) so you can reference them later without repeating full paths.
Common failure mode
Selecting sources that confirm what you already believe. If your inventory shows that 60% of the corpus is about topic X but you only read sources about topic Y, your research has a bias problem.
Step 4: Separate evidence from inference from assumptions
This is the step most hot-take artists skip.
The four categories
| Category | Definition | Example |
|---|---|---|
| Evidence | Something you directly observed in a source | "S5 reports a 57-tool install injecting ~18K tokens per call" |
| Inference | A conclusion you drew from evidence | "Tool catalog size likely degrades agent performance past a threshold" |
| Assumption | Something you believe but cannot verify from sources | "This pattern generalizes to non-Hermes agent frameworks" |
| Unknown | Something you explicitly do not know | "Whether builders actually act on token overhead warnings" |
Template: evidence map
For every claim in your output, trace it:
Claim: <your statement>
evidence: <source ID + what it says>
inference: <what you concluded>
assumption: <what you are assuming>
unknown: <what you cannot verify>
confidence: <high | medium | low>
Common failure mode
Labeling inferences as evidence. "The community cares about X" is an inference. "S3, S4, and S10 all discuss memory/context in threads with 200+ messages" is evidence.
Step 5: Apply the public-safety filter
Before publishing anything derived from public sources, run this checklist.
Public-safety checklist
- [ ] No raw participant identifiers (Discord IDs, usernames, emails) in the public body
- [ ] No long verbatim quotes from semi-private spaces (Discord, private forums, DMs)
- [ ] No affiliation or endorsement claims (saying "inspired by" is not the same as "endorsed by")
- [ ] No private filesystem paths, credentials, or internal URLs exposed
- [ ] No participant identities turned into outreach targets
- [ ] No fake metrics or invented statistics attributed to the source
- [ ] Source citations use generalized references, not exact private-ish wording
- [ ] "What we did NOT access" section is included
Boundary note template
Every public research output should include:
BOUNDARY NOTE:
accessed: <what sources were inspected>
not_accessed: <what was deliberately avoided>
not_implied: <what affiliation/endorsement is NOT being claimed>
outreach_status: <no outreach performed | outreach approved by X on date>
Common failure mode
Treating "it's public" as "it's free to use however I want." A public GitHub repo is different from a public Discord archive where people had semi-private conversations. Adjust your extraction ethics accordingly.
Step 6: Produce the output with traceable claims
Your final deliverable should make the source chain visible.
Required sections for any public research output
- Executive finding — one paragraph summarizing the direction, not the details
- Coverage and method — what was inventoried, what was read, how
- Evidence — organized by theme, each claim traced to a source ID
- Inferences — what you concluded, with confidence levels
- Assumptions — what you are assuming without proof
- Unknowns — what you explicitly cannot determine
- Risks and safety notes — what could go wrong if this research is misused
- Next tests — what would reduce uncertainty
- Boundary note — what was accessed, not accessed, and not implied
Common failure mode
Publishing the hot take without the inventory. Your audience deserves to see the work, not just the conclusion.
Common failure modes summary
| Failure mode | What goes wrong | Prevention |
|---|---|---|
| No inventory | You read three fun threads and call it research | Count before you read |
| No evidence separation | Inferences get published as facts | Use the four-category template |
| No boundary note | Readers cannot tell what you did and did not do | Include the boundary note in every output |
| Identity mining | You turn public participants into outreach targets | Strip all raw identifiers before drafting |
| Affiliation creep | "Inspired by" becomes "endorsed by" in marketing copy | Write the non-affiliation statement explicitly |
| Fake metrics | You invent numbers that "feel right" | Only cite numbers from your actual inventory |
| Scope explosion | Research becomes a book | Write the scope boundary before starting |
| Confirmation selection | You only read sources that agree with you | Compare selection against inventory theme distribution |
Source notes
This template was built from Ana's content queue and resource backlog (2026-06-25), which defined the audience, format, safety rules, and structural requirements for a public-source research inventory resource. The concept was inspired by the general pattern of structured source inventories used in agent-assisted research workflows — counting before reading, separating evidence from inference, and including boundary notes.
This template does not quote, reference, or expose any specific community participants, private conversations, internal paths, or external research findings. All examples and structures are original.
Using this template
- Copy the source-inventory.json skeleton into your workspace.
- Fill in the coverage fields from your target corpus.
- Select 5-15 sources for manual inspection; log them with IDs.
- Draft your output using the evidence/inference/assumption/unknown structure.
- Run the public-safety checklist.
- Include the boundary note.
- Save your source-map.json alongside the output for auditability.
If your research has no inventory, it is gossip in a trench coat. Fix that.
Last updated and provenance
- Created: 2026-06-27
- Author: Ana content strategy lane
- Primary sources: content queue item 15 and resource backlog item 15 (both 2026-06-25)
- Inspiration: structured source inventory patterns from agent-assisted research workflows
- Status: draft resource candidate; requires human review before publication
- Companion files:
source-map.json,claims-checklist.md,verification.json