Source Inventory Before Hot Takes

A repeatable, public-safe workflow for turning archives, forums, repos, and community threads into strategy without quote-mining, identity scraping, or unsupported claims.

Public-safe resource template for agent-assisted research

Last updated: 2026-06-27 Status: draft resource candidate; not published Author lane: Ana content strategy

Audience

Content operators, agent builders, and small research teams who turn public archives, forums, repos, or community threads into strategy, blog posts, or public advice — and who want to avoid quote-mining, identity scraping, or unsupported claims.

If you have ever published a "hot take" based on something you skimmed in a Discord server or GitHub issue tracker, this template is for you.

Promise

By the end of this resource, you will have a repeatable source inventory workflow that:

Separates evidence from inference from assumptions from unknowns.
Prevents you from publishing advice built on sources you never actually inspected.
Makes your research auditable: someone can check what you looked at, what you skipped, and why.
Keeps you safe from extractive research patterns: no participant mining, no affiliation claims, no long quotes from semi-private spaces.

Prerequisites

Before using this template, you should have:

A defined research scope: what question are you trying to answer?
Access to at least one public source corpus: a GitHub repo, a public archive, a forum export, an open dataset, or documentation.
A workspace where you can save structured files (JSON, Markdown).
A rule: if you did not inventory it, you cannot cite it.

The core idea

"Public" does not mean "yours to strip-mine." A public Discord archive is still a space where real people talked about real problems. Your job as a researcher is to extract patterns, not harvest identities or copy conversations.

A source inventory is the difference between:

"I read 896 files across three forums, inventoried themes by byte volume, manually inspected 14 high-signal threads, and here is what I found" (research)
"Someone on Discord said X" (gossip in a trench coat)

Step 1: Define the research scope and boundary

Before touching any source, write down:

SCOPE:
  question: <What are you trying to learn?>
  audience: <Who will read the output?>
  output_type: <blog post | strategy memo | resource template | social thread>
  boundary: <What will you explicitly NOT do with this research?>

Template fields

Field	Description	Example
`question`	The research question in one sentence	"What do agent builders care about most in 2026?"
`audience`	Who the output serves	"Content operators building agent-themed blogs"
`output_type`	Format of the deliverable	"Public resource template"
`boundary`	What you refuse to do	"No outreach, no identity scraping, no long quotes"

Common failure mode

Scope creep: you start researching "agent memory patterns" and end up writing a market sizing report based on vibes. Write the boundary before you start, or the inventory will eat your calendar.

Step 2: Inventory what exists before reading anything

Do not start by reading. Start by counting.

What to inventory

For every source corpus, capture:

Inventory field	What it measures
`source_url`	Where the corpus lives (public URL)
`snapshot_commit`	Version/commit/date you accessed
`file_count`	How many items/files/threads exist
`total_bytes`	Raw size of the corpus
`forum_or_channel_breakdown`	How items distribute across sub-groups
`theme_tags`	Regex or keyword-based theme classification
`public_link_domains`	External domains referenced (top N)
`largest_files`	The items that dominate by volume

Why inventory first

Inventory gives you a map before you start walking. Without it, you will read the three most entertaining threads and call that "research." With it, you know whether you inspected 2% or 80% of the available signal.

Template: source-inventory.json skeleton

{
  "created_utc": "<ISO timestamp>",
  "source": "<public URL>",
  "snapshot_commit": "<commit hash or date>",
  "coverage": {
    "file_count": 0,
    "total_bytes": 0,
    "breakdown": []
  },
  "theme_counts_by_file": [],
  "theme_counts_by_bytes": [],
  "top_public_link_domains": [],
  "largest_files": [],
  "scope_note": "<one sentence on what was inventoried vs. what was read>"
}

Common failure mode

Inventoring everything but reading nothing. The inventory is a map, not the destination. Use it to select which items deserve manual inspection.

Step 3: Select sources for manual inspection

From your inventory, pick a small number of high-signal items to read carefully.

Selection criteria

Pick sources that are:

Representative of a major theme cluster
High-volume (large files or threads tend to contain more signal)
Topically diverse (don't pick five threads about the same feature)
Publicly linkable (can you cite this without exposing semi-private context?)

Template: selected sources log

For each selected source, record:

{
  "id": "S1",
  "path": "<relative path within corpus>",
  "lines_read": "<range, e.g. 1-90>",
  "evidence": "<one-sentence summary of what you found>"
}

Assign each source a short ID (S1, S2, S3...) so you can reference them later without repeating full paths.

Common failure mode

Selecting sources that confirm what you already believe. If your inventory shows that 60% of the corpus is about topic X but you only read sources about topic Y, your research has a bias problem.

Step 4: Separate evidence from inference from assumptions

This is the step most hot-take artists skip.

The four categories

Category	Definition	Example
Evidence	Something you directly observed in a source	"S5 reports a 57-tool install injecting ~18K tokens per call"
Inference	A conclusion you drew from evidence	"Tool catalog size likely degrades agent performance past a threshold"
Assumption	Something you believe but cannot verify from sources	"This pattern generalizes to non-Hermes agent frameworks"
Unknown	Something you explicitly do not know	"Whether builders actually act on token overhead warnings"

Template: evidence map

For every claim in your output, trace it:

Claim: <your statement>
  evidence: <source ID + what it says>
  inference: <what you concluded>
  assumption: <what you are assuming>
  unknown: <what you cannot verify>
  confidence: <high | medium | low>

Common failure mode

Labeling inferences as evidence. "The community cares about X" is an inference. "S3, S4, and S10 all discuss memory/context in threads with 200+ messages" is evidence.

Step 5: Apply the public-safety filter

Before publishing anything derived from public sources, run this checklist.

Public-safety checklist

[ ] No raw participant identifiers (Discord IDs, usernames, emails) in the public body
[ ] No long verbatim quotes from semi-private spaces (Discord, private forums, DMs)
[ ] No affiliation or endorsement claims (saying "inspired by" is not the same as "endorsed by")
[ ] No private filesystem paths, credentials, or internal URLs exposed
[ ] No participant identities turned into outreach targets
[ ] No fake metrics or invented statistics attributed to the source
[ ] Source citations use generalized references, not exact private-ish wording
[ ] "What we did NOT access" section is included

Boundary note template

Every public research output should include:

BOUNDARY NOTE:
  accessed: <what sources were inspected>
  not_accessed: <what was deliberately avoided>
  not_implied: <what affiliation/endorsement is NOT being claimed>
  outreach_status: <no outreach performed | outreach approved by X on date>

Common failure mode

Treating "it's public" as "it's free to use however I want." A public GitHub repo is different from a public Discord archive where people had semi-private conversations. Adjust your extraction ethics accordingly.

Step 6: Produce the output with traceable claims

Your final deliverable should make the source chain visible.

Required sections for any public research output

Executive finding — one paragraph summarizing the direction, not the details
Coverage and method — what was inventoried, what was read, how
Evidence — organized by theme, each claim traced to a source ID
Inferences — what you concluded, with confidence levels
Assumptions — what you are assuming without proof
Unknowns — what you explicitly cannot determine
Risks and safety notes — what could go wrong if this research is misused
Next tests — what would reduce uncertainty
Boundary note — what was accessed, not accessed, and not implied

Common failure mode

Publishing the hot take without the inventory. Your audience deserves to see the work, not just the conclusion.

Common failure modes summary

Failure mode	What goes wrong	Prevention
No inventory	You read three fun threads and call it research	Count before you read
No evidence separation	Inferences get published as facts	Use the four-category template
No boundary note	Readers cannot tell what you did and did not do	Include the boundary note in every output
Identity mining	You turn public participants into outreach targets	Strip all raw identifiers before drafting
Affiliation creep	"Inspired by" becomes "endorsed by" in marketing copy	Write the non-affiliation statement explicitly
Fake metrics	You invent numbers that "feel right"	Only cite numbers from your actual inventory
Scope explosion	Research becomes a book	Write the scope boundary before starting
Confirmation selection	You only read sources that agree with you	Compare selection against inventory theme distribution

Source notes

This template was built from Ana's content queue and resource backlog (2026-06-25), which defined the audience, format, safety rules, and structural requirements for a public-source research inventory resource. The concept was inspired by the general pattern of structured source inventories used in agent-assisted research workflows — counting before reading, separating evidence from inference, and including boundary notes.

This template does not quote, reference, or expose any specific community participants, private conversations, internal paths, or external research findings. All examples and structures are original.

Using this template

Copy the source-inventory.json skeleton into your workspace.
Fill in the coverage fields from your target corpus.
Select 5-15 sources for manual inspection; log them with IDs.
Draft your output using the evidence/inference/assumption/unknown structure.
Run the public-safety checklist.
Include the boundary note.
Save your source-map.json alongside the output for auditability.

If your research has no inventory, it is gossip in a trench coat. Fix that.

Last updated and provenance

Created: 2026-06-27
Author: Ana content strategy lane
Primary sources: content queue item 15 and resource backlog item 15 (both 2026-06-25)
Inspiration: structured source inventory patterns from agent-assisted research workflows
Status: draft resource candidate; requires human review before publication
Companion files: source-map.json, claims-checklist.md, verification.json

Public-safety note: this static staged page performs no account, credential, payment, outreach, deployment, provider, gateway, DNS, service, upload, or spend actions. Spend: zero.

Source Inventory Before Hot Takes

Public safety status

Who this is for

Public-safe resource template for agent-assisted research

Audience

Promise

Prerequisites

The core idea

Step 1: Define the research scope and boundary

Template fields

Common failure mode

Step 2: Inventory what exists before reading anything

What to inventory

Why inventory first

Template: source-inventory.json skeleton

Common failure mode

Step 3: Select sources for manual inspection

Selection criteria

Template: selected sources log

Common failure mode

Step 4: Separate evidence from inference from assumptions

The four categories

Template: evidence map

Common failure mode

Step 5: Apply the public-safety filter

Public-safety checklist

Boundary note template

Common failure mode

Step 6: Produce the output with traceable claims

Required sections for any public research output

Common failure mode

Common failure modes summary

Source notes

Using this template

Last updated and provenance

Ana takeaway