Agent Platform Sunset Checklist: what to save before the builder changes under you
A platform change is a control test, not gossip. Use this checklist to figure out what workflow truth you actually own outside the provider dashboard — before the builder changes under you.
Evidence-labeled hook: platform churn is a control test
A public OpenAI AgentKit page included an update dated June 3, 2026 saying OpenAI is winding down Agent Builder and Evals, and that from November 30, 2026 onward they will no longer be available on the OpenAI platform.
That is not a dunk, a migration pitch, or insider knowledge. It is a dated public source. The useful question is the same one that applies to any provider-native builder: what do you own outside the dashboard?
If the answer is "not much," the fix is the same regardless of which platform changed: export your workflow truth into formats you control.
1. Workflow inventory
Why it matters
You cannot export what you cannot name. A workflow is not just a prompt; it is the full chain of triggers, inputs, tools, decisions, side effects, and stop conditions. If the inventory lives only in a dashboard, you do not own it.
Checklist
- Name every trigger: schedule, webhook, manual button, event, message, file change, or human approval.
- Name every input: data source, schema, auth, rate limit, pagination, and failure behavior.
- Name every output: what gets written, sent, deployed, published, charged, or mutated.
- Classify side effects: public post, external API call, file write, credential use, payment, notification, or state change.
- Name the stop conditions: what makes the workflow refuse, retry, escalate, or pause.
- Record dependencies: which models, providers, SDKs, browser profiles, or platform features are required.
2. Prompt and instruction export
Why it matters
The prompt is not the workflow. But losing it is still a problem. Provider-native builders often store instructions, guardrails, and output schemas inside their UI. If you cannot read those back as plain text, you do not own them.
Checklist
- Export the system prompt, instructions, or role definition as readable text.
- Export refusal rules, safety rules, and forbidden-action lists.
- Export output schemas, response formats, and structured-output definitions.
- Export model selection, temperature, max tokens, and other generation parameters.
- Separate provider-specific settings from portable instruction text.
- Version the export with a date and source label.
3. Eval case export
Why it matters
"It works" is not proof. Eval cases are what let you say "it works on these inputs, produces these outputs, and refuses these actions." Provider-native eval surfaces are convenient, but they are not a substitute for owning the cases: inputs, expected outputs, pass/fail rules, fixtures, edge cases, known regressions, and who decided the criteria.
Checklist
- Export or recreate the eval cases in a readable, portable format.
- Separate public-safe examples from private/customer cases.
- Include expected behavior, forbidden behavior, and acceptable uncertainty.
- Record which tools should or should not be called.
- Include negative cases: the workflow should refuse, block, or ask for approval.
- Record who set the pass/fail criteria and when.
4. Tool permission inventory
Why it matters
A workflow that depends on tools you cannot inventory is a workflow you cannot safely move. Tools include API integrations, browser access, file system access, code execution, external service calls, and internal endpoints.
Checklist
- List every tool the workflow can call.
- Record the scope and risk class for each tool: read-only, write, public post, payment, credential use, or irreversible action.
- Record whether permission is inherited from a parent agent, subagent, workspace, browser profile, OAuth grant, API key, or platform project.
- Record approval requirements before execution.
- Record revocation and rotation path.
- Record what happens if the tool is unavailable.
5. HITL approval payloads
Why it matters
"Approve?" is not human-in-the-loop. It is a tiny trap wearing a button. A human approval step only helps if the human can see enough to decide safely before the side effect happens. The approval payload should survive outside the platform UI because it is part of the workflow's control logic.
Checklist
For any approval-gated action, capture:
- proposed action,
- target surface,
- affected record/file/account/channel,
- expected side effect,
- spend or cost exposure,
- risk class,
- exact diff, preview, arguments, or output summary,
- rollback or undo path,
- timeout behavior,
- what "allow," "block," "modify," or "ask for more context" means,
- approval scope and expiry,
- durable approval/block reason.
6. Session and checkpoint portability
Why it matters
A long-running agent workflow is held together by state: session IDs, checkpoints, file locks, partial outputs, handoffs, memory, queue state, and last verified side effects. If the state only exists inside one platform and cannot be read back, resume becomes theatre. A fallback route needs to know what already happened, what was skipped, and what must not be repeated.
Checklist
- Name every durable state type: memory, session, checkpoint, queue item, artifact, trace ID, side-effect ledger, approval record, cost receipt.
- Record where each state type is stored.
- Record what can be exported, reconstructed, or deliberately abandoned.
- Save input hash/version, output hash/version, last verified step, and side-effect IDs where appropriate.
- Use idempotency keys for external writes where possible.
- Test crash, timeout, cancellation, duplicate dispatch, and fallback resume behavior before claiming portability.
- Keep model summaries separate from verified side-effect ledgers.
7. Traces and logs
Why it matters
A trace should help you understand what happened. It should not become a secret dumpster. Provider dashboards often make traces convenient, but your workflow needs safe evidence outside a single UI: what ran, which tools were called, what failed, what was approved, what was blocked, what it cost, and what proof exists.
Checklist
- Capture safe trace IDs, run IDs, timestamps, tool names, status, error class, retry count, and verification result.
- Do not dump full private prompts, credentials, customer data, account IDs, billing material, production URLs, or raw proprietary exports into public logs.
- Separate debug logs from public-safe failure summaries.
- Record typed errors: no action, parse error, schema validation error, forbidden action, tool execution error, provider outage, approval timeout, cost cap hit.
- Keep enough evidence to reproduce the class of failure without leaking the private payload.
- Record where raw sensitive logs live and who is allowed to access them, if they must exist at all.
8. Sandbox and secrets boundary
Why it matters
A platform change is a bad time to discover that the workflow was depending on invisible access to files, browser sessions, OAuth grants, API keys, hosted sandboxes, or environment variables. Sandboxes are useful because they create a boundary. Secrets are useful because they unlock actions. Confusing the two is how a tidy workflow becomes a very expensive incident-shaped lesson.
Checklist
- List where code or tools execute: hosted sandbox, local machine, container, browser profile, CI worker, cloud function, VPS, or user device.
- Record what the sandbox can read and write.
- Record network access rules.
- Record filesystem boundaries and cleanup rules.
- Record which secrets exist, where they are stored, who/what can read them, and how they rotate. Do not put the secret values in the checklist.
- Record whether secrets are injected at runtime, copied into files, inherited by child agents, or exposed to tool logs.
- Record what happens when credentials expire, are revoked, or fail refresh.
- Test that blocked secrets and forbidden paths are actually blocked.
9. Fallback route
Why it matters
A fallback route is not a heroic rewrite. It is the smallest safe path that preserves the useful outcome when the preferred platform path is paused, changed, or removed. The fallback may be manual. That is not embarrassing. Manual beats pretending a broken agent still works because the demo video was pretty.
Checklist
- Name the minimum outcome the workflow must preserve.
- Decide whether the fallback is manual, semi-automated, another platform, a script, a queue, a human review path, or "pause safely."
- Name which fields are required to run the fallback: instructions, eval cases, tool list, approval payloads, state, traces, artifacts, secrets boundary, and final proof.
- Name which fields are optional convenience, not survival-critical.
- Define the first fallback test using fake/sanitized data.
- Define the stop sign: if private access, legal/compliance interpretation, production risk, or unclear cost appears, stop and route to the right owner.
- Record cost and time expectations honestly.
10. Final proof receipts
Why it matters
The agent saying "done" is not proof. The dashboard saying "green" is not enough either. Final proof is what another person can inspect without reverse-engineering the whole workflow: a readback, export, test, checksum, source label, approval record, safe trace, artifact, URL, or explicit blocker.
Checklist
- Read back the exported instructions or key config.
- Run at least one smoke eval against sanitized cases.
- Verify tool permissions from an independent inventory, not just memory.
- Verify approval payload fields before any sensitive action.
- Verify checkpoint or session state can be read back or deliberately abandoned.
- Verify traces/logs exist without leaking secrets.
- Verify sandbox and secret boundaries through a blocked-action test where appropriate.
- Verify fallback route on a tiny fake/sanitized case before needing it.
- Record spend, side effects, approvals, skipped actions, blockers, and known unknowns.
What not to do
- Do not panic-migrate. Panic rewards the loudest shortcut, not the safest route.
- Do not paste private logs publicly. A useful failure class is not the same thing as a raw trace dump.
- Do not trust provider dashboards as the only source of truth. Dashboards are useful views; they are not ownership by themselves.
- Do not promise continuity without a test. If the fallback has not run on a safe case, call it a hypothesis.
- Do not turn a sunset notice into a vendor dunk. Churn is a control test, not gossip fuel.
- Do not ask strangers on the internet to review credentials, account screenshots, production URLs, customer data, billing exports, or proprietary workflow bundles.
- Do not call this security, compliance, legal, privacy, or production-readiness approval. It is an operational self-check.
- Do not launch pricing, booking, support, or rescue claims from this checklist. Content is not demand proof.
Tiny worksheet
Use this privately. Keep examples sanitized.
Workflow name: Owner: Platform/builder/dashboard currently used: Minimum useful outcome: 1. Workflow inventory [ ] Triggers named [ ] Inputs/outputs named [ ] Side effects classified [ ] Stop conditions named Receipt location: 2. Prompt/instruction export [ ] Instructions exported [ ] Guardrails/refusals exported [ ] Output schema exported [ ] Provider-specific settings separated Receipt location: 3. Eval case export [ ] Smoke cases exported [ ] Negative/refusal cases exported [ ] Expected proof named [ ] Private cases sanitized or kept internal Receipt location: 4. Tool permission inventory [ ] Tools listed [ ] Scopes/risk classes listed [ ] Approval requirements listed [ ] Revocation/fallback path listed Receipt location: 5. HITL approval payloads [ ] Action/target/risk visible [ ] Diff/preview/arguments visible [ ] Rollback/undo path visible [ ] Approval scope/expiry recorded Receipt location: 6. Sessions/checkpoints [ ] State types named [ ] Storage locations named [ ] Last verified step recorded [ ] Resume/abandon rule named Receipt location: 7. Traces/logs [ ] Safe run receipt exists [ ] Error classes are typed [ ] Secrets/private data excluded from public logs [ ] Raw sensitive log access is controlled, if raw logs exist Receipt location: 8. Sandbox/secrets boundary [ ] Execution environment named [ ] Allowed reads/writes named [ ] Secrets storage class named without values [ ] Revocation/expiry behavior named Receipt location: 9. Fallback route [ ] Minimum fallback path named [ ] Required inputs named [ ] Tiny sanitized test defined [ ] Stop signs named Receipt location: 10. Final proof [ ] Export/readback performed [ ] Smoke eval performed [ ] Tool permissions verified [ ] Fallback tested or explicitly not tested [ ] Spend/side effects/blockers recorded Receipt location:
What this does not prove
Completing this checklist does not prove that a workflow is secure, compliant, private, production-ready, reliable, profitable, portable, or safe for every domain. It does not replace legal review, security review, privacy review, incident response, platform-specific migration guidance, or implementation testing.
It proves something smaller and more useful: you know what workflow truth you own outside the provider dashboard, what you do not own yet, and what test would turn a fallback claim into a receipt.
Receipts beat vibes. Especially when the builder changes under you.
Source and attribution note
This draft is grounded in public source labels and internal editorial notes: the dated public AgentKit update named above; Nylas Agentic AI Report 2026; Claude Code Agent SDK docs; Microsoft Agent Framework HITL docs and Build 2026 announcement; other public agent SDK documentation; and existing Ana receipt/approval-payload language.
Public source pages can change. Re-check the dated public hook before publication. This page does not claim endorsement by, affiliation with, private access to, or implementation validation for any provider, framework, project, or report named above.
Last updated: 2026-06-29.