I have a post-feature cleanup ritual: remove fallback code, remove dead code, update the docs. Three steps. Every time.
For a while, I ran these manually. Then I decided to automate it. I was building a Pi extension and thought: I will just send one prompt that tells the LLM to do all three steps at once.
That was the shortcut. And it backfired.
The Problem With One Shot
The one-shot approach looks efficient. One prompt, one LLM call, three tasks done. No overhead. No complexity.
Here is what the prompt looked like:
Review the codebase and complete the following:
1. Remove fallback code (try/catch that swallow errors, commented stubs, etc.)
2. Remove dead code (unused exports, old feature branches, etc.)
3. Update all .md files in the repo.
Clean. Concise. Wrong.
The LLM ran through step 1 and step 2 without complaint. Then it stopped. No docs update. No warning. It simply moved on.
This is a well-known failure mode of one-shot prompting: when you bundle multiple tasks, the last one is the most likely to get dropped. The LLM has already spent attention on the first two tasks. It has answered the question to its satisfaction. The third task arrives as an afterthought.
The short version is this: the one-shot shortcut created more work than it saved. I had to run the cleanup again, which meant another LLM call, more tokens, more time.
What I Tried First
Before the one-shot approach, I had considered a different architecture. Multiple slash commands:
/cleanup-fallbacks/cleanup-dead-code/update-docs
Each command fires a separate prompt. Each one is independent.
This works. But the developer experience is bad. You have to remember to run three commands. You have to wait for each one to finish. You have to babysit the process.
The problem with slash commands for a sequential workflow is that they treat each step as equal when the workflow itself has a natural order. I did not want three commands. I wanted one.
Sequential Gated Prompting
The fix is straightforward. Break the one-shot into three separate prompts. After each one, verify completion before moving to the next.
Here is the pattern I landed on:
- Send the task prompt
- Send a verification prompt: reply with exactly
STEP_DONEorSTEP_SKIPPEDand a one-line reason - If
STEP_SKIPPED- retry the task - If
STEP_DONE- move to the next step
Each step gets a full LLM turn. Each step gates the next. The LLM cannot skip because it has to explicitly declare completion.
The verification prompt is intentionally structured. It does not say “did you finish?” in natural language. It asks for a specific token. This makes parsing trivial and prevents the LLM from drifting into a natural-language explanation that might hide an incomplete task.
Here is what it looks like in code:
async function runTask(pi: ExtensionAPI, task) {
await pi.sendUserMessage(task.prompt, { deliverAs: "followUp" });
const result = await waitForGate();
if (result.includes("STEP_SKIPPED")) {
return runTask(pi, task); // retry
}
return true;
}
The extension loops through all three tasks, each one independently gated.
Why This Works Better
The one-shot approach treats the LLM as a reader who will do everything you ask. The sequential gated approach treats it as an agent who needs explicit completion signals.
That distinction matters because LLMs are not readers. They are generators. They will stop when they feel done, not when you are done.
The gate forces a handoff. It says: you are not done until you say you are done, and even then I am going to check.
That one decision does a lot of work.
The Takeaway
If you are building agentic workflows with multiple steps, do not bundle them into one prompt to save tokens. The token savings are illusory. You will pay for them in retries and missed steps.
Instead, build sequential gates. Each step gets attention. Each step is verified. Each step gates the next.
It is more prompts. It is more tokens per step. But it completes.
That is the trade-off that is actually worth making.