4 min read
The Shortcut That Backfired: One-Shot Prompting and the Step-Skipping Problem

I have a post-feature cleanup ritual: remove fallback code, remove dead code, update the docs. Three steps. Every time.

For a while, I ran these manually. Then I decided to automate it. I was building a Pi extension and thought: I will just send one prompt that tells the LLM to do all three steps at once.

That was the shortcut. And it backfired.

The Problem With One Shot

The one-shot approach looks efficient. One prompt, one LLM call, three tasks done. No overhead. No complexity.

Here is what the prompt looked like:

Review the codebase and complete the following:
1. Remove fallback code (try/catch that swallow errors, commented stubs, etc.)
2. Remove dead code (unused exports, old feature branches, etc.)
3. Update all .md files in the repo.

Clean. Concise. Wrong.

The LLM ran through step 1 and step 2 without complaint. Then it stopped. No docs update. No warning. It simply moved on.

This is a well-known failure mode of one-shot prompting: when you bundle multiple tasks, the last one is the most likely to get dropped. The LLM has already spent attention on the first two tasks. It has answered the question to its satisfaction. The third task arrives as an afterthought.

The short version is this: the one-shot shortcut created more work than it saved. I had to run the cleanup again, which meant another LLM call, more tokens, more time.

What I Tried First

Before the one-shot approach, I had considered a different architecture. Multiple slash commands:

  • /cleanup-fallbacks
  • /cleanup-dead-code
  • /update-docs

Each command fires a separate prompt. Each one is independent.

This works. But the developer experience is bad. You have to remember to run three commands. You have to wait for each one to finish. You have to babysit the process.

The problem with slash commands for a sequential workflow is that they treat each step as equal when the workflow itself has a natural order. I did not want three commands. I wanted one.

Sequential Gated Prompting

The fix is straightforward. Break the one-shot into three separate prompts. After each one, verify completion before moving to the next.

Here is the pattern I landed on:

  1. Send the task prompt
  2. Send a verification prompt: reply with exactly STEP_DONE or STEP_SKIPPED and a one-line reason
  3. If STEP_SKIPPED - retry the task
  4. If STEP_DONE - move to the next step

Each step gets a full LLM turn. Each step gates the next. The LLM cannot skip because it has to explicitly declare completion.

The verification prompt is intentionally structured. It does not say “did you finish?” in natural language. It asks for a specific token. This makes parsing trivial and prevents the LLM from drifting into a natural-language explanation that might hide an incomplete task.

Here is what it looks like in code:

async function runTask(pi: ExtensionAPI, task) {
  await pi.sendUserMessage(task.prompt, { deliverAs: "followUp" });

  const result = await waitForGate();
  if (result.includes("STEP_SKIPPED")) {
    return runTask(pi, task); // retry
  }
  return true;
}

The extension loops through all three tasks, each one independently gated.

Why This Works Better

The one-shot approach treats the LLM as a reader who will do everything you ask. The sequential gated approach treats it as an agent who needs explicit completion signals.

That distinction matters because LLMs are not readers. They are generators. They will stop when they feel done, not when you are done.

The gate forces a handoff. It says: you are not done until you say you are done, and even then I am going to check.

That one decision does a lot of work.

The Takeaway

If you are building agentic workflows with multiple steps, do not bundle them into one prompt to save tokens. The token savings are illusory. You will pay for them in retries and missed steps.

Instead, build sequential gates. Each step gets attention. Each step is verified. Each step gates the next.

It is more prompts. It is more tokens per step. But it completes.

That is the trade-off that is actually worth making.