Sequented Gated Prompting with a Pi extension

I have a repeatable post-feature cleanup sequence:

remove fallback code
remove dead code
update docs

I wanted this workflow often, so I tried creating a Pi agent for it.

Initially Pi itself suggested creating separate slash commands for each step. I did not like that. I did not want to run and remember three different slash commands every time I finished a feature.

So I tried the obvious shortcut instead. I built one extension command and packed all three cleanup steps into a single prompt.

That was the shortcut. And it backfired.

The model handled the first parts, then quietly failed to give the last step the same attention. The run looked complete. It was not complete.

That is the annoying version of failure in agent workflows. Not a crash. Not a refusal. A plausible-looking partial success.

Why One-Shot Looked Fine

One-shot looked efficient on paper:

one user message
one model turn
one cleanup command

The original shape was simple enough:

Review the codebase and complete the following:
1) Remove fallback code
2) Remove dead code
3) Update all .md files in the repo

Nothing about that prompt looks obviously broken.

That framing matters. The problem was not clarity. The problem was asking one turn to carry multiple state transitions with no explicit handoff between them.

The model could appear to do the work without ever making it legible which steps were actually completed and which ones were implicitly dropped.

What The Extension Does Now

The extension now runs three bounded tasks in sequence:

fallback
dead-code
docs

Each task gets its own model turn.

Each turn also ends with an explicit gate:

Reply with exactly one of:
- STEP_DONE - if you completed the task above
- STEP_SKIPPED - if you skipped or did nothing

Then a brief one-line reason.

That is the important change.

The runtime no longer treats cleanup as one large prompt. It treats it as a series of small controlled transitions.

Why The Gate Matters

The extension is not trying to force every step to produce edits.

It is trying to force every step to produce an outcome.

That is a better contract.

For each task, the assistant must say one of two things:

STEP_DONE
STEP_SKIPPED

If the reply does not contain a valid gate result, the extension retries the same task. Up to three attempts.

The pi extension snippet that worked finally implements sequential gated prompting:

async function runTask(pi, ctx, task, attempt = 1) {
  await ctx.waitForIdle();

  const seenEntryIds = new Set(
    ctx.sessionManager.getBranch().map((entry) => entry.id),
  );

  pi.sendUserMessage(buildInstruction(task, attempt));

  await waitForTurnToStart(ctx, seenEntryIds);
  await ctx.waitForIdle();

  const reply = getNewAssistantReply(ctx, seenEntryIds);
  const result = parseTaskResult(reply);

  if (result === "done" || result === "skipped") {
    return result;
  }

  if (attempt >= MAX_ATTEMPTS) {
    return "failed";
  }

  return runTask(pi, ctx, task, attempt + 1);
}

So the retry loop is not “retry until the model edits files.”

It is “retry until the model gives a valid explicit outcome.”

That one decision does a lot of work.

The Real Failure Mode

The original problem was not just that docs were sometimes skipped.

It was that the skip could be silent.

A silent skip is worse than an explicit skip. At least STEP_SKIPPED is observable. You can see it, log it, report it in UI, and decide whether that outcome is acceptable.

The extension now makes that distinction explicit:

done is valid
skipped is valid
malformed or missing gate result is not valid

That gives the system a much cleaner control model.

The Shape Of The Runtime

The core loop is small, but it is more disciplined than the original version.

For each task, the extension:

waits for the session to be idle
snapshots the existing branch entry IDs
sends one bounded instruction for the current task
waits for the turn to start
waits for the session to return to idle
reads the new assistant reply only
parses the gate result
retries if the gate result is malformed or missing

That structure matters more than the prompt wording by itself.

Before understanding why the extension is reliable, it helps to understand what it is actually enforcing. It is not enforcing edits. It is enforcing explicit state.

Why This Wins In Practice

LLMs are generators, not transactional systems.

If you pack multiple cleanup phases into one prompt, the model has to infer sequencing, completion semantics, and stopping conditions all at once. That is where workflow drift starts.

Sequential gated prompting reduces that drift by making each step narrow and observable.

The trade-off is honest:

more turns
slightly more tokens
clearer state transitions
no silent terminal-step disappearance

And the extension adds one more useful distinction: skipped is not failure.

That matters because sometimes there really is nothing to change in one phase. If docs are already current, STEP_SKIPPED is the correct answer. The system should accept that and move on.

What it should not accept is ambiguity.

Why I Like This Pattern

A lot of agent failures come from pretending natural language alone is a workflow engine. It is not.

If a process has order, bounded phases, and different valid outcomes, you need explicit transitions between those phases.

In this case the valid outcomes are simple:

done
skipped
failed to provide a valid result after retries

That is enough structure to make the automation trustworthy without overbuilding it.

The Takeaway

If your workflow depends on order, do not hide multiple state transitions inside one prompt.

Break the work into bounded steps. Require an explicit outcome for each one. Retry ambiguity, not honest skips.

That is the small shift that made this extension actually reliable.