Step-Runner

A graph-based agent that executes one step of a phased implementation plan, with the step protocol from the step-implementation skill enforced as graph edges rather than prose. Designed to be delegated to by Sisyphus; delegates implementation to Coder and independent review to code-reviewer.

It expects a plan repo authored per the plan-authoring skill:

plans/
  steps/NN-<slug>.md    # step plans with frontmatter (step/title/depends_on/status)
  handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
  NOTES.md              # rolling durable facts

Workflow

resolve_step (script)         locate plan + previous handoff, check depends_on,
        ↓                     mark plan in-progress   [→ gate_blocked if deps unsatisfied]
orient (llm, read-only)       merge handoff directives + staleness-check the plan
        ↓
route_staleness (script)      major deviation → gate_deviation (approval)
        ↓
implement (agent → coder)     coder runs its own build/test/self-review fix-loop
        ↓
route_coder_result (script)   COMPLETE → verify | REJECTED / FAILED → end
        ↓
verify_format_lint (script)   format BEFORE evidence, then lint
verify_build (script)         step-level build/typecheck
verify_tests (script)         FULL test suite
        ↓                     [failures → fix_loop_gate, back-edge to implement]
edge_case_sweep (llm)         missed edge cases; annotate downstream plans
        ↓                     (Edge cases sections ONLY - scope changes become proposals)
route_sweep (script)          5+ files or architectural boundary → independent_review
independent_review (agent)    code-reviewer; 🔴 findings loop back to implement (bounded)
        ↓
write_handoff (llm)           evidence-backed handoff per handoff-protocol + NOTES.md
check_handoff (script)        deterministic schema gate; marks plan status complete
        ↓
gate_user_review (approval)   HARD STOP - approve, or send revision comments
        ↓                     (revisions loop through implement → verify → handoff again)
end_success / end_blocked / end_rejected / end_failure

End nodes emit sentinel outcomes for the caller:

STEP_COMPLETE — step implemented, verified, handoff written, user approved.
STEP_BLOCKED — depends_on unsatisfied and the user declined to proceed.
STEP_REJECTED — user aborted at the deviation gate, or the coder's plan was rejected at its approval gate.
STEP_FAILED — coder failed, the step-level fix budget was exhausted, or the handoff failed validation twice.

Usage

# From the project root: run the next in-progress/pending step
coyote -a step-runner "Execute the next step"

# A specific step (also parsed from the prompt: "execute step 3")
coyote -a step-runner --agent-variable step 3 "Execute step 3"

# Plan repo somewhere else
coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"

Invoke from the project root. The coder sub-agent resolves its own project_dir from the invocation directory; overriding project_dir here does not propagate to the spawned coder.

Tuning

graph.yaml initial_state exposes:

max_fix_attempts (default 2) — step-level fix budget (the coder has its own internal budget of 3).
max_review_attempts (default 1) — bounded 🔴-finding fix loops after independent review.

Environment overrides honored by the script nodes:

FORMAT_CMD / LINT_CMD — formatting and linting (otherwise a per-type heuristic formats, and linting defers to the build/check command).
BUILD_CMD / TEST_CMD — skip project-type detection (same as coder).
STEP_AUTOAPPROVE=1 — bypass the deviation gate (non-interactive runs).
STEP_SKIP_REVIEW=1 — never spawn the independent reviewer.

The final user approval gate is never bypassed by an environment variable - it is the point of the workflow.

4.0 KiB Raw Blame History

Step-Runner

Workflow

Usage

Tuning

4.0 KiB

Raw Blame History