94 lines
4.0 KiB
Markdown
94 lines
4.0 KiB
Markdown
# Step-Runner
|
|
|
|
A graph-based agent that executes **one step** of a phased implementation
|
|
plan, with the step protocol from the `step-implementation` skill enforced
|
|
as graph edges rather than prose. Designed to be delegated to by
|
|
**[Sisyphus](../sisyphus/README.md)**; delegates implementation to
|
|
**[Coder](../coder/README.md)** and independent review to
|
|
**[code-reviewer](../code-reviewer/README.md)**.
|
|
|
|
It expects a plan repo authored per the `plan-authoring` skill:
|
|
|
|
```
|
|
plans/
|
|
steps/NN-<slug>.md # step plans with frontmatter (step/title/depends_on/status)
|
|
handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
|
|
NOTES.md # rolling durable facts
|
|
```
|
|
|
|
## Workflow
|
|
|
|
```
|
|
resolve_step (script) locate plan + previous handoff, check depends_on,
|
|
↓ mark plan in-progress [→ gate_blocked if deps unsatisfied]
|
|
orient (llm, read-only) merge handoff directives + staleness-check the plan
|
|
↓
|
|
route_staleness (script) major deviation → gate_deviation (approval)
|
|
↓
|
|
implement (agent → coder) coder runs its own build/test/self-review fix-loop
|
|
↓
|
|
route_coder_result (script) COMPLETE → verify | REJECTED / FAILED → end
|
|
↓
|
|
verify_format_lint (script) format BEFORE evidence, then lint
|
|
verify_build (script) step-level build/typecheck
|
|
verify_tests (script) FULL test suite
|
|
↓ [failures → fix_loop_gate, back-edge to implement]
|
|
edge_case_sweep (llm) missed edge cases; annotate downstream plans
|
|
↓ (Edge cases sections ONLY - scope changes become proposals)
|
|
route_sweep (script) 5+ files or architectural boundary → independent_review
|
|
independent_review (agent) code-reviewer; 🔴 findings loop back to implement (bounded)
|
|
↓
|
|
write_handoff (llm) evidence-backed handoff per handoff-protocol + NOTES.md
|
|
check_handoff (script) deterministic schema gate; marks plan status complete
|
|
↓
|
|
gate_user_review (approval) HARD STOP - approve, or send revision comments
|
|
↓ (revisions loop through implement → verify → handoff again)
|
|
end_success / end_blocked / end_rejected / end_failure
|
|
```
|
|
|
|
End nodes emit sentinel outcomes for the caller:
|
|
|
|
- `STEP_COMPLETE` — step implemented, verified, handoff written, user approved.
|
|
- `STEP_BLOCKED` — `depends_on` unsatisfied and the user declined to proceed.
|
|
- `STEP_REJECTED` — user aborted at the deviation gate, or the coder's plan
|
|
was rejected at its approval gate.
|
|
- `STEP_FAILED` — coder failed, the step-level fix budget was exhausted, or
|
|
the handoff failed validation twice.
|
|
|
|
## Usage
|
|
|
|
```sh
|
|
# From the project root: run the next in-progress/pending step
|
|
coyote -a step-runner "Execute the next step"
|
|
|
|
# A specific step (also parsed from the prompt: "execute step 3")
|
|
coyote -a step-runner --agent-variable step 3 "Execute step 3"
|
|
|
|
# Plan repo somewhere else
|
|
coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"
|
|
```
|
|
|
|
**Invoke from the project root.** The coder sub-agent resolves its own
|
|
`project_dir` from the invocation directory; overriding `project_dir` here
|
|
does not propagate to the spawned coder.
|
|
|
|
## Tuning
|
|
|
|
`graph.yaml` `initial_state` exposes:
|
|
|
|
- `max_fix_attempts` (default `2`) — step-level fix budget (the coder has
|
|
its own internal budget of 3).
|
|
- `max_review_attempts` (default `1`) — bounded 🔴-finding fix loops after
|
|
independent review.
|
|
|
|
Environment overrides honored by the script nodes:
|
|
|
|
- `FORMAT_CMD` / `LINT_CMD` — formatting and linting (otherwise a per-type
|
|
heuristic formats, and linting defers to the build/check command).
|
|
- `BUILD_CMD` / `TEST_CMD` — skip project-type detection (same as coder).
|
|
- `STEP_AUTOAPPROVE=1` — bypass the deviation gate (non-interactive runs).
|
|
- `STEP_SKIP_REVIEW=1` — never spawn the independent reviewer.
|
|
|
|
The final user approval gate is never bypassed by an environment variable -
|
|
it is the point of the workflow.
|