feat: Created the step-runner graph agent for more deterministic coding workflows to produce even more reliable and higher-quality results

This commit is contained in:
2026-07-04 12:50:37 -06:00
parent 159afbbc06
commit 9d2e936e7f
15 changed files with 1333 additions and 0 deletions
+5
View File
@@ -132,6 +132,7 @@ instructions: |
| `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel | | `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
| `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop | | `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
| `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results | | `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results |
| `step-runner` | Execute ONE step of a phased plan repo (Phase 8) | Graph agent: orient → staleness check → coder → verify → handoff → user approval gate |
### When to fire `librarian` (external grep) vs `explore` (internal grep) ### When to fire `librarian` (external grep) vs `explore` (internal grep)
@@ -333,6 +334,10 @@ instructions: |
### Execution lifecycle (one step at a time) ### Execution lifecycle (one step at a time)
**Default: delegate the whole step to `step-runner`** — a graph agent that enforces the step protocol as graph edges (orient → staleness check → coder → verify → edge-case sweep → optional independent review → validated handoff → user approval gate): `agent__spawn --agent step-runner --prompt "Execute step <N> of the plan at <plans_dir>"`. It returns `STEP_COMPLETE` / `STEP_BLOCKED` / `STEP_REJECTED` / `STEP_FAILED`. Relay its escalations (deviation gate, approval gate) promptly. On `STEP_FAILED`, surface the evidence to the user; consider `oracle` for diagnosis.
Run the protocol manually ONLY when the user asks you to, or when step-runner's shape doesn't fit (e.g. a docs-only step with nothing to build). Then:
1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos. 1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos.
2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval. 2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval.
3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it. 3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it.
+93
View File
@@ -0,0 +1,93 @@
# Step-Runner
A graph-based agent that executes **one step** of a phased implementation
plan, with the step protocol from the `step-implementation` skill enforced
as graph edges rather than prose. Designed to be delegated to by
**[Sisyphus](../sisyphus/README.md)**; delegates implementation to
**[Coder](../coder/README.md)** and independent review to
**[code-reviewer](../code-reviewer/README.md)**.
It expects a plan repo authored per the `plan-authoring` skill:
```
plans/
steps/NN-<slug>.md # step plans with frontmatter (step/title/depends_on/status)
handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
NOTES.md # rolling durable facts
```
## Workflow
```
resolve_step (script) locate plan + previous handoff, check depends_on,
↓ mark plan in-progress [→ gate_blocked if deps unsatisfied]
orient (llm, read-only) merge handoff directives + staleness-check the plan
route_staleness (script) major deviation → gate_deviation (approval)
implement (agent → coder) coder runs its own build/test/self-review fix-loop
route_coder_result (script) COMPLETE → verify | REJECTED / FAILED → end
verify_format_lint (script) format BEFORE evidence, then lint
verify_build (script) step-level build/typecheck
verify_tests (script) FULL test suite
↓ [failures → fix_loop_gate, back-edge to implement]
edge_case_sweep (llm) missed edge cases; annotate downstream plans
↓ (Edge cases sections ONLY - scope changes become proposals)
route_sweep (script) 5+ files or architectural boundary → independent_review
independent_review (agent) code-reviewer; 🔴 findings loop back to implement (bounded)
write_handoff (llm) evidence-backed handoff per handoff-protocol + NOTES.md
check_handoff (script) deterministic schema gate; marks plan status complete
gate_user_review (approval) HARD STOP - approve, or send revision comments
↓ (revisions loop through implement → verify → handoff again)
end_success / end_blocked / end_rejected / end_failure
```
End nodes emit sentinel outcomes for the caller:
- `STEP_COMPLETE` — step implemented, verified, handoff written, user approved.
- `STEP_BLOCKED``depends_on` unsatisfied and the user declined to proceed.
- `STEP_REJECTED` — user aborted at the deviation gate, or the coder's plan
was rejected at its approval gate.
- `STEP_FAILED` — coder failed, the step-level fix budget was exhausted, or
the handoff failed validation twice.
## Usage
```sh
# From the project root: run the next in-progress/pending step
coyote -a step-runner "Execute the next step"
# A specific step (also parsed from the prompt: "execute step 3")
coyote -a step-runner --agent-variable step 3 "Execute step 3"
# Plan repo somewhere else
coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"
```
**Invoke from the project root.** The coder sub-agent resolves its own
`project_dir` from the invocation directory; overriding `project_dir` here
does not propagate to the spawned coder.
## Tuning
`graph.yaml` `initial_state` exposes:
- `max_fix_attempts` (default `2`) — step-level fix budget (the coder has
its own internal budget of 3).
- `max_review_attempts` (default `1`) — bounded 🔴-finding fix loops after
independent review.
Environment overrides honored by the script nodes:
- `FORMAT_CMD` / `LINT_CMD` — formatting and linting (otherwise a per-type
heuristic formats, and linting defers to the build/check command).
- `BUILD_CMD` / `TEST_CMD` — skip project-type detection (same as coder).
- `STEP_AUTOAPPROVE=1` — bypass the deviation gate (non-interactive runs).
- `STEP_SKIP_REVIEW=1` — never spawn the independent reviewer.
The final user approval gate is never bypassed by an environment variable -
it is the point of the workflow.
+599
View File
@@ -0,0 +1,599 @@
name: step-runner
description: |
Executes ONE step of a phased implementation plan (plans/ repo) with the
step protocol enforced as graph edges: orient -> staleness check ->
implement (coder) -> verify -> edge-case sweep -> optional independent
review -> evidence-backed handoff -> user approval gate. Designed to be
delegated to by sisyphus.
version: "1.0"
global_tools:
- fs_cat.sh
- fs_ls.sh
- fs_write.sh
- fs_patch.sh
- execute_command.sh
skills_enabled: true
enabled_skills:
- step-implementation
- handoff-protocol
- code-review
- ai-slop-remover
variables:
- name: project_dir
description: |
Absolute path to the project directory. Defaults to "." (the directory
coyote was invoked from). The coder sub-agent resolves its own
project_dir the same way, so invoke step-runner FROM the project root
unless you override this for both.
default: "."
- name: plans_dir
description: |
Path to the plan repo. Relative paths resolve against project_dir.
Expected layout: <plans_dir>/steps/NN-<slug>.md,
<plans_dir>/handoffs/, <plans_dir>/NOTES.md.
default: "plans"
- name: step
description: |
Which step to execute: a step number, or "next" to pick the first
in-progress (resume) or pending step plan.
default: "next"
settings:
max_loop_iterations: 20
log_state_snapshots: true
validate_before_run: true
timeout: 7200
initial_state:
project_dir: ""
plans_dir: ""
step_number: 0
step_slug: ""
step_title: ""
step_plan_path: ""
step_plan: ""
prev_handoff_path: "(none)"
prev_handoff: "(none - this is the first step)"
notes_path: ""
notes: "(none)"
handoff_path: ""
blocking_reason: ""
plan_summary: ""
implementation_brief: ""
staleness_report: ""
has_major_deviation: false
deviation_summary: ""
user_feedback: ""
fix_instructions: ""
fix_attempts: 0
max_fix_attempts: 2
coder_result: ""
format_output: ""
lint_ok: true
lint_output: ""
build_ok: true
build_output: ""
tests_ok: true
tests_output: ""
edge_case_report: ""
downstream_updates: ""
needs_independent_review: false
review_report: ""
review_attempts: 0
max_review_attempts: 1
handoff_attempts: 0
handoff_fix: ""
step_summary: ""
start: resolve_step
nodes:
resolve_step:
id: resolve_step
type: script
description: |
Locate the step plan, previous handoff, and NOTES.md; parse frontmatter;
check depends_on satisfaction against existing handoffs; mark the plan
in-progress. Routes to gate_blocked when dependencies are unsatisfied.
script: scripts/resolve_step.sh
timeout: 30
fallback: end_failure
next: orient
gate_blocked:
id: gate_blocked
type: approval
description: Escalate unsatisfied dependencies instead of building on missing ground.
question: |
Step {{step_number}} ({{step_title}}) is BLOCKED:
{{blocking_reason}}
Proceed anyway?
options:
- "yes"
- "no"
routes:
"yes": orient
"no": end_blocked
on_other: end_blocked
orient:
id: orient
type: llm
description: |
Read-only orientation and staleness check: merge the previous handoff's
directives with the step plan, then verify the plan's assumptions
against the CURRENT codebase before any edit.
skills_enabled: true
enabled_skills:
- step-implementation
instructions: |
You are orienting for one step of a phased implementation plan. Load
`step-implementation` and apply its Orient and Staleness-check phases.
You are READ-ONLY in this node: no edits, no fixes.
1. Read the previous handoff (below). Note directives aimed at this
step, deviations that changed the codebase, and bare assertions
that need re-verification.
2. Staleness-check the step plan against the code at {{project_dir}}:
grep the symbols it references (via execute_command), read its
Context snippets at their claimed locations with fs_cat, confirm
its Test commands exist.
3. Classify discrepancies per the skill's deviation table: minor
(mechanics differ; correct silently in the brief) vs major (scope,
approach, interfaces, or a later step's assumptions affected).
Produce `implementation_brief`: the corrected, self-contained marching
orders for the implementer - plan tasks in order, handoff directives
applied, minor staleness corrections folded in, acceptance criteria
restated. The implementer sees ONLY the step plan plus your brief.
prompt: |
## Step plan ({{step_plan_path}})
{{step_plan}}
## Previous handoff ({{prev_handoff_path}})
{{prev_handoff}}
## Rolling project notes
{{notes}}
tools:
- fs_cat
- fs_ls
- execute_command
max_iterations: 20
output_schema:
type: object
properties:
plan_summary:
type: string
description: 1-3 sentences summarizing what this step delivers
implementation_brief:
type: string
description: Corrected, self-contained instructions for the implementer
staleness_report:
type: string
description: Findings from checking plan assumptions against current code; "clean" if none
has_major_deviation:
type: boolean
description: True when a discrepancy changes scope, approach, or interfaces
deviation_summary:
type: string
description: Major deviations only, with the plan claim vs current reality. Empty when none
required: [plan_summary, implementation_brief, staleness_report, has_major_deviation, deviation_summary]
fallback: end_failure
next: route_staleness
route_staleness:
id: route_staleness
type: script
description: Major deviation -> user gate; otherwise straight to implement.
script: scripts/route_staleness.sh
timeout: 5
fallback: implement
gate_deviation:
id: gate_deviation
type: approval
description: Major deviations are never silently absorbed - the user decides.
question: |
Step {{step_number}} ({{step_title}}): the plan no longer matches the
codebase in a way that changes scope or approach.
{{deviation_summary}}
Staleness report:
{{staleness_report}}
Proceed with the corrected brief? (Answer with anything else to give
your own guidance to the implementer.)
options:
- "proceed"
- "abort"
routes:
"proceed": implement
"abort": end_rejected
on_other: implement
state_updates:
user_feedback: "{{choice}}"
implement:
id: implement
type: agent
description: |
Delegate implementation to the coder graph agent, which runs its own
plan -> implement -> build -> tests -> self-review fix-loop internally.
agent: coder
prompt: |
## TASK
Execute step {{step_number}} ({{step_title}}) of a phased implementation
plan for the project at {{project_dir}}.
## EXPECTED OUTCOME
Every task in the step plan below is implemented and its acceptance
criteria are met. Tests are derived from the Acceptance criteria
section (not from the implementation). Build and full test suite pass.
## MUST DO
- Follow the Orientation brief below - it supersedes the raw plan where
they disagree (it folds in corrections from the staleness check).
- Match the patterns pasted in the step plan's Context section.
- Derive tests from the plan's Acceptance criteria.
## MUST NOT DO
- Do not touch anything listed in the plan's Out of scope section.
- Do not modify files under {{plans_dir}}.
- Do not implement work belonging to other steps.
## CONTEXT
### Step plan
{{step_plan}}
### Orientation brief (handoff directives + staleness corrections applied)
{{implementation_brief}}
### User guidance (if any)
{{user_feedback}}
### Fix loop status (empty on first attempt)
{{fix_instructions}}
timeout: 3600
state_updates:
coder_result: "{{output}}"
next: route_coder_result
route_coder_result:
id: route_coder_result
type: script
description: Route on the coder sentinel - COMPLETE verifies, REJECTED/FAILED terminate.
script: scripts/route_coder_result.sh
timeout: 5
fallback: end_failure
verify_format_lint:
id: verify_format_lint
type: script
description: |
Format BEFORE evidence collection (FORMAT_CMD override or per-type
heuristic), then lint (LINT_CMD, when configured). Lint failure routes
to the fix loop.
script: scripts/verify_format_lint.sh
timeout: 300
fallback: fix_loop_gate
verify_build:
id: verify_build
type: script
description: Step-level build/typecheck evidence, collected AFTER formatting.
script: scripts/verify_build.sh
timeout: 600
fallback: fix_loop_gate
verify_tests:
id: verify_tests
type: script
description: FULL test suite - regressions in untouched code fail the step too.
script: scripts/verify_tests.sh
timeout: 1200
fallback: fix_loop_gate
fix_loop_gate:
id: fix_loop_gate
type: script
description: |
Step-level fix budget (the coder already ran its own internal fix
loop). Loops to implement with fix_instructions, or ends as failure.
script: scripts/fix_loop_gate.sh
timeout: 5
fallback: end_failure
edge_case_sweep:
id: edge_case_sweep
type: llm
description: |
Post-implementation sweep: missed spots, edge cases, downstream plan
implications. May annotate downstream plans' Edge cases sections
(annotate vs propose per handoff-protocol). Also judges whether the
change warrants an independent review pass.
skills_enabled: true
enabled_skills:
- step-implementation
- handoff-protocol
instructions: |
The implementation for this step just passed build and tests. Load
`step-implementation` (edge-case sweep phase) and `handoff-protocol`
(annotate-vs-propose rules), then:
1. Read the changed code (the coder result below names the files).
Look for edge cases the plan missed: empty inputs, error paths,
concurrency, partial failure, compat.
2. For each edge case belonging to a LATER step: check that step's
plan under {{plans_dir}}/steps/. If its Edge cases section already
covers it, done. If not, append an entry to that section via
fs_patch - touch NOTHING else in the file.
3. NEVER edit a later plan's Objective, Tasks, Acceptance criteria,
or Out of scope. Scope-affecting changes become proposed diffs in
`downstream_updates` instead.
4. Set needs_independent_review=true when the change touched 5+ files
or crosses architectural boundaries (auth, public APIs, schema,
security-sensitive paths).
Be terse. Findings, not prose.
prompt: |
## Coder result
{{coder_result}}
## Step plan
{{step_plan}}
## Staleness report from orientation
{{staleness_report}}
tools:
- fs_cat
- fs_ls
- fs_patch
- execute_command
max_iterations: 20
output_schema:
type: object
properties:
edge_case_report:
type: string
description: Edge cases discovered - both handled and punted, one per line. "none" if empty
downstream_updates:
type: string
description: Annotations made (plan file + section) and proposed diffs for scope-affecting changes. "none" if empty
needs_independent_review:
type: boolean
required: [edge_case_report, downstream_updates, needs_independent_review]
fallback: write_handoff
next: route_sweep
route_sweep:
id: route_sweep
type: script
description: Broad or boundary-crossing changes get an independent reviewer.
script: scripts/route_sweep.sh
timeout: 5
fallback: write_handoff
independent_review:
id: independent_review
type: agent
description: Independent review pass - the author's self-review cannot catch its own rationalizations.
agent: code-reviewer
prompt: |
Review the changes produced for step {{step_number}} ({{step_title}})
of a phased implementation plan in {{project_dir}}.
What the step was supposed to do:
{{plan_summary}}
Coder summary (names the modified/created files):
{{coder_result}}
Review the changed files against the step plan's acceptance criteria.
Preserve severity tags in your findings.
timeout: 1200
state_updates:
review_report: "{{output}}"
next: route_review
route_review:
id: route_review
type: script
description: Critical findings loop back to implement (bounded); otherwise proceed to handoff.
script: scripts/route_review.sh
timeout: 5
fallback: write_handoff
write_handoff:
id: write_handoff
type: llm
description: |
Write the evidence-backed handoff per handoff-protocol and append
durable facts to NOTES.md. The completion gate (check_handoff)
verifies the document afterward.
skills_enabled: true
enabled_skills:
- handoff-protocol
- ai-slop-remover
instructions: |
Load `handoff-protocol` and follow its writer schema EXACTLY: the
frontmatter (step, title, result) and all eight sections, writing
"None" rather than omitting a section.
Write the handoff to {{handoff_path}} with fs_write. Paste the
verification evidence below verbatim into the Evidence section -
commands, exit codes, decisive output lines. Deviations come from the
staleness report, gate decisions, and fix loop history. Downstream
plan updates come from the sweep results.
Then append durable, step-independent facts (if any) to {{notes_path}}
- create the file if missing, never rewrite existing entries.
If "Gate feedback" below is non-empty, a previous handoff attempt
failed validation - fix exactly what it lists.
prompt: |
## Step
{{step_number}} ({{step_title}}) - plan at {{step_plan_path}}
## Plan summary
{{plan_summary}}
## Coder result
{{coder_result}}
## Staleness report / deviations
{{staleness_report}}
Major deviation summary (if any): {{deviation_summary}}
User guidance given (if any): {{user_feedback}}
Fix loop attempts used: {{fix_attempts}} of {{max_fix_attempts}}
## Edge cases discovered
{{edge_case_report}}
## Downstream plan updates
{{downstream_updates}}
## Independent review report (if any)
{{review_report}}
## Verification evidence (paste verbatim)
### Format
{{format_output}}
### Lint
{{lint_output}}
### Build
{{build_output}}
### Tests
{{tests_output}}
## Gate feedback
{{handoff_fix}}
tools:
- fs_cat
- fs_ls
- fs_write
- fs_patch
max_iterations: 15
output_schema:
type: object
properties:
step_summary:
type: string
description: 3-6 sentence summary of the step for the user's approval decision - what was done, deviations, anything needing their attention
required: [step_summary]
fallback: end_failure
next: check_handoff
check_handoff:
id: check_handoff
type: script
description: |
Deterministic completion gate - handoff exists with frontmatter and all
required sections. On success, marks the step plan status complete.
One retry back to write_handoff, then failure.
script: scripts/check_handoff.sh
timeout: 10
fallback: end_failure
gate_user_review:
id: gate_user_review
type: approval
description: The hard stop - the next step never starts without explicit approval.
question: |
## Step {{step_number}} ({{step_title}}) - ready for review
{{step_summary}}
Handoff: {{handoff_path}}
Build: {{build_ok}} | Tests: {{tests_ok}} | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
Approve this step? (Answer with anything else to send revision
instructions straight to the implementer.)
options:
- "approve"
- "revise"
routes:
"approve": end_success
"revise": get_revision
on_other: revise_from_choice
state_updates:
user_feedback: "{{choice}}"
get_revision:
id: get_revision
type: input
description: Collect revision instructions, then loop back through implement -> verify -> handoff.
question: "What should change? Your comments go to the implementer verbatim."
validation: "len(input) > 0"
state_updates:
fix_instructions: "{{input}}"
next: implement
revise_from_choice:
id: revise_from_choice
type: script
description: Free-form approval answers are treated as revision instructions.
script: scripts/revise_from_choice.sh
timeout: 5
fallback: get_revision
end_success:
id: end_success
type: end
output: |
STEP_COMPLETE
Step: {{step_number}} ({{step_title}})
Plan: {{step_plan_path}}
Handoff: {{handoff_path}}
Build: passed | Tests: passed | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
{{step_summary}}
Downstream plan updates:
{{downstream_updates}}
end_blocked:
id: end_blocked
type: end
output: |
STEP_BLOCKED
Step: {{step_number}} ({{step_title}})
Reason:
{{blocking_reason}}
end_rejected:
id: end_rejected
type: end
output: |
STEP_REJECTED
Step: {{step_number}} ({{step_title}})
Rejected at: deviation gate or coder approval gate.
Deviation summary:
{{deviation_summary}}
Coder result (if it ran):
{{coder_result}}
end_failure:
id: end_failure
type: end
output: |
STEP_FAILED
Step: {{step_number}} ({{step_title}})
Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
Blocking reason (if resolution failed): {{blocking_reason}}
Coder result:
{{coder_result}}
Last build output:
{{build_output}}
Last tests output:
{{tests_output}}
+54
View File
@@ -0,0 +1,54 @@
#!/usr/bin/env bash
set -uo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
handoff_path=$(echo "$state" | jq -r '.handoff_path // ""')
step_plan_path=$(echo "$state" | jq -r '.step_plan_path // ""')
handoff_attempts=$(echo "$state" | jq -r '.handoff_attempts // 0')
problems=""
if [[ ! -f "$handoff_path" ]]; then
problems="- handoff file does not exist at $handoff_path"$'\n'
else
content=$(cat "$handoff_path")
grep -qE '^result:[[:space:]]*(complete|partial|blocked)' <<< "$content" \
|| problems+="- frontmatter is missing 'result: complete|partial|blocked'"$'\n'
for section in "Summary" "Completed" "Not completed" "Deviations" "Downstream plan updates" "Edge cases discovered" "Evidence" "Notes for next step"; do
grep -qE "^##[[:space:]]+${section}" <<< "$content" \
|| problems+="- missing required section: ## ${section}"$'\n'
done
fi
if [[ -z "$problems" ]]; then
if [[ -f "$step_plan_path" ]]; then
tmp=$(mktemp)
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: complete"; next} {print}' "$step_plan_path" > "$tmp" && mv "$tmp" "$step_plan_path"
fi
jq -nc '{"handoff_fix": "", "_next": "gate_user_review"}'
exit 0
fi
if (( handoff_attempts >= 1 )); then
jq -nc \
--arg br "Handoff failed validation twice. Problems:
$problems" \
'{"blocking_reason": $br, "_next": "end_failure"}'
exit 0
fi
jq -nc \
--arg hf "The previous handoff attempt failed validation. Fix exactly these problems:
$problems" \
'{
"handoff_attempts": 1,
"handoff_fix": $hf,
"_next": "write_handoff"
}'
+60
View File
@@ -0,0 +1,60 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
fix_attempts=$(echo "$state" | jq -r '.fix_attempts // 0')
max_fix_attempts=$(echo "$state" | jq -r '.max_fix_attempts // 2')
lint_ok=$(echo "$state" | jq -r '.lint_ok | if . == null then "true" else (. | tostring) end')
build_ok=$(echo "$state" | jq -r '.build_ok | if . == null then "true" else (. | tostring) end')
tests_ok=$(echo "$state" | jq -r '.tests_ok | if . == null then "true" else (. | tostring) end')
lint_output=$(echo "$state" | jq -r '.lint_output // ""')
build_output=$(echo "$state" | jq -r '.build_output // ""')
tests_output=$(echo "$state" | jq -r '.tests_output // ""')
if (( fix_attempts >= max_fix_attempts )); then
jq -nc \
--argjson n "$fix_attempts" \
'{
"fix_attempts": $n,
"_next": "end_failure"
}'
exit 0
fi
next_attempts=$((fix_attempts + 1))
if [[ "$lint_ok" != "true" ]]; then
stage="lint"
output="$lint_output"
elif [[ "$build_ok" != "true" ]]; then
stage="build"
output="$build_output"
elif [[ "$tests_ok" != "true" ]]; then
stage="full test suite"
output="$tests_output"
else
stage="verification"
output="fix_loop_gate was reached but no failing stage was recorded. Re-run verification."
fi
fix_instructions=$(printf '## Fix loop status (step-level attempt %d of %d)\n\nThe implementation passed the coder'"'"'s internal checks but failed step-level verification at the %s stage.\n\nOutput:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor. Regressions in untouched code caused by this change are in scope.' \
"$next_attempts" "$max_fix_attempts" "$stage" "$output")
jq -nc \
--argjson n "$next_attempts" \
--arg 'fi' "$fix_instructions" \
'{
"fix_attempts": $n,
"fix_instructions": $fi,
"lint_ok": true,
"build_ok": true,
"tests_ok": true,
"_next": "implement"
}'
+152
View File
@@ -0,0 +1,152 @@
#!/usr/bin/env bash
set -uo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
fail() {
jq -nc --arg r "$1" '{"blocking_reason": $r, "_next": "end_failure"}'
exit 0
}
project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
project_dir=$(cd "$project_dir" 2>/dev/null && pwd) || fail "project_dir does not exist: $project_dir"
plans_dir="${LLM_AGENT_VAR_PLANS_DIR:-plans}"
[[ "$plans_dir" != /* ]] && plans_dir="$project_dir/$plans_dir"
steps_dir="$plans_dir/steps"
handoffs_dir="$plans_dir/handoffs"
notes_path="$plans_dir/NOTES.md"
[[ -d "$steps_dir" ]] || fail "No step plans directory at $steps_dir (expected <plans_dir>/steps/NN-<slug>.md)"
frontmatter() {
awk '/^---[[:space:]]*$/{n++; next} n==1{print} n>=2{exit}' "$1"
}
fm_value() {
echo "$1" | grep -E "^$2:" | head -1 | sed -E "s/^$2:[[:space:]]*//" | sed -E 's/^["'"'"']|["'"'"']$//g'
}
step="${LLM_AGENT_VAR_STEP:-next}"
if [[ "$step" == "next" ]]; then
prompt_step=$(echo "$state" | jq -r '.initial_prompt // ""' | grep -oiE 'step[[:space:]#:]*[0-9]+' | head -1 | grep -oE '[0-9]+' || true)
[[ -n "$prompt_step" ]] && step="$prompt_step"
fi
plan_file=""
if [[ "$step" == "next" ]]; then
first_pending=""
while IFS= read -r f; do
st=$(fm_value "$(frontmatter "$f")" "status")
if [[ "$st" == "in-progress" ]]; then
plan_file="$f"
break
fi
[[ -z "$first_pending" && ( "$st" == "pending" || -z "$st" ) ]] && first_pending="$f"
done < <(find "$steps_dir" -maxdepth 1 -name '*.md' | sort)
[[ -z "$plan_file" ]] && plan_file="$first_pending"
[[ -z "$plan_file" ]] && fail "No in-progress or pending step plans in $steps_dir"
else
[[ "$step" =~ ^[0-9]+$ ]] || fail "step must be a number or 'next'; got: $step"
padded=$(printf '%02d' "$((10#$step))")
plan_file=$(find "$steps_dir" -maxdepth 1 \( -name "${padded}-*.md" -o -name "${step}-*.md" \) | sort | head -1)
[[ -n "$plan_file" ]] || fail "No step plan matching step $step in $steps_dir"
fi
bn=$(basename "$plan_file" .md)
num_part="${bn%%-*}"
[[ "$num_part" =~ ^[0-9]+$ ]] || fail "Step plan filename must start with a number: $bn"
step_number=$((10#$num_part))
step_slug="${bn#*-}"
fm=$(frontmatter "$plan_file")
step_title=$(fm_value "$fm" "title")
[[ -z "$step_title" ]] && step_title="$step_slug"
deps=$(echo "$fm" | awk '/^depends_on:/{f=1; print; next} f && /^[[:space:]]*-/{print; next} f{exit}' | grep -oE '[0-9]+' || true)
unsatisfied=""
for dep in $deps; do
dep_padded=$(printf '%02d' "$((10#$dep))")
dep_handoff=$(find "$handoffs_dir" -maxdepth 1 \( -name "${dep_padded}-*.md" -o -name "${dep}-*.md" \) 2>/dev/null | sort | head -1)
if [[ -z "$dep_handoff" ]]; then
unsatisfied+="- step $dep: no handoff found (step not executed?)"$'\n'
continue
fi
dep_result=$(fm_value "$(frontmatter "$dep_handoff")" "result")
if [[ "$dep_result" != "complete" ]]; then
unsatisfied+="- step $dep: handoff result is '$dep_result' (not complete): $dep_handoff"$'\n'
fi
done
prev_handoff_path="(none)"
prev_handoff="(none - this is the first step)"
prev_file=""
prev_num=0
while IFS= read -r h; do
hn="${h##*/}"
hn="${hn%%-*}"
[[ "$hn" =~ ^[0-9]+$ ]] || continue
n=$((10#$hn))
if (( n < step_number && n >= prev_num )); then
prev_num=$n
prev_file="$h"
fi
done < <(find "$handoffs_dir" -maxdepth 1 -name '*.md' 2>/dev/null | sort)
if [[ -n "$prev_file" ]]; then
prev_handoff_path="$prev_file"
prev_handoff=$(head -c 16000 "$prev_file")
fi
notes="(none)"
[[ -f "$notes_path" ]] && notes=$(head -c 8000 "$notes_path")
step_plan=$(head -c 24000 "$plan_file")
handoff_path="$handoffs_dir/$(basename "$plan_file")"
tmp=$(mktemp)
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: in-progress"; next} {print}' "$plan_file" > "$tmp" && mv "$tmp" "$plan_file"
next_node="orient"
blocking_reason=""
if [[ -n "$unsatisfied" ]]; then
next_node="gate_blocked"
blocking_reason="Unsatisfied dependencies:"$'\n'"$unsatisfied"
fi
jq -nc \
--arg pd "$project_dir" \
--arg pl "$plans_dir" \
--argjson sn "$step_number" \
--arg ss "$step_slug" \
--arg st "$step_title" \
--arg spp "$plan_file" \
--arg sp "$step_plan" \
--arg php "$prev_handoff_path" \
--arg ph "$prev_handoff" \
--arg np "$notes_path" \
--arg no "$notes" \
--arg hp "$handoff_path" \
--arg br "$blocking_reason" \
--arg nx "$next_node" \
'{
"project_dir": $pd,
"plans_dir": $pl,
"step_number": $sn,
"step_slug": $ss,
"step_title": $st,
"step_plan_path": $spp,
"step_plan": $sp,
"prev_handoff_path": $php,
"prev_handoff": $ph,
"notes_path": $np,
"notes": $no,
"handoff_path": $hp,
"blocking_reason": $br,
"_next": $nx
}'
+27
View File
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
feedback=$(echo "$state" | jq -r '.user_feedback // ""')
if [[ -z "$feedback" ]]; then
jq -nc '{"_next": "get_revision"}'
exit 0
fi
fix_instructions=$(printf '## Revision requested by the user at the step approval gate\n\nAddress these comments with minimal edits, then the step re-verifies and the handoff is rewritten:\n\n%s' \
"$feedback")
jq -nc \
--arg 'fi' "$fix_instructions" \
'{
"fix_instructions": $fi,
"_next": "implement"
}'
+27
View File
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
coder_result=$(echo "$state" | jq -r '.coder_result // ""')
case "$coder_result" in
*CODER_COMPLETE*)
jq -nc '{"_next": "verify_format_lint"}'
;;
*CODER_REJECTED*)
jq -nc '{"_next": "end_rejected"}'
;;
*CODER_FAILED*)
jq -nc '{"blocking_reason": "coder fix-loop exhausted; see coder result", "_next": "end_failure"}'
;;
*)
jq -nc '{"blocking_reason": "coder returned no recognizable sentinel (expected CODER_COMPLETE / CODER_REJECTED / CODER_FAILED)", "_next": "end_failure"}'
;;
esac
+38
View File
@@ -0,0 +1,38 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
review_report=$(echo "$state" | jq -r '.review_report // ""')
review_attempts=$(echo "$state" | jq -r '.review_attempts // 0')
max_review_attempts=$(echo "$state" | jq -r '.max_review_attempts // 1')
if ! grep -qF "🔴" <<< "$review_report"; then
jq -nc '{"_next": "write_handoff"}'
exit 0
fi
if (( review_attempts >= max_review_attempts )); then
jq -nc '{"_next": "write_handoff"}'
exit 0
fi
next_review=$((review_attempts + 1))
fix_instructions=$(printf '## Independent review findings (attempt %d of %d)\n\nAn independent reviewer flagged CRITICAL (🔴) findings. Address ONLY the 🔴 findings with minimal edits. Do not refactor unrelated code.\n\n%s' \
"$next_review" "$max_review_attempts" "$review_report")
jq -nc \
--argjson n "$next_review" \
--arg 'fi' "$fix_instructions" \
'{
"review_attempts": $n,
"fix_instructions": $fi,
"needs_independent_review": false,
"_next": "implement"
}'
+23
View File
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
has_major=$(echo "$state" | jq -r '.has_major_deviation // false')
if [[ "${STEP_AUTOAPPROVE:-0}" == "1" ]]; then
jq -nc '{"_next": "implement"}'
exit 0
fi
if [[ "$has_major" == "true" ]]; then
jq -nc '{"_next": "gate_deviation"}'
else
jq -nc '{"_next": "implement"}'
fi
+23
View File
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
needs_review=$(echo "$state" | jq -r '.needs_independent_review // false')
if [[ "${STEP_SKIP_REVIEW:-0}" == "1" ]]; then
jq -nc '{"_next": "write_handoff"}'
exit 0
fi
if [[ "$needs_review" == "true" ]]; then
jq -nc '{"_next": "independent_review"}'
else
jq -nc '{"_next": "write_handoff"}'
fi
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
set -uo pipefail
# shellcheck disable=SC1091
source "$(dirname "$0")/../../.shared/utils.sh"
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
if [[ -n "${BUILD_CMD:-}" ]]; then
cmd="$BUILD_CMD"
else
project_info=$(detect_project "$project_dir")
cmd=$(echo "$project_info" | jq -r '.check // .build // ""')
fi
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
jq -nc '{
"build_ok": true,
"build_output": "(no build/check command available for this project type)",
"_next": "verify_tests"
}'
exit 0
fi
exit_code=0
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
if (( exit_code == 0 )); then
jq -nc \
--arg out "Ran: $cmd
$output" \
'{
"build_ok": true,
"build_output": $out,
"_next": "verify_tests"
}'
else
jq -nc \
--arg out "Ran: $cmd
Exit code: $exit_code
$output" \
'{
"build_ok": false,
"build_output": $out,
"_next": "fix_loop_gate"
}'
fi
+79
View File
@@ -0,0 +1,79 @@
#!/usr/bin/env bash
set -uo pipefail
# shellcheck disable=SC1091
source "$(dirname "$0")/../../.shared/utils.sh"
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
project_type=$(detect_project "$project_dir" | jq -r '.type // "unknown"')
format_cmd="${FORMAT_CMD:-}"
if [[ -z "$format_cmd" ]]; then
case "$project_type" in
rust) format_cmd="cargo fmt" ;;
go) format_cmd="gofmt -w ." ;;
python) command -v ruff &>/dev/null && format_cmd="ruff format ." ;;
esac
fi
if [[ -z "$format_cmd" ]]; then
format_output="(no format command configured for project type '$project_type'; skipped. Set FORMAT_CMD to enable.)"
else
fmt_rc=0
fmt_out=$(cd "$project_dir" && eval "$format_cmd" 2>&1) || fmt_rc=$?
format_output="Ran: $format_cmd
Exit code: $fmt_rc
$fmt_out"
fi
lint_cmd="${LINT_CMD:-}"
if [[ -z "$lint_cmd" ]]; then
jq -nc \
--arg fo "$format_output" \
'{
"format_output": $fo,
"lint_ok": true,
"lint_output": "(no LINT_CMD configured; linting is covered by the build/check command)",
"_next": "verify_build"
}'
exit 0
fi
lint_rc=0
lint_out=$(cd "$project_dir" && eval "$lint_cmd" 2>&1) || lint_rc=$?
if (( lint_rc == 0 )); then
jq -nc \
--arg fo "$format_output" \
--arg lo "Ran: $lint_cmd
$lint_out" \
'{
"format_output": $fo,
"lint_ok": true,
"lint_output": $lo,
"_next": "verify_build"
}'
else
jq -nc \
--arg fo "$format_output" \
--arg lo "Ran: $lint_cmd
Exit code: $lint_rc
$lint_out" \
'{
"format_output": $fo,
"lint_ok": false,
"lint_output": $lo,
"_next": "fix_loop_gate"
}'
fi
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
set -uo pipefail
# shellcheck disable=SC1091
source "$(dirname "$0")/../../.shared/utils.sh"
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
if [[ -n "${TEST_CMD:-}" ]]; then
cmd="$TEST_CMD"
else
project_info=$(detect_project "$project_dir")
cmd=$(echo "$project_info" | jq -r '.test // ""')
fi
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
jq -nc '{
"tests_ok": true,
"tests_output": "(no test command available for this project type)",
"_next": "edge_case_sweep"
}'
exit 0
fi
exit_code=0
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
if (( exit_code == 0 )); then
jq -nc \
--arg out "Ran: $cmd
$output" \
'{
"tests_ok": true,
"tests_output": $out,
"_next": "edge_case_sweep"
}'
else
jq -nc \
--arg out "Ran: $cmd
Exit code: $exit_code
$output" \
'{
"tests_ok": false,
"tests_output": $out,
"_next": "fix_loop_gate"
}'
fi
+39
View File
@@ -5116,6 +5116,45 @@ mod tests {
assert!(paths::skill_file("frontend-ui-ux").exists()); assert!(paths::skill_file("frontend-ui-ux").exists());
} }
#[test]
#[serial]
fn bundled_graph_agents_parse_and_validate() {
use crate::graph::GraphParser;
use crate::graph::validator::GraphValidator;
let _guard = TestConfigDirGuard::new();
Agent::install_builtin_agents(false).unwrap();
Skill::install_builtin_skills(false).unwrap();
let mut checked = Vec::new();
for entry in std::fs::read_dir(paths::agents_data_dir()).unwrap() {
let dir = entry.unwrap().path();
let graph_path = dir.join("graph.yaml");
if !graph_path.exists() {
continue;
}
let name = dir.file_name().unwrap().to_string_lossy().to_string();
let graph = GraphParser::new(&dir)
.load_from_file(&graph_path)
.unwrap_or_else(|e| panic!("graph.yaml for '{name}' failed to parse: {e}"));
let result = GraphValidator::new(&dir).validate(&graph);
assert!(
result.errors.is_empty(),
"graph.yaml for '{name}' failed validation: {:#?}",
result.errors
);
checked.push(name);
}
checked.sort();
for expected in ["coder", "librarian", "step-runner"] {
assert!(
checked.iter().any(|n| n == expected),
"expected bundled graph agent '{expected}' to be checked; found {checked:?}"
);
}
}
#[test] #[test]
#[serial] #[serial]
fn install_functions_force_preserves_user_mcp_json() { fn install_functions_force_preserves_user_mcp_json() {