feat: Created the step-runner graph agent for more deterministic coding workflows to produce even more reliable and higher-quality results
This commit is contained in:
@@ -132,6 +132,7 @@ instructions: |
|
|||||||
| `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
|
| `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
|
||||||
| `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
|
| `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
|
||||||
| `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results |
|
| `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results |
|
||||||
|
| `step-runner` | Execute ONE step of a phased plan repo (Phase 8) | Graph agent: orient → staleness check → coder → verify → handoff → user approval gate |
|
||||||
|
|
||||||
### When to fire `librarian` (external grep) vs `explore` (internal grep)
|
### When to fire `librarian` (external grep) vs `explore` (internal grep)
|
||||||
|
|
||||||
@@ -333,6 +334,10 @@ instructions: |
|
|||||||
|
|
||||||
### Execution lifecycle (one step at a time)
|
### Execution lifecycle (one step at a time)
|
||||||
|
|
||||||
|
**Default: delegate the whole step to `step-runner`** — a graph agent that enforces the step protocol as graph edges (orient → staleness check → coder → verify → edge-case sweep → optional independent review → validated handoff → user approval gate): `agent__spawn --agent step-runner --prompt "Execute step <N> of the plan at <plans_dir>"`. It returns `STEP_COMPLETE` / `STEP_BLOCKED` / `STEP_REJECTED` / `STEP_FAILED`. Relay its escalations (deviation gate, approval gate) promptly. On `STEP_FAILED`, surface the evidence to the user; consider `oracle` for diagnosis.
|
||||||
|
|
||||||
|
Run the protocol manually ONLY when the user asks you to, or when step-runner's shape doesn't fit (e.g. a docs-only step with nothing to build). Then:
|
||||||
|
|
||||||
1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos.
|
1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos.
|
||||||
2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval.
|
2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval.
|
||||||
3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it.
|
3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it.
|
||||||
|
|||||||
@@ -0,0 +1,93 @@
|
|||||||
|
# Step-Runner
|
||||||
|
|
||||||
|
A graph-based agent that executes **one step** of a phased implementation
|
||||||
|
plan, with the step protocol from the `step-implementation` skill enforced
|
||||||
|
as graph edges rather than prose. Designed to be delegated to by
|
||||||
|
**[Sisyphus](../sisyphus/README.md)**; delegates implementation to
|
||||||
|
**[Coder](../coder/README.md)** and independent review to
|
||||||
|
**[code-reviewer](../code-reviewer/README.md)**.
|
||||||
|
|
||||||
|
It expects a plan repo authored per the `plan-authoring` skill:
|
||||||
|
|
||||||
|
```
|
||||||
|
plans/
|
||||||
|
steps/NN-<slug>.md # step plans with frontmatter (step/title/depends_on/status)
|
||||||
|
handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
|
||||||
|
NOTES.md # rolling durable facts
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
```
|
||||||
|
resolve_step (script) locate plan + previous handoff, check depends_on,
|
||||||
|
↓ mark plan in-progress [→ gate_blocked if deps unsatisfied]
|
||||||
|
orient (llm, read-only) merge handoff directives + staleness-check the plan
|
||||||
|
↓
|
||||||
|
route_staleness (script) major deviation → gate_deviation (approval)
|
||||||
|
↓
|
||||||
|
implement (agent → coder) coder runs its own build/test/self-review fix-loop
|
||||||
|
↓
|
||||||
|
route_coder_result (script) COMPLETE → verify | REJECTED / FAILED → end
|
||||||
|
↓
|
||||||
|
verify_format_lint (script) format BEFORE evidence, then lint
|
||||||
|
verify_build (script) step-level build/typecheck
|
||||||
|
verify_tests (script) FULL test suite
|
||||||
|
↓ [failures → fix_loop_gate, back-edge to implement]
|
||||||
|
edge_case_sweep (llm) missed edge cases; annotate downstream plans
|
||||||
|
↓ (Edge cases sections ONLY - scope changes become proposals)
|
||||||
|
route_sweep (script) 5+ files or architectural boundary → independent_review
|
||||||
|
independent_review (agent) code-reviewer; 🔴 findings loop back to implement (bounded)
|
||||||
|
↓
|
||||||
|
write_handoff (llm) evidence-backed handoff per handoff-protocol + NOTES.md
|
||||||
|
check_handoff (script) deterministic schema gate; marks plan status complete
|
||||||
|
↓
|
||||||
|
gate_user_review (approval) HARD STOP - approve, or send revision comments
|
||||||
|
↓ (revisions loop through implement → verify → handoff again)
|
||||||
|
end_success / end_blocked / end_rejected / end_failure
|
||||||
|
```
|
||||||
|
|
||||||
|
End nodes emit sentinel outcomes for the caller:
|
||||||
|
|
||||||
|
- `STEP_COMPLETE` — step implemented, verified, handoff written, user approved.
|
||||||
|
- `STEP_BLOCKED` — `depends_on` unsatisfied and the user declined to proceed.
|
||||||
|
- `STEP_REJECTED` — user aborted at the deviation gate, or the coder's plan
|
||||||
|
was rejected at its approval gate.
|
||||||
|
- `STEP_FAILED` — coder failed, the step-level fix budget was exhausted, or
|
||||||
|
the handoff failed validation twice.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# From the project root: run the next in-progress/pending step
|
||||||
|
coyote -a step-runner "Execute the next step"
|
||||||
|
|
||||||
|
# A specific step (also parsed from the prompt: "execute step 3")
|
||||||
|
coyote -a step-runner --agent-variable step 3 "Execute step 3"
|
||||||
|
|
||||||
|
# Plan repo somewhere else
|
||||||
|
coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Invoke from the project root.** The coder sub-agent resolves its own
|
||||||
|
`project_dir` from the invocation directory; overriding `project_dir` here
|
||||||
|
does not propagate to the spawned coder.
|
||||||
|
|
||||||
|
## Tuning
|
||||||
|
|
||||||
|
`graph.yaml` `initial_state` exposes:
|
||||||
|
|
||||||
|
- `max_fix_attempts` (default `2`) — step-level fix budget (the coder has
|
||||||
|
its own internal budget of 3).
|
||||||
|
- `max_review_attempts` (default `1`) — bounded 🔴-finding fix loops after
|
||||||
|
independent review.
|
||||||
|
|
||||||
|
Environment overrides honored by the script nodes:
|
||||||
|
|
||||||
|
- `FORMAT_CMD` / `LINT_CMD` — formatting and linting (otherwise a per-type
|
||||||
|
heuristic formats, and linting defers to the build/check command).
|
||||||
|
- `BUILD_CMD` / `TEST_CMD` — skip project-type detection (same as coder).
|
||||||
|
- `STEP_AUTOAPPROVE=1` — bypass the deviation gate (non-interactive runs).
|
||||||
|
- `STEP_SKIP_REVIEW=1` — never spawn the independent reviewer.
|
||||||
|
|
||||||
|
The final user approval gate is never bypassed by an environment variable -
|
||||||
|
it is the point of the workflow.
|
||||||
@@ -0,0 +1,599 @@
|
|||||||
|
name: step-runner
|
||||||
|
description: |
|
||||||
|
Executes ONE step of a phased implementation plan (plans/ repo) with the
|
||||||
|
step protocol enforced as graph edges: orient -> staleness check ->
|
||||||
|
implement (coder) -> verify -> edge-case sweep -> optional independent
|
||||||
|
review -> evidence-backed handoff -> user approval gate. Designed to be
|
||||||
|
delegated to by sisyphus.
|
||||||
|
version: "1.0"
|
||||||
|
|
||||||
|
global_tools:
|
||||||
|
- fs_cat.sh
|
||||||
|
- fs_ls.sh
|
||||||
|
- fs_write.sh
|
||||||
|
- fs_patch.sh
|
||||||
|
- execute_command.sh
|
||||||
|
|
||||||
|
skills_enabled: true
|
||||||
|
enabled_skills:
|
||||||
|
- step-implementation
|
||||||
|
- handoff-protocol
|
||||||
|
- code-review
|
||||||
|
- ai-slop-remover
|
||||||
|
|
||||||
|
variables:
|
||||||
|
- name: project_dir
|
||||||
|
description: |
|
||||||
|
Absolute path to the project directory. Defaults to "." (the directory
|
||||||
|
coyote was invoked from). The coder sub-agent resolves its own
|
||||||
|
project_dir the same way, so invoke step-runner FROM the project root
|
||||||
|
unless you override this for both.
|
||||||
|
default: "."
|
||||||
|
- name: plans_dir
|
||||||
|
description: |
|
||||||
|
Path to the plan repo. Relative paths resolve against project_dir.
|
||||||
|
Expected layout: <plans_dir>/steps/NN-<slug>.md,
|
||||||
|
<plans_dir>/handoffs/, <plans_dir>/NOTES.md.
|
||||||
|
default: "plans"
|
||||||
|
- name: step
|
||||||
|
description: |
|
||||||
|
Which step to execute: a step number, or "next" to pick the first
|
||||||
|
in-progress (resume) or pending step plan.
|
||||||
|
default: "next"
|
||||||
|
|
||||||
|
settings:
|
||||||
|
max_loop_iterations: 20
|
||||||
|
log_state_snapshots: true
|
||||||
|
validate_before_run: true
|
||||||
|
timeout: 7200
|
||||||
|
|
||||||
|
initial_state:
|
||||||
|
project_dir: ""
|
||||||
|
plans_dir: ""
|
||||||
|
step_number: 0
|
||||||
|
step_slug: ""
|
||||||
|
step_title: ""
|
||||||
|
step_plan_path: ""
|
||||||
|
step_plan: ""
|
||||||
|
prev_handoff_path: "(none)"
|
||||||
|
prev_handoff: "(none - this is the first step)"
|
||||||
|
notes_path: ""
|
||||||
|
notes: "(none)"
|
||||||
|
handoff_path: ""
|
||||||
|
blocking_reason: ""
|
||||||
|
plan_summary: ""
|
||||||
|
implementation_brief: ""
|
||||||
|
staleness_report: ""
|
||||||
|
has_major_deviation: false
|
||||||
|
deviation_summary: ""
|
||||||
|
user_feedback: ""
|
||||||
|
fix_instructions: ""
|
||||||
|
fix_attempts: 0
|
||||||
|
max_fix_attempts: 2
|
||||||
|
coder_result: ""
|
||||||
|
format_output: ""
|
||||||
|
lint_ok: true
|
||||||
|
lint_output: ""
|
||||||
|
build_ok: true
|
||||||
|
build_output: ""
|
||||||
|
tests_ok: true
|
||||||
|
tests_output: ""
|
||||||
|
edge_case_report: ""
|
||||||
|
downstream_updates: ""
|
||||||
|
needs_independent_review: false
|
||||||
|
review_report: ""
|
||||||
|
review_attempts: 0
|
||||||
|
max_review_attempts: 1
|
||||||
|
handoff_attempts: 0
|
||||||
|
handoff_fix: ""
|
||||||
|
step_summary: ""
|
||||||
|
|
||||||
|
start: resolve_step
|
||||||
|
|
||||||
|
nodes:
|
||||||
|
resolve_step:
|
||||||
|
id: resolve_step
|
||||||
|
type: script
|
||||||
|
description: |
|
||||||
|
Locate the step plan, previous handoff, and NOTES.md; parse frontmatter;
|
||||||
|
check depends_on satisfaction against existing handoffs; mark the plan
|
||||||
|
in-progress. Routes to gate_blocked when dependencies are unsatisfied.
|
||||||
|
script: scripts/resolve_step.sh
|
||||||
|
timeout: 30
|
||||||
|
fallback: end_failure
|
||||||
|
next: orient
|
||||||
|
|
||||||
|
gate_blocked:
|
||||||
|
id: gate_blocked
|
||||||
|
type: approval
|
||||||
|
description: Escalate unsatisfied dependencies instead of building on missing ground.
|
||||||
|
question: |
|
||||||
|
Step {{step_number}} ({{step_title}}) is BLOCKED:
|
||||||
|
|
||||||
|
{{blocking_reason}}
|
||||||
|
|
||||||
|
Proceed anyway?
|
||||||
|
options:
|
||||||
|
- "yes"
|
||||||
|
- "no"
|
||||||
|
routes:
|
||||||
|
"yes": orient
|
||||||
|
"no": end_blocked
|
||||||
|
on_other: end_blocked
|
||||||
|
|
||||||
|
orient:
|
||||||
|
id: orient
|
||||||
|
type: llm
|
||||||
|
description: |
|
||||||
|
Read-only orientation and staleness check: merge the previous handoff's
|
||||||
|
directives with the step plan, then verify the plan's assumptions
|
||||||
|
against the CURRENT codebase before any edit.
|
||||||
|
skills_enabled: true
|
||||||
|
enabled_skills:
|
||||||
|
- step-implementation
|
||||||
|
instructions: |
|
||||||
|
You are orienting for one step of a phased implementation plan. Load
|
||||||
|
`step-implementation` and apply its Orient and Staleness-check phases.
|
||||||
|
You are READ-ONLY in this node: no edits, no fixes.
|
||||||
|
|
||||||
|
1. Read the previous handoff (below). Note directives aimed at this
|
||||||
|
step, deviations that changed the codebase, and bare assertions
|
||||||
|
that need re-verification.
|
||||||
|
2. Staleness-check the step plan against the code at {{project_dir}}:
|
||||||
|
grep the symbols it references (via execute_command), read its
|
||||||
|
Context snippets at their claimed locations with fs_cat, confirm
|
||||||
|
its Test commands exist.
|
||||||
|
3. Classify discrepancies per the skill's deviation table: minor
|
||||||
|
(mechanics differ; correct silently in the brief) vs major (scope,
|
||||||
|
approach, interfaces, or a later step's assumptions affected).
|
||||||
|
|
||||||
|
Produce `implementation_brief`: the corrected, self-contained marching
|
||||||
|
orders for the implementer - plan tasks in order, handoff directives
|
||||||
|
applied, minor staleness corrections folded in, acceptance criteria
|
||||||
|
restated. The implementer sees ONLY the step plan plus your brief.
|
||||||
|
prompt: |
|
||||||
|
## Step plan ({{step_plan_path}})
|
||||||
|
{{step_plan}}
|
||||||
|
|
||||||
|
## Previous handoff ({{prev_handoff_path}})
|
||||||
|
{{prev_handoff}}
|
||||||
|
|
||||||
|
## Rolling project notes
|
||||||
|
{{notes}}
|
||||||
|
tools:
|
||||||
|
- fs_cat
|
||||||
|
- fs_ls
|
||||||
|
- execute_command
|
||||||
|
max_iterations: 20
|
||||||
|
output_schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
plan_summary:
|
||||||
|
type: string
|
||||||
|
description: 1-3 sentences summarizing what this step delivers
|
||||||
|
implementation_brief:
|
||||||
|
type: string
|
||||||
|
description: Corrected, self-contained instructions for the implementer
|
||||||
|
staleness_report:
|
||||||
|
type: string
|
||||||
|
description: Findings from checking plan assumptions against current code; "clean" if none
|
||||||
|
has_major_deviation:
|
||||||
|
type: boolean
|
||||||
|
description: True when a discrepancy changes scope, approach, or interfaces
|
||||||
|
deviation_summary:
|
||||||
|
type: string
|
||||||
|
description: Major deviations only, with the plan claim vs current reality. Empty when none
|
||||||
|
required: [plan_summary, implementation_brief, staleness_report, has_major_deviation, deviation_summary]
|
||||||
|
fallback: end_failure
|
||||||
|
next: route_staleness
|
||||||
|
|
||||||
|
route_staleness:
|
||||||
|
id: route_staleness
|
||||||
|
type: script
|
||||||
|
description: Major deviation -> user gate; otherwise straight to implement.
|
||||||
|
script: scripts/route_staleness.sh
|
||||||
|
timeout: 5
|
||||||
|
fallback: implement
|
||||||
|
|
||||||
|
gate_deviation:
|
||||||
|
id: gate_deviation
|
||||||
|
type: approval
|
||||||
|
description: Major deviations are never silently absorbed - the user decides.
|
||||||
|
question: |
|
||||||
|
Step {{step_number}} ({{step_title}}): the plan no longer matches the
|
||||||
|
codebase in a way that changes scope or approach.
|
||||||
|
|
||||||
|
{{deviation_summary}}
|
||||||
|
|
||||||
|
Staleness report:
|
||||||
|
{{staleness_report}}
|
||||||
|
|
||||||
|
Proceed with the corrected brief? (Answer with anything else to give
|
||||||
|
your own guidance to the implementer.)
|
||||||
|
options:
|
||||||
|
- "proceed"
|
||||||
|
- "abort"
|
||||||
|
routes:
|
||||||
|
"proceed": implement
|
||||||
|
"abort": end_rejected
|
||||||
|
on_other: implement
|
||||||
|
state_updates:
|
||||||
|
user_feedback: "{{choice}}"
|
||||||
|
|
||||||
|
implement:
|
||||||
|
id: implement
|
||||||
|
type: agent
|
||||||
|
description: |
|
||||||
|
Delegate implementation to the coder graph agent, which runs its own
|
||||||
|
plan -> implement -> build -> tests -> self-review fix-loop internally.
|
||||||
|
agent: coder
|
||||||
|
prompt: |
|
||||||
|
## TASK
|
||||||
|
Execute step {{step_number}} ({{step_title}}) of a phased implementation
|
||||||
|
plan for the project at {{project_dir}}.
|
||||||
|
|
||||||
|
## EXPECTED OUTCOME
|
||||||
|
Every task in the step plan below is implemented and its acceptance
|
||||||
|
criteria are met. Tests are derived from the Acceptance criteria
|
||||||
|
section (not from the implementation). Build and full test suite pass.
|
||||||
|
|
||||||
|
## MUST DO
|
||||||
|
- Follow the Orientation brief below - it supersedes the raw plan where
|
||||||
|
they disagree (it folds in corrections from the staleness check).
|
||||||
|
- Match the patterns pasted in the step plan's Context section.
|
||||||
|
- Derive tests from the plan's Acceptance criteria.
|
||||||
|
|
||||||
|
## MUST NOT DO
|
||||||
|
- Do not touch anything listed in the plan's Out of scope section.
|
||||||
|
- Do not modify files under {{plans_dir}}.
|
||||||
|
- Do not implement work belonging to other steps.
|
||||||
|
|
||||||
|
## CONTEXT
|
||||||
|
### Step plan
|
||||||
|
{{step_plan}}
|
||||||
|
|
||||||
|
### Orientation brief (handoff directives + staleness corrections applied)
|
||||||
|
{{implementation_brief}}
|
||||||
|
|
||||||
|
### User guidance (if any)
|
||||||
|
{{user_feedback}}
|
||||||
|
|
||||||
|
### Fix loop status (empty on first attempt)
|
||||||
|
{{fix_instructions}}
|
||||||
|
timeout: 3600
|
||||||
|
state_updates:
|
||||||
|
coder_result: "{{output}}"
|
||||||
|
next: route_coder_result
|
||||||
|
|
||||||
|
route_coder_result:
|
||||||
|
id: route_coder_result
|
||||||
|
type: script
|
||||||
|
description: Route on the coder sentinel - COMPLETE verifies, REJECTED/FAILED terminate.
|
||||||
|
script: scripts/route_coder_result.sh
|
||||||
|
timeout: 5
|
||||||
|
fallback: end_failure
|
||||||
|
|
||||||
|
verify_format_lint:
|
||||||
|
id: verify_format_lint
|
||||||
|
type: script
|
||||||
|
description: |
|
||||||
|
Format BEFORE evidence collection (FORMAT_CMD override or per-type
|
||||||
|
heuristic), then lint (LINT_CMD, when configured). Lint failure routes
|
||||||
|
to the fix loop.
|
||||||
|
script: scripts/verify_format_lint.sh
|
||||||
|
timeout: 300
|
||||||
|
fallback: fix_loop_gate
|
||||||
|
|
||||||
|
verify_build:
|
||||||
|
id: verify_build
|
||||||
|
type: script
|
||||||
|
description: Step-level build/typecheck evidence, collected AFTER formatting.
|
||||||
|
script: scripts/verify_build.sh
|
||||||
|
timeout: 600
|
||||||
|
fallback: fix_loop_gate
|
||||||
|
|
||||||
|
verify_tests:
|
||||||
|
id: verify_tests
|
||||||
|
type: script
|
||||||
|
description: FULL test suite - regressions in untouched code fail the step too.
|
||||||
|
script: scripts/verify_tests.sh
|
||||||
|
timeout: 1200
|
||||||
|
fallback: fix_loop_gate
|
||||||
|
|
||||||
|
fix_loop_gate:
|
||||||
|
id: fix_loop_gate
|
||||||
|
type: script
|
||||||
|
description: |
|
||||||
|
Step-level fix budget (the coder already ran its own internal fix
|
||||||
|
loop). Loops to implement with fix_instructions, or ends as failure.
|
||||||
|
script: scripts/fix_loop_gate.sh
|
||||||
|
timeout: 5
|
||||||
|
fallback: end_failure
|
||||||
|
|
||||||
|
edge_case_sweep:
|
||||||
|
id: edge_case_sweep
|
||||||
|
type: llm
|
||||||
|
description: |
|
||||||
|
Post-implementation sweep: missed spots, edge cases, downstream plan
|
||||||
|
implications. May annotate downstream plans' Edge cases sections
|
||||||
|
(annotate vs propose per handoff-protocol). Also judges whether the
|
||||||
|
change warrants an independent review pass.
|
||||||
|
skills_enabled: true
|
||||||
|
enabled_skills:
|
||||||
|
- step-implementation
|
||||||
|
- handoff-protocol
|
||||||
|
instructions: |
|
||||||
|
The implementation for this step just passed build and tests. Load
|
||||||
|
`step-implementation` (edge-case sweep phase) and `handoff-protocol`
|
||||||
|
(annotate-vs-propose rules), then:
|
||||||
|
|
||||||
|
1. Read the changed code (the coder result below names the files).
|
||||||
|
Look for edge cases the plan missed: empty inputs, error paths,
|
||||||
|
concurrency, partial failure, compat.
|
||||||
|
2. For each edge case belonging to a LATER step: check that step's
|
||||||
|
plan under {{plans_dir}}/steps/. If its Edge cases section already
|
||||||
|
covers it, done. If not, append an entry to that section via
|
||||||
|
fs_patch - touch NOTHING else in the file.
|
||||||
|
3. NEVER edit a later plan's Objective, Tasks, Acceptance criteria,
|
||||||
|
or Out of scope. Scope-affecting changes become proposed diffs in
|
||||||
|
`downstream_updates` instead.
|
||||||
|
4. Set needs_independent_review=true when the change touched 5+ files
|
||||||
|
or crosses architectural boundaries (auth, public APIs, schema,
|
||||||
|
security-sensitive paths).
|
||||||
|
|
||||||
|
Be terse. Findings, not prose.
|
||||||
|
prompt: |
|
||||||
|
## Coder result
|
||||||
|
{{coder_result}}
|
||||||
|
|
||||||
|
## Step plan
|
||||||
|
{{step_plan}}
|
||||||
|
|
||||||
|
## Staleness report from orientation
|
||||||
|
{{staleness_report}}
|
||||||
|
tools:
|
||||||
|
- fs_cat
|
||||||
|
- fs_ls
|
||||||
|
- fs_patch
|
||||||
|
- execute_command
|
||||||
|
max_iterations: 20
|
||||||
|
output_schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
edge_case_report:
|
||||||
|
type: string
|
||||||
|
description: Edge cases discovered - both handled and punted, one per line. "none" if empty
|
||||||
|
downstream_updates:
|
||||||
|
type: string
|
||||||
|
description: Annotations made (plan file + section) and proposed diffs for scope-affecting changes. "none" if empty
|
||||||
|
needs_independent_review:
|
||||||
|
type: boolean
|
||||||
|
required: [edge_case_report, downstream_updates, needs_independent_review]
|
||||||
|
fallback: write_handoff
|
||||||
|
next: route_sweep
|
||||||
|
|
||||||
|
route_sweep:
|
||||||
|
id: route_sweep
|
||||||
|
type: script
|
||||||
|
description: Broad or boundary-crossing changes get an independent reviewer.
|
||||||
|
script: scripts/route_sweep.sh
|
||||||
|
timeout: 5
|
||||||
|
fallback: write_handoff
|
||||||
|
|
||||||
|
independent_review:
|
||||||
|
id: independent_review
|
||||||
|
type: agent
|
||||||
|
description: Independent review pass - the author's self-review cannot catch its own rationalizations.
|
||||||
|
agent: code-reviewer
|
||||||
|
prompt: |
|
||||||
|
Review the changes produced for step {{step_number}} ({{step_title}})
|
||||||
|
of a phased implementation plan in {{project_dir}}.
|
||||||
|
|
||||||
|
What the step was supposed to do:
|
||||||
|
{{plan_summary}}
|
||||||
|
|
||||||
|
Coder summary (names the modified/created files):
|
||||||
|
{{coder_result}}
|
||||||
|
|
||||||
|
Review the changed files against the step plan's acceptance criteria.
|
||||||
|
Preserve severity tags in your findings.
|
||||||
|
timeout: 1200
|
||||||
|
state_updates:
|
||||||
|
review_report: "{{output}}"
|
||||||
|
next: route_review
|
||||||
|
|
||||||
|
route_review:
|
||||||
|
id: route_review
|
||||||
|
type: script
|
||||||
|
description: Critical findings loop back to implement (bounded); otherwise proceed to handoff.
|
||||||
|
script: scripts/route_review.sh
|
||||||
|
timeout: 5
|
||||||
|
fallback: write_handoff
|
||||||
|
|
||||||
|
write_handoff:
|
||||||
|
id: write_handoff
|
||||||
|
type: llm
|
||||||
|
description: |
|
||||||
|
Write the evidence-backed handoff per handoff-protocol and append
|
||||||
|
durable facts to NOTES.md. The completion gate (check_handoff)
|
||||||
|
verifies the document afterward.
|
||||||
|
skills_enabled: true
|
||||||
|
enabled_skills:
|
||||||
|
- handoff-protocol
|
||||||
|
- ai-slop-remover
|
||||||
|
instructions: |
|
||||||
|
Load `handoff-protocol` and follow its writer schema EXACTLY: the
|
||||||
|
frontmatter (step, title, result) and all eight sections, writing
|
||||||
|
"None" rather than omitting a section.
|
||||||
|
|
||||||
|
Write the handoff to {{handoff_path}} with fs_write. Paste the
|
||||||
|
verification evidence below verbatim into the Evidence section -
|
||||||
|
commands, exit codes, decisive output lines. Deviations come from the
|
||||||
|
staleness report, gate decisions, and fix loop history. Downstream
|
||||||
|
plan updates come from the sweep results.
|
||||||
|
|
||||||
|
Then append durable, step-independent facts (if any) to {{notes_path}}
|
||||||
|
- create the file if missing, never rewrite existing entries.
|
||||||
|
|
||||||
|
If "Gate feedback" below is non-empty, a previous handoff attempt
|
||||||
|
failed validation - fix exactly what it lists.
|
||||||
|
prompt: |
|
||||||
|
## Step
|
||||||
|
{{step_number}} ({{step_title}}) - plan at {{step_plan_path}}
|
||||||
|
|
||||||
|
## Plan summary
|
||||||
|
{{plan_summary}}
|
||||||
|
|
||||||
|
## Coder result
|
||||||
|
{{coder_result}}
|
||||||
|
|
||||||
|
## Staleness report / deviations
|
||||||
|
{{staleness_report}}
|
||||||
|
|
||||||
|
Major deviation summary (if any): {{deviation_summary}}
|
||||||
|
User guidance given (if any): {{user_feedback}}
|
||||||
|
Fix loop attempts used: {{fix_attempts}} of {{max_fix_attempts}}
|
||||||
|
|
||||||
|
## Edge cases discovered
|
||||||
|
{{edge_case_report}}
|
||||||
|
|
||||||
|
## Downstream plan updates
|
||||||
|
{{downstream_updates}}
|
||||||
|
|
||||||
|
## Independent review report (if any)
|
||||||
|
{{review_report}}
|
||||||
|
|
||||||
|
## Verification evidence (paste verbatim)
|
||||||
|
### Format
|
||||||
|
{{format_output}}
|
||||||
|
### Lint
|
||||||
|
{{lint_output}}
|
||||||
|
### Build
|
||||||
|
{{build_output}}
|
||||||
|
### Tests
|
||||||
|
{{tests_output}}
|
||||||
|
|
||||||
|
## Gate feedback
|
||||||
|
{{handoff_fix}}
|
||||||
|
tools:
|
||||||
|
- fs_cat
|
||||||
|
- fs_ls
|
||||||
|
- fs_write
|
||||||
|
- fs_patch
|
||||||
|
max_iterations: 15
|
||||||
|
output_schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
step_summary:
|
||||||
|
type: string
|
||||||
|
description: 3-6 sentence summary of the step for the user's approval decision - what was done, deviations, anything needing their attention
|
||||||
|
required: [step_summary]
|
||||||
|
fallback: end_failure
|
||||||
|
next: check_handoff
|
||||||
|
|
||||||
|
check_handoff:
|
||||||
|
id: check_handoff
|
||||||
|
type: script
|
||||||
|
description: |
|
||||||
|
Deterministic completion gate - handoff exists with frontmatter and all
|
||||||
|
required sections. On success, marks the step plan status complete.
|
||||||
|
One retry back to write_handoff, then failure.
|
||||||
|
script: scripts/check_handoff.sh
|
||||||
|
timeout: 10
|
||||||
|
fallback: end_failure
|
||||||
|
|
||||||
|
gate_user_review:
|
||||||
|
id: gate_user_review
|
||||||
|
type: approval
|
||||||
|
description: The hard stop - the next step never starts without explicit approval.
|
||||||
|
question: |
|
||||||
|
## Step {{step_number}} ({{step_title}}) - ready for review
|
||||||
|
|
||||||
|
{{step_summary}}
|
||||||
|
|
||||||
|
Handoff: {{handoff_path}}
|
||||||
|
Build: {{build_ok}} | Tests: {{tests_ok}} | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
|
||||||
|
|
||||||
|
Approve this step? (Answer with anything else to send revision
|
||||||
|
instructions straight to the implementer.)
|
||||||
|
options:
|
||||||
|
- "approve"
|
||||||
|
- "revise"
|
||||||
|
routes:
|
||||||
|
"approve": end_success
|
||||||
|
"revise": get_revision
|
||||||
|
on_other: revise_from_choice
|
||||||
|
state_updates:
|
||||||
|
user_feedback: "{{choice}}"
|
||||||
|
|
||||||
|
get_revision:
|
||||||
|
id: get_revision
|
||||||
|
type: input
|
||||||
|
description: Collect revision instructions, then loop back through implement -> verify -> handoff.
|
||||||
|
question: "What should change? Your comments go to the implementer verbatim."
|
||||||
|
validation: "len(input) > 0"
|
||||||
|
state_updates:
|
||||||
|
fix_instructions: "{{input}}"
|
||||||
|
next: implement
|
||||||
|
|
||||||
|
revise_from_choice:
|
||||||
|
id: revise_from_choice
|
||||||
|
type: script
|
||||||
|
description: Free-form approval answers are treated as revision instructions.
|
||||||
|
script: scripts/revise_from_choice.sh
|
||||||
|
timeout: 5
|
||||||
|
fallback: get_revision
|
||||||
|
|
||||||
|
end_success:
|
||||||
|
id: end_success
|
||||||
|
type: end
|
||||||
|
output: |
|
||||||
|
STEP_COMPLETE
|
||||||
|
Step: {{step_number}} ({{step_title}})
|
||||||
|
Plan: {{step_plan_path}}
|
||||||
|
Handoff: {{handoff_path}}
|
||||||
|
Build: passed | Tests: passed | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
|
||||||
|
|
||||||
|
{{step_summary}}
|
||||||
|
|
||||||
|
Downstream plan updates:
|
||||||
|
{{downstream_updates}}
|
||||||
|
|
||||||
|
end_blocked:
|
||||||
|
id: end_blocked
|
||||||
|
type: end
|
||||||
|
output: |
|
||||||
|
STEP_BLOCKED
|
||||||
|
Step: {{step_number}} ({{step_title}})
|
||||||
|
Reason:
|
||||||
|
{{blocking_reason}}
|
||||||
|
|
||||||
|
end_rejected:
|
||||||
|
id: end_rejected
|
||||||
|
type: end
|
||||||
|
output: |
|
||||||
|
STEP_REJECTED
|
||||||
|
Step: {{step_number}} ({{step_title}})
|
||||||
|
Rejected at: deviation gate or coder approval gate.
|
||||||
|
Deviation summary:
|
||||||
|
{{deviation_summary}}
|
||||||
|
Coder result (if it ran):
|
||||||
|
{{coder_result}}
|
||||||
|
|
||||||
|
end_failure:
|
||||||
|
id: end_failure
|
||||||
|
type: end
|
||||||
|
output: |
|
||||||
|
STEP_FAILED
|
||||||
|
Step: {{step_number}} ({{step_title}})
|
||||||
|
Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
|
||||||
|
Blocking reason (if resolution failed): {{blocking_reason}}
|
||||||
|
|
||||||
|
Coder result:
|
||||||
|
{{coder_result}}
|
||||||
|
|
||||||
|
Last build output:
|
||||||
|
{{build_output}}
|
||||||
|
|
||||||
|
Last tests output:
|
||||||
|
{{tests_output}}
|
||||||
+54
@@ -0,0 +1,54 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
handoff_path=$(echo "$state" | jq -r '.handoff_path // ""')
|
||||||
|
step_plan_path=$(echo "$state" | jq -r '.step_plan_path // ""')
|
||||||
|
handoff_attempts=$(echo "$state" | jq -r '.handoff_attempts // 0')
|
||||||
|
|
||||||
|
problems=""
|
||||||
|
|
||||||
|
if [[ ! -f "$handoff_path" ]]; then
|
||||||
|
problems="- handoff file does not exist at $handoff_path"$'\n'
|
||||||
|
else
|
||||||
|
content=$(cat "$handoff_path")
|
||||||
|
grep -qE '^result:[[:space:]]*(complete|partial|blocked)' <<< "$content" \
|
||||||
|
|| problems+="- frontmatter is missing 'result: complete|partial|blocked'"$'\n'
|
||||||
|
for section in "Summary" "Completed" "Not completed" "Deviations" "Downstream plan updates" "Edge cases discovered" "Evidence" "Notes for next step"; do
|
||||||
|
grep -qE "^##[[:space:]]+${section}" <<< "$content" \
|
||||||
|
|| problems+="- missing required section: ## ${section}"$'\n'
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "$problems" ]]; then
|
||||||
|
if [[ -f "$step_plan_path" ]]; then
|
||||||
|
tmp=$(mktemp)
|
||||||
|
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: complete"; next} {print}' "$step_plan_path" > "$tmp" && mv "$tmp" "$step_plan_path"
|
||||||
|
fi
|
||||||
|
jq -nc '{"handoff_fix": "", "_next": "gate_user_review"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if (( handoff_attempts >= 1 )); then
|
||||||
|
jq -nc \
|
||||||
|
--arg br "Handoff failed validation twice. Problems:
|
||||||
|
$problems" \
|
||||||
|
'{"blocking_reason": $br, "_next": "end_failure"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
jq -nc \
|
||||||
|
--arg hf "The previous handoff attempt failed validation. Fix exactly these problems:
|
||||||
|
$problems" \
|
||||||
|
'{
|
||||||
|
"handoff_attempts": 1,
|
||||||
|
"handoff_fix": $hf,
|
||||||
|
"_next": "write_handoff"
|
||||||
|
}'
|
||||||
+60
@@ -0,0 +1,60 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
fix_attempts=$(echo "$state" | jq -r '.fix_attempts // 0')
|
||||||
|
max_fix_attempts=$(echo "$state" | jq -r '.max_fix_attempts // 2')
|
||||||
|
lint_ok=$(echo "$state" | jq -r '.lint_ok | if . == null then "true" else (. | tostring) end')
|
||||||
|
build_ok=$(echo "$state" | jq -r '.build_ok | if . == null then "true" else (. | tostring) end')
|
||||||
|
tests_ok=$(echo "$state" | jq -r '.tests_ok | if . == null then "true" else (. | tostring) end')
|
||||||
|
lint_output=$(echo "$state" | jq -r '.lint_output // ""')
|
||||||
|
build_output=$(echo "$state" | jq -r '.build_output // ""')
|
||||||
|
tests_output=$(echo "$state" | jq -r '.tests_output // ""')
|
||||||
|
|
||||||
|
if (( fix_attempts >= max_fix_attempts )); then
|
||||||
|
jq -nc \
|
||||||
|
--argjson n "$fix_attempts" \
|
||||||
|
'{
|
||||||
|
"fix_attempts": $n,
|
||||||
|
"_next": "end_failure"
|
||||||
|
}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
next_attempts=$((fix_attempts + 1))
|
||||||
|
|
||||||
|
if [[ "$lint_ok" != "true" ]]; then
|
||||||
|
stage="lint"
|
||||||
|
output="$lint_output"
|
||||||
|
elif [[ "$build_ok" != "true" ]]; then
|
||||||
|
stage="build"
|
||||||
|
output="$build_output"
|
||||||
|
elif [[ "$tests_ok" != "true" ]]; then
|
||||||
|
stage="full test suite"
|
||||||
|
output="$tests_output"
|
||||||
|
else
|
||||||
|
stage="verification"
|
||||||
|
output="fix_loop_gate was reached but no failing stage was recorded. Re-run verification."
|
||||||
|
fi
|
||||||
|
|
||||||
|
fix_instructions=$(printf '## Fix loop status (step-level attempt %d of %d)\n\nThe implementation passed the coder'"'"'s internal checks but failed step-level verification at the %s stage.\n\nOutput:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor. Regressions in untouched code caused by this change are in scope.' \
|
||||||
|
"$next_attempts" "$max_fix_attempts" "$stage" "$output")
|
||||||
|
|
||||||
|
jq -nc \
|
||||||
|
--argjson n "$next_attempts" \
|
||||||
|
--arg 'fi' "$fix_instructions" \
|
||||||
|
'{
|
||||||
|
"fix_attempts": $n,
|
||||||
|
"fix_instructions": $fi,
|
||||||
|
"lint_ok": true,
|
||||||
|
"build_ok": true,
|
||||||
|
"tests_ok": true,
|
||||||
|
"_next": "implement"
|
||||||
|
}'
|
||||||
+152
@@ -0,0 +1,152 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
fail() {
|
||||||
|
jq -nc --arg r "$1" '{"blocking_reason": $r, "_next": "end_failure"}'
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
|
||||||
|
project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
|
||||||
|
project_dir=$(cd "$project_dir" 2>/dev/null && pwd) || fail "project_dir does not exist: $project_dir"
|
||||||
|
|
||||||
|
plans_dir="${LLM_AGENT_VAR_PLANS_DIR:-plans}"
|
||||||
|
[[ "$plans_dir" != /* ]] && plans_dir="$project_dir/$plans_dir"
|
||||||
|
steps_dir="$plans_dir/steps"
|
||||||
|
handoffs_dir="$plans_dir/handoffs"
|
||||||
|
notes_path="$plans_dir/NOTES.md"
|
||||||
|
|
||||||
|
[[ -d "$steps_dir" ]] || fail "No step plans directory at $steps_dir (expected <plans_dir>/steps/NN-<slug>.md)"
|
||||||
|
|
||||||
|
frontmatter() {
|
||||||
|
awk '/^---[[:space:]]*$/{n++; next} n==1{print} n>=2{exit}' "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
fm_value() {
|
||||||
|
echo "$1" | grep -E "^$2:" | head -1 | sed -E "s/^$2:[[:space:]]*//" | sed -E 's/^["'"'"']|["'"'"']$//g'
|
||||||
|
}
|
||||||
|
|
||||||
|
step="${LLM_AGENT_VAR_STEP:-next}"
|
||||||
|
if [[ "$step" == "next" ]]; then
|
||||||
|
prompt_step=$(echo "$state" | jq -r '.initial_prompt // ""' | grep -oiE 'step[[:space:]#:]*[0-9]+' | head -1 | grep -oE '[0-9]+' || true)
|
||||||
|
[[ -n "$prompt_step" ]] && step="$prompt_step"
|
||||||
|
fi
|
||||||
|
|
||||||
|
plan_file=""
|
||||||
|
if [[ "$step" == "next" ]]; then
|
||||||
|
first_pending=""
|
||||||
|
while IFS= read -r f; do
|
||||||
|
st=$(fm_value "$(frontmatter "$f")" "status")
|
||||||
|
if [[ "$st" == "in-progress" ]]; then
|
||||||
|
plan_file="$f"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
[[ -z "$first_pending" && ( "$st" == "pending" || -z "$st" ) ]] && first_pending="$f"
|
||||||
|
done < <(find "$steps_dir" -maxdepth 1 -name '*.md' | sort)
|
||||||
|
[[ -z "$plan_file" ]] && plan_file="$first_pending"
|
||||||
|
[[ -z "$plan_file" ]] && fail "No in-progress or pending step plans in $steps_dir"
|
||||||
|
else
|
||||||
|
[[ "$step" =~ ^[0-9]+$ ]] || fail "step must be a number or 'next'; got: $step"
|
||||||
|
padded=$(printf '%02d' "$((10#$step))")
|
||||||
|
plan_file=$(find "$steps_dir" -maxdepth 1 \( -name "${padded}-*.md" -o -name "${step}-*.md" \) | sort | head -1)
|
||||||
|
[[ -n "$plan_file" ]] || fail "No step plan matching step $step in $steps_dir"
|
||||||
|
fi
|
||||||
|
|
||||||
|
bn=$(basename "$plan_file" .md)
|
||||||
|
num_part="${bn%%-*}"
|
||||||
|
[[ "$num_part" =~ ^[0-9]+$ ]] || fail "Step plan filename must start with a number: $bn"
|
||||||
|
step_number=$((10#$num_part))
|
||||||
|
step_slug="${bn#*-}"
|
||||||
|
|
||||||
|
fm=$(frontmatter "$plan_file")
|
||||||
|
step_title=$(fm_value "$fm" "title")
|
||||||
|
[[ -z "$step_title" ]] && step_title="$step_slug"
|
||||||
|
|
||||||
|
deps=$(echo "$fm" | awk '/^depends_on:/{f=1; print; next} f && /^[[:space:]]*-/{print; next} f{exit}' | grep -oE '[0-9]+' || true)
|
||||||
|
unsatisfied=""
|
||||||
|
for dep in $deps; do
|
||||||
|
dep_padded=$(printf '%02d' "$((10#$dep))")
|
||||||
|
dep_handoff=$(find "$handoffs_dir" -maxdepth 1 \( -name "${dep_padded}-*.md" -o -name "${dep}-*.md" \) 2>/dev/null | sort | head -1)
|
||||||
|
if [[ -z "$dep_handoff" ]]; then
|
||||||
|
unsatisfied+="- step $dep: no handoff found (step not executed?)"$'\n'
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
dep_result=$(fm_value "$(frontmatter "$dep_handoff")" "result")
|
||||||
|
if [[ "$dep_result" != "complete" ]]; then
|
||||||
|
unsatisfied+="- step $dep: handoff result is '$dep_result' (not complete): $dep_handoff"$'\n'
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
prev_handoff_path="(none)"
|
||||||
|
prev_handoff="(none - this is the first step)"
|
||||||
|
prev_file=""
|
||||||
|
prev_num=0
|
||||||
|
while IFS= read -r h; do
|
||||||
|
hn="${h##*/}"
|
||||||
|
hn="${hn%%-*}"
|
||||||
|
[[ "$hn" =~ ^[0-9]+$ ]] || continue
|
||||||
|
n=$((10#$hn))
|
||||||
|
if (( n < step_number && n >= prev_num )); then
|
||||||
|
prev_num=$n
|
||||||
|
prev_file="$h"
|
||||||
|
fi
|
||||||
|
done < <(find "$handoffs_dir" -maxdepth 1 -name '*.md' 2>/dev/null | sort)
|
||||||
|
if [[ -n "$prev_file" ]]; then
|
||||||
|
prev_handoff_path="$prev_file"
|
||||||
|
prev_handoff=$(head -c 16000 "$prev_file")
|
||||||
|
fi
|
||||||
|
|
||||||
|
notes="(none)"
|
||||||
|
[[ -f "$notes_path" ]] && notes=$(head -c 8000 "$notes_path")
|
||||||
|
|
||||||
|
step_plan=$(head -c 24000 "$plan_file")
|
||||||
|
handoff_path="$handoffs_dir/$(basename "$plan_file")"
|
||||||
|
|
||||||
|
tmp=$(mktemp)
|
||||||
|
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: in-progress"; next} {print}' "$plan_file" > "$tmp" && mv "$tmp" "$plan_file"
|
||||||
|
|
||||||
|
next_node="orient"
|
||||||
|
blocking_reason=""
|
||||||
|
if [[ -n "$unsatisfied" ]]; then
|
||||||
|
next_node="gate_blocked"
|
||||||
|
blocking_reason="Unsatisfied dependencies:"$'\n'"$unsatisfied"
|
||||||
|
fi
|
||||||
|
|
||||||
|
jq -nc \
|
||||||
|
--arg pd "$project_dir" \
|
||||||
|
--arg pl "$plans_dir" \
|
||||||
|
--argjson sn "$step_number" \
|
||||||
|
--arg ss "$step_slug" \
|
||||||
|
--arg st "$step_title" \
|
||||||
|
--arg spp "$plan_file" \
|
||||||
|
--arg sp "$step_plan" \
|
||||||
|
--arg php "$prev_handoff_path" \
|
||||||
|
--arg ph "$prev_handoff" \
|
||||||
|
--arg np "$notes_path" \
|
||||||
|
--arg no "$notes" \
|
||||||
|
--arg hp "$handoff_path" \
|
||||||
|
--arg br "$blocking_reason" \
|
||||||
|
--arg nx "$next_node" \
|
||||||
|
'{
|
||||||
|
"project_dir": $pd,
|
||||||
|
"plans_dir": $pl,
|
||||||
|
"step_number": $sn,
|
||||||
|
"step_slug": $ss,
|
||||||
|
"step_title": $st,
|
||||||
|
"step_plan_path": $spp,
|
||||||
|
"step_plan": $sp,
|
||||||
|
"prev_handoff_path": $php,
|
||||||
|
"prev_handoff": $ph,
|
||||||
|
"notes_path": $np,
|
||||||
|
"notes": $no,
|
||||||
|
"handoff_path": $hp,
|
||||||
|
"blocking_reason": $br,
|
||||||
|
"_next": $nx
|
||||||
|
}'
|
||||||
+27
@@ -0,0 +1,27 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
feedback=$(echo "$state" | jq -r '.user_feedback // ""')
|
||||||
|
|
||||||
|
if [[ -z "$feedback" ]]; then
|
||||||
|
jq -nc '{"_next": "get_revision"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
fix_instructions=$(printf '## Revision requested by the user at the step approval gate\n\nAddress these comments with minimal edits, then the step re-verifies and the handoff is rewritten:\n\n%s' \
|
||||||
|
"$feedback")
|
||||||
|
|
||||||
|
jq -nc \
|
||||||
|
--arg 'fi' "$fix_instructions" \
|
||||||
|
'{
|
||||||
|
"fix_instructions": $fi,
|
||||||
|
"_next": "implement"
|
||||||
|
}'
|
||||||
+27
@@ -0,0 +1,27 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
coder_result=$(echo "$state" | jq -r '.coder_result // ""')
|
||||||
|
|
||||||
|
case "$coder_result" in
|
||||||
|
*CODER_COMPLETE*)
|
||||||
|
jq -nc '{"_next": "verify_format_lint"}'
|
||||||
|
;;
|
||||||
|
*CODER_REJECTED*)
|
||||||
|
jq -nc '{"_next": "end_rejected"}'
|
||||||
|
;;
|
||||||
|
*CODER_FAILED*)
|
||||||
|
jq -nc '{"blocking_reason": "coder fix-loop exhausted; see coder result", "_next": "end_failure"}'
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
jq -nc '{"blocking_reason": "coder returned no recognizable sentinel (expected CODER_COMPLETE / CODER_REJECTED / CODER_FAILED)", "_next": "end_failure"}'
|
||||||
|
;;
|
||||||
|
esac
|
||||||
+38
@@ -0,0 +1,38 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
review_report=$(echo "$state" | jq -r '.review_report // ""')
|
||||||
|
review_attempts=$(echo "$state" | jq -r '.review_attempts // 0')
|
||||||
|
max_review_attempts=$(echo "$state" | jq -r '.max_review_attempts // 1')
|
||||||
|
|
||||||
|
if ! grep -qF "🔴" <<< "$review_report"; then
|
||||||
|
jq -nc '{"_next": "write_handoff"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if (( review_attempts >= max_review_attempts )); then
|
||||||
|
jq -nc '{"_next": "write_handoff"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
next_review=$((review_attempts + 1))
|
||||||
|
fix_instructions=$(printf '## Independent review findings (attempt %d of %d)\n\nAn independent reviewer flagged CRITICAL (🔴) findings. Address ONLY the 🔴 findings with minimal edits. Do not refactor unrelated code.\n\n%s' \
|
||||||
|
"$next_review" "$max_review_attempts" "$review_report")
|
||||||
|
|
||||||
|
jq -nc \
|
||||||
|
--argjson n "$next_review" \
|
||||||
|
--arg 'fi' "$fix_instructions" \
|
||||||
|
'{
|
||||||
|
"review_attempts": $n,
|
||||||
|
"fix_instructions": $fi,
|
||||||
|
"needs_independent_review": false,
|
||||||
|
"_next": "implement"
|
||||||
|
}'
|
||||||
+23
@@ -0,0 +1,23 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
has_major=$(echo "$state" | jq -r '.has_major_deviation // false')
|
||||||
|
|
||||||
|
if [[ "${STEP_AUTOAPPROVE:-0}" == "1" ]]; then
|
||||||
|
jq -nc '{"_next": "implement"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$has_major" == "true" ]]; then
|
||||||
|
jq -nc '{"_next": "gate_deviation"}'
|
||||||
|
else
|
||||||
|
jq -nc '{"_next": "implement"}'
|
||||||
|
fi
|
||||||
+23
@@ -0,0 +1,23 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
needs_review=$(echo "$state" | jq -r '.needs_independent_review // false')
|
||||||
|
|
||||||
|
if [[ "${STEP_SKIP_REVIEW:-0}" == "1" ]]; then
|
||||||
|
jq -nc '{"_next": "write_handoff"}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$needs_review" == "true" ]]; then
|
||||||
|
jq -nc '{"_next": "independent_review"}'
|
||||||
|
else
|
||||||
|
jq -nc '{"_next": "write_handoff"}'
|
||||||
|
fi
|
||||||
+57
@@ -0,0 +1,57 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
# shellcheck disable=SC1091
|
||||||
|
source "$(dirname "$0")/../../.shared/utils.sh"
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
|
||||||
|
|
||||||
|
if [[ -n "${BUILD_CMD:-}" ]]; then
|
||||||
|
cmd="$BUILD_CMD"
|
||||||
|
else
|
||||||
|
project_info=$(detect_project "$project_dir")
|
||||||
|
cmd=$(echo "$project_info" | jq -r '.check // .build // ""')
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
|
||||||
|
jq -nc '{
|
||||||
|
"build_ok": true,
|
||||||
|
"build_output": "(no build/check command available for this project type)",
|
||||||
|
"_next": "verify_tests"
|
||||||
|
}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
exit_code=0
|
||||||
|
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
|
||||||
|
|
||||||
|
if (( exit_code == 0 )); then
|
||||||
|
jq -nc \
|
||||||
|
--arg out "Ran: $cmd
|
||||||
|
|
||||||
|
$output" \
|
||||||
|
'{
|
||||||
|
"build_ok": true,
|
||||||
|
"build_output": $out,
|
||||||
|
"_next": "verify_tests"
|
||||||
|
}'
|
||||||
|
else
|
||||||
|
jq -nc \
|
||||||
|
--arg out "Ran: $cmd
|
||||||
|
Exit code: $exit_code
|
||||||
|
|
||||||
|
$output" \
|
||||||
|
'{
|
||||||
|
"build_ok": false,
|
||||||
|
"build_output": $out,
|
||||||
|
"_next": "fix_loop_gate"
|
||||||
|
}'
|
||||||
|
fi
|
||||||
+79
@@ -0,0 +1,79 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
# shellcheck disable=SC1091
|
||||||
|
source "$(dirname "$0")/../../.shared/utils.sh"
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
|
||||||
|
project_type=$(detect_project "$project_dir" | jq -r '.type // "unknown"')
|
||||||
|
|
||||||
|
format_cmd="${FORMAT_CMD:-}"
|
||||||
|
if [[ -z "$format_cmd" ]]; then
|
||||||
|
case "$project_type" in
|
||||||
|
rust) format_cmd="cargo fmt" ;;
|
||||||
|
go) format_cmd="gofmt -w ." ;;
|
||||||
|
python) command -v ruff &>/dev/null && format_cmd="ruff format ." ;;
|
||||||
|
esac
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "$format_cmd" ]]; then
|
||||||
|
format_output="(no format command configured for project type '$project_type'; skipped. Set FORMAT_CMD to enable.)"
|
||||||
|
else
|
||||||
|
fmt_rc=0
|
||||||
|
fmt_out=$(cd "$project_dir" && eval "$format_cmd" 2>&1) || fmt_rc=$?
|
||||||
|
format_output="Ran: $format_cmd
|
||||||
|
Exit code: $fmt_rc
|
||||||
|
|
||||||
|
$fmt_out"
|
||||||
|
fi
|
||||||
|
|
||||||
|
lint_cmd="${LINT_CMD:-}"
|
||||||
|
if [[ -z "$lint_cmd" ]]; then
|
||||||
|
jq -nc \
|
||||||
|
--arg fo "$format_output" \
|
||||||
|
'{
|
||||||
|
"format_output": $fo,
|
||||||
|
"lint_ok": true,
|
||||||
|
"lint_output": "(no LINT_CMD configured; linting is covered by the build/check command)",
|
||||||
|
"_next": "verify_build"
|
||||||
|
}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
lint_rc=0
|
||||||
|
lint_out=$(cd "$project_dir" && eval "$lint_cmd" 2>&1) || lint_rc=$?
|
||||||
|
|
||||||
|
if (( lint_rc == 0 )); then
|
||||||
|
jq -nc \
|
||||||
|
--arg fo "$format_output" \
|
||||||
|
--arg lo "Ran: $lint_cmd
|
||||||
|
|
||||||
|
$lint_out" \
|
||||||
|
'{
|
||||||
|
"format_output": $fo,
|
||||||
|
"lint_ok": true,
|
||||||
|
"lint_output": $lo,
|
||||||
|
"_next": "verify_build"
|
||||||
|
}'
|
||||||
|
else
|
||||||
|
jq -nc \
|
||||||
|
--arg fo "$format_output" \
|
||||||
|
--arg lo "Ran: $lint_cmd
|
||||||
|
Exit code: $lint_rc
|
||||||
|
|
||||||
|
$lint_out" \
|
||||||
|
'{
|
||||||
|
"format_output": $fo,
|
||||||
|
"lint_ok": false,
|
||||||
|
"lint_output": $lo,
|
||||||
|
"_next": "fix_loop_gate"
|
||||||
|
}'
|
||||||
|
fi
|
||||||
+57
@@ -0,0 +1,57 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
# shellcheck disable=SC1091
|
||||||
|
source "$(dirname "$0")/../../.shared/utils.sh"
|
||||||
|
|
||||||
|
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||||
|
state=$(cat "$GRAPH_STATE_FILE")
|
||||||
|
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||||
|
state="$GRAPH_STATE"
|
||||||
|
else
|
||||||
|
state='{}'
|
||||||
|
fi
|
||||||
|
|
||||||
|
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
|
||||||
|
|
||||||
|
if [[ -n "${TEST_CMD:-}" ]]; then
|
||||||
|
cmd="$TEST_CMD"
|
||||||
|
else
|
||||||
|
project_info=$(detect_project "$project_dir")
|
||||||
|
cmd=$(echo "$project_info" | jq -r '.test // ""')
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
|
||||||
|
jq -nc '{
|
||||||
|
"tests_ok": true,
|
||||||
|
"tests_output": "(no test command available for this project type)",
|
||||||
|
"_next": "edge_case_sweep"
|
||||||
|
}'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
exit_code=0
|
||||||
|
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
|
||||||
|
|
||||||
|
if (( exit_code == 0 )); then
|
||||||
|
jq -nc \
|
||||||
|
--arg out "Ran: $cmd
|
||||||
|
|
||||||
|
$output" \
|
||||||
|
'{
|
||||||
|
"tests_ok": true,
|
||||||
|
"tests_output": $out,
|
||||||
|
"_next": "edge_case_sweep"
|
||||||
|
}'
|
||||||
|
else
|
||||||
|
jq -nc \
|
||||||
|
--arg out "Ran: $cmd
|
||||||
|
Exit code: $exit_code
|
||||||
|
|
||||||
|
$output" \
|
||||||
|
'{
|
||||||
|
"tests_ok": false,
|
||||||
|
"tests_output": $out,
|
||||||
|
"_next": "fix_loop_gate"
|
||||||
|
}'
|
||||||
|
fi
|
||||||
@@ -5116,6 +5116,45 @@ mod tests {
|
|||||||
assert!(paths::skill_file("frontend-ui-ux").exists());
|
assert!(paths::skill_file("frontend-ui-ux").exists());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
#[serial]
|
||||||
|
fn bundled_graph_agents_parse_and_validate() {
|
||||||
|
use crate::graph::GraphParser;
|
||||||
|
use crate::graph::validator::GraphValidator;
|
||||||
|
|
||||||
|
let _guard = TestConfigDirGuard::new();
|
||||||
|
|
||||||
|
Agent::install_builtin_agents(false).unwrap();
|
||||||
|
Skill::install_builtin_skills(false).unwrap();
|
||||||
|
|
||||||
|
let mut checked = Vec::new();
|
||||||
|
for entry in std::fs::read_dir(paths::agents_data_dir()).unwrap() {
|
||||||
|
let dir = entry.unwrap().path();
|
||||||
|
let graph_path = dir.join("graph.yaml");
|
||||||
|
if !graph_path.exists() {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
let name = dir.file_name().unwrap().to_string_lossy().to_string();
|
||||||
|
let graph = GraphParser::new(&dir)
|
||||||
|
.load_from_file(&graph_path)
|
||||||
|
.unwrap_or_else(|e| panic!("graph.yaml for '{name}' failed to parse: {e}"));
|
||||||
|
let result = GraphValidator::new(&dir).validate(&graph);
|
||||||
|
assert!(
|
||||||
|
result.errors.is_empty(),
|
||||||
|
"graph.yaml for '{name}' failed validation: {:#?}",
|
||||||
|
result.errors
|
||||||
|
);
|
||||||
|
checked.push(name);
|
||||||
|
}
|
||||||
|
checked.sort();
|
||||||
|
for expected in ["coder", "librarian", "step-runner"] {
|
||||||
|
assert!(
|
||||||
|
checked.iter().any(|n| n == expected),
|
||||||
|
"expected bundled graph agent '{expected}' to be checked; found {checked:?}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[serial]
|
#[serial]
|
||||||
fn install_functions_force_preserves_user_mcp_json() {
|
fn install_functions_force_preserves_user_mcp_json() {
|
||||||
|
|||||||
Reference in New Issue
Block a user