feat: Created the step-runner graph agent for more deterministic coding workflows to produce even more reliable and higher-quality results
This commit is contained in:
@@ -132,6 +132,7 @@ instructions: |
|
||||
| `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
|
||||
| `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
|
||||
| `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results |
|
||||
| `step-runner` | Execute ONE step of a phased plan repo (Phase 8) | Graph agent: orient → staleness check → coder → verify → handoff → user approval gate |
|
||||
|
||||
### When to fire `librarian` (external grep) vs `explore` (internal grep)
|
||||
|
||||
@@ -333,6 +334,10 @@ instructions: |
|
||||
|
||||
### Execution lifecycle (one step at a time)
|
||||
|
||||
**Default: delegate the whole step to `step-runner`** — a graph agent that enforces the step protocol as graph edges (orient → staleness check → coder → verify → edge-case sweep → optional independent review → validated handoff → user approval gate): `agent__spawn --agent step-runner --prompt "Execute step <N> of the plan at <plans_dir>"`. It returns `STEP_COMPLETE` / `STEP_BLOCKED` / `STEP_REJECTED` / `STEP_FAILED`. Relay its escalations (deviation gate, approval gate) promptly. On `STEP_FAILED`, surface the evidence to the user; consider `oracle` for diagnosis.
|
||||
|
||||
Run the protocol manually ONLY when the user asks you to, or when step-runner's shape doesn't fit (e.g. a docs-only step with nothing to build). Then:
|
||||
|
||||
1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos.
|
||||
2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval.
|
||||
3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it.
|
||||
|
||||
@@ -0,0 +1,93 @@
|
||||
# Step-Runner
|
||||
|
||||
A graph-based agent that executes **one step** of a phased implementation
|
||||
plan, with the step protocol from the `step-implementation` skill enforced
|
||||
as graph edges rather than prose. Designed to be delegated to by
|
||||
**[Sisyphus](../sisyphus/README.md)**; delegates implementation to
|
||||
**[Coder](../coder/README.md)** and independent review to
|
||||
**[code-reviewer](../code-reviewer/README.md)**.
|
||||
|
||||
It expects a plan repo authored per the `plan-authoring` skill:
|
||||
|
||||
```
|
||||
plans/
|
||||
steps/NN-<slug>.md # step plans with frontmatter (step/title/depends_on/status)
|
||||
handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
|
||||
NOTES.md # rolling durable facts
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
resolve_step (script) locate plan + previous handoff, check depends_on,
|
||||
↓ mark plan in-progress [→ gate_blocked if deps unsatisfied]
|
||||
orient (llm, read-only) merge handoff directives + staleness-check the plan
|
||||
↓
|
||||
route_staleness (script) major deviation → gate_deviation (approval)
|
||||
↓
|
||||
implement (agent → coder) coder runs its own build/test/self-review fix-loop
|
||||
↓
|
||||
route_coder_result (script) COMPLETE → verify | REJECTED / FAILED → end
|
||||
↓
|
||||
verify_format_lint (script) format BEFORE evidence, then lint
|
||||
verify_build (script) step-level build/typecheck
|
||||
verify_tests (script) FULL test suite
|
||||
↓ [failures → fix_loop_gate, back-edge to implement]
|
||||
edge_case_sweep (llm) missed edge cases; annotate downstream plans
|
||||
↓ (Edge cases sections ONLY - scope changes become proposals)
|
||||
route_sweep (script) 5+ files or architectural boundary → independent_review
|
||||
independent_review (agent) code-reviewer; 🔴 findings loop back to implement (bounded)
|
||||
↓
|
||||
write_handoff (llm) evidence-backed handoff per handoff-protocol + NOTES.md
|
||||
check_handoff (script) deterministic schema gate; marks plan status complete
|
||||
↓
|
||||
gate_user_review (approval) HARD STOP - approve, or send revision comments
|
||||
↓ (revisions loop through implement → verify → handoff again)
|
||||
end_success / end_blocked / end_rejected / end_failure
|
||||
```
|
||||
|
||||
End nodes emit sentinel outcomes for the caller:
|
||||
|
||||
- `STEP_COMPLETE` — step implemented, verified, handoff written, user approved.
|
||||
- `STEP_BLOCKED` — `depends_on` unsatisfied and the user declined to proceed.
|
||||
- `STEP_REJECTED` — user aborted at the deviation gate, or the coder's plan
|
||||
was rejected at its approval gate.
|
||||
- `STEP_FAILED` — coder failed, the step-level fix budget was exhausted, or
|
||||
the handoff failed validation twice.
|
||||
|
||||
## Usage
|
||||
|
||||
```sh
|
||||
# From the project root: run the next in-progress/pending step
|
||||
coyote -a step-runner "Execute the next step"
|
||||
|
||||
# A specific step (also parsed from the prompt: "execute step 3")
|
||||
coyote -a step-runner --agent-variable step 3 "Execute step 3"
|
||||
|
||||
# Plan repo somewhere else
|
||||
coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"
|
||||
```
|
||||
|
||||
**Invoke from the project root.** The coder sub-agent resolves its own
|
||||
`project_dir` from the invocation directory; overriding `project_dir` here
|
||||
does not propagate to the spawned coder.
|
||||
|
||||
## Tuning
|
||||
|
||||
`graph.yaml` `initial_state` exposes:
|
||||
|
||||
- `max_fix_attempts` (default `2`) — step-level fix budget (the coder has
|
||||
its own internal budget of 3).
|
||||
- `max_review_attempts` (default `1`) — bounded 🔴-finding fix loops after
|
||||
independent review.
|
||||
|
||||
Environment overrides honored by the script nodes:
|
||||
|
||||
- `FORMAT_CMD` / `LINT_CMD` — formatting and linting (otherwise a per-type
|
||||
heuristic formats, and linting defers to the build/check command).
|
||||
- `BUILD_CMD` / `TEST_CMD` — skip project-type detection (same as coder).
|
||||
- `STEP_AUTOAPPROVE=1` — bypass the deviation gate (non-interactive runs).
|
||||
- `STEP_SKIP_REVIEW=1` — never spawn the independent reviewer.
|
||||
|
||||
The final user approval gate is never bypassed by an environment variable -
|
||||
it is the point of the workflow.
|
||||
@@ -0,0 +1,599 @@
|
||||
name: step-runner
|
||||
description: |
|
||||
Executes ONE step of a phased implementation plan (plans/ repo) with the
|
||||
step protocol enforced as graph edges: orient -> staleness check ->
|
||||
implement (coder) -> verify -> edge-case sweep -> optional independent
|
||||
review -> evidence-backed handoff -> user approval gate. Designed to be
|
||||
delegated to by sisyphus.
|
||||
version: "1.0"
|
||||
|
||||
global_tools:
|
||||
- fs_cat.sh
|
||||
- fs_ls.sh
|
||||
- fs_write.sh
|
||||
- fs_patch.sh
|
||||
- execute_command.sh
|
||||
|
||||
skills_enabled: true
|
||||
enabled_skills:
|
||||
- step-implementation
|
||||
- handoff-protocol
|
||||
- code-review
|
||||
- ai-slop-remover
|
||||
|
||||
variables:
|
||||
- name: project_dir
|
||||
description: |
|
||||
Absolute path to the project directory. Defaults to "." (the directory
|
||||
coyote was invoked from). The coder sub-agent resolves its own
|
||||
project_dir the same way, so invoke step-runner FROM the project root
|
||||
unless you override this for both.
|
||||
default: "."
|
||||
- name: plans_dir
|
||||
description: |
|
||||
Path to the plan repo. Relative paths resolve against project_dir.
|
||||
Expected layout: <plans_dir>/steps/NN-<slug>.md,
|
||||
<plans_dir>/handoffs/, <plans_dir>/NOTES.md.
|
||||
default: "plans"
|
||||
- name: step
|
||||
description: |
|
||||
Which step to execute: a step number, or "next" to pick the first
|
||||
in-progress (resume) or pending step plan.
|
||||
default: "next"
|
||||
|
||||
settings:
|
||||
max_loop_iterations: 20
|
||||
log_state_snapshots: true
|
||||
validate_before_run: true
|
||||
timeout: 7200
|
||||
|
||||
initial_state:
|
||||
project_dir: ""
|
||||
plans_dir: ""
|
||||
step_number: 0
|
||||
step_slug: ""
|
||||
step_title: ""
|
||||
step_plan_path: ""
|
||||
step_plan: ""
|
||||
prev_handoff_path: "(none)"
|
||||
prev_handoff: "(none - this is the first step)"
|
||||
notes_path: ""
|
||||
notes: "(none)"
|
||||
handoff_path: ""
|
||||
blocking_reason: ""
|
||||
plan_summary: ""
|
||||
implementation_brief: ""
|
||||
staleness_report: ""
|
||||
has_major_deviation: false
|
||||
deviation_summary: ""
|
||||
user_feedback: ""
|
||||
fix_instructions: ""
|
||||
fix_attempts: 0
|
||||
max_fix_attempts: 2
|
||||
coder_result: ""
|
||||
format_output: ""
|
||||
lint_ok: true
|
||||
lint_output: ""
|
||||
build_ok: true
|
||||
build_output: ""
|
||||
tests_ok: true
|
||||
tests_output: ""
|
||||
edge_case_report: ""
|
||||
downstream_updates: ""
|
||||
needs_independent_review: false
|
||||
review_report: ""
|
||||
review_attempts: 0
|
||||
max_review_attempts: 1
|
||||
handoff_attempts: 0
|
||||
handoff_fix: ""
|
||||
step_summary: ""
|
||||
|
||||
start: resolve_step
|
||||
|
||||
nodes:
|
||||
resolve_step:
|
||||
id: resolve_step
|
||||
type: script
|
||||
description: |
|
||||
Locate the step plan, previous handoff, and NOTES.md; parse frontmatter;
|
||||
check depends_on satisfaction against existing handoffs; mark the plan
|
||||
in-progress. Routes to gate_blocked when dependencies are unsatisfied.
|
||||
script: scripts/resolve_step.sh
|
||||
timeout: 30
|
||||
fallback: end_failure
|
||||
next: orient
|
||||
|
||||
gate_blocked:
|
||||
id: gate_blocked
|
||||
type: approval
|
||||
description: Escalate unsatisfied dependencies instead of building on missing ground.
|
||||
question: |
|
||||
Step {{step_number}} ({{step_title}}) is BLOCKED:
|
||||
|
||||
{{blocking_reason}}
|
||||
|
||||
Proceed anyway?
|
||||
options:
|
||||
- "yes"
|
||||
- "no"
|
||||
routes:
|
||||
"yes": orient
|
||||
"no": end_blocked
|
||||
on_other: end_blocked
|
||||
|
||||
orient:
|
||||
id: orient
|
||||
type: llm
|
||||
description: |
|
||||
Read-only orientation and staleness check: merge the previous handoff's
|
||||
directives with the step plan, then verify the plan's assumptions
|
||||
against the CURRENT codebase before any edit.
|
||||
skills_enabled: true
|
||||
enabled_skills:
|
||||
- step-implementation
|
||||
instructions: |
|
||||
You are orienting for one step of a phased implementation plan. Load
|
||||
`step-implementation` and apply its Orient and Staleness-check phases.
|
||||
You are READ-ONLY in this node: no edits, no fixes.
|
||||
|
||||
1. Read the previous handoff (below). Note directives aimed at this
|
||||
step, deviations that changed the codebase, and bare assertions
|
||||
that need re-verification.
|
||||
2. Staleness-check the step plan against the code at {{project_dir}}:
|
||||
grep the symbols it references (via execute_command), read its
|
||||
Context snippets at their claimed locations with fs_cat, confirm
|
||||
its Test commands exist.
|
||||
3. Classify discrepancies per the skill's deviation table: minor
|
||||
(mechanics differ; correct silently in the brief) vs major (scope,
|
||||
approach, interfaces, or a later step's assumptions affected).
|
||||
|
||||
Produce `implementation_brief`: the corrected, self-contained marching
|
||||
orders for the implementer - plan tasks in order, handoff directives
|
||||
applied, minor staleness corrections folded in, acceptance criteria
|
||||
restated. The implementer sees ONLY the step plan plus your brief.
|
||||
prompt: |
|
||||
## Step plan ({{step_plan_path}})
|
||||
{{step_plan}}
|
||||
|
||||
## Previous handoff ({{prev_handoff_path}})
|
||||
{{prev_handoff}}
|
||||
|
||||
## Rolling project notes
|
||||
{{notes}}
|
||||
tools:
|
||||
- fs_cat
|
||||
- fs_ls
|
||||
- execute_command
|
||||
max_iterations: 20
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
plan_summary:
|
||||
type: string
|
||||
description: 1-3 sentences summarizing what this step delivers
|
||||
implementation_brief:
|
||||
type: string
|
||||
description: Corrected, self-contained instructions for the implementer
|
||||
staleness_report:
|
||||
type: string
|
||||
description: Findings from checking plan assumptions against current code; "clean" if none
|
||||
has_major_deviation:
|
||||
type: boolean
|
||||
description: True when a discrepancy changes scope, approach, or interfaces
|
||||
deviation_summary:
|
||||
type: string
|
||||
description: Major deviations only, with the plan claim vs current reality. Empty when none
|
||||
required: [plan_summary, implementation_brief, staleness_report, has_major_deviation, deviation_summary]
|
||||
fallback: end_failure
|
||||
next: route_staleness
|
||||
|
||||
route_staleness:
|
||||
id: route_staleness
|
||||
type: script
|
||||
description: Major deviation -> user gate; otherwise straight to implement.
|
||||
script: scripts/route_staleness.sh
|
||||
timeout: 5
|
||||
fallback: implement
|
||||
|
||||
gate_deviation:
|
||||
id: gate_deviation
|
||||
type: approval
|
||||
description: Major deviations are never silently absorbed - the user decides.
|
||||
question: |
|
||||
Step {{step_number}} ({{step_title}}): the plan no longer matches the
|
||||
codebase in a way that changes scope or approach.
|
||||
|
||||
{{deviation_summary}}
|
||||
|
||||
Staleness report:
|
||||
{{staleness_report}}
|
||||
|
||||
Proceed with the corrected brief? (Answer with anything else to give
|
||||
your own guidance to the implementer.)
|
||||
options:
|
||||
- "proceed"
|
||||
- "abort"
|
||||
routes:
|
||||
"proceed": implement
|
||||
"abort": end_rejected
|
||||
on_other: implement
|
||||
state_updates:
|
||||
user_feedback: "{{choice}}"
|
||||
|
||||
implement:
|
||||
id: implement
|
||||
type: agent
|
||||
description: |
|
||||
Delegate implementation to the coder graph agent, which runs its own
|
||||
plan -> implement -> build -> tests -> self-review fix-loop internally.
|
||||
agent: coder
|
||||
prompt: |
|
||||
## TASK
|
||||
Execute step {{step_number}} ({{step_title}}) of a phased implementation
|
||||
plan for the project at {{project_dir}}.
|
||||
|
||||
## EXPECTED OUTCOME
|
||||
Every task in the step plan below is implemented and its acceptance
|
||||
criteria are met. Tests are derived from the Acceptance criteria
|
||||
section (not from the implementation). Build and full test suite pass.
|
||||
|
||||
## MUST DO
|
||||
- Follow the Orientation brief below - it supersedes the raw plan where
|
||||
they disagree (it folds in corrections from the staleness check).
|
||||
- Match the patterns pasted in the step plan's Context section.
|
||||
- Derive tests from the plan's Acceptance criteria.
|
||||
|
||||
## MUST NOT DO
|
||||
- Do not touch anything listed in the plan's Out of scope section.
|
||||
- Do not modify files under {{plans_dir}}.
|
||||
- Do not implement work belonging to other steps.
|
||||
|
||||
## CONTEXT
|
||||
### Step plan
|
||||
{{step_plan}}
|
||||
|
||||
### Orientation brief (handoff directives + staleness corrections applied)
|
||||
{{implementation_brief}}
|
||||
|
||||
### User guidance (if any)
|
||||
{{user_feedback}}
|
||||
|
||||
### Fix loop status (empty on first attempt)
|
||||
{{fix_instructions}}
|
||||
timeout: 3600
|
||||
state_updates:
|
||||
coder_result: "{{output}}"
|
||||
next: route_coder_result
|
||||
|
||||
route_coder_result:
|
||||
id: route_coder_result
|
||||
type: script
|
||||
description: Route on the coder sentinel - COMPLETE verifies, REJECTED/FAILED terminate.
|
||||
script: scripts/route_coder_result.sh
|
||||
timeout: 5
|
||||
fallback: end_failure
|
||||
|
||||
verify_format_lint:
|
||||
id: verify_format_lint
|
||||
type: script
|
||||
description: |
|
||||
Format BEFORE evidence collection (FORMAT_CMD override or per-type
|
||||
heuristic), then lint (LINT_CMD, when configured). Lint failure routes
|
||||
to the fix loop.
|
||||
script: scripts/verify_format_lint.sh
|
||||
timeout: 300
|
||||
fallback: fix_loop_gate
|
||||
|
||||
verify_build:
|
||||
id: verify_build
|
||||
type: script
|
||||
description: Step-level build/typecheck evidence, collected AFTER formatting.
|
||||
script: scripts/verify_build.sh
|
||||
timeout: 600
|
||||
fallback: fix_loop_gate
|
||||
|
||||
verify_tests:
|
||||
id: verify_tests
|
||||
type: script
|
||||
description: FULL test suite - regressions in untouched code fail the step too.
|
||||
script: scripts/verify_tests.sh
|
||||
timeout: 1200
|
||||
fallback: fix_loop_gate
|
||||
|
||||
fix_loop_gate:
|
||||
id: fix_loop_gate
|
||||
type: script
|
||||
description: |
|
||||
Step-level fix budget (the coder already ran its own internal fix
|
||||
loop). Loops to implement with fix_instructions, or ends as failure.
|
||||
script: scripts/fix_loop_gate.sh
|
||||
timeout: 5
|
||||
fallback: end_failure
|
||||
|
||||
edge_case_sweep:
|
||||
id: edge_case_sweep
|
||||
type: llm
|
||||
description: |
|
||||
Post-implementation sweep: missed spots, edge cases, downstream plan
|
||||
implications. May annotate downstream plans' Edge cases sections
|
||||
(annotate vs propose per handoff-protocol). Also judges whether the
|
||||
change warrants an independent review pass.
|
||||
skills_enabled: true
|
||||
enabled_skills:
|
||||
- step-implementation
|
||||
- handoff-protocol
|
||||
instructions: |
|
||||
The implementation for this step just passed build and tests. Load
|
||||
`step-implementation` (edge-case sweep phase) and `handoff-protocol`
|
||||
(annotate-vs-propose rules), then:
|
||||
|
||||
1. Read the changed code (the coder result below names the files).
|
||||
Look for edge cases the plan missed: empty inputs, error paths,
|
||||
concurrency, partial failure, compat.
|
||||
2. For each edge case belonging to a LATER step: check that step's
|
||||
plan under {{plans_dir}}/steps/. If its Edge cases section already
|
||||
covers it, done. If not, append an entry to that section via
|
||||
fs_patch - touch NOTHING else in the file.
|
||||
3. NEVER edit a later plan's Objective, Tasks, Acceptance criteria,
|
||||
or Out of scope. Scope-affecting changes become proposed diffs in
|
||||
`downstream_updates` instead.
|
||||
4. Set needs_independent_review=true when the change touched 5+ files
|
||||
or crosses architectural boundaries (auth, public APIs, schema,
|
||||
security-sensitive paths).
|
||||
|
||||
Be terse. Findings, not prose.
|
||||
prompt: |
|
||||
## Coder result
|
||||
{{coder_result}}
|
||||
|
||||
## Step plan
|
||||
{{step_plan}}
|
||||
|
||||
## Staleness report from orientation
|
||||
{{staleness_report}}
|
||||
tools:
|
||||
- fs_cat
|
||||
- fs_ls
|
||||
- fs_patch
|
||||
- execute_command
|
||||
max_iterations: 20
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
edge_case_report:
|
||||
type: string
|
||||
description: Edge cases discovered - both handled and punted, one per line. "none" if empty
|
||||
downstream_updates:
|
||||
type: string
|
||||
description: Annotations made (plan file + section) and proposed diffs for scope-affecting changes. "none" if empty
|
||||
needs_independent_review:
|
||||
type: boolean
|
||||
required: [edge_case_report, downstream_updates, needs_independent_review]
|
||||
fallback: write_handoff
|
||||
next: route_sweep
|
||||
|
||||
route_sweep:
|
||||
id: route_sweep
|
||||
type: script
|
||||
description: Broad or boundary-crossing changes get an independent reviewer.
|
||||
script: scripts/route_sweep.sh
|
||||
timeout: 5
|
||||
fallback: write_handoff
|
||||
|
||||
independent_review:
|
||||
id: independent_review
|
||||
type: agent
|
||||
description: Independent review pass - the author's self-review cannot catch its own rationalizations.
|
||||
agent: code-reviewer
|
||||
prompt: |
|
||||
Review the changes produced for step {{step_number}} ({{step_title}})
|
||||
of a phased implementation plan in {{project_dir}}.
|
||||
|
||||
What the step was supposed to do:
|
||||
{{plan_summary}}
|
||||
|
||||
Coder summary (names the modified/created files):
|
||||
{{coder_result}}
|
||||
|
||||
Review the changed files against the step plan's acceptance criteria.
|
||||
Preserve severity tags in your findings.
|
||||
timeout: 1200
|
||||
state_updates:
|
||||
review_report: "{{output}}"
|
||||
next: route_review
|
||||
|
||||
route_review:
|
||||
id: route_review
|
||||
type: script
|
||||
description: Critical findings loop back to implement (bounded); otherwise proceed to handoff.
|
||||
script: scripts/route_review.sh
|
||||
timeout: 5
|
||||
fallback: write_handoff
|
||||
|
||||
write_handoff:
|
||||
id: write_handoff
|
||||
type: llm
|
||||
description: |
|
||||
Write the evidence-backed handoff per handoff-protocol and append
|
||||
durable facts to NOTES.md. The completion gate (check_handoff)
|
||||
verifies the document afterward.
|
||||
skills_enabled: true
|
||||
enabled_skills:
|
||||
- handoff-protocol
|
||||
- ai-slop-remover
|
||||
instructions: |
|
||||
Load `handoff-protocol` and follow its writer schema EXACTLY: the
|
||||
frontmatter (step, title, result) and all eight sections, writing
|
||||
"None" rather than omitting a section.
|
||||
|
||||
Write the handoff to {{handoff_path}} with fs_write. Paste the
|
||||
verification evidence below verbatim into the Evidence section -
|
||||
commands, exit codes, decisive output lines. Deviations come from the
|
||||
staleness report, gate decisions, and fix loop history. Downstream
|
||||
plan updates come from the sweep results.
|
||||
|
||||
Then append durable, step-independent facts (if any) to {{notes_path}}
|
||||
- create the file if missing, never rewrite existing entries.
|
||||
|
||||
If "Gate feedback" below is non-empty, a previous handoff attempt
|
||||
failed validation - fix exactly what it lists.
|
||||
prompt: |
|
||||
## Step
|
||||
{{step_number}} ({{step_title}}) - plan at {{step_plan_path}}
|
||||
|
||||
## Plan summary
|
||||
{{plan_summary}}
|
||||
|
||||
## Coder result
|
||||
{{coder_result}}
|
||||
|
||||
## Staleness report / deviations
|
||||
{{staleness_report}}
|
||||
|
||||
Major deviation summary (if any): {{deviation_summary}}
|
||||
User guidance given (if any): {{user_feedback}}
|
||||
Fix loop attempts used: {{fix_attempts}} of {{max_fix_attempts}}
|
||||
|
||||
## Edge cases discovered
|
||||
{{edge_case_report}}
|
||||
|
||||
## Downstream plan updates
|
||||
{{downstream_updates}}
|
||||
|
||||
## Independent review report (if any)
|
||||
{{review_report}}
|
||||
|
||||
## Verification evidence (paste verbatim)
|
||||
### Format
|
||||
{{format_output}}
|
||||
### Lint
|
||||
{{lint_output}}
|
||||
### Build
|
||||
{{build_output}}
|
||||
### Tests
|
||||
{{tests_output}}
|
||||
|
||||
## Gate feedback
|
||||
{{handoff_fix}}
|
||||
tools:
|
||||
- fs_cat
|
||||
- fs_ls
|
||||
- fs_write
|
||||
- fs_patch
|
||||
max_iterations: 15
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
step_summary:
|
||||
type: string
|
||||
description: 3-6 sentence summary of the step for the user's approval decision - what was done, deviations, anything needing their attention
|
||||
required: [step_summary]
|
||||
fallback: end_failure
|
||||
next: check_handoff
|
||||
|
||||
check_handoff:
|
||||
id: check_handoff
|
||||
type: script
|
||||
description: |
|
||||
Deterministic completion gate - handoff exists with frontmatter and all
|
||||
required sections. On success, marks the step plan status complete.
|
||||
One retry back to write_handoff, then failure.
|
||||
script: scripts/check_handoff.sh
|
||||
timeout: 10
|
||||
fallback: end_failure
|
||||
|
||||
gate_user_review:
|
||||
id: gate_user_review
|
||||
type: approval
|
||||
description: The hard stop - the next step never starts without explicit approval.
|
||||
question: |
|
||||
## Step {{step_number}} ({{step_title}}) - ready for review
|
||||
|
||||
{{step_summary}}
|
||||
|
||||
Handoff: {{handoff_path}}
|
||||
Build: {{build_ok}} | Tests: {{tests_ok}} | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
|
||||
|
||||
Approve this step? (Answer with anything else to send revision
|
||||
instructions straight to the implementer.)
|
||||
options:
|
||||
- "approve"
|
||||
- "revise"
|
||||
routes:
|
||||
"approve": end_success
|
||||
"revise": get_revision
|
||||
on_other: revise_from_choice
|
||||
state_updates:
|
||||
user_feedback: "{{choice}}"
|
||||
|
||||
get_revision:
|
||||
id: get_revision
|
||||
type: input
|
||||
description: Collect revision instructions, then loop back through implement -> verify -> handoff.
|
||||
question: "What should change? Your comments go to the implementer verbatim."
|
||||
validation: "len(input) > 0"
|
||||
state_updates:
|
||||
fix_instructions: "{{input}}"
|
||||
next: implement
|
||||
|
||||
revise_from_choice:
|
||||
id: revise_from_choice
|
||||
type: script
|
||||
description: Free-form approval answers are treated as revision instructions.
|
||||
script: scripts/revise_from_choice.sh
|
||||
timeout: 5
|
||||
fallback: get_revision
|
||||
|
||||
end_success:
|
||||
id: end_success
|
||||
type: end
|
||||
output: |
|
||||
STEP_COMPLETE
|
||||
Step: {{step_number}} ({{step_title}})
|
||||
Plan: {{step_plan_path}}
|
||||
Handoff: {{handoff_path}}
|
||||
Build: passed | Tests: passed | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
|
||||
|
||||
{{step_summary}}
|
||||
|
||||
Downstream plan updates:
|
||||
{{downstream_updates}}
|
||||
|
||||
end_blocked:
|
||||
id: end_blocked
|
||||
type: end
|
||||
output: |
|
||||
STEP_BLOCKED
|
||||
Step: {{step_number}} ({{step_title}})
|
||||
Reason:
|
||||
{{blocking_reason}}
|
||||
|
||||
end_rejected:
|
||||
id: end_rejected
|
||||
type: end
|
||||
output: |
|
||||
STEP_REJECTED
|
||||
Step: {{step_number}} ({{step_title}})
|
||||
Rejected at: deviation gate or coder approval gate.
|
||||
Deviation summary:
|
||||
{{deviation_summary}}
|
||||
Coder result (if it ran):
|
||||
{{coder_result}}
|
||||
|
||||
end_failure:
|
||||
id: end_failure
|
||||
type: end
|
||||
output: |
|
||||
STEP_FAILED
|
||||
Step: {{step_number}} ({{step_title}})
|
||||
Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
|
||||
Blocking reason (if resolution failed): {{blocking_reason}}
|
||||
|
||||
Coder result:
|
||||
{{coder_result}}
|
||||
|
||||
Last build output:
|
||||
{{build_output}}
|
||||
|
||||
Last tests output:
|
||||
{{tests_output}}
|
||||
+54
@@ -0,0 +1,54 @@
|
||||
#!/usr/bin/env bash
|
||||
set -uo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
handoff_path=$(echo "$state" | jq -r '.handoff_path // ""')
|
||||
step_plan_path=$(echo "$state" | jq -r '.step_plan_path // ""')
|
||||
handoff_attempts=$(echo "$state" | jq -r '.handoff_attempts // 0')
|
||||
|
||||
problems=""
|
||||
|
||||
if [[ ! -f "$handoff_path" ]]; then
|
||||
problems="- handoff file does not exist at $handoff_path"$'\n'
|
||||
else
|
||||
content=$(cat "$handoff_path")
|
||||
grep -qE '^result:[[:space:]]*(complete|partial|blocked)' <<< "$content" \
|
||||
|| problems+="- frontmatter is missing 'result: complete|partial|blocked'"$'\n'
|
||||
for section in "Summary" "Completed" "Not completed" "Deviations" "Downstream plan updates" "Edge cases discovered" "Evidence" "Notes for next step"; do
|
||||
grep -qE "^##[[:space:]]+${section}" <<< "$content" \
|
||||
|| problems+="- missing required section: ## ${section}"$'\n'
|
||||
done
|
||||
fi
|
||||
|
||||
if [[ -z "$problems" ]]; then
|
||||
if [[ -f "$step_plan_path" ]]; then
|
||||
tmp=$(mktemp)
|
||||
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: complete"; next} {print}' "$step_plan_path" > "$tmp" && mv "$tmp" "$step_plan_path"
|
||||
fi
|
||||
jq -nc '{"handoff_fix": "", "_next": "gate_user_review"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if (( handoff_attempts >= 1 )); then
|
||||
jq -nc \
|
||||
--arg br "Handoff failed validation twice. Problems:
|
||||
$problems" \
|
||||
'{"blocking_reason": $br, "_next": "end_failure"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
jq -nc \
|
||||
--arg hf "The previous handoff attempt failed validation. Fix exactly these problems:
|
||||
$problems" \
|
||||
'{
|
||||
"handoff_attempts": 1,
|
||||
"handoff_fix": $hf,
|
||||
"_next": "write_handoff"
|
||||
}'
|
||||
+60
@@ -0,0 +1,60 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
fix_attempts=$(echo "$state" | jq -r '.fix_attempts // 0')
|
||||
max_fix_attempts=$(echo "$state" | jq -r '.max_fix_attempts // 2')
|
||||
lint_ok=$(echo "$state" | jq -r '.lint_ok | if . == null then "true" else (. | tostring) end')
|
||||
build_ok=$(echo "$state" | jq -r '.build_ok | if . == null then "true" else (. | tostring) end')
|
||||
tests_ok=$(echo "$state" | jq -r '.tests_ok | if . == null then "true" else (. | tostring) end')
|
||||
lint_output=$(echo "$state" | jq -r '.lint_output // ""')
|
||||
build_output=$(echo "$state" | jq -r '.build_output // ""')
|
||||
tests_output=$(echo "$state" | jq -r '.tests_output // ""')
|
||||
|
||||
if (( fix_attempts >= max_fix_attempts )); then
|
||||
jq -nc \
|
||||
--argjson n "$fix_attempts" \
|
||||
'{
|
||||
"fix_attempts": $n,
|
||||
"_next": "end_failure"
|
||||
}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
next_attempts=$((fix_attempts + 1))
|
||||
|
||||
if [[ "$lint_ok" != "true" ]]; then
|
||||
stage="lint"
|
||||
output="$lint_output"
|
||||
elif [[ "$build_ok" != "true" ]]; then
|
||||
stage="build"
|
||||
output="$build_output"
|
||||
elif [[ "$tests_ok" != "true" ]]; then
|
||||
stage="full test suite"
|
||||
output="$tests_output"
|
||||
else
|
||||
stage="verification"
|
||||
output="fix_loop_gate was reached but no failing stage was recorded. Re-run verification."
|
||||
fi
|
||||
|
||||
fix_instructions=$(printf '## Fix loop status (step-level attempt %d of %d)\n\nThe implementation passed the coder'"'"'s internal checks but failed step-level verification at the %s stage.\n\nOutput:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor. Regressions in untouched code caused by this change are in scope.' \
|
||||
"$next_attempts" "$max_fix_attempts" "$stage" "$output")
|
||||
|
||||
jq -nc \
|
||||
--argjson n "$next_attempts" \
|
||||
--arg 'fi' "$fix_instructions" \
|
||||
'{
|
||||
"fix_attempts": $n,
|
||||
"fix_instructions": $fi,
|
||||
"lint_ok": true,
|
||||
"build_ok": true,
|
||||
"tests_ok": true,
|
||||
"_next": "implement"
|
||||
}'
|
||||
+152
@@ -0,0 +1,152 @@
|
||||
#!/usr/bin/env bash
|
||||
set -uo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
fail() {
|
||||
jq -nc --arg r "$1" '{"blocking_reason": $r, "_next": "end_failure"}'
|
||||
exit 0
|
||||
}
|
||||
|
||||
project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
|
||||
project_dir=$(cd "$project_dir" 2>/dev/null && pwd) || fail "project_dir does not exist: $project_dir"
|
||||
|
||||
plans_dir="${LLM_AGENT_VAR_PLANS_DIR:-plans}"
|
||||
[[ "$plans_dir" != /* ]] && plans_dir="$project_dir/$plans_dir"
|
||||
steps_dir="$plans_dir/steps"
|
||||
handoffs_dir="$plans_dir/handoffs"
|
||||
notes_path="$plans_dir/NOTES.md"
|
||||
|
||||
[[ -d "$steps_dir" ]] || fail "No step plans directory at $steps_dir (expected <plans_dir>/steps/NN-<slug>.md)"
|
||||
|
||||
frontmatter() {
|
||||
awk '/^---[[:space:]]*$/{n++; next} n==1{print} n>=2{exit}' "$1"
|
||||
}
|
||||
|
||||
fm_value() {
|
||||
echo "$1" | grep -E "^$2:" | head -1 | sed -E "s/^$2:[[:space:]]*//" | sed -E 's/^["'"'"']|["'"'"']$//g'
|
||||
}
|
||||
|
||||
step="${LLM_AGENT_VAR_STEP:-next}"
|
||||
if [[ "$step" == "next" ]]; then
|
||||
prompt_step=$(echo "$state" | jq -r '.initial_prompt // ""' | grep -oiE 'step[[:space:]#:]*[0-9]+' | head -1 | grep -oE '[0-9]+' || true)
|
||||
[[ -n "$prompt_step" ]] && step="$prompt_step"
|
||||
fi
|
||||
|
||||
plan_file=""
|
||||
if [[ "$step" == "next" ]]; then
|
||||
first_pending=""
|
||||
while IFS= read -r f; do
|
||||
st=$(fm_value "$(frontmatter "$f")" "status")
|
||||
if [[ "$st" == "in-progress" ]]; then
|
||||
plan_file="$f"
|
||||
break
|
||||
fi
|
||||
[[ -z "$first_pending" && ( "$st" == "pending" || -z "$st" ) ]] && first_pending="$f"
|
||||
done < <(find "$steps_dir" -maxdepth 1 -name '*.md' | sort)
|
||||
[[ -z "$plan_file" ]] && plan_file="$first_pending"
|
||||
[[ -z "$plan_file" ]] && fail "No in-progress or pending step plans in $steps_dir"
|
||||
else
|
||||
[[ "$step" =~ ^[0-9]+$ ]] || fail "step must be a number or 'next'; got: $step"
|
||||
padded=$(printf '%02d' "$((10#$step))")
|
||||
plan_file=$(find "$steps_dir" -maxdepth 1 \( -name "${padded}-*.md" -o -name "${step}-*.md" \) | sort | head -1)
|
||||
[[ -n "$plan_file" ]] || fail "No step plan matching step $step in $steps_dir"
|
||||
fi
|
||||
|
||||
bn=$(basename "$plan_file" .md)
|
||||
num_part="${bn%%-*}"
|
||||
[[ "$num_part" =~ ^[0-9]+$ ]] || fail "Step plan filename must start with a number: $bn"
|
||||
step_number=$((10#$num_part))
|
||||
step_slug="${bn#*-}"
|
||||
|
||||
fm=$(frontmatter "$plan_file")
|
||||
step_title=$(fm_value "$fm" "title")
|
||||
[[ -z "$step_title" ]] && step_title="$step_slug"
|
||||
|
||||
deps=$(echo "$fm" | awk '/^depends_on:/{f=1; print; next} f && /^[[:space:]]*-/{print; next} f{exit}' | grep -oE '[0-9]+' || true)
|
||||
unsatisfied=""
|
||||
for dep in $deps; do
|
||||
dep_padded=$(printf '%02d' "$((10#$dep))")
|
||||
dep_handoff=$(find "$handoffs_dir" -maxdepth 1 \( -name "${dep_padded}-*.md" -o -name "${dep}-*.md" \) 2>/dev/null | sort | head -1)
|
||||
if [[ -z "$dep_handoff" ]]; then
|
||||
unsatisfied+="- step $dep: no handoff found (step not executed?)"$'\n'
|
||||
continue
|
||||
fi
|
||||
dep_result=$(fm_value "$(frontmatter "$dep_handoff")" "result")
|
||||
if [[ "$dep_result" != "complete" ]]; then
|
||||
unsatisfied+="- step $dep: handoff result is '$dep_result' (not complete): $dep_handoff"$'\n'
|
||||
fi
|
||||
done
|
||||
|
||||
prev_handoff_path="(none)"
|
||||
prev_handoff="(none - this is the first step)"
|
||||
prev_file=""
|
||||
prev_num=0
|
||||
while IFS= read -r h; do
|
||||
hn="${h##*/}"
|
||||
hn="${hn%%-*}"
|
||||
[[ "$hn" =~ ^[0-9]+$ ]] || continue
|
||||
n=$((10#$hn))
|
||||
if (( n < step_number && n >= prev_num )); then
|
||||
prev_num=$n
|
||||
prev_file="$h"
|
||||
fi
|
||||
done < <(find "$handoffs_dir" -maxdepth 1 -name '*.md' 2>/dev/null | sort)
|
||||
if [[ -n "$prev_file" ]]; then
|
||||
prev_handoff_path="$prev_file"
|
||||
prev_handoff=$(head -c 16000 "$prev_file")
|
||||
fi
|
||||
|
||||
notes="(none)"
|
||||
[[ -f "$notes_path" ]] && notes=$(head -c 8000 "$notes_path")
|
||||
|
||||
step_plan=$(head -c 24000 "$plan_file")
|
||||
handoff_path="$handoffs_dir/$(basename "$plan_file")"
|
||||
|
||||
tmp=$(mktemp)
|
||||
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: in-progress"; next} {print}' "$plan_file" > "$tmp" && mv "$tmp" "$plan_file"
|
||||
|
||||
next_node="orient"
|
||||
blocking_reason=""
|
||||
if [[ -n "$unsatisfied" ]]; then
|
||||
next_node="gate_blocked"
|
||||
blocking_reason="Unsatisfied dependencies:"$'\n'"$unsatisfied"
|
||||
fi
|
||||
|
||||
jq -nc \
|
||||
--arg pd "$project_dir" \
|
||||
--arg pl "$plans_dir" \
|
||||
--argjson sn "$step_number" \
|
||||
--arg ss "$step_slug" \
|
||||
--arg st "$step_title" \
|
||||
--arg spp "$plan_file" \
|
||||
--arg sp "$step_plan" \
|
||||
--arg php "$prev_handoff_path" \
|
||||
--arg ph "$prev_handoff" \
|
||||
--arg np "$notes_path" \
|
||||
--arg no "$notes" \
|
||||
--arg hp "$handoff_path" \
|
||||
--arg br "$blocking_reason" \
|
||||
--arg nx "$next_node" \
|
||||
'{
|
||||
"project_dir": $pd,
|
||||
"plans_dir": $pl,
|
||||
"step_number": $sn,
|
||||
"step_slug": $ss,
|
||||
"step_title": $st,
|
||||
"step_plan_path": $spp,
|
||||
"step_plan": $sp,
|
||||
"prev_handoff_path": $php,
|
||||
"prev_handoff": $ph,
|
||||
"notes_path": $np,
|
||||
"notes": $no,
|
||||
"handoff_path": $hp,
|
||||
"blocking_reason": $br,
|
||||
"_next": $nx
|
||||
}'
|
||||
+27
@@ -0,0 +1,27 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
feedback=$(echo "$state" | jq -r '.user_feedback // ""')
|
||||
|
||||
if [[ -z "$feedback" ]]; then
|
||||
jq -nc '{"_next": "get_revision"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
fix_instructions=$(printf '## Revision requested by the user at the step approval gate\n\nAddress these comments with minimal edits, then the step re-verifies and the handoff is rewritten:\n\n%s' \
|
||||
"$feedback")
|
||||
|
||||
jq -nc \
|
||||
--arg 'fi' "$fix_instructions" \
|
||||
'{
|
||||
"fix_instructions": $fi,
|
||||
"_next": "implement"
|
||||
}'
|
||||
+27
@@ -0,0 +1,27 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
coder_result=$(echo "$state" | jq -r '.coder_result // ""')
|
||||
|
||||
case "$coder_result" in
|
||||
*CODER_COMPLETE*)
|
||||
jq -nc '{"_next": "verify_format_lint"}'
|
||||
;;
|
||||
*CODER_REJECTED*)
|
||||
jq -nc '{"_next": "end_rejected"}'
|
||||
;;
|
||||
*CODER_FAILED*)
|
||||
jq -nc '{"blocking_reason": "coder fix-loop exhausted; see coder result", "_next": "end_failure"}'
|
||||
;;
|
||||
*)
|
||||
jq -nc '{"blocking_reason": "coder returned no recognizable sentinel (expected CODER_COMPLETE / CODER_REJECTED / CODER_FAILED)", "_next": "end_failure"}'
|
||||
;;
|
||||
esac
|
||||
+38
@@ -0,0 +1,38 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
review_report=$(echo "$state" | jq -r '.review_report // ""')
|
||||
review_attempts=$(echo "$state" | jq -r '.review_attempts // 0')
|
||||
max_review_attempts=$(echo "$state" | jq -r '.max_review_attempts // 1')
|
||||
|
||||
if ! grep -qF "🔴" <<< "$review_report"; then
|
||||
jq -nc '{"_next": "write_handoff"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if (( review_attempts >= max_review_attempts )); then
|
||||
jq -nc '{"_next": "write_handoff"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
next_review=$((review_attempts + 1))
|
||||
fix_instructions=$(printf '## Independent review findings (attempt %d of %d)\n\nAn independent reviewer flagged CRITICAL (🔴) findings. Address ONLY the 🔴 findings with minimal edits. Do not refactor unrelated code.\n\n%s' \
|
||||
"$next_review" "$max_review_attempts" "$review_report")
|
||||
|
||||
jq -nc \
|
||||
--argjson n "$next_review" \
|
||||
--arg 'fi' "$fix_instructions" \
|
||||
'{
|
||||
"review_attempts": $n,
|
||||
"fix_instructions": $fi,
|
||||
"needs_independent_review": false,
|
||||
"_next": "implement"
|
||||
}'
|
||||
+23
@@ -0,0 +1,23 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
has_major=$(echo "$state" | jq -r '.has_major_deviation // false')
|
||||
|
||||
if [[ "${STEP_AUTOAPPROVE:-0}" == "1" ]]; then
|
||||
jq -nc '{"_next": "implement"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [[ "$has_major" == "true" ]]; then
|
||||
jq -nc '{"_next": "gate_deviation"}'
|
||||
else
|
||||
jq -nc '{"_next": "implement"}'
|
||||
fi
|
||||
+23
@@ -0,0 +1,23 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
needs_review=$(echo "$state" | jq -r '.needs_independent_review // false')
|
||||
|
||||
if [[ "${STEP_SKIP_REVIEW:-0}" == "1" ]]; then
|
||||
jq -nc '{"_next": "write_handoff"}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [[ "$needs_review" == "true" ]]; then
|
||||
jq -nc '{"_next": "independent_review"}'
|
||||
else
|
||||
jq -nc '{"_next": "write_handoff"}'
|
||||
fi
|
||||
+57
@@ -0,0 +1,57 @@
|
||||
#!/usr/bin/env bash
|
||||
set -uo pipefail
|
||||
|
||||
# shellcheck disable=SC1091
|
||||
source "$(dirname "$0")/../../.shared/utils.sh"
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
|
||||
|
||||
if [[ -n "${BUILD_CMD:-}" ]]; then
|
||||
cmd="$BUILD_CMD"
|
||||
else
|
||||
project_info=$(detect_project "$project_dir")
|
||||
cmd=$(echo "$project_info" | jq -r '.check // .build // ""')
|
||||
fi
|
||||
|
||||
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
|
||||
jq -nc '{
|
||||
"build_ok": true,
|
||||
"build_output": "(no build/check command available for this project type)",
|
||||
"_next": "verify_tests"
|
||||
}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
exit_code=0
|
||||
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
|
||||
|
||||
if (( exit_code == 0 )); then
|
||||
jq -nc \
|
||||
--arg out "Ran: $cmd
|
||||
|
||||
$output" \
|
||||
'{
|
||||
"build_ok": true,
|
||||
"build_output": $out,
|
||||
"_next": "verify_tests"
|
||||
}'
|
||||
else
|
||||
jq -nc \
|
||||
--arg out "Ran: $cmd
|
||||
Exit code: $exit_code
|
||||
|
||||
$output" \
|
||||
'{
|
||||
"build_ok": false,
|
||||
"build_output": $out,
|
||||
"_next": "fix_loop_gate"
|
||||
}'
|
||||
fi
|
||||
+79
@@ -0,0 +1,79 @@
|
||||
#!/usr/bin/env bash
|
||||
set -uo pipefail
|
||||
|
||||
# shellcheck disable=SC1091
|
||||
source "$(dirname "$0")/../../.shared/utils.sh"
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
|
||||
project_type=$(detect_project "$project_dir" | jq -r '.type // "unknown"')
|
||||
|
||||
format_cmd="${FORMAT_CMD:-}"
|
||||
if [[ -z "$format_cmd" ]]; then
|
||||
case "$project_type" in
|
||||
rust) format_cmd="cargo fmt" ;;
|
||||
go) format_cmd="gofmt -w ." ;;
|
||||
python) command -v ruff &>/dev/null && format_cmd="ruff format ." ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
if [[ -z "$format_cmd" ]]; then
|
||||
format_output="(no format command configured for project type '$project_type'; skipped. Set FORMAT_CMD to enable.)"
|
||||
else
|
||||
fmt_rc=0
|
||||
fmt_out=$(cd "$project_dir" && eval "$format_cmd" 2>&1) || fmt_rc=$?
|
||||
format_output="Ran: $format_cmd
|
||||
Exit code: $fmt_rc
|
||||
|
||||
$fmt_out"
|
||||
fi
|
||||
|
||||
lint_cmd="${LINT_CMD:-}"
|
||||
if [[ -z "$lint_cmd" ]]; then
|
||||
jq -nc \
|
||||
--arg fo "$format_output" \
|
||||
'{
|
||||
"format_output": $fo,
|
||||
"lint_ok": true,
|
||||
"lint_output": "(no LINT_CMD configured; linting is covered by the build/check command)",
|
||||
"_next": "verify_build"
|
||||
}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
lint_rc=0
|
||||
lint_out=$(cd "$project_dir" && eval "$lint_cmd" 2>&1) || lint_rc=$?
|
||||
|
||||
if (( lint_rc == 0 )); then
|
||||
jq -nc \
|
||||
--arg fo "$format_output" \
|
||||
--arg lo "Ran: $lint_cmd
|
||||
|
||||
$lint_out" \
|
||||
'{
|
||||
"format_output": $fo,
|
||||
"lint_ok": true,
|
||||
"lint_output": $lo,
|
||||
"_next": "verify_build"
|
||||
}'
|
||||
else
|
||||
jq -nc \
|
||||
--arg fo "$format_output" \
|
||||
--arg lo "Ran: $lint_cmd
|
||||
Exit code: $lint_rc
|
||||
|
||||
$lint_out" \
|
||||
'{
|
||||
"format_output": $fo,
|
||||
"lint_ok": false,
|
||||
"lint_output": $lo,
|
||||
"_next": "fix_loop_gate"
|
||||
}'
|
||||
fi
|
||||
+57
@@ -0,0 +1,57 @@
|
||||
#!/usr/bin/env bash
|
||||
set -uo pipefail
|
||||
|
||||
# shellcheck disable=SC1091
|
||||
source "$(dirname "$0")/../../.shared/utils.sh"
|
||||
|
||||
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
|
||||
state=$(cat "$GRAPH_STATE_FILE")
|
||||
elif [[ -n "${GRAPH_STATE:-}" ]]; then
|
||||
state="$GRAPH_STATE"
|
||||
else
|
||||
state='{}'
|
||||
fi
|
||||
|
||||
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
|
||||
|
||||
if [[ -n "${TEST_CMD:-}" ]]; then
|
||||
cmd="$TEST_CMD"
|
||||
else
|
||||
project_info=$(detect_project "$project_dir")
|
||||
cmd=$(echo "$project_info" | jq -r '.test // ""')
|
||||
fi
|
||||
|
||||
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
|
||||
jq -nc '{
|
||||
"tests_ok": true,
|
||||
"tests_output": "(no test command available for this project type)",
|
||||
"_next": "edge_case_sweep"
|
||||
}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
exit_code=0
|
||||
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
|
||||
|
||||
if (( exit_code == 0 )); then
|
||||
jq -nc \
|
||||
--arg out "Ran: $cmd
|
||||
|
||||
$output" \
|
||||
'{
|
||||
"tests_ok": true,
|
||||
"tests_output": $out,
|
||||
"_next": "edge_case_sweep"
|
||||
}'
|
||||
else
|
||||
jq -nc \
|
||||
--arg out "Ran: $cmd
|
||||
Exit code: $exit_code
|
||||
|
||||
$output" \
|
||||
'{
|
||||
"tests_ok": false,
|
||||
"tests_output": $out,
|
||||
"_next": "fix_loop_gate"
|
||||
}'
|
||||
fi
|
||||
@@ -5116,6 +5116,45 @@ mod tests {
|
||||
assert!(paths::skill_file("frontend-ui-ux").exists());
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[serial]
|
||||
fn bundled_graph_agents_parse_and_validate() {
|
||||
use crate::graph::GraphParser;
|
||||
use crate::graph::validator::GraphValidator;
|
||||
|
||||
let _guard = TestConfigDirGuard::new();
|
||||
|
||||
Agent::install_builtin_agents(false).unwrap();
|
||||
Skill::install_builtin_skills(false).unwrap();
|
||||
|
||||
let mut checked = Vec::new();
|
||||
for entry in std::fs::read_dir(paths::agents_data_dir()).unwrap() {
|
||||
let dir = entry.unwrap().path();
|
||||
let graph_path = dir.join("graph.yaml");
|
||||
if !graph_path.exists() {
|
||||
continue;
|
||||
}
|
||||
let name = dir.file_name().unwrap().to_string_lossy().to_string();
|
||||
let graph = GraphParser::new(&dir)
|
||||
.load_from_file(&graph_path)
|
||||
.unwrap_or_else(|e| panic!("graph.yaml for '{name}' failed to parse: {e}"));
|
||||
let result = GraphValidator::new(&dir).validate(&graph);
|
||||
assert!(
|
||||
result.errors.is_empty(),
|
||||
"graph.yaml for '{name}' failed validation: {:#?}",
|
||||
result.errors
|
||||
);
|
||||
checked.push(name);
|
||||
}
|
||||
checked.sort();
|
||||
for expected in ["coder", "librarian", "step-runner"] {
|
||||
assert!(
|
||||
checked.iter().any(|n| n == expected),
|
||||
"expected bundled graph agent '{expected}' to be checked; found {checked:?}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[serial]
|
||||
fn install_functions_force_preserves_user_mcp_json() {
|
||||
|
||||
Reference in New Issue
Block a user