From 8b061b200fc8d0da8f43480d747a7dcc4dfa59e8 Mon Sep 17 00:00:00 2001
From: Alex Clarke <alex.j.tusa@gmail.com>
Date: Fri, 22 May 2026 12:57:12 -0600
Subject: [PATCH] feat: Improved coder agent that is now a graph-based agent

---
 assets/agents/coder/README.md                 |  84 ++++--
 assets/agents/coder/config.yaml               | 116 --------
 assets/agents/coder/graph.yaml                | 278 ++++++++++++++++++
 assets/agents/coder/scripts/fix_loop_gate.sh  |  49 +++
 assets/agents/coder/scripts/resolve_paths.sh  |  12 +
 .../agents/coder/scripts/route_complexity.sh  |  23 ++
 assets/agents/coder/scripts/verify_build.sh   |  55 ++++
 assets/agents/coder/scripts/verify_tests.sh   |  55 ++++
 assets/agents/sisyphus/README.md              |   9 +-
 assets/agents/sisyphus/config.yaml            |  41 ++-
 10 files changed, 568 insertions(+), 154 deletions(-)
 delete mode 100644 assets/agents/coder/config.yaml
 create mode 100644 assets/agents/coder/graph.yaml
 create mode 100644 assets/agents/coder/scripts/fix_loop_gate.sh
 create mode 100644 assets/agents/coder/scripts/resolve_paths.sh
 create mode 100644 assets/agents/coder/scripts/route_complexity.sh
 create mode 100644 assets/agents/coder/scripts/verify_build.sh
 create mode 100644 assets/agents/coder/scripts/verify_tests.sh

diff --git a/assets/agents/coder/README.md b/assets/agents/coder/README.md
index dd23a02..51c4477 100644
--- a/assets/agents/coder/README.md
+++ b/assets/agents/coder/README.md
@@ -1,40 +1,82 @@
 # Coder
 
-An AI agent that assists you with your coding tasks.
+A graph-based implementation agent. Plans, implements, and runs build +
+tests in a bounded fix-loop until verified. Designed to be delegated to by
+the **[Sisyphus](../sisyphus/README.md)** agent.
 
-This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent to implement code specifications. Sisyphus
-acts as the coordinator/architect, while Coder handles the implementation details.
+Coder is a [graph agent](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents): its workflow is
+defined declaratively in `graph.yaml`, with verification and the
+implement-fix loop enforced as graph edges rather than prose.
 
-## Features
+## Workflow
 
-- 🏗️ Intelligent project structure creation and management
-- 🖼️ Convert screenshots into clean, functional code
-- 📁 Comprehensive file system operations (create folders, files, read/write files)
-- 🧐 Advanced code analysis and improvement suggestions
-- 📊 Precise diff-based file editing for controlled code modifications
+```
+analyze_request (llm + output_schema)   plan + complexity extraction
+        ↓
+route_complexity (script)               opt-out approval gate (complexity ≥ 7)
+        ↓
+gate_approval (approval, optional)
+        ↓
+implement (llm + fs tools)              actual file edits
+        ↓
+verify_build (script)
+        ↓
+verify_tests (script)
+        ↓
+fix_loop_gate (script)                  back-edge to implement (bounded)
+        ↓
+end_success / end_rejected / end_failure
+```
 
-It can also be used as a standalone tool for direct coding assistance.
+End nodes emit one of three sentinel outcomes for the caller:
 
-## Pro-Tip: Use an IDE MCP Server for Improved Performance
-Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
-an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
-server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
-them), and modify the agent definition to look like this:
+- `CODER_COMPLETE` — build and tests passed.
+- `CODER_REJECTED` — user rejected the plan at the approval gate.
+- `CODER_FAILED` — fix-loop exhausted; build/tests still failing.
+
+## Tuning
+
+The agent's `project_dir` is exposed via the standard `variables:` block,
+so it accepts the runtime override flag:
+
+```sh
+# Invoke from inside the project (project_dir defaults to ".")
+cd /path/to/your/project
+loki -a coder "Add a foo() function..."
+
+# Or invoke from anywhere with an explicit override
+loki -a coder --agent-variable project_dir /path/to/your/project "Add..."
+```
+
+`graph.yaml` `initial_state` exposes:
+
+- `max_fix_attempts` (default `3`) — fix-loop budget before `end_failure`.
+
+Environment overrides honored by the script nodes:
+
+- `BUILD_CMD` — skip project-type detection for the build/check command.
+- `TEST_CMD` — skip detection for tests.
+- `CODER_AUTOAPPROVE=1` — bypass the approval gate (for non-interactive runs
+  where complexity might trip the gate).
+
+## Pro-Tip: IDE MCP Server
+
+Modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers
+that let LLMs use IDE tools directly. To wire one in, edit `graph.yaml`:
 
 ```yaml
-# ...
-
 mcp_servers:
-  - jetbrains # The name of your configured IDE MCP server
+  - your-ide-mcp-server
 
 global_tools:
-  # Keep useful read-only tools for reading files in other non-project directories
+  # Keep read-only fs tools for files outside the IDE project
   - fs_read.sh
   - fs_grep.sh
   - fs_glob.sh
 #  - fs_write.sh
 #  - fs_patch.sh
   - execute_command.sh
+```
 
-# ...
-```
\ No newline at end of file
+Then add the MCP server's write/patch tools to the `implement` node's
+`tools:` whitelist.
diff --git a/assets/agents/coder/config.yaml b/assets/agents/coder/config.yaml
deleted file mode 100644
index 3f06dee..0000000
--- a/assets/agents/coder/config.yaml
+++ /dev/null
@@ -1,116 +0,0 @@
-name: coder
-description: Implementation agent - writes code, follows patterns, verifies with builds
-version: 1.0.0
-temperature: 0.1
-
-auto_continue: true
-max_auto_continues: 15
-inject_todo_instructions: true
-
-variables:
-  - name: project_dir
-    description: Project directory to work in
-    default: '.'
-  - name: auto_confirm
-    description: Auto-confirm command execution
-    default: '1'
-
-global_tools:
-  - fs_read.sh
-  - fs_grep.sh
-  - fs_glob.sh
-  - fs_write.sh
-  - fs_patch.sh
-  - execute_command.sh
-
-instructions: |
-  You are a senior engineer. You write code that works on the first try.
-
-  ## Your Mission
-
-  Given an implementation task:
-  1. Check for orchestrator context first (see below)
-  2. Fill gaps only. Read files NOT already covered in context
-  3. Write the code (using tools, NOT chat output)
-  4. Verify it compiles/builds
-  5. Signal completion with a summary
-
-  ## Using Orchestrator Context (IMPORTANT)
-
-  When spawned by sisyphus, your prompt will often contain a `<context>` block
-  with prior findings: file paths, code patterns, and conventions discovered by
-  explore agents.
-
-  **If context is provided:**
-  1. Use it as your primary reference. Don't re-read files already summarized
-  2. Follow the code patterns shown. Snippets in context ARE the style guide
-  3. Read the referenced files ONLY IF you need more detail (e.g. full function
-     signature, import list, or adjacent code not included in the snippet)
-  4. If context includes a "Conventions" section, follow it exactly
-
-  **If context is NOT provided or is too vague to act on:**
-  Fall back to self-exploration: grep for similar files, read 1-2 examples,
-  match their style.
-
-  **Never ignore provided context.** It represents work already done upstream.
-
-  ## Todo System
-
-  For multi-file changes:
-  1. `todo__init` with the implementation goal
-  2. `todo__add` for each file to create/modify
-  3. Implement each, calling `todo__done` immediately after
-
-  ## Writing Code
-  1. **Use fs_patch for surgical edits** - `fs_patch --path "src/main.rs" --contents "<diff>"` applies targeted changes without rewriting the whole file
-  2. **use fs_write for full file writes** - `fs_write --path "src/main.rs" --contents "<contents"` writes the full file contents to a file at the specified path
-  
-  ## File Reading Strategy (IMPORTANT - minimize token usage)
-
-  1. **Use grep to find relevant code** - `fs_grep --pattern "fn handle_request" --include "*.rs"` finds where things are
-  2. **Read only what you need** - `fs_read --path "src/main.rs" --offset 50 --limit 30` reads lines 50-79
-  3. **Never cat entire large files** - If 500+ lines, read the relevant section after grepping for it
-  4. **Use glob to find files** - `fs_glob --pattern "*.rs" --path src/` discovers files by name
-
-  ## Pattern Matching
-
-  Before writing ANY file:
-  1. Find a similar existing file (use `fs_grep` to locate, then `fs_read` to examine)
-  2. Match its style: imports, naming, structure
-  3. Follow the same patterns exactly
-
-  ## Verification
-
-  After writing files:
-  1. Run `verify_build` to check compilation
-  2. If it fails, fix the error (minimal change)
-  3. Don't move on until build passes
-
-  ## Completion Signal
-
-  When done, end your response with a summary so the parent agent knows what happened:
-
-  ```
-  CODER_COMPLETE: [summary of what was implemented, which files were created/modified, and build status]
-  ```
-
-  Or if something went wrong:
-  ```
-  CODER_FAILED: [what went wrong]
-  ```
-
-  ## Rules
-
-  1. **Write code via tools** - Never output code to chat
-  2. **Follow patterns** - Read existing files first
-  3. **Verify builds** - Don't finish without checking
-  4. **Minimal fixes** - If build fails, fix precisely
-  5. **No refactoring** - Only implement what's asked
-
-  ## Context
-  - Project: {{project_dir}}
-  - CWD: {{__cwd__}}
-  - Shell: {{__shell__}}
-  
-  ## Available tools:
-  {{__tools__}}
\ No newline at end of file
diff --git a/assets/agents/coder/graph.yaml b/assets/agents/coder/graph.yaml
new file mode 100644
index 0000000..1d5e9df
--- /dev/null
+++ b/assets/agents/coder/graph.yaml
@@ -0,0 +1,278 @@
+name: coder
+description: |
+  Implementation agent. Plans, implements, and runs build + tests in a
+  bounded fix-loop until verified. Designed to be delegated to by sisyphus.
+version: "1.0"
+
+temperature: 0.1
+
+global_tools:
+  - fs_cat.sh
+  - fs_ls.sh
+  - fs_write.sh
+  - fs_patch.sh
+  - execute_command.sh
+
+variables:
+  - name: project_dir
+    description: |
+      Absolute path to the project directory. Defaults to "." which is the
+      directory you invoked `loki` from. Override at runtime with
+      `loki -a coder --agent-variable project_dir /abs/path "..."`.
+    default: "."
+
+settings:
+  max_loop_iterations: 20
+  log_state_snapshots: true
+  validate_before_run: true
+  timeout: 1800
+
+initial_state:
+  project_dir: ""
+  fix_attempts: 0
+  max_fix_attempts: 3
+  fix_instructions: ""
+  build_output: ""
+  tests_output: ""
+  last_node_output: ""
+  plan_summary: ""
+  files_to_modify: []
+  files_to_create: []
+  risks: []
+  complexity_score: 0
+
+start: resolve_paths
+
+nodes:
+  resolve_paths:
+    id: resolve_paths
+    type: script
+    description: Resolve project_dir to an absolute path from the agent variable
+    script: scripts/resolve_paths.sh
+    timeout: 5
+    fallback: end_failure
+
+  analyze_request:
+    id: analyze_request
+    type: llm
+    description: Extract a structured plan and complexity score from the orchestrator's prompt
+    instructions: |
+      You are a senior engineer's planning assistant. Read the orchestrator's
+      request and emit a structured plan. You only plan. You never edit files.
+
+      Score complexity from 1 to 10:
+        1-3: trivial - single file, <=20 lines changed, obvious approach
+        4-6: moderate - 2-5 files, clear approach, some pattern matching
+        7-10: complex - multi-component, ambiguous tradeoffs, refactoring,
+              or wide blast radius
+
+      Be specific in `files_to_modify` and `files_to_create`. All paths
+      MUST be absolute. The project root is {{project_dir}}. Prefer paths
+      like "{{project_dir}}/src/foo.rs" over "src/foo.rs". The implementer
+      uses these paths directly with fs_write and fs_patch tools, which
+      resolve relative paths against the loki invocation directory (NOT
+      the project dir). Empty arrays are fine if no files in that category.
+
+      `risks` is a list of short strings. Anything that could derail the
+      implementation: unknown dependencies, brittle tests, blast radius,
+      etc. Empty list is fine.
+
+      Project directory: {{project_dir}}
+    prompt: "{{initial_prompt}}"
+    tools: []
+    output_schema:
+      type: object
+      properties:
+        plan_summary:
+          type: string
+          description: 1-3 sentences summarizing what will be done
+        files_to_modify:
+          type: array
+          items: {type: string}
+        files_to_create:
+          type: array
+          items: {type: string}
+        complexity_score:
+          type: integer
+          minimum: 1
+          maximum: 10
+        risks:
+          type: array
+          items: {type: string}
+      required: [plan_summary, files_to_modify, files_to_create, complexity_score, risks]
+    state_updates:
+      last_node_output: "{{output}}"
+    fallback: end_failure
+    next: route_complexity
+
+  route_complexity:
+    id: route_complexity
+    type: script
+    description: Route to approval gate for complex plans; skip otherwise
+    script: scripts/route_complexity.sh
+    timeout: 5
+    fallback: implement
+
+  gate_approval:
+    id: gate_approval
+    type: approval
+    description: Optional human checkpoint for high-complexity plans
+    question: |
+      ## Plan
+      {{plan_summary}}
+
+      ## Files to modify
+      {{files_to_modify}}
+
+      ## Files to create
+      {{files_to_create}}
+
+      ## Risks
+      {{risks}}
+
+      Complexity: {{complexity_score}}/10
+
+      Approve this plan?
+    options:
+      - "yes"
+      - "no"
+    routes:
+      "yes": implement
+      "no": end_rejected
+    on_other: end_rejected
+
+  implement:
+    id: implement
+    type: llm
+    description: Write code via fs tools. Bounded tool-call loop.
+    instructions: |
+      You are a senior engineer. Implement the plan by writing code via
+      tools. Follow existing patterns in the codebase.
+
+      ## Writing code
+
+      1. Use `fs_patch` for surgical edits to existing files.
+      2. Use `fs_write` for new files or full rewrites.
+      3. NEVER output code to chat. Always use tools.
+      4. ALWAYS pass ABSOLUTE paths to fs_write and fs_patch. Relative
+         paths resolve against the loki invocation directory (not the
+         project dir), which is rarely what you want. The project root
+         is {{project_dir}}.
+
+      ## File reading
+
+      1. Use `execute_command` to grep/find:
+         `execute_command --command "grep -rn 'fn handle_request' --include='*.rs' ."`
+         `execute_command --command "find . -name '*.rs' -not -path '*/target/*'"`
+      2. Read only what you need:
+         `fs_cat --path "src/main.rs" --offset 50 --limit 30`
+      3. Never read entire large files. Use offset/limit.
+      4. Use `fs_ls` to list directory contents.
+
+      ## Pattern matching
+
+      Before writing ANY file:
+      1. Find a similar existing file (grep, then read).
+      2. Match its style: imports, naming, structure, error handling.
+      3. Follow the same patterns exactly. Do not invent new ones.
+
+      ## Fix loop
+
+      If the "Fix loop status" section in your user prompt is non-empty,
+      the previous attempt failed verification. Read the error, identify
+      the minimal fix, apply it. Do not refactor while fixing.
+
+      ## Rules
+
+      1. Match existing patterns - read examples first.
+      2. Minimal changes - implement only what's asked.
+      3. Never suppress errors (`as any`, `@ts-ignore`, `#[allow(...)]`
+         on unfamiliar lints, etc.).
+      4. No dead code, no commented-out blocks, no premature abstractions.
+      5. End your turn when editing is done. The graph runs verification next.
+
+      Project directory: {{project_dir}}
+    prompt: |
+      ## Plan summary
+      {{plan_summary}}
+
+      ## Files involved
+      - Modify: {{files_to_modify}}
+      - Create: {{files_to_create}}
+
+      ## Original request from the orchestrator
+      {{initial_prompt}}
+
+      ## Fix loop status
+      {{fix_instructions}}
+    tools:
+      - fs_cat
+      - fs_ls
+      - fs_write
+      - fs_patch
+      - execute_command
+    max_iterations: 30
+    state_updates:
+      last_node_output: "{{output}}"
+    fallback: end_failure
+    next: verify_build
+
+  verify_build:
+    id: verify_build
+    type: script
+    description: Run the project's check/build command. Routes to verify_tests on success, fix_loop_gate on failure.
+    script: scripts/verify_build.sh
+    timeout: 300
+    fallback: fix_loop_gate
+
+  verify_tests:
+    id: verify_tests
+    type: script
+    description: Run the project's test command. Routes to end_success on pass, fix_loop_gate on failure.
+    script: scripts/verify_tests.sh
+    timeout: 600
+    fallback: fix_loop_gate
+
+  fix_loop_gate:
+    id: fix_loop_gate
+    type: script
+    description: Budget gate. Loops back to implement with fix_instructions populated, or terminates as end_failure.
+    script: scripts/fix_loop_gate.sh
+    timeout: 5
+    fallback: end_failure
+
+  end_success:
+    id: end_success
+    type: end
+    output: |
+      CODER_COMPLETE
+      Plan: {{plan_summary}}
+      Files modified: {{files_to_modify}}
+      Files created: {{files_to_create}}
+      Build: passed
+      Tests: passed
+
+  end_rejected:
+    id: end_rejected
+    type: end
+    output: |
+      CODER_REJECTED
+      Plan was rejected at the approval gate.
+      Plan: {{plan_summary}}
+
+  end_failure:
+    id: end_failure
+    type: end
+    output: |
+      CODER_FAILED
+      Plan: {{plan_summary}}
+      Attempts: {{fix_attempts}}/{{max_fix_attempts}}
+
+      Last node output:
+      {{last_node_output}}
+
+      Last build output:
+      {{build_output}}
+
+      Last tests output:
+      {{tests_output}}
diff --git a/assets/agents/coder/scripts/fix_loop_gate.sh b/assets/agents/coder/scripts/fix_loop_gate.sh
new file mode 100644
index 0000000..29a0bf0
--- /dev/null
+++ b/assets/agents/coder/scripts/fix_loop_gate.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
+  state=$(cat "$GRAPH_STATE_FILE")
+elif [[ -n "${GRAPH_STATE:-}" ]]; then
+  state="$GRAPH_STATE"
+else
+  state='{}'
+fi
+
+fix_attempts=$(echo "$state" | jq -r '.fix_attempts // 0')
+max_fix_attempts=$(echo "$state" | jq -r '.max_fix_attempts // 3')
+build_ok=$(echo "$state" | jq -r '.build_ok | if . == null then "true" else (. | tostring) end')
+tests_ok=$(echo "$state" | jq -r '.tests_ok | if . == null then "true" else (. | tostring) end')
+build_output=$(echo "$state" | jq -r '.build_output // ""')
+tests_output=$(echo "$state" | jq -r '.tests_output // ""')
+
+if (( fix_attempts >= max_fix_attempts )); then
+  jq -nc \
+    --argjson n "$fix_attempts" \
+    '{
+      "fix_attempts": $n,
+      "_next": "end_failure"
+    }'
+  exit 0
+fi
+
+next_attempts=$((fix_attempts + 1))
+
+if [[ "$build_ok" != "true" ]]; then
+  fix_instructions=$(printf '## Fix loop status (attempt %d of %d)\n\nThe previous attempt failed the build.\n\nBuild output:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor.' \
+    "$next_attempts" "$max_fix_attempts" "$build_output")
+elif [[ "$tests_ok" != "true" ]]; then
+  fix_instructions=$(printf '## Fix loop status (attempt %d of %d)\n\nBuild passed but tests failed.\n\nTest output:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor.' \
+    "$next_attempts" "$max_fix_attempts" "$tests_output")
+else
+  fix_instructions=$(printf '## Fix loop status (attempt %d of %d)\n\nfix_loop_gate was reached but no failure was detected in state. Re-run the verification step.' \
+    "$next_attempts" "$max_fix_attempts")
+fi
+
+jq -nc \
+  --argjson n "$next_attempts" \
+  --arg fi "$fix_instructions" \
+  '{
+    "fix_attempts": $n,
+    "fix_instructions": $fi,
+    "_next": "implement"
+  }'
diff --git a/assets/agents/coder/scripts/resolve_paths.sh b/assets/agents/coder/scripts/resolve_paths.sh
new file mode 100644
index 0000000..2c9433c
--- /dev/null
+++ b/assets/agents/coder/scripts/resolve_paths.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
+resolved=$(cd "$project_dir" 2>/dev/null && pwd) || resolved="$project_dir"
+
+jq -nc \
+  --arg pd "$resolved" \
+  '{
+    "project_dir": $pd,
+    "_next": "analyze_request"
+  }'
diff --git a/assets/agents/coder/scripts/route_complexity.sh b/assets/agents/coder/scripts/route_complexity.sh
new file mode 100644
index 0000000..d45ca91
--- /dev/null
+++ b/assets/agents/coder/scripts/route_complexity.sh
@@ -0,0 +1,23 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
+  state=$(cat "$GRAPH_STATE_FILE")
+elif [[ -n "${GRAPH_STATE:-}" ]]; then
+  state="$GRAPH_STATE"
+else
+  state='{}'
+fi
+
+complexity=$(echo "$state" | jq -r '.complexity_score // 0')
+
+if [[ "${CODER_AUTOAPPROVE:-0}" == "1" ]]; then
+  jq -nc '{"_next": "implement"}'
+  exit 0
+fi
+
+if (( complexity >= 7 )); then
+  jq -nc '{"_next": "gate_approval"}'
+else
+  jq -nc '{"_next": "implement"}'
+fi
diff --git a/assets/agents/coder/scripts/verify_build.sh b/assets/agents/coder/scripts/verify_build.sh
new file mode 100644
index 0000000..f9b9d65
--- /dev/null
+++ b/assets/agents/coder/scripts/verify_build.sh
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+set -uo pipefail
+
+# shellcheck disable=SC1091
+source "$(dirname "$0")/../../.shared/utils.sh"
+
+if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
+  state=$(cat "$GRAPH_STATE_FILE")
+elif [[ -n "${GRAPH_STATE:-}" ]]; then
+  state="$GRAPH_STATE"
+else
+  state='{}'
+fi
+
+project_dir=$(echo "$state" | jq -r '.project_dir // "."')
+
+if [[ -n "${BUILD_CMD:-}" ]]; then
+  cmd="$BUILD_CMD"
+else
+  project_info=$(detect_project "$project_dir")
+  cmd=$(echo "$project_info" | jq -r '.check // .build // ""')
+fi
+
+if [[ -z "$cmd" || "$cmd" == "null" ]]; then
+  jq -nc '{
+    "build_ok": true,
+    "build_output": "(no build/check command available for this project type)",
+    "_next": "verify_tests"
+  }'
+  exit 0
+fi
+
+exit_code=0
+output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
+
+if (( exit_code == 0 )); then
+  jq -nc \
+    --arg out "$output" \
+    --arg cmd "$cmd" \
+    '{
+      "build_ok": true,
+      "build_output": ("Ran: " + $cmd + "\n\n" + $out),
+      "_next": "verify_tests"
+    }'
+else
+  jq -nc \
+    --arg out "$output" \
+    --arg cmd "$cmd" \
+    --argjson rc "$exit_code" \
+    '{
+      "build_ok": false,
+      "build_output": ("Ran: " + $cmd + "\nExit code: " + ($rc | tostring) + "\n\n" + $out),
+      "_next": "fix_loop_gate"
+    }'
+fi
diff --git a/assets/agents/coder/scripts/verify_tests.sh b/assets/agents/coder/scripts/verify_tests.sh
new file mode 100644
index 0000000..102364e
--- /dev/null
+++ b/assets/agents/coder/scripts/verify_tests.sh
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+set -uo pipefail
+
+# shellcheck disable=SC1091
+source "$(dirname "$0")/../../.shared/utils.sh"
+
+if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
+  state=$(cat "$GRAPH_STATE_FILE")
+elif [[ -n "${GRAPH_STATE:-}" ]]; then
+  state="$GRAPH_STATE"
+else
+  state='{}'
+fi
+
+project_dir=$(echo "$state" | jq -r '.project_dir // "."')
+
+if [[ -n "${TEST_CMD:-}" ]]; then
+  cmd="$TEST_CMD"
+else
+  project_info=$(detect_project "$project_dir")
+  cmd=$(echo "$project_info" | jq -r '.test // ""')
+fi
+
+if [[ -z "$cmd" || "$cmd" == "null" ]]; then
+  jq -nc '{
+    "tests_ok": true,
+    "tests_output": "(no test command available for this project type)",
+    "_next": "end_success"
+  }'
+  exit 0
+fi
+
+exit_code=0
+output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
+
+if (( exit_code == 0 )); then
+  jq -nc \
+    --arg out "$output" \
+    --arg cmd "$cmd" \
+    '{
+      "tests_ok": true,
+      "tests_output": ("Ran: " + $cmd + "\n\n" + $out),
+      "_next": "end_success"
+    }'
+else
+  jq -nc \
+    --arg out "$output" \
+    --arg cmd "$cmd" \
+    --argjson rc "$exit_code" \
+    '{
+      "tests_ok": false,
+      "tests_output": ("Ran: " + $cmd + "\nExit code: " + ($rc | tostring) + "\n\n" + $out),
+      "_next": "fix_loop_gate"
+    }'
+fi
diff --git a/assets/agents/sisyphus/README.md b/assets/agents/sisyphus/README.md
index e59369e..e2a3381 100644
--- a/assets/agents/sisyphus/README.md
+++ b/assets/agents/sisyphus/README.md
@@ -18,16 +18,15 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c
 - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation.
 
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
-Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
-an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
-server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
-them), and modify the agent definition to look like this:
+Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using
+one dramatically improves the performance of coding agents. If you have one, add it to your loki config (see the
+[MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md)) and reference it in this agent's `mcp_servers:` list:
 
 ```yaml
 # ...
 
 mcp_servers:
-  - jetbrains
+  - your-ide-mcp-server
 
 global_tools:
   - fs_read.sh
diff --git a/assets/agents/sisyphus/config.yaml b/assets/agents/sisyphus/config.yaml
index a6cb81a..8bb3821 100644
--- a/assets/agents/sisyphus/config.yaml
+++ b/assets/agents/sisyphus/config.yaml
@@ -119,20 +119,21 @@ instructions: |
   1. todo__init --goal "Add user profiles API endpoint"
   2. todo__add --task "Explore existing API patterns"
   3. todo__add --task "Implement profile endpoint"
-  4. todo__add --task "Verify with build/test"
-  5. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets."
-  6. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets."
-  7. agent__collect --id <id1>
-  8. agent__collect --id <id2>
-  9. todo__done --id 1
-  10. agent__spawn --agent coder --prompt "<structured prompt using Coder Delegation Format above, including code snippets from explore results>"
-  11. agent__collect --id <coder_id>
-  12. todo__done --id 2
-  13. run_build
-  14. run_tests
-  15. todo__done --id 3
+  4. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets."
+  5. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets."
+  6. agent__collect --id <id1>
+  7. agent__collect --id <id2>
+  8. todo__done --id 1
+  9. agent__spawn --agent coder --prompt "<structured prompt using Coder Delegation Format above, including code snippets from explore results>"
+  10. agent__collect --id <coder_id>
+  11. todo__done --id 2
   ```
 
+  Note: the `coder` agent is a graph agent that runs verification (build +
+  tests) and a bounded fix-loop internally. You do NOT need to spawn a
+  separate build/test step. A `CODER_COMPLETE` outcome means build and
+  tests already passed.
+
   ### Example 2: Architecture/design question (explore + oracle in parallel)
 
   User: "How should I structure the authentication for this app?"
@@ -172,6 +173,22 @@ instructions: |
   10. **Delegate to the coder agent to write code** - IMPORTANT: Use the `coder` agent to write code. Do not try to write code yourself except for trivial changes
   11. **Always output a summary of changes when finished** - Make it clear to user's that you've completed your tasks
 
+  ## Coder Outcomes
+
+  The `coder` agent is a graph agent that runs the implement -> verify_build
+  -> verify_tests -> fix_loop pipeline internally. It always returns one of
+  three sentinel outcomes:
+
+  - `CODER_COMPLETE` - implementation succeeded with build + tests green.
+    Continue with any follow-up todos.
+  - `CODER_REJECTED` - user rejected the plan at the approval gate (only
+    triggered for high-complexity plans). Do NOT re-spawn coder blindly;
+    ask the user what to change first.
+  - `CODER_FAILED` - the fix-loop exhausted its budget without producing
+    green build/tests. The failure output includes the last build and tests
+    output. Surface this to the user; consider spawning `oracle` for
+    diagnosis if the failure is unclear.
+
   ## When to Do It Yourself
 
   - Simple command execution