From 8b061b200fc8d0da8f43480d747a7dcc4dfa59e8 Mon Sep 17 00:00:00 2001 From: Alex Clarke Date: Fri, 22 May 2026 12:57:12 -0600 Subject: [PATCH] feat: Improved coder agent that is now a graph-based agent --- assets/agents/coder/README.md | 84 ++++-- assets/agents/coder/config.yaml | 116 -------- assets/agents/coder/graph.yaml | 278 ++++++++++++++++++ assets/agents/coder/scripts/fix_loop_gate.sh | 49 +++ assets/agents/coder/scripts/resolve_paths.sh | 12 + .../agents/coder/scripts/route_complexity.sh | 23 ++ assets/agents/coder/scripts/verify_build.sh | 55 ++++ assets/agents/coder/scripts/verify_tests.sh | 55 ++++ assets/agents/sisyphus/README.md | 9 +- assets/agents/sisyphus/config.yaml | 41 ++- 10 files changed, 568 insertions(+), 154 deletions(-) delete mode 100644 assets/agents/coder/config.yaml create mode 100644 assets/agents/coder/graph.yaml create mode 100644 assets/agents/coder/scripts/fix_loop_gate.sh create mode 100644 assets/agents/coder/scripts/resolve_paths.sh create mode 100644 assets/agents/coder/scripts/route_complexity.sh create mode 100644 assets/agents/coder/scripts/verify_build.sh create mode 100644 assets/agents/coder/scripts/verify_tests.sh diff --git a/assets/agents/coder/README.md b/assets/agents/coder/README.md index dd23a02..51c4477 100644 --- a/assets/agents/coder/README.md +++ b/assets/agents/coder/README.md @@ -1,40 +1,82 @@ # Coder -An AI agent that assists you with your coding tasks. +A graph-based implementation agent. Plans, implements, and runs build + +tests in a bounded fix-loop until verified. Designed to be delegated to by +the **[Sisyphus](../sisyphus/README.md)** agent. -This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent to implement code specifications. Sisyphus -acts as the coordinator/architect, while Coder handles the implementation details. +Coder is a [graph agent](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents): its workflow is +defined declaratively in `graph.yaml`, with verification and the +implement-fix loop enforced as graph edges rather than prose. -## Features +## Workflow -- 🏗️ Intelligent project structure creation and management -- 🖼️ Convert screenshots into clean, functional code -- 📁 Comprehensive file system operations (create folders, files, read/write files) -- 🧐 Advanced code analysis and improvement suggestions -- 📊 Precise diff-based file editing for controlled code modifications +``` +analyze_request (llm + output_schema) plan + complexity extraction + ↓ +route_complexity (script) opt-out approval gate (complexity ≥ 7) + ↓ +gate_approval (approval, optional) + ↓ +implement (llm + fs tools) actual file edits + ↓ +verify_build (script) + ↓ +verify_tests (script) + ↓ +fix_loop_gate (script) back-edge to implement (bounded) + ↓ +end_success / end_rejected / end_failure +``` -It can also be used as a standalone tool for direct coding assistance. +End nodes emit one of three sentinel outcomes for the caller: -## Pro-Tip: Use an IDE MCP Server for Improved Performance -Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using -an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP -server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure -them), and modify the agent definition to look like this: +- `CODER_COMPLETE` — build and tests passed. +- `CODER_REJECTED` — user rejected the plan at the approval gate. +- `CODER_FAILED` — fix-loop exhausted; build/tests still failing. + +## Tuning + +The agent's `project_dir` is exposed via the standard `variables:` block, +so it accepts the runtime override flag: + +```sh +# Invoke from inside the project (project_dir defaults to ".") +cd /path/to/your/project +loki -a coder "Add a foo() function..." + +# Or invoke from anywhere with an explicit override +loki -a coder --agent-variable project_dir /path/to/your/project "Add..." +``` + +`graph.yaml` `initial_state` exposes: + +- `max_fix_attempts` (default `3`) — fix-loop budget before `end_failure`. + +Environment overrides honored by the script nodes: + +- `BUILD_CMD` — skip project-type detection for the build/check command. +- `TEST_CMD` — skip detection for tests. +- `CODER_AUTOAPPROVE=1` — bypass the approval gate (for non-interactive runs + where complexity might trip the gate). + +## Pro-Tip: IDE MCP Server + +Modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers +that let LLMs use IDE tools directly. To wire one in, edit `graph.yaml`: ```yaml -# ... - mcp_servers: - - jetbrains # The name of your configured IDE MCP server + - your-ide-mcp-server global_tools: - # Keep useful read-only tools for reading files in other non-project directories + # Keep read-only fs tools for files outside the IDE project - fs_read.sh - fs_grep.sh - fs_glob.sh # - fs_write.sh # - fs_patch.sh - execute_command.sh +``` -# ... -``` \ No newline at end of file +Then add the MCP server's write/patch tools to the `implement` node's +`tools:` whitelist. diff --git a/assets/agents/coder/config.yaml b/assets/agents/coder/config.yaml deleted file mode 100644 index 3f06dee..0000000 --- a/assets/agents/coder/config.yaml +++ /dev/null @@ -1,116 +0,0 @@ -name: coder -description: Implementation agent - writes code, follows patterns, verifies with builds -version: 1.0.0 -temperature: 0.1 - -auto_continue: true -max_auto_continues: 15 -inject_todo_instructions: true - -variables: - - name: project_dir - description: Project directory to work in - default: '.' - - name: auto_confirm - description: Auto-confirm command execution - default: '1' - -global_tools: - - fs_read.sh - - fs_grep.sh - - fs_glob.sh - - fs_write.sh - - fs_patch.sh - - execute_command.sh - -instructions: | - You are a senior engineer. You write code that works on the first try. - - ## Your Mission - - Given an implementation task: - 1. Check for orchestrator context first (see below) - 2. Fill gaps only. Read files NOT already covered in context - 3. Write the code (using tools, NOT chat output) - 4. Verify it compiles/builds - 5. Signal completion with a summary - - ## Using Orchestrator Context (IMPORTANT) - - When spawned by sisyphus, your prompt will often contain a `` block - with prior findings: file paths, code patterns, and conventions discovered by - explore agents. - - **If context is provided:** - 1. Use it as your primary reference. Don't re-read files already summarized - 2. Follow the code patterns shown. Snippets in context ARE the style guide - 3. Read the referenced files ONLY IF you need more detail (e.g. full function - signature, import list, or adjacent code not included in the snippet) - 4. If context includes a "Conventions" section, follow it exactly - - **If context is NOT provided or is too vague to act on:** - Fall back to self-exploration: grep for similar files, read 1-2 examples, - match their style. - - **Never ignore provided context.** It represents work already done upstream. - - ## Todo System - - For multi-file changes: - 1. `todo__init` with the implementation goal - 2. `todo__add` for each file to create/modify - 3. Implement each, calling `todo__done` immediately after - - ## Writing Code - 1. **Use fs_patch for surgical edits** - `fs_patch --path "src/main.rs" --contents ""` applies targeted changes without rewriting the whole file - 2. **use fs_write for full file writes** - `fs_write --path "src/main.rs" --contents "= max_fix_attempts )); then + jq -nc \ + --argjson n "$fix_attempts" \ + '{ + "fix_attempts": $n, + "_next": "end_failure" + }' + exit 0 +fi + +next_attempts=$((fix_attempts + 1)) + +if [[ "$build_ok" != "true" ]]; then + fix_instructions=$(printf '## Fix loop status (attempt %d of %d)\n\nThe previous attempt failed the build.\n\nBuild output:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor.' \ + "$next_attempts" "$max_fix_attempts" "$build_output") +elif [[ "$tests_ok" != "true" ]]; then + fix_instructions=$(printf '## Fix loop status (attempt %d of %d)\n\nBuild passed but tests failed.\n\nTest output:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor.' \ + "$next_attempts" "$max_fix_attempts" "$tests_output") +else + fix_instructions=$(printf '## Fix loop status (attempt %d of %d)\n\nfix_loop_gate was reached but no failure was detected in state. Re-run the verification step.' \ + "$next_attempts" "$max_fix_attempts") +fi + +jq -nc \ + --argjson n "$next_attempts" \ + --arg fi "$fix_instructions" \ + '{ + "fix_attempts": $n, + "fix_instructions": $fi, + "_next": "implement" + }' diff --git a/assets/agents/coder/scripts/resolve_paths.sh b/assets/agents/coder/scripts/resolve_paths.sh new file mode 100644 index 0000000..2c9433c --- /dev/null +++ b/assets/agents/coder/scripts/resolve_paths.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash +set -euo pipefail + +project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}" +resolved=$(cd "$project_dir" 2>/dev/null && pwd) || resolved="$project_dir" + +jq -nc \ + --arg pd "$resolved" \ + '{ + "project_dir": $pd, + "_next": "analyze_request" + }' diff --git a/assets/agents/coder/scripts/route_complexity.sh b/assets/agents/coder/scripts/route_complexity.sh new file mode 100644 index 0000000..d45ca91 --- /dev/null +++ b/assets/agents/coder/scripts/route_complexity.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash +set -euo pipefail + +if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then + state=$(cat "$GRAPH_STATE_FILE") +elif [[ -n "${GRAPH_STATE:-}" ]]; then + state="$GRAPH_STATE" +else + state='{}' +fi + +complexity=$(echo "$state" | jq -r '.complexity_score // 0') + +if [[ "${CODER_AUTOAPPROVE:-0}" == "1" ]]; then + jq -nc '{"_next": "implement"}' + exit 0 +fi + +if (( complexity >= 7 )); then + jq -nc '{"_next": "gate_approval"}' +else + jq -nc '{"_next": "implement"}' +fi diff --git a/assets/agents/coder/scripts/verify_build.sh b/assets/agents/coder/scripts/verify_build.sh new file mode 100644 index 0000000..f9b9d65 --- /dev/null +++ b/assets/agents/coder/scripts/verify_build.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +set -uo pipefail + +# shellcheck disable=SC1091 +source "$(dirname "$0")/../../.shared/utils.sh" + +if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then + state=$(cat "$GRAPH_STATE_FILE") +elif [[ -n "${GRAPH_STATE:-}" ]]; then + state="$GRAPH_STATE" +else + state='{}' +fi + +project_dir=$(echo "$state" | jq -r '.project_dir // "."') + +if [[ -n "${BUILD_CMD:-}" ]]; then + cmd="$BUILD_CMD" +else + project_info=$(detect_project "$project_dir") + cmd=$(echo "$project_info" | jq -r '.check // .build // ""') +fi + +if [[ -z "$cmd" || "$cmd" == "null" ]]; then + jq -nc '{ + "build_ok": true, + "build_output": "(no build/check command available for this project type)", + "_next": "verify_tests" + }' + exit 0 +fi + +exit_code=0 +output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$? + +if (( exit_code == 0 )); then + jq -nc \ + --arg out "$output" \ + --arg cmd "$cmd" \ + '{ + "build_ok": true, + "build_output": ("Ran: " + $cmd + "\n\n" + $out), + "_next": "verify_tests" + }' +else + jq -nc \ + --arg out "$output" \ + --arg cmd "$cmd" \ + --argjson rc "$exit_code" \ + '{ + "build_ok": false, + "build_output": ("Ran: " + $cmd + "\nExit code: " + ($rc | tostring) + "\n\n" + $out), + "_next": "fix_loop_gate" + }' +fi diff --git a/assets/agents/coder/scripts/verify_tests.sh b/assets/agents/coder/scripts/verify_tests.sh new file mode 100644 index 0000000..102364e --- /dev/null +++ b/assets/agents/coder/scripts/verify_tests.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +set -uo pipefail + +# shellcheck disable=SC1091 +source "$(dirname "$0")/../../.shared/utils.sh" + +if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then + state=$(cat "$GRAPH_STATE_FILE") +elif [[ -n "${GRAPH_STATE:-}" ]]; then + state="$GRAPH_STATE" +else + state='{}' +fi + +project_dir=$(echo "$state" | jq -r '.project_dir // "."') + +if [[ -n "${TEST_CMD:-}" ]]; then + cmd="$TEST_CMD" +else + project_info=$(detect_project "$project_dir") + cmd=$(echo "$project_info" | jq -r '.test // ""') +fi + +if [[ -z "$cmd" || "$cmd" == "null" ]]; then + jq -nc '{ + "tests_ok": true, + "tests_output": "(no test command available for this project type)", + "_next": "end_success" + }' + exit 0 +fi + +exit_code=0 +output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$? + +if (( exit_code == 0 )); then + jq -nc \ + --arg out "$output" \ + --arg cmd "$cmd" \ + '{ + "tests_ok": true, + "tests_output": ("Ran: " + $cmd + "\n\n" + $out), + "_next": "end_success" + }' +else + jq -nc \ + --arg out "$output" \ + --arg cmd "$cmd" \ + --argjson rc "$exit_code" \ + '{ + "tests_ok": false, + "tests_output": ("Ran: " + $cmd + "\nExit code: " + ($rc | tostring) + "\n\n" + $out), + "_next": "fix_loop_gate" + }' +fi diff --git a/assets/agents/sisyphus/README.md b/assets/agents/sisyphus/README.md index e59369e..e2a3381 100644 --- a/assets/agents/sisyphus/README.md +++ b/assets/agents/sisyphus/README.md @@ -18,16 +18,15 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation. ## Pro-Tip: Use an IDE MCP Server for Improved Performance -Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using -an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP -server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure -them), and modify the agent definition to look like this: +Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using +one dramatically improves the performance of coding agents. If you have one, add it to your loki config (see the +[MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md)) and reference it in this agent's `mcp_servers:` list: ```yaml # ... mcp_servers: - - jetbrains + - your-ide-mcp-server global_tools: - fs_read.sh diff --git a/assets/agents/sisyphus/config.yaml b/assets/agents/sisyphus/config.yaml index a6cb81a..8bb3821 100644 --- a/assets/agents/sisyphus/config.yaml +++ b/assets/agents/sisyphus/config.yaml @@ -119,20 +119,21 @@ instructions: | 1. todo__init --goal "Add user profiles API endpoint" 2. todo__add --task "Explore existing API patterns" 3. todo__add --task "Implement profile endpoint" - 4. todo__add --task "Verify with build/test" - 5. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets." - 6. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets." - 7. agent__collect --id - 8. agent__collect --id - 9. todo__done --id 1 - 10. agent__spawn --agent coder --prompt "" - 11. agent__collect --id - 12. todo__done --id 2 - 13. run_build - 14. run_tests - 15. todo__done --id 3 + 4. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets." + 5. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets." + 6. agent__collect --id + 7. agent__collect --id + 8. todo__done --id 1 + 9. agent__spawn --agent coder --prompt "" + 10. agent__collect --id + 11. todo__done --id 2 ``` + Note: the `coder` agent is a graph agent that runs verification (build + + tests) and a bounded fix-loop internally. You do NOT need to spawn a + separate build/test step. A `CODER_COMPLETE` outcome means build and + tests already passed. + ### Example 2: Architecture/design question (explore + oracle in parallel) User: "How should I structure the authentication for this app?" @@ -172,6 +173,22 @@ instructions: | 10. **Delegate to the coder agent to write code** - IMPORTANT: Use the `coder` agent to write code. Do not try to write code yourself except for trivial changes 11. **Always output a summary of changes when finished** - Make it clear to user's that you've completed your tasks + ## Coder Outcomes + + The `coder` agent is a graph agent that runs the implement -> verify_build + -> verify_tests -> fix_loop pipeline internally. It always returns one of + three sentinel outcomes: + + - `CODER_COMPLETE` - implementation succeeded with build + tests green. + Continue with any follow-up todos. + - `CODER_REJECTED` - user rejected the plan at the approval gate (only + triggered for high-complexity plans). Do NOT re-spawn coder blindly; + ask the user what to change first. + - `CODER_FAILED` - the fix-loop exhausted its budget without producing + green build/tests. The failure output includes the last build and tests + output. Surface this to the user; consider spawning `oracle` for + diagnosis if the failure is unclear. + ## When to Do It Yourself - Simple command execution