feat: Improved coder agent that is now a graph-based agent

2026-05-22 12:57:12 -06:00
parent 5370637274
commit dacccbfcf7
10 changed files with 568 additions and 154 deletions
@@ -18,16 +18,15 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c
 - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation.

 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
-Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
-an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
-server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
-them), and modify the agent definition to look like this:
+Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using
+one dramatically improves the performance of coding agents. If you have one, add it to your loki config (see the
+[MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md)) and reference it in this agent's `mcp_servers:` list:

 ```yaml
 # ...

 mcp_servers:
-  - jetbrains
+  - your-ide-mcp-server

 global_tools:
  - fs_read.sh
@@ -119,20 +119,21 @@ instructions: |
  1. todo__init --goal "Add user profiles API endpoint"
  2. todo__add --task "Explore existing API patterns"
  3. todo__add --task "Implement profile endpoint"
-  4. todo__add --task "Verify with build/test"
-  5. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets."
-  6. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets."
-  7. agent__collect --id <id1>
-  8. agent__collect --id <id2>
-  9. todo__done --id 1
-  10. agent__spawn --agent coder --prompt "<structured prompt using Coder Delegation Format above, including code snippets from explore results>"
-  11. agent__collect --id <coder_id>
-  12. todo__done --id 2
-  13. run_build
-  14. run_tests
-  15. todo__done --id 3
+  4. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets."
+  5. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets."
+  6. agent__collect --id <id1>
+  7. agent__collect --id <id2>
+  8. todo__done --id 1
+  9. agent__spawn --agent coder --prompt "<structured prompt using Coder Delegation Format above, including code snippets from explore results>"
+  10. agent__collect --id <coder_id>
+  11. todo__done --id 2
  ```

+  Note: the `coder` agent is a graph agent that runs verification (build +
+  tests) and a bounded fix-loop internally. You do NOT need to spawn a
+  separate build/test step. A `CODER_COMPLETE` outcome means build and
+  tests already passed.
+
  ### Example 2: Architecture/design question (explore + oracle in parallel)

  User: "How should I structure the authentication for this app?"
@@ -172,6 +173,22 @@ instructions: |
  10. **Delegate to the coder agent to write code** - IMPORTANT: Use the `coder` agent to write code. Do not try to write code yourself except for trivial changes
  11. **Always output a summary of changes when finished** - Make it clear to user's that you've completed your tasks

+  ## Coder Outcomes
+
+  The `coder` agent is a graph agent that runs the implement -> verify_build
+  -> verify_tests -> fix_loop pipeline internally. It always returns one of
+  three sentinel outcomes:
+
+  - `CODER_COMPLETE` - implementation succeeded with build + tests green.
+    Continue with any follow-up todos.
+  - `CODER_REJECTED` - user rejected the plan at the approval gate (only
+    triggered for high-complexity plans). Do NOT re-spawn coder blindly;
+    ask the user what to change first.
+  - `CODER_FAILED` - the fix-loop exhausted its budget without producing
+    green build/tests. The failure output includes the last build and tests
+    output. Surface this to the user; consider spawning `oracle` for
+    diagnosis if the failure is unclear.
+
  ## When to Do It Yourself

  - Simple command execution