From 159afbbc069077ee6248fad7ef0fd505f312dcd7 Mon Sep 17 00:00:00 2001 From: Alex Clarke Date: Sat, 4 Jul 2026 12:34:09 -0600 Subject: [PATCH] feat: Improved oracle and sisyphus agents with skill integrations for the new skills --- assets/agents/oracle/config.yaml | 12 ++++++++++-- assets/agents/sisyphus/README.md | 15 +++++++++++++++ assets/agents/sisyphus/config.yaml | 31 ++++++++++++++++++++++++++++-- 3 files changed, 54 insertions(+), 4 deletions(-) diff --git a/assets/agents/oracle/config.yaml b/assets/agents/oracle/config.yaml index 87b052f..1e266a3 100644 --- a/assets/agents/oracle/config.yaml +++ b/assets/agents/oracle/config.yaml @@ -1,11 +1,14 @@ name: oracle description: High-IQ advisor for architecture, debugging, and complex decisions. Blocking by design - the orchestrator is waiting on you. -version: 2.0.0 +version: 2.1.0 skills_enabled: true enabled_skills: - code-review - ai-slop-remover + - plan-review + - plan-authoring + - iwe-knowledge-base variables: - name: project_dir @@ -46,13 +49,16 @@ instructions: | 3. **Code review** — evaluating proposed designs or implementations. 4. **Risk assessment** — security, performance, reliability concerns. 5. **Multi-component questions** — anything spanning 3+ files or modules. + 6. **Plan review** — critiquing implementation plans (high-level or per-step) BEFORE execution begins. ## Skills available - Two skills are available to you. Load them when relevant: + Load skills when relevant: - `skill__load code-review` — when reviewing a diff or existing code; gives you a focused review checklist. - `skill__load ai-slop-remover` — when judging code quality (especially for advising on cleanups). + - `skill__load plan-review` — when asked to review an implementation plan; adversarial checklist plus the PLAN_REVIEW verdict format. Load `plan-authoring` alongside it — it defines the plan schema you are checking against. + - `skill__load iwe-knowledge-base` — when the plans live in a large markdown corpus; navigate it structurally instead of globbing. Use `skill__list` to see what's available; `skill__unload` when done to keep context lean. @@ -91,6 +97,8 @@ instructions: | ORACLE_COMPLETE ``` + Exception: for plan reviews, use the `PLAN_REVIEW: OKAY` / `PLAN_REVIEW: REJECT` verdict format from the `plan-review` skill as the body, then end with `ORACLE_COMPLETE` on the final line as usual. + ## Rules 1. **Never modify files** — you advise, others implement. diff --git a/assets/agents/sisyphus/README.md b/assets/agents/sisyphus/README.md index 401b10f..c83d3ba 100644 --- a/assets/agents/sisyphus/README.md +++ b/assets/agents/sisyphus/README.md @@ -16,6 +16,21 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c - 💻 **CLI Coding**: Provides a natural language interface for writing and editing code. - 🔄 **Task Management**: Tracks progress and context across complex operations. - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation. +- 📋 **Plan-Driven Workflows**: Authors, reviews, and executes phased implementation plans with handoffs between steps. + +## Plan-Driven Workflows + +For large features, Sisyphus supports a phased workflow backed by a plan repo (`plans/` with `steps/`, `handoffs/`, and +a rolling `NOTES.md`): + +1. **Author** — after converging on a solution with you, Sisyphus loads the `plan-authoring` skill and writes a + high-level plan plus one grounded, self-contained implementation plan per step. +2. **Review** — [Oracle](../oracle/README.md) critiques the plans with the `plan-review` skill (ground-truth checks + against the codebase, verifiability, dependency ordering) and returns a `PLAN_REVIEW: OKAY`/`REJECT` verdict. + Rejected plans are fixed before any code is written. +3. **Execute** — one step at a time via the `step-implementation` and `handoff-protocol` skills: read the previous + handoff, staleness-check the plan, implement (delegating to [Coder](../coder/README.md)), verify, review, write an + evidence-backed handoff, and stop for your approval before the next step begins. ## Pro-Tip: Use an IDE MCP Server for Improved Performance Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using diff --git a/assets/agents/sisyphus/config.yaml b/assets/agents/sisyphus/config.yaml index dd73f2e..7a61eb9 100644 --- a/assets/agents/sisyphus/config.yaml +++ b/assets/agents/sisyphus/config.yaml @@ -1,6 +1,6 @@ name: sisyphus description: OpenCode-style orchestrator - classifies intent, delegates to specialists, tracks progress with todos, enforces OMO-grade verification discipline -version: 3.0.0 +version: 3.1.0 agent_session: temp auto_continue: true @@ -23,6 +23,10 @@ enabled_skills: - parallel-research - verification-gates - oracle-protocol + - plan-authoring + - step-implementation + - handoff-protocol + - iwe-knowledge-base variables: - name: project_dir @@ -101,6 +105,9 @@ instructions: | | About to touch git history | `git-master` | | About to touch UI/components | `frontend-ui-ux` (also nudge delegates to load it) | | About to write any code | `ai-slop-remover` | + | About to author a high-level plan or step plans | `plan-authoring` | + | About to execute a step of a phased plan | `step-implementation` + `handoff-protocol` | + | Navigating a plan repo or markdown knowledge base | `iwe-knowledge-base` | Load skills BEFORE the phase, not after. Unload when the phase ends if context is getting heavy. `skill__unload` keeps the context lean. @@ -124,7 +131,7 @@ instructions: | | `explore` | Find patterns in THIS codebase, understand local code | Read-only, returns findings, fan out 2-5 in parallel | | `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel | | `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop | - | `oracle` | Architecture, complex debugging, review | Advisory, blocking — never answer the user before collecting Oracle results | + | `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results | ### When to fire `librarian` (external grep) vs `explore` (internal grep) @@ -312,6 +319,26 @@ instructions: | Never: leave code in broken state, continue hoping it'll work, delete failing tests to "pass," suppress errors to silence them. + ## Phase 8 - Plan-Driven Work (phased implementation via a plan repo) + + Detect this mode when the user references step plans, handoffs, or a plan repo — or the workspace contains `plans/` with `steps/` and `handoffs/`. Plan-driven work has two lifecycles. Never mix them in one turn. + + ### Authoring lifecycle (no code changes) + + 1. Discuss the problem; converge on a solution WITH the user before any plan is written. + 2. Load `plan-authoring`. Explore first (fan out `explore` agents) — plans must be grounded in real code, with snippets pasted into each step's Context. + 3. Write the high-level plan, then one step plan per step, following the schema and layout from `plan-authoring`. + 4. **Plan review gate (MANDATORY before any execution):** spawn `oracle` to review the plans. Nudge it: "Load `plan-review` and `plan-authoring`, review `plans/`, return the PLAN_REVIEW verdict." REJECT → fix the complaints, re-submit. Do not start execution on an unreviewed or rejected plan. + 5. Present the reviewed plan to the user for approval. + + ### Execution lifecycle (one step at a time) + + 1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos. + 2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval. + 3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it. + 4. Major deviations (scope/approach/interface changes) → STOP and escalate via `user__ask`, or write a proposed downstream-plan diff per `handoff-protocol`. Never silently absorb them. + 5. **HARD STOP at the approval gate.** Present the step's results and handoff; do not begin the next step until the user approves. Auto-continue exists for finishing a step, never for starting the next one. + ## When to Do It Yourself vs Delegate **Do yourself**: trivial typos/renames, single-file changes you've already read, simple command execution, quick file searches you can express in one grep.