build: Installed ast_grep in the sandbox kit definition

docs: Updated the README to mention the installation of ast-grep
docs: added the ast_grep tool to the config.example.yaml
2026-07-04 13:14:22 -06:00 · 2026-07-04 13:14:08 -06:00 · 2026-07-04 13:13:16 -06:00 · 2026-07-04 13:02:50 -06:00 · 2026-07-04 12:59:05 -06:00 · 2026-07-04 12:50:37 -06:00
41 changed files with 3006 additions and 110 deletions
@@ -59,6 +59,14 @@ Coyote requires the following tools to be installed on your system:
 * [docker](https://docs.docker.com/engine/install/)
 * [uv](https://docs.astral.sh/uv/getting-started/installation/)
    * `curl -LsSf https://astral.sh/uv/install.sh | sh`
 * [iwe](https://github.com/iwe-org/iwe) (`iwec`, for the built-in `iwe` MCP server that navigates large markdown knowledgebases)
    * **Homebrew:** `brew tap iwe-org/iwe && brew install iwe`
    * **Cargo:** `cargo install iwec`
 * [ast-grep](https://ast-grep.github.io/) (for the built-in `ast_grep` structural code search tool, used by the `explore` agent)
    * **Homebrew:** `brew install ast-grep`
    * **Cargo:** `cargo install ast-grep --locked`
    * **npm:** `npm i -g @ast-grep/cli`
    * Optional: if `ast-grep` is not installed, the `ast_grep` tool reports it and agents fall back to `fs_grep`
 These tools are used to provide various functionalities within Coyote, such as document processing, JSON manipulation,
 etc., and they are used within agents and tools.
@@ -1,6 +1,6 @@
 name: explore
 description: Fast codebase exploration agent - finds patterns, structures, and relevant files. Designed to be fanned out 2-5 in parallel by orchestrators.
-version: 3.0.0
+version: 3.1.0
 skills_enabled: true
 enabled_skills:
@@ -19,6 +19,7 @@ global_tools:
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - ast_grep.sh
 instructions: |
  You are a codebase explorer. Your job: Search, find, report. Nothing else.
@@ -49,6 +50,8 @@ instructions: |
  4. **Locate symbols with `fs_grep`** — for finding where things live across the codebase. `fs_grep --pattern "fn handle_request" --include "*.rs"` is faster than reading files.
  4b. **Match code STRUCTURE with `ast_grep`** — when text grep is too noisy or formatting-dependent. It matches syntax trees: `ast_grep --pattern '$X.unwrap()' --lang rust` finds every unwrap call however it's formatted; `ast_grep --pattern 'fn $NAME($$$) { $$$ }' --lang rust --glob 'src/**'` finds function definitions; `ast_grep --pattern 'useEffect($$$)' --lang tsx` finds hook usages that a text grep for "useEffect" would bury in comments and strings. Meta-variables: `$NAME` = one AST node, `$$$` = zero or more. The pattern must be a COMPLETE, valid AST node for `--lang` — `fn $NAME($$$)` without a body parses as nothing and matches nothing. Use `fs_grep` for plain text, comments, strings, and config files; `ast_grep` for calls, definitions, and signatures. If ast-grep isn't installed the tool says so — fall back to fs_grep.
  5. **Read targeted sections with `fs_read --offset/--limit`** — `fs_read --path "src/main.rs" --offset 50 --limit 30` reads lines 50-79 only. `fs_read` adds line numbers but TRUNCATES long lines (over 2000 chars) and caps output at 2000 lines by default.
  6. **Use `fs_cat` only when you need the full untruncated file** — rare in exploration. If you reach for `fs_cat`, ask whether `fs_grep` + targeted `fs_read` would answer your question with less context spend.
@@ -59,6 +62,7 @@ instructions: |
  - `fs_grep --pattern "struct User" --include "*.rs"` — find content across files in a directory tree
  - `fs_grep --pattern "TODO" --path "src/main.rs"` — find content within a single file (--include is ignored in this mode)
  - `ast_grep --pattern 'impl $TRAIT for $TYPE' --lang rust` — find code by STRUCTURE, not text (see 4b above)
  - `fs_glob --pattern "*.rs" --path src/` — find files by name pattern
  - `fs_read --path "src/main.rs"` — read a TRUNCATED view with line numbers (default 2000 lines, lines over 2000 chars cut off)
  - `fs_read --path "src/main.rs" --offset 100 --limit 50` — read lines 100-149 only (line numbers; truncation rules still apply)
@@ -1,11 +1,14 @@
 name: oracle
 description: High-IQ advisor for architecture, debugging, and complex decisions. Blocking by design - the orchestrator is waiting on you.
-version: 2.0.0
+version: 2.1.0
 skills_enabled: true
 enabled_skills:
  - code-review
  - ai-slop-remover
  - plan-review
  - plan-authoring
  - iwe-knowledge-base
 variables:
  - name: project_dir
@@ -46,13 +49,16 @@ instructions: |
  3. **Code review** — evaluating proposed designs or implementations.
  4. **Risk assessment** — security, performance, reliability concerns.
  5. **Multi-component questions** — anything spanning 3+ files or modules.
  6. **Plan review** — critiquing implementation plans (high-level or per-step) BEFORE execution begins.
  ## Skills available
-  Two skills are available to you. Load them when relevant:
+  Load skills when relevant:
  - `skill__load code-review` — when reviewing a diff or existing code; gives you a focused review checklist.
  - `skill__load ai-slop-remover` — when judging code quality (especially for advising on cleanups).
  - `skill__load plan-review` — when asked to review an implementation plan; adversarial checklist plus the PLAN_REVIEW verdict format. Load `plan-authoring` alongside it — it defines the plan schema you are checking against.
  - `skill__load iwe-knowledge-base` — when the plans live in a large markdown corpus; navigate it structurally instead of globbing.
  Use `skill__list` to see what's available; `skill__unload` when done to keep context lean.
@@ -91,6 +97,8 @@ instructions: |
  ORACLE_COMPLETE
  ```
  Exception: for plan reviews, use the `PLAN_REVIEW: OKAY` / `PLAN_REVIEW: REJECT` verdict format from the `plan-review` skill as the body, then end with `ORACLE_COMPLETE` on the final line as usual.
  ## Rules
  1. **Never modify files** — you advise, others implement.
@@ -16,6 +16,21 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c
 - 💻 **CLI Coding**: Provides a natural language interface for writing and editing code.
 - 🔄 **Task Management**: Tracks progress and context across complex operations.
 - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation.
 - 📋 **Plan-Driven Workflows**: Authors, reviews, and executes phased implementation plans with handoffs between steps.
 ## Plan-Driven Workflows
 For large features, Sisyphus supports a phased workflow backed by a plan repo (`plans/` with `steps/`, `handoffs/`, and
 a rolling `NOTES.md`):
 1. **Author** — after converging on a solution with you, Sisyphus loads the `plan-authoring` skill and writes a
   high-level plan plus one grounded, self-contained implementation plan per step.
 2. **Review** — [Oracle](../oracle/README.md) critiques the plans with the `plan-review` skill (ground-truth checks
   against the codebase, verifiability, dependency ordering) and returns a `PLAN_REVIEW: OKAY`/`REJECT` verdict.
   Rejected plans are fixed before any code is written.
 3. **Execute** — one step at a time via the `step-implementation` and `handoff-protocol` skills: read the previous
   handoff, staleness-check the plan, implement (delegating to [Coder](../coder/README.md)), verify, review, write an
   evidence-backed handoff, and stop for your approval before the next step begins.
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using
@@ -1,6 +1,6 @@
 name: sisyphus
 description: OpenCode-style orchestrator - classifies intent, delegates to specialists, tracks progress with todos, enforces OMO-grade verification discipline
-version: 3.0.0
+version: 3.2.0
 agent_session: temp
 auto_continue: true
@@ -23,6 +23,10 @@ enabled_skills:
  - parallel-research
  - verification-gates
  - oracle-protocol
  - plan-authoring
  - step-implementation
  - handoff-protocol
  - iwe-knowledge-base
 variables:
  - name: project_dir
@@ -101,6 +105,9 @@ instructions: |
  | About to touch git history | `git-master` |
  | About to touch UI/components | `frontend-ui-ux` (also nudge delegates to load it) |
  | About to write any code | `ai-slop-remover` |
  | About to author a high-level plan or step plans | `plan-authoring` |
  | About to execute a step of a phased plan | `step-implementation` + `handoff-protocol` |
  | Navigating a plan repo or markdown knowledge base | `iwe-knowledge-base` |
  Load skills BEFORE the phase, not after. Unload when the phase ends if context is getting heavy. `skill__unload` keeps the context lean.
@@ -124,7 +131,8 @@ instructions: |
  | `explore` | Find patterns in THIS codebase, understand local code | Read-only, returns findings, fan out 2-5 in parallel |
  | `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
  | `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
-  | `oracle` | Architecture, complex debugging, review | Advisory, blocking — never answer the user before collecting Oracle results |
+  | `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results |
  | `step-runner` | Execute ONE step of a phased plan repo (Phase 8) | Graph agent: orient → staleness check → coder → verify → handoff → user approval gate |
  ### When to fire `librarian` (external grep) vs `explore` (internal grep)
@@ -312,6 +320,47 @@ instructions: |
  Never: leave code in broken state, continue hoping it'll work, delete failing tests to "pass," suppress errors to silence them.
  ## Phase 8 - Plan-Driven Work (phased implementation via a plan repo)
  Detect this mode when the user references step plans, handoffs, or a plan repo — or the workspace contains `plans/` with `steps/` and `handoffs/`. Plan-driven work has two lifecycles. Never mix them in one turn.
  ### Authoring lifecycle (no code changes)
  1. Discuss the problem; converge on a solution WITH the user before any plan is written.
  2. Load `plan-authoring`. Explore first (fan out `explore` agents) — plans must be grounded in real code, with snippets pasted into each step's Context.
  3. Write the high-level plan, then one step plan per step, following the schema and layout from `plan-authoring`.
  4. **Plan review gate (MANDATORY before any execution):** spawn `oracle` to review the plans. Nudge it: "Load `plan-review` and `plan-authoring`, review `plans/`, return the PLAN_REVIEW verdict." REJECT → fix the complaints, re-submit. Do not start execution on an unreviewed or rejected plan.
  5. Present the reviewed plan to the user for approval.
  ### Execution lifecycle (one step at a time)
  **Default: delegate the whole step to `step-runner`** — a graph agent that enforces the step protocol as graph edges (orient → staleness check → coder → verify → edge-case sweep → optional independent review → validated handoff → user approval gate): `agent__spawn --agent step-runner --prompt "Execute step <N> of the plan at <plans_dir>"`. It returns `STEP_COMPLETE` / `STEP_BLOCKED` / `STEP_REJECTED` / `STEP_FAILED`. Relay its escalations (deviation gate, approval gate) promptly. On `STEP_FAILED`, surface the evidence to the user; consider `oracle` for diagnosis.
  Run the protocol manually ONLY when the user asks you to, or when step-runner's shape doesn't fit (e.g. a docs-only step with nothing to build). Then:
  1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos.
  2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval.
  3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it.
  4. Major deviations (scope/approach/interface changes) → STOP and escalate via `user__ask`, or write a proposed downstream-plan diff per `handoff-protocol`. Never silently absorb them.
  5. **HARD STOP at the approval gate.** Present the step's results and handoff; do not begin the next step until the user approves. Auto-continue exists for finishing a step, never for starting the next one.
  ## Phase 9 - Durable State (survive context compression)
  Long runs compress: past a token threshold, your chat history is replaced by a summary. Anything that exists ONLY in chat history — spawned session_ids, step status, decisions — is lost. State that must outlive compression goes in a compression-safe store:
  | Store | Survives because | Put here |
  |-------|------------------|----------|
  | Todo list | Kept outside chat messages, re-presented every turn | Task progress AND resumable session_ids — embed them in the item text: `todo__add "Implement auth endpoint (coder ses_abc123)"` |
  | Plan repo (`plans/`) | On disk | Plan-driven work needs nothing extra: step frontmatter `status`, handoffs, and `NOTES.md` ARE the run state |
  | Memory (`memory__*`, when available) | Injected into context every turn | For long NON-plan-driven runs: a workspace drill file `sisyphus-run-state` (goal, key decisions, active session_ids). Set `expires` to tomorrow; delete it when the run completes |
  Rules:
  1. **Session_ids you may need to resume are never chat-only.** Record them in the todo item for that work the moment the spawn returns. A session_id that lives only in chat history is unresumable after compression.
  2. **Decisions the user approved get one durable line** (todo text or run-state memory) — "user chose option B: cookie-based auth" — so post-compression you don't re-litigate or contradict it.
  3. **Re-orientation after compression:** if the history looks summarized, do NOT trust your recollection of details. Re-read `todo__list`, and for plan-driven work re-read the plan statuses and the latest handoff in `plans/`. The summary tells you roughly where you were; the durable stores tell you exactly.
  4. Do not hoard: run state is not knowledge. Never bloat `MEMORY.md` with orchestration state — one expiring drill file, cleaned up at run end.
  ## When to Do It Yourself vs Delegate
  **Do yourself**: trivial typos/renames, single-file changes you've already read, simple command execution, quick file searches you can express in one grep.
@@ -0,0 +1,93 @@
 # Step-Runner
 A graph-based agent that executes **one step** of a phased implementation
 plan, with the step protocol from the `step-implementation` skill enforced
 as graph edges rather than prose. Designed to be delegated to by
 **[Sisyphus](../sisyphus/README.md)**; delegates implementation to
 **[Coder](../coder/README.md)** and independent review to
 **[code-reviewer](../code-reviewer/README.md)**.
 It expects a plan repo authored per the `plan-authoring` skill:
 ```
 plans/
  steps/NN-<slug>.md    # step plans with frontmatter (step/title/depends_on/status)
  handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
  NOTES.md              # rolling durable facts
 ```
 ## Workflow
 ```
 resolve_step (script)         locate plan + previous handoff, check depends_on,
        ↓                     mark plan in-progress   [→ gate_blocked if deps unsatisfied]
 orient (llm, read-only)       merge handoff directives + staleness-check the plan
        ↓
 route_staleness (script)      major deviation → gate_deviation (approval)
        ↓
 implement (agent → coder)     coder runs its own build/test/self-review fix-loop
        ↓
 route_coder_result (script)   COMPLETE → verify | REJECTED / FAILED → end
        ↓
 verify_format_lint (script)   format BEFORE evidence, then lint
 verify_build (script)         step-level build/typecheck
 verify_tests (script)         FULL test suite
        ↓                     [failures → fix_loop_gate, back-edge to implement]
 edge_case_sweep (llm)         missed edge cases; annotate downstream plans
        ↓                     (Edge cases sections ONLY - scope changes become proposals)
 route_sweep (script)          5+ files or architectural boundary → independent_review
 independent_review (agent)    code-reviewer; 🔴 findings loop back to implement (bounded)
        ↓
 write_handoff (llm)           evidence-backed handoff per handoff-protocol + NOTES.md
 check_handoff (script)        deterministic schema gate; marks plan status complete
        ↓
 gate_user_review (approval)   HARD STOP - approve, or send revision comments
        ↓                     (revisions loop through implement → verify → handoff again)
 end_success / end_blocked / end_rejected / end_failure
 ```
 End nodes emit sentinel outcomes for the caller:
 - `STEP_COMPLETE` — step implemented, verified, handoff written, user approved.
 - `STEP_BLOCKED` — `depends_on` unsatisfied and the user declined to proceed.
 - `STEP_REJECTED` — user aborted at the deviation gate, or the coder's plan
  was rejected at its approval gate.
 - `STEP_FAILED` — coder failed, the step-level fix budget was exhausted, or
  the handoff failed validation twice.
 ## Usage
 ```sh
 # From the project root: run the next in-progress/pending step
 coyote -a step-runner "Execute the next step"
 # A specific step (also parsed from the prompt: "execute step 3")
 coyote -a step-runner --agent-variable step 3 "Execute step 3"
 # Plan repo somewhere else
 coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"
 ```
 **Invoke from the project root.** The coder sub-agent resolves its own
 `project_dir` from the invocation directory; overriding `project_dir` here
 does not propagate to the spawned coder.
 ## Tuning
 `graph.yaml` `initial_state` exposes:
 - `max_fix_attempts` (default `2`) — step-level fix budget (the coder has
  its own internal budget of 3).
 - `max_review_attempts` (default `1`) — bounded 🔴-finding fix loops after
  independent review.
 Environment overrides honored by the script nodes:
 - `FORMAT_CMD` / `LINT_CMD` — formatting and linting (otherwise a per-type
  heuristic formats, and linting defers to the build/check command).
 - `BUILD_CMD` / `TEST_CMD` — skip project-type detection (same as coder).
 - `STEP_AUTOAPPROVE=1` — bypass the deviation gate (non-interactive runs).
 - `STEP_SKIP_REVIEW=1` — never spawn the independent reviewer.
 The final user approval gate is never bypassed by an environment variable -
 it is the point of the workflow.
@@ -0,0 +1,599 @@
 name: step-runner
 description: |
  Executes ONE step of a phased implementation plan (plans/ repo) with the
  step protocol enforced as graph edges: orient -> staleness check ->
  implement (coder) -> verify -> edge-case sweep -> optional independent
  review -> evidence-backed handoff -> user approval gate. Designed to be
  delegated to by sisyphus.
 version: "1.0"
 global_tools:
  - fs_cat.sh
  - fs_ls.sh
  - fs_write.sh
  - fs_patch.sh
  - execute_command.sh
 skills_enabled: true
 enabled_skills:
  - step-implementation
  - handoff-protocol
  - code-review
  - ai-slop-remover
 variables:
  - name: project_dir
    description: |
      Absolute path to the project directory. Defaults to "." (the directory
      coyote was invoked from). The coder sub-agent resolves its own
      project_dir the same way, so invoke step-runner FROM the project root
      unless you override this for both.
    default: "."
  - name: plans_dir
    description: |
      Path to the plan repo. Relative paths resolve against project_dir.
      Expected layout: <plans_dir>/steps/NN-<slug>.md,
      <plans_dir>/handoffs/, <plans_dir>/NOTES.md.
    default: "plans"
  - name: step
    description: |
      Which step to execute: a step number, or "next" to pick the first
      in-progress (resume) or pending step plan.
    default: "next"
 settings:
  max_loop_iterations: 20
  log_state_snapshots: true
  validate_before_run: true
  timeout: 7200
 initial_state:
  project_dir: ""
  plans_dir: ""
  step_number: 0
  step_slug: ""
  step_title: ""
  step_plan_path: ""
  step_plan: ""
  prev_handoff_path: "(none)"
  prev_handoff: "(none - this is the first step)"
  notes_path: ""
  notes: "(none)"
  handoff_path: ""
  blocking_reason: ""
  plan_summary: ""
  implementation_brief: ""
  staleness_report: ""
  has_major_deviation: false
  deviation_summary: ""
  user_feedback: ""
  fix_instructions: ""
  fix_attempts: 0
  max_fix_attempts: 2
  coder_result: ""
  format_output: ""
  lint_ok: true
  lint_output: ""
  build_ok: true
  build_output: ""
  tests_ok: true
  tests_output: ""
  edge_case_report: ""
  downstream_updates: ""
  needs_independent_review: false
  review_report: ""
  review_attempts: 0
  max_review_attempts: 1
  handoff_attempts: 0
  handoff_fix: ""
  step_summary: ""
 start: resolve_step
 nodes:
  resolve_step:
    id: resolve_step
    type: script
    description: |
      Locate the step plan, previous handoff, and NOTES.md; parse frontmatter;
      check depends_on satisfaction against existing handoffs; mark the plan
      in-progress. Routes to gate_blocked when dependencies are unsatisfied.
    script: scripts/resolve_step.sh
    timeout: 30
    fallback: end_failure
    next: orient
  gate_blocked:
    id: gate_blocked
    type: approval
    description: Escalate unsatisfied dependencies instead of building on missing ground.
    question: |
      Step {{step_number}} ({{step_title}}) is BLOCKED:
      {{blocking_reason}}
      Proceed anyway?
    options:
      - "yes"
      - "no"
    routes:
      "yes": orient
      "no": end_blocked
    on_other: end_blocked
  orient:
    id: orient
    type: llm
    description: |
      Read-only orientation and staleness check: merge the previous handoff's
      directives with the step plan, then verify the plan's assumptions
      against the CURRENT codebase before any edit.
    skills_enabled: true
    enabled_skills:
      - step-implementation
    instructions: |
      You are orienting for one step of a phased implementation plan. Load
      `step-implementation` and apply its Orient and Staleness-check phases.
      You are READ-ONLY in this node: no edits, no fixes.
      1. Read the previous handoff (below). Note directives aimed at this
         step, deviations that changed the codebase, and bare assertions
         that need re-verification.
      2. Staleness-check the step plan against the code at {{project_dir}}:
         grep the symbols it references (via execute_command), read its
         Context snippets at their claimed locations with fs_cat, confirm
         its Test commands exist.
      3. Classify discrepancies per the skill's deviation table: minor
         (mechanics differ; correct silently in the brief) vs major (scope,
         approach, interfaces, or a later step's assumptions affected).
      Produce `implementation_brief`: the corrected, self-contained marching
      orders for the implementer - plan tasks in order, handoff directives
      applied, minor staleness corrections folded in, acceptance criteria
      restated. The implementer sees ONLY the step plan plus your brief.
    prompt: |
      ## Step plan ({{step_plan_path}})
      {{step_plan}}
      ## Previous handoff ({{prev_handoff_path}})
      {{prev_handoff}}
      ## Rolling project notes
      {{notes}}
    tools:
      - fs_cat
      - fs_ls
      - execute_command
    max_iterations: 20
    output_schema:
      type: object
      properties:
        plan_summary:
          type: string
          description: 1-3 sentences summarizing what this step delivers
        implementation_brief:
          type: string
          description: Corrected, self-contained instructions for the implementer
        staleness_report:
          type: string
          description: Findings from checking plan assumptions against current code; "clean" if none
        has_major_deviation:
          type: boolean
          description: True when a discrepancy changes scope, approach, or interfaces
        deviation_summary:
          type: string
          description: Major deviations only, with the plan claim vs current reality. Empty when none
      required: [plan_summary, implementation_brief, staleness_report, has_major_deviation, deviation_summary]
    fallback: end_failure
    next: route_staleness
  route_staleness:
    id: route_staleness
    type: script
    description: Major deviation -> user gate; otherwise straight to implement.
    script: scripts/route_staleness.sh
    timeout: 5
    fallback: implement
  gate_deviation:
    id: gate_deviation
    type: approval
    description: Major deviations are never silently absorbed - the user decides.
    question: |
      Step {{step_number}} ({{step_title}}): the plan no longer matches the
      codebase in a way that changes scope or approach.
      {{deviation_summary}}
      Staleness report:
      {{staleness_report}}
      Proceed with the corrected brief? (Answer with anything else to give
      your own guidance to the implementer.)
    options:
      - "proceed"
      - "abort"
    routes:
      "proceed": implement
      "abort": end_rejected
    on_other: implement
    state_updates:
      user_feedback: "{{choice}}"
  implement:
    id: implement
    type: agent
    description: |
      Delegate implementation to the coder graph agent, which runs its own
      plan -> implement -> build -> tests -> self-review fix-loop internally.
    agent: coder
    prompt: |
      ## TASK
      Execute step {{step_number}} ({{step_title}}) of a phased implementation
      plan for the project at {{project_dir}}.
      ## EXPECTED OUTCOME
      Every task in the step plan below is implemented and its acceptance
      criteria are met. Tests are derived from the Acceptance criteria
      section (not from the implementation). Build and full test suite pass.
      ## MUST DO
      - Follow the Orientation brief below - it supersedes the raw plan where
        they disagree (it folds in corrections from the staleness check).
      - Match the patterns pasted in the step plan's Context section.
      - Derive tests from the plan's Acceptance criteria.
      ## MUST NOT DO
      - Do not touch anything listed in the plan's Out of scope section.
      - Do not modify files under {{plans_dir}}.
      - Do not implement work belonging to other steps.
      ## CONTEXT
      ### Step plan
      {{step_plan}}
      ### Orientation brief (handoff directives + staleness corrections applied)
      {{implementation_brief}}
      ### User guidance (if any)
      {{user_feedback}}
      ### Fix loop status (empty on first attempt)
      {{fix_instructions}}
    timeout: 3600
    state_updates:
      coder_result: "{{output}}"
    next: route_coder_result
  route_coder_result:
    id: route_coder_result
    type: script
    description: Route on the coder sentinel - COMPLETE verifies, REJECTED/FAILED terminate.
    script: scripts/route_coder_result.sh
    timeout: 5
    fallback: end_failure
  verify_format_lint:
    id: verify_format_lint
    type: script
    description: |
      Format BEFORE evidence collection (FORMAT_CMD override or per-type
      heuristic), then lint (LINT_CMD, when configured). Lint failure routes
      to the fix loop.
    script: scripts/verify_format_lint.sh
    timeout: 300
    fallback: fix_loop_gate
  verify_build:
    id: verify_build
    type: script
    description: Step-level build/typecheck evidence, collected AFTER formatting.
    script: scripts/verify_build.sh
    timeout: 600
    fallback: fix_loop_gate
  verify_tests:
    id: verify_tests
    type: script
    description: FULL test suite - regressions in untouched code fail the step too.
    script: scripts/verify_tests.sh
    timeout: 1200
    fallback: fix_loop_gate
  fix_loop_gate:
    id: fix_loop_gate
    type: script
    description: |
      Step-level fix budget (the coder already ran its own internal fix
      loop). Loops to implement with fix_instructions, or ends as failure.
    script: scripts/fix_loop_gate.sh
    timeout: 5
    fallback: end_failure
  edge_case_sweep:
    id: edge_case_sweep
    type: llm
    description: |
      Post-implementation sweep: missed spots, edge cases, downstream plan
      implications. May annotate downstream plans' Edge cases sections
      (annotate vs propose per handoff-protocol). Also judges whether the
      change warrants an independent review pass.
    skills_enabled: true
    enabled_skills:
      - step-implementation
      - handoff-protocol
    instructions: |
      The implementation for this step just passed build and tests. Load
      `step-implementation` (edge-case sweep phase) and `handoff-protocol`
      (annotate-vs-propose rules), then:
      1. Read the changed code (the coder result below names the files).
         Look for edge cases the plan missed: empty inputs, error paths,
         concurrency, partial failure, compat.
      2. For each edge case belonging to a LATER step: check that step's
         plan under {{plans_dir}}/steps/. If its Edge cases section already
         covers it, done. If not, append an entry to that section via
         fs_patch - touch NOTHING else in the file.
      3. NEVER edit a later plan's Objective, Tasks, Acceptance criteria,
         or Out of scope. Scope-affecting changes become proposed diffs in
         `downstream_updates` instead.
      4. Set needs_independent_review=true when the change touched 5+ files
         or crosses architectural boundaries (auth, public APIs, schema,
         security-sensitive paths).
      Be terse. Findings, not prose.
    prompt: |
      ## Coder result
      {{coder_result}}
      ## Step plan
      {{step_plan}}
      ## Staleness report from orientation
      {{staleness_report}}
    tools:
      - fs_cat
      - fs_ls
      - fs_patch
      - execute_command
    max_iterations: 20
    output_schema:
      type: object
      properties:
        edge_case_report:
          type: string
          description: Edge cases discovered - both handled and punted, one per line. "none" if empty
        downstream_updates:
          type: string
          description: Annotations made (plan file + section) and proposed diffs for scope-affecting changes. "none" if empty
        needs_independent_review:
          type: boolean
      required: [edge_case_report, downstream_updates, needs_independent_review]
    fallback: write_handoff
    next: route_sweep
  route_sweep:
    id: route_sweep
    type: script
    description: Broad or boundary-crossing changes get an independent reviewer.
    script: scripts/route_sweep.sh
    timeout: 5
    fallback: write_handoff
  independent_review:
    id: independent_review
    type: agent
    description: Independent review pass - the author's self-review cannot catch its own rationalizations.
    agent: code-reviewer
    prompt: |
      Review the changes produced for step {{step_number}} ({{step_title}})
      of a phased implementation plan in {{project_dir}}.
      What the step was supposed to do:
      {{plan_summary}}
      Coder summary (names the modified/created files):
      {{coder_result}}
      Review the changed files against the step plan's acceptance criteria.
      Preserve severity tags in your findings.
    timeout: 1200
    state_updates:
      review_report: "{{output}}"
    next: route_review
  route_review:
    id: route_review
    type: script
    description: Critical findings loop back to implement (bounded); otherwise proceed to handoff.
    script: scripts/route_review.sh
    timeout: 5
    fallback: write_handoff
  write_handoff:
    id: write_handoff
    type: llm
    description: |
      Write the evidence-backed handoff per handoff-protocol and append
      durable facts to NOTES.md. The completion gate (check_handoff)
      verifies the document afterward.
    skills_enabled: true
    enabled_skills:
      - handoff-protocol
      - ai-slop-remover
    instructions: |
      Load `handoff-protocol` and follow its writer schema EXACTLY: the
      frontmatter (step, title, result) and all eight sections, writing
      "None" rather than omitting a section.
      Write the handoff to {{handoff_path}} with fs_write. Paste the
      verification evidence below verbatim into the Evidence section -
      commands, exit codes, decisive output lines. Deviations come from the
      staleness report, gate decisions, and fix loop history. Downstream
      plan updates come from the sweep results.
      Then append durable, step-independent facts (if any) to {{notes_path}}
      - create the file if missing, never rewrite existing entries.
      If "Gate feedback" below is non-empty, a previous handoff attempt
      failed validation - fix exactly what it lists.
    prompt: |
      ## Step
      {{step_number}} ({{step_title}}) - plan at {{step_plan_path}}
      ## Plan summary
      {{plan_summary}}
      ## Coder result
      {{coder_result}}
      ## Staleness report / deviations
      {{staleness_report}}
      Major deviation summary (if any): {{deviation_summary}}
      User guidance given (if any): {{user_feedback}}
      Fix loop attempts used: {{fix_attempts}} of {{max_fix_attempts}}
      ## Edge cases discovered
      {{edge_case_report}}
      ## Downstream plan updates
      {{downstream_updates}}
      ## Independent review report (if any)
      {{review_report}}
      ## Verification evidence (paste verbatim)
      ### Format
      {{format_output}}
      ### Lint
      {{lint_output}}
      ### Build
      {{build_output}}
      ### Tests
      {{tests_output}}
      ## Gate feedback
      {{handoff_fix}}
    tools:
      - fs_cat
      - fs_ls
      - fs_write
      - fs_patch
    max_iterations: 15
    output_schema:
      type: object
      properties:
        step_summary:
          type: string
          description: 3-6 sentence summary of the step for the user's approval decision - what was done, deviations, anything needing their attention
      required: [step_summary]
    fallback: end_failure
    next: check_handoff
  check_handoff:
    id: check_handoff
    type: script
    description: |
      Deterministic completion gate - handoff exists with frontmatter and all
      required sections. On success, marks the step plan status complete.
      One retry back to write_handoff, then failure.
    script: scripts/check_handoff.sh
    timeout: 10
    fallback: end_failure
  gate_user_review:
    id: gate_user_review
    type: approval
    description: The hard stop - the next step never starts without explicit approval.
    question: |
      ## Step {{step_number}} ({{step_title}}) - ready for review
      {{step_summary}}
      Handoff: {{handoff_path}}
      Build: {{build_ok}} | Tests: {{tests_ok}} | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
      Approve this step? (Answer with anything else to send revision
      instructions straight to the implementer.)
    options:
      - "approve"
      - "revise"
    routes:
      "approve": end_success
      "revise": get_revision
    on_other: revise_from_choice
    state_updates:
      user_feedback: "{{choice}}"
  get_revision:
    id: get_revision
    type: input
    description: Collect revision instructions, then loop back through implement -> verify -> handoff.
    question: "What should change? Your comments go to the implementer verbatim."
    validation: "len(input) > 0"
    state_updates:
      fix_instructions: "{{input}}"
    next: implement
  revise_from_choice:
    id: revise_from_choice
    type: script
    description: Free-form approval answers are treated as revision instructions.
    script: scripts/revise_from_choice.sh
    timeout: 5
    fallback: get_revision
  end_success:
    id: end_success
    type: end
    output: |
      STEP_COMPLETE
      Step: {{step_number}} ({{step_title}})
      Plan: {{step_plan_path}}
      Handoff: {{handoff_path}}
      Build: passed | Tests: passed | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
      {{step_summary}}
      Downstream plan updates:
      {{downstream_updates}}
  end_blocked:
    id: end_blocked
    type: end
    output: |
      STEP_BLOCKED
      Step: {{step_number}} ({{step_title}})
      Reason:
      {{blocking_reason}}
  end_rejected:
    id: end_rejected
    type: end
    output: |
      STEP_REJECTED
      Step: {{step_number}} ({{step_title}})
      Rejected at: deviation gate or coder approval gate.
      Deviation summary:
      {{deviation_summary}}
      Coder result (if it ran):
      {{coder_result}}
  end_failure:
    id: end_failure
    type: end
    output: |
      STEP_FAILED
      Step: {{step_number}} ({{step_title}})
      Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
      Blocking reason (if resolution failed): {{blocking_reason}}
      Coder result:
      {{coder_result}}
      Last build output:
      {{build_output}}
      Last tests output:
      {{tests_output}}
@@ -0,0 +1,54 @@
 #!/usr/bin/env bash
 set -uo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 handoff_path=$(echo "$state" | jq -r '.handoff_path // ""')
 step_plan_path=$(echo "$state" | jq -r '.step_plan_path // ""')
 handoff_attempts=$(echo "$state" | jq -r '.handoff_attempts // 0')
 problems=""
 if [[ ! -f "$handoff_path" ]]; then
  problems="- handoff file does not exist at $handoff_path"$'\n'
 else
  content=$(cat "$handoff_path")
  grep -qE '^result:[[:space:]]*(complete|partial|blocked)' <<< "$content" \
    || problems+="- frontmatter is missing 'result: complete|partial|blocked'"$'\n'
  for section in "Summary" "Completed" "Not completed" "Deviations" "Downstream plan updates" "Edge cases discovered" "Evidence" "Notes for next step"; do
    grep -qE "^##[[:space:]]+${section}" <<< "$content" \
      || problems+="- missing required section: ## ${section}"$'\n'
  done
 fi
 if [[ -z "$problems" ]]; then
  if [[ -f "$step_plan_path" ]]; then
    tmp=$(mktemp)
    awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: complete"; next} {print}' "$step_plan_path" > "$tmp" && mv "$tmp" "$step_plan_path"
  fi
  jq -nc '{"handoff_fix": "", "_next": "gate_user_review"}'
  exit 0
 fi
 if (( handoff_attempts >= 1 )); then
  jq -nc \
    --arg br "Handoff failed validation twice. Problems:
 $problems" \
    '{"blocking_reason": $br, "_next": "end_failure"}'
  exit 0
 fi
 jq -nc \
  --arg hf "The previous handoff attempt failed validation. Fix exactly these problems:
 $problems" \
  '{
    "handoff_attempts": 1,
    "handoff_fix": $hf,
    "_next": "write_handoff"
  }'
@@ -0,0 +1,60 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 fix_attempts=$(echo "$state" | jq -r '.fix_attempts // 0')
 max_fix_attempts=$(echo "$state" | jq -r '.max_fix_attempts // 2')
 lint_ok=$(echo "$state" | jq -r '.lint_ok | if . == null then "true" else (. | tostring) end')
 build_ok=$(echo "$state" | jq -r '.build_ok | if . == null then "true" else (. | tostring) end')
 tests_ok=$(echo "$state" | jq -r '.tests_ok | if . == null then "true" else (. | tostring) end')
 lint_output=$(echo "$state" | jq -r '.lint_output // ""')
 build_output=$(echo "$state" | jq -r '.build_output // ""')
 tests_output=$(echo "$state" | jq -r '.tests_output // ""')
 if (( fix_attempts >= max_fix_attempts )); then
  jq -nc \
    --argjson n "$fix_attempts" \
    '{
      "fix_attempts": $n,
      "_next": "end_failure"
    }'
  exit 0
 fi
 next_attempts=$((fix_attempts + 1))
 if [[ "$lint_ok" != "true" ]]; then
  stage="lint"
  output="$lint_output"
 elif [[ "$build_ok" != "true" ]]; then
  stage="build"
  output="$build_output"
 elif [[ "$tests_ok" != "true" ]]; then
  stage="full test suite"
  output="$tests_output"
 else
  stage="verification"
  output="fix_loop_gate was reached but no failing stage was recorded. Re-run verification."
 fi
 fix_instructions=$(printf '## Fix loop status (step-level attempt %d of %d)\n\nThe implementation passed the coder'"'"'s internal checks but failed step-level verification at the %s stage.\n\nOutput:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor. Regressions in untouched code caused by this change are in scope.' \
  "$next_attempts" "$max_fix_attempts" "$stage" "$output")
 jq -nc \
  --argjson n "$next_attempts" \
  --arg 'fi' "$fix_instructions" \
  '{
    "fix_attempts": $n,
    "fix_instructions": $fi,
    "lint_ok": true,
    "build_ok": true,
    "tests_ok": true,
    "_next": "implement"
  }'
@@ -0,0 +1,152 @@
 #!/usr/bin/env bash
 set -uo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 fail() {
  jq -nc --arg r "$1" '{"blocking_reason": $r, "_next": "end_failure"}'
  exit 0
 }
 project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
 project_dir=$(cd "$project_dir" 2>/dev/null && pwd) || fail "project_dir does not exist: $project_dir"
 plans_dir="${LLM_AGENT_VAR_PLANS_DIR:-plans}"
 [[ "$plans_dir" != /* ]] && plans_dir="$project_dir/$plans_dir"
 steps_dir="$plans_dir/steps"
 handoffs_dir="$plans_dir/handoffs"
 notes_path="$plans_dir/NOTES.md"
 [[ -d "$steps_dir" ]] || fail "No step plans directory at $steps_dir (expected <plans_dir>/steps/NN-<slug>.md)"
 frontmatter() {
  awk '/^---[[:space:]]*$/{n++; next} n==1{print} n>=2{exit}' "$1"
 }
 fm_value() {
  echo "$1" | grep -E "^$2:" | head -1 | sed -E "s/^$2:[[:space:]]*//" | sed -E 's/^["'"'"']|["'"'"']$//g'
 }
 step="${LLM_AGENT_VAR_STEP:-next}"
 if [[ "$step" == "next" ]]; then
  prompt_step=$(echo "$state" | jq -r '.initial_prompt // ""' | grep -oiE 'step[[:space:]#:]*[0-9]+' | head -1 | grep -oE '[0-9]+' || true)
  [[ -n "$prompt_step" ]] && step="$prompt_step"
 fi
 plan_file=""
 if [[ "$step" == "next" ]]; then
  first_pending=""
  while IFS= read -r f; do
    st=$(fm_value "$(frontmatter "$f")" "status")
    if [[ "$st" == "in-progress" ]]; then
      plan_file="$f"
      break
    fi
    [[ -z "$first_pending" && ( "$st" == "pending" || -z "$st" ) ]] && first_pending="$f"
  done < <(find "$steps_dir" -maxdepth 1 -name '*.md' | sort)
  [[ -z "$plan_file" ]] && plan_file="$first_pending"
  [[ -z "$plan_file" ]] && fail "No in-progress or pending step plans in $steps_dir"
 else
  [[ "$step" =~ ^[0-9]+$ ]] || fail "step must be a number or 'next'; got: $step"
  padded=$(printf '%02d' "$((10#$step))")
  plan_file=$(find "$steps_dir" -maxdepth 1 \( -name "${padded}-*.md" -o -name "${step}-*.md" \) | sort | head -1)
  [[ -n "$plan_file" ]] || fail "No step plan matching step $step in $steps_dir"
 fi
 bn=$(basename "$plan_file" .md)
 num_part="${bn%%-*}"
 [[ "$num_part" =~ ^[0-9]+$ ]] || fail "Step plan filename must start with a number: $bn"
 step_number=$((10#$num_part))
 step_slug="${bn#*-}"
 fm=$(frontmatter "$plan_file")
 step_title=$(fm_value "$fm" "title")
 [[ -z "$step_title" ]] && step_title="$step_slug"
 deps=$(echo "$fm" | awk '/^depends_on:/{f=1; print; next} f && /^[[:space:]]*-/{print; next} f{exit}' | grep -oE '[0-9]+' || true)
 unsatisfied=""
 for dep in $deps; do
  dep_padded=$(printf '%02d' "$((10#$dep))")
  dep_handoff=$(find "$handoffs_dir" -maxdepth 1 \( -name "${dep_padded}-*.md" -o -name "${dep}-*.md" \) 2>/dev/null | sort | head -1)
  if [[ -z "$dep_handoff" ]]; then
    unsatisfied+="- step $dep: no handoff found (step not executed?)"$'\n'
    continue
  fi
  dep_result=$(fm_value "$(frontmatter "$dep_handoff")" "result")
  if [[ "$dep_result" != "complete" ]]; then
    unsatisfied+="- step $dep: handoff result is '$dep_result' (not complete): $dep_handoff"$'\n'
  fi
 done
 prev_handoff_path="(none)"
 prev_handoff="(none - this is the first step)"
 prev_file=""
 prev_num=0
 while IFS= read -r h; do
  hn="${h##*/}"
  hn="${hn%%-*}"
  [[ "$hn" =~ ^[0-9]+$ ]] || continue
  n=$((10#$hn))
  if (( n < step_number && n >= prev_num )); then
    prev_num=$n
    prev_file="$h"
  fi
 done < <(find "$handoffs_dir" -maxdepth 1 -name '*.md' 2>/dev/null | sort)
 if [[ -n "$prev_file" ]]; then
  prev_handoff_path="$prev_file"
  prev_handoff=$(head -c 16000 "$prev_file")
 fi
 notes="(none)"
 [[ -f "$notes_path" ]] && notes=$(head -c 8000 "$notes_path")
 step_plan=$(head -c 24000 "$plan_file")
 handoff_path="$handoffs_dir/$(basename "$plan_file")"
 tmp=$(mktemp)
 awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: in-progress"; next} {print}' "$plan_file" > "$tmp" && mv "$tmp" "$plan_file"
 next_node="orient"
 blocking_reason=""
 if [[ -n "$unsatisfied" ]]; then
  next_node="gate_blocked"
  blocking_reason="Unsatisfied dependencies:"$'\n'"$unsatisfied"
 fi
 jq -nc \
  --arg pd "$project_dir" \
  --arg pl "$plans_dir" \
  --argjson sn "$step_number" \
  --arg ss "$step_slug" \
  --arg st "$step_title" \
  --arg spp "$plan_file" \
  --arg sp "$step_plan" \
  --arg php "$prev_handoff_path" \
  --arg ph "$prev_handoff" \
  --arg np "$notes_path" \
  --arg no "$notes" \
  --arg hp "$handoff_path" \
  --arg br "$blocking_reason" \
  --arg nx "$next_node" \
  '{
    "project_dir": $pd,
    "plans_dir": $pl,
    "step_number": $sn,
    "step_slug": $ss,
    "step_title": $st,
    "step_plan_path": $spp,
    "step_plan": $sp,
    "prev_handoff_path": $php,
    "prev_handoff": $ph,
    "notes_path": $np,
    "notes": $no,
    "handoff_path": $hp,
    "blocking_reason": $br,
    "_next": $nx
  }'
@@ -0,0 +1,27 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 feedback=$(echo "$state" | jq -r '.user_feedback // ""')
 if [[ -z "$feedback" ]]; then
  jq -nc '{"_next": "get_revision"}'
  exit 0
 fi
 fix_instructions=$(printf '## Revision requested by the user at the step approval gate\n\nAddress these comments with minimal edits, then the step re-verifies and the handoff is rewritten:\n\n%s' \
  "$feedback")
 jq -nc \
  --arg 'fi' "$fix_instructions" \
  '{
    "fix_instructions": $fi,
    "_next": "implement"
  }'
@@ -0,0 +1,27 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 coder_result=$(echo "$state" | jq -r '.coder_result // ""')
 case "$coder_result" in
  *CODER_COMPLETE*)
    jq -nc '{"_next": "verify_format_lint"}'
    ;;
  *CODER_REJECTED*)
    jq -nc '{"_next": "end_rejected"}'
    ;;
  *CODER_FAILED*)
    jq -nc '{"blocking_reason": "coder fix-loop exhausted; see coder result", "_next": "end_failure"}'
    ;;
  *)
    jq -nc '{"blocking_reason": "coder returned no recognizable sentinel (expected CODER_COMPLETE / CODER_REJECTED / CODER_FAILED)", "_next": "end_failure"}'
    ;;
 esac
@@ -0,0 +1,38 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 review_report=$(echo "$state" | jq -r '.review_report // ""')
 review_attempts=$(echo "$state" | jq -r '.review_attempts // 0')
 max_review_attempts=$(echo "$state" | jq -r '.max_review_attempts // 1')
 if ! grep -qF "🔴" <<< "$review_report"; then
  jq -nc '{"_next": "write_handoff"}'
  exit 0
 fi
 if (( review_attempts >= max_review_attempts )); then
  jq -nc '{"_next": "write_handoff"}'
  exit 0
 fi
 next_review=$((review_attempts + 1))
 fix_instructions=$(printf '## Independent review findings (attempt %d of %d)\n\nAn independent reviewer flagged CRITICAL (🔴) findings. Address ONLY the 🔴 findings with minimal edits. Do not refactor unrelated code.\n\n%s' \
  "$next_review" "$max_review_attempts" "$review_report")
 jq -nc \
  --argjson n "$next_review" \
  --arg 'fi' "$fix_instructions" \
  '{
    "review_attempts": $n,
    "fix_instructions": $fi,
    "needs_independent_review": false,
    "_next": "implement"
  }'
@@ -0,0 +1,23 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 has_major=$(echo "$state" | jq -r '.has_major_deviation // false')
 if [[ "${STEP_AUTOAPPROVE:-0}" == "1" ]]; then
  jq -nc '{"_next": "implement"}'
  exit 0
 fi
 if [[ "$has_major" == "true" ]]; then
  jq -nc '{"_next": "gate_deviation"}'
 else
  jq -nc '{"_next": "implement"}'
 fi
@@ -0,0 +1,23 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 needs_review=$(echo "$state" | jq -r '.needs_independent_review // false')
 if [[ "${STEP_SKIP_REVIEW:-0}" == "1" ]]; then
  jq -nc '{"_next": "write_handoff"}'
  exit 0
 fi
 if [[ "$needs_review" == "true" ]]; then
  jq -nc '{"_next": "independent_review"}'
 else
  jq -nc '{"_next": "write_handoff"}'
 fi
@@ -0,0 +1,57 @@
 #!/usr/bin/env bash
 set -uo pipefail
 # shellcheck disable=SC1091
 source "$(dirname "$0")/../../.shared/utils.sh"
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 project_dir=$(echo "$state" | jq -r '.project_dir // "."')
 if [[ -n "${BUILD_CMD:-}" ]]; then
  cmd="$BUILD_CMD"
 else
  project_info=$(detect_project "$project_dir")
  cmd=$(echo "$project_info" | jq -r '.check // .build // ""')
 fi
 if [[ -z "$cmd" || "$cmd" == "null" ]]; then
  jq -nc '{
    "build_ok": true,
    "build_output": "(no build/check command available for this project type)",
    "_next": "verify_tests"
  }'
  exit 0
 fi
 exit_code=0
 output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
 if (( exit_code == 0 )); then
  jq -nc \
    --arg out "Ran: $cmd
 $output" \
    '{
      "build_ok": true,
      "build_output": $out,
      "_next": "verify_tests"
    }'
 else
  jq -nc \
    --arg out "Ran: $cmd
 Exit code: $exit_code
 $output" \
    '{
      "build_ok": false,
      "build_output": $out,
      "_next": "fix_loop_gate"
    }'
 fi
@@ -0,0 +1,79 @@
 #!/usr/bin/env bash
 set -uo pipefail
 # shellcheck disable=SC1091
 source "$(dirname "$0")/../../.shared/utils.sh"
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 project_dir=$(echo "$state" | jq -r '.project_dir // "."')
 project_type=$(detect_project "$project_dir" | jq -r '.type // "unknown"')
 format_cmd="${FORMAT_CMD:-}"
 if [[ -z "$format_cmd" ]]; then
  case "$project_type" in
    rust) format_cmd="cargo fmt" ;;
    go) format_cmd="gofmt -w ." ;;
    python) command -v ruff &>/dev/null && format_cmd="ruff format ." ;;
  esac
 fi
 if [[ -z "$format_cmd" ]]; then
  format_output="(no format command configured for project type '$project_type'; skipped. Set FORMAT_CMD to enable.)"
 else
  fmt_rc=0
  fmt_out=$(cd "$project_dir" && eval "$format_cmd" 2>&1) || fmt_rc=$?
  format_output="Ran: $format_cmd
 Exit code: $fmt_rc
 $fmt_out"
 fi
 lint_cmd="${LINT_CMD:-}"
 if [[ -z "$lint_cmd" ]]; then
  jq -nc \
    --arg fo "$format_output" \
    '{
      "format_output": $fo,
      "lint_ok": true,
      "lint_output": "(no LINT_CMD configured; linting is covered by the build/check command)",
      "_next": "verify_build"
    }'
  exit 0
 fi
 lint_rc=0
 lint_out=$(cd "$project_dir" && eval "$lint_cmd" 2>&1) || lint_rc=$?
 if (( lint_rc == 0 )); then
  jq -nc \
    --arg fo "$format_output" \
    --arg lo "Ran: $lint_cmd
 $lint_out" \
    '{
      "format_output": $fo,
      "lint_ok": true,
      "lint_output": $lo,
      "_next": "verify_build"
    }'
 else
  jq -nc \
    --arg fo "$format_output" \
    --arg lo "Ran: $lint_cmd
 Exit code: $lint_rc
 $lint_out" \
    '{
      "format_output": $fo,
      "lint_ok": false,
      "lint_output": $lo,
      "_next": "fix_loop_gate"
    }'
 fi
@@ -0,0 +1,57 @@
 #!/usr/bin/env bash
 set -uo pipefail
 # shellcheck disable=SC1091
 source "$(dirname "$0")/../../.shared/utils.sh"
 if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
  state=$(cat "$GRAPH_STATE_FILE")
 elif [[ -n "${GRAPH_STATE:-}" ]]; then
  state="$GRAPH_STATE"
 else
  state='{}'
 fi
 project_dir=$(echo "$state" | jq -r '.project_dir // "."')
 if [[ -n "${TEST_CMD:-}" ]]; then
  cmd="$TEST_CMD"
 else
  project_info=$(detect_project "$project_dir")
  cmd=$(echo "$project_info" | jq -r '.test // ""')
 fi
 if [[ -z "$cmd" || "$cmd" == "null" ]]; then
  jq -nc '{
    "tests_ok": true,
    "tests_output": "(no test command available for this project type)",
    "_next": "edge_case_sweep"
  }'
  exit 0
 fi
 exit_code=0
 output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
 if (( exit_code == 0 )); then
  jq -nc \
    --arg out "Ran: $cmd
 $output" \
    '{
      "tests_ok": true,
      "tests_output": $out,
      "_next": "edge_case_sweep"
    }'
 else
  jq -nc \
    --arg out "Ran: $cmd
 Exit code: $exit_code
 $output" \
    '{
      "tests_ok": false,
      "tests_output": $out,
      "_next": "fix_loop_gate"
    }'
 fi
@@ -18,6 +18,11 @@
      "type": "stdio",
      "command": "uvx",
      "args": ["duckduckgo-mcp-server"]
    },
    "iwe": {
      "type": "stdio",
      "command": "iwec",
      "args": ["--project", "."]
    }
  }
 }
@@ -0,0 +1,81 @@
 #!/usr/bin/env bash
 set -e
 # @describe Structural code search using AST patterns (ast-grep). Matches syntax trees, not text,
 # so it finds code regardless of formatting: function calls with any arguments, definitions, etc.
 # Use meta-variables in patterns: $NAME matches one AST node, $$$ matches zero or more nodes.
 # Patterns must be COMPLETE, valid AST nodes in the target language: 'fn $NAME($$$) { $$$ }'
 # matches Rust fn definitions (with body - 'fn $NAME($$$)' alone parses as nothing and matches
 # nothing), 'foo($$$)' matches all calls to foo, '$X.unwrap()' matches all unwrap calls.
 # Prefer this over fs_grep when searching for code STRUCTURE (calls, definitions, signatures);
 # use fs_grep for plain text, comments, or strings.
 # @option --pattern! The AST pattern to search for (must parse as valid code in the target language)
 # @option --lang The target language (e.g. rust, typescript, tsx, javascript, python, go, java, c, cpp, kotlin, swift, ruby, php, css, html, yaml, json). Strongly recommended; without it files of every supported language are scanned
 # @option --path The directory OR file to search in (defaults to current working directory)
 # @option --glob File glob to narrow the search (e.g. "src/**/*.rs", "!**/tests/**")
 # @env LLM_OUTPUT=/dev/stdout The output path
 MAX_RESULTS=100
 MAX_OUTPUT_BYTES=32768
 resolve_binary() {
    if command -v ast-grep &>/dev/null; then
        echo "ast-grep"
        return 0
    fi
    if command -v sg &>/dev/null && sg --version 2>/dev/null | grep -qi 'ast-grep'; then
        echo "sg"
        return 0
    fi
    return 1
 }
 main() {
    # shellcheck disable=SC2154
    local pattern="$argc_pattern"
    local lang="${argc_lang:-}"
    local search_path="${argc_path:-.}"
    local glob="${argc_glob:-}"
    local bin
    if ! bin=$(resolve_binary); then
        printf 'ast-grep is not installed. Fall back to fs_grep for this search.\nTo enable structural search, install ast-grep:\n  cargo install ast-grep --locked\n  brew install ast-grep\n  npm i -g @ast-grep/cli\n' >> "$LLM_OUTPUT"
        return 0
    fi
    if [[ ! -e "$search_path" ]]; then
        echo "Error: path not found: $search_path" >> "$LLM_OUTPUT"
        return 1
    fi
    local args=(run --pattern "$pattern" --color never --heading never)
    [[ -n "$lang" ]] && args+=(--lang "$lang")
    [[ -n "$glob" ]] && args+=(--globs "$glob")
    args+=("$search_path")
    local output exit_code=0
    output=$("$bin" "${args[@]}" 2>&1) || exit_code=$?
    if [[ -z "$output" ]]; then
        echo "No structural matches found for: $pattern" >> "$LLM_OUTPUT"
        return 0
    fi
    if (( exit_code > 1 )); then
        printf 'ast-grep failed (exit %s):\n%s\n\nHint: the pattern must be valid %s syntax. Meta-variables: $NAME (one node), $$$ (zero or more).\n' \
            "$exit_code" "$output" "${lang:-source}" >> "$LLM_OUTPUT"
        return 0
    fi
    local total
    total=$(wc -l <<< "$output")
    output=$(head -n "$MAX_RESULTS" <<< "$output" | head -c "$MAX_OUTPUT_BYTES")
    echo "$output" >> "$LLM_OUTPUT"
    if (( total > MAX_RESULTS )); then
        printf '\n(Showing %s of %s matching lines. Narrow with --glob, --lang, or a more specific pattern.)\n' \
            "$MAX_RESULTS" "$total" >> "$LLM_OUTPUT"
    fi
 }
@@ -0,0 +1,93 @@
 ---
 name: diagnose
 temperature: 0.2
 enabled_tools:
  - execute_command
  - fs_cat
  - fs_ls
  - web_search_coyote
 skills_enabled: false
 auto_continue: true
 max_auto_continues: 10
 ---
 You are an expert systems troubleshooter: equal parts SRE, sysadmin, network engineer, and homelab tinkerer. Your job
 is to diagnose and fix technical problems of any kind: services that won't start, networking failures, container
 issues, driver problems, permission errors, misbehaving hardware, broken configs, or anything else. You are not limited
 to code.
 <system>
 os: {{__os__}}
 distro: {{__os_distro__}}
 arch: {{__arch__}}
 shell: {{__shell__}}
 cwd: {{__cwd__}}
 now: {{__now__}}
 </system>
 ## Prime Directive
 **You run the diagnostics yourself.** Never tell the user to run a command and paste the output back. Use the
 `execute_command` tool to gather evidence directly, then interpret the results for them. The user should watch you
 work, not act as your terminal.
 ## Diagnostic Loop
 Work the loop until the problem is solved or genuinely blocked:
 1. **Reproduce & observe.** Run the failing thing (or inspect its state) to see the actual error with your own eyes.
   Never diagnose from the user's paraphrase alone.
 2. **Establish what changed.** Most breakage follows a change: updates, config edits, reboots, new hardware, expired
   certs/leases. Check timestamps, package logs, and recent history early.
 3. **Check the dumb stuff first.** Is the service running? Is it enabled? Is the interface up? Is the disk full? Is
   DNS resolving? Is the clock right? Cheap checks before deep theories.
 4. **Isolate by layer.** Split the problem space in half with each test:
   - Networking: bottom-up — link → IP/DHCP → routing → DNS → transport → application.
   - Software: process alive? → logs → config → dependencies/permissions → environment → binary itself.
   - Containers: daemon → image → container state → logs → mounts/networks → host resources.
 5. **Hypothesize, then test.** State your current best hypothesis in one line before each test, and change ONE
   variable at a time. If a test disproves the hypothesis, say so and pivot; don't quietly move on.
 6. **Fix the root cause, not the symptom.** A restart that "fixes" it without explanation is a data point, not a fix.
 7. **Verify.** After any fix, re-run the original failing operation and confirm it now works. No verification, no
   victory declaration.
 ## Evidence Gathering
 - Primary sources, in rough order of value: exit codes and stderr, service/app logs (`journalctl`, `docker logs`,
  files under `/var/log`), kernel messages (`dmesg`), state inspection (`systemctl status`, `ip`, `ss`, `df`, `free`,
  `lsblk`, `nmcli`, `docker ps/inspect`), then config files.
 - Make every command non-interactive and bounded: `--no-pager` for `journalctl`/`systemctl`, `-n`/`--since` to limit
  log output, `timeout 10 ...` for anything that might hang, `-c` counts for `ping`. Never launch interactive TUIs
  (top, htop, lazydocker itself) — use their batch/one-shot modes or underlying CLIs instead.
 - Prefer unprivileged commands. When root is genuinely required, say why and use `sudo` (the user may get a password
  prompt in their terminal — that's expected).
 - Search the web for exact error strings (quoted, with software name and version) when an error is unfamiliar or
  smells like a known bug or recent regression. Distro wikis, GitHub issues, and bug trackers beat guessing.
 ## Safety Rules
 Commands fall into three tiers:
 1. **Read-only / inspection** (status, logs, listing, ping, dig, cat): run freely, no permission needed.
 2. **Reversible state changes** (restart a service, bounce an interface, recreate a container, edit a config after
   backing it up): announce what you're about to do and why in one sentence, then do it. Back up any file before
   modifying it (`cp file file.bak.$(date +%s)`).
 3. **Destructive or hard-to-reverse actions** (deleting data or volumes, formatting, `dd`, partitioning, package
   removal, firewall flushes, forced resets): STOP and ask for explicit confirmation first, including the exact
   command and a rollback plan. Never run these on your own judgment.
 Additional hard rules:
 - Never print or transmit secrets. If command output contains tokens, keys, or passwords, redact them in your response.
 - Never disable security controls (firewalls, SELinux/AppArmor, certificate validation) as a "fix" — at most as a
  temporary, clearly-labeled isolation test, restored immediately after.
 - If the evidence points to failing hardware or risk of data loss, stop, say so plainly, and present options before
  touching anything else.
 ## Communication
 - Lead with what you found, not what you did. Then show the key evidence: the command and the relevant lines of its
  output (trimmed — never dump walls of text).
 - When the problem is multi-step, keep a running todo list so the user can follow the investigation.
 - On resolution, close with a short summary: **root cause → fix applied → how it was verified → how to prevent it**.
 - If you're blocked (needs physical access, a password you don't have, a reboot decision), say exactly what you need
  and what you'll do once you have it.
@@ -5,7 +5,7 @@
 #   sbx cp $HOME/.config/coyote/ testing:/home/agent/.config/
 #   sbx cp $HOME/.coyote_password testing:/home/agent/
 #   sbx run testing --kit ./sbx-kit/
-schemaVersion: '1'
+schemaVersion: "1"
 kind: sandbox
 name: coyote
 displayName: Coyote
@@ -14,10 +14,10 @@ description: >
  CLI & REPL mode, RAG, AI tools & agents, MCP servers, skills, and macros.
 sandbox:
-  image: 'docker/sandbox-templates:shell-docker'
+  image: "docker/sandbox-templates:shell-docker"
  aiFilename: COYOTE.md
  entrypoint:
-    run: ['bash', '-lc', 'exec /home/agent/.cargo/bin/coyote']
+    run: ["bash", "-lc", "exec /home/agent/.cargo/bin/coyote"]
 network:
  # Proxy-managed LLM providers: the proxy substitutes `proxy-managed` for
@@ -50,96 +50,96 @@ network:
  serviceAuth:
    openai:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    anthropic:
      headerName: x-api-key
-      valueFormat: '%s'
+      valueFormat: "%s"
    gemini:
      headerName: x-goog-api-key
-      valueFormat: '%s'
+      valueFormat: "%s"
    cohere:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    groq:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    openrouter:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    ai21:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    cloudflare:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    deepinfra:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    deepseek:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    mistral:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    perplexity:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    voyageai:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    xai:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    jina:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    ernie:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    hunyuan:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    minimax:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    moonshot:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    qianwen:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
    zhipuai:
      headerName: Authorization
-      valueFormat: 'Bearer %s'
+      valueFormat: "Bearer %s"
  allowedDomains:
    # Coyote release + self-update + model-registry sync
-    - 'github.com:443'
+    - "github.com:443"
-    - 'api.github.com:443'
+    - "api.github.com:443"
-    - 'raw.githubusercontent.com:443'
+    - "raw.githubusercontent.com:443"
-    - 'objects.githubusercontent.com:443'
+    - "objects.githubusercontent.com:443"
-    - '*.githubusercontent.com:443'
+    - "*.githubusercontent.com:443"
    # Coyote install paths (cargo install + uv + rustup + Python tool deps at runtime)
-    - 'crates.io:443'
+    - "crates.io:443"
-    - 'static.crates.io:443'
+    - "static.crates.io:443"
-    - 'pypi.org:443'
+    - "pypi.org:443"
-    - 'files.pythonhosted.org:443'
+    - "files.pythonhosted.org:443"
-    - 'astral.sh:443'
+    - "astral.sh:443"
-    - 'sh.rustup.rs:443'
+    - "sh.rustup.rs:443"
-    - 'static.rust-lang.org:443'
+    - "static.rust-lang.org:443"
    # LLM model OAuth + API endpoints
-    - 'claude.ai:443'
+    - "claude.ai:443"
-    - 'console.anthropic.com:443'
+    - "console.anthropic.com:443"
-    - 'accounts.google.com:443'
+    - "accounts.google.com:443"
    # *.googleapis.com covers oauth2 + userinfo + VertexAI regional endpoints
    # (*-aiplatform.googleapis.com). Do not narrow without re-checking VertexAI.
-    - '*.googleapis.com:443'
+    - "*.googleapis.com:443"
    # Bedrock and GitHub Models use signed / GitHub-PAT auth that the proxy
    # cannot rewrite. Domains are allow-listed; credentials must be injected
    # separately (see README "Extending").
-    - '*.amazonaws.com:443'
+    - "*.amazonaws.com:443"
-    - 'models.inference.ai.azure.com:443'
+    - "models.inference.ai.azure.com:443"
 credentials:
  sources:
@@ -210,7 +210,7 @@ credentials:
 environment:
  variables:
-    IS_SANDBOX: '1'
+    IS_SANDBOX: "1"
    COYOTE_LOG_LEVEL: INFO
    COYOTE_CONFIG_DIR: /home/agent/.config/coyote
  proxyManaged:
@@ -250,7 +250,7 @@ commands:
          libssl-dev \
          pandoc \
          bzip2
-      user: '1000'
+      user: "1000"
      description: Install system prerequisites (including pandoc for fetch_url_via_curl)
    - command: |
        curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -258,7 +258,7 @@ commands:
          printf '#!/bin/sh\nexec uv tool run "$@"\n' > "$HOME/.local/bin/uvx"
          chmod +x "$HOME/.local/bin/uvx"
        fi
-      user: '1000'
+      user: "1000"
      description: Install uv and write a uvx shell wrapper (the installer may place a macOS binary at this path on Docker-for-Mac hosts, which the Linux container cannot execute)
    - command: |
        set -euo pipefail
@@ -274,7 +274,7 @@ commands:
        curl -fsSL --retry 3 "https://github.com/xo/usql/releases/download/v${USQL_VERSION}/usql_static-${USQL_VERSION}-linux-${USQL_ARCH}.tar.bz2" -o "$TMPDIR/usql.tar.bz2"
        tar -xjf "$TMPDIR/usql.tar.bz2" -C "$TMPDIR"
        sudo install -m 0755 "$TMPDIR/usql_static" /usr/local/bin/usql
-      user: '1000'
+      user: "1000"
      description: Install the usql universal SQL CLI (used by the built-in sql agent and execute_sql_code tool)
    - command: |
        curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | \
@@ -284,17 +284,27 @@ commands:
          --target x86_64-unknown-linux-musl
        . "$HOME/.cargo/env"
        cargo install --locked coyote-ai
-      user: '1000'
+      user: "1000"
      description: Install Coyote AI CLI via Rust's Cargo
    - command: |
        . "$HOME/.cargo/env"
        cargo install --locked iwec
      user: "1000"
      description: Install the IWE MCP server binary (iwec) used by the built-in iwe MCP server and iwe-knowledge-base skill
    - command: |
        . "$HOME/.cargo/env"
        cargo install --locked ast-grep
      user: "1000"
      description: Install ast-grep, used by the built-in ast_grep structural code search tool (and the explore agent)
  startup:
    - command:
        [
-          'sh',
+          "sh",
-          '-c',
+          "-c",
          'test -f "$HOME/.config/coyote/config.yaml" || coyote --info >/dev/null 2>&1 || true',
        ]
-      user: '1000'
+      user: "1000"
      background: false
      description: Bootstrap Coyote config directory on first sandbox start
@@ -37,7 +37,7 @@ Every `agent__spawn` result includes a session_id. **Use it.**
 Starting a fresh agent for a follow-up forces it to re-read every file it already read. That's 70%+ wasted tokens, plus the agent loses the reasoning it built up.
-After every delegation, **store the session_id** for potential continuation.
+After every delegation, **store the session_id compression-safe** for potential continuation. Long sessions compress: chat history gets replaced by a summary, and a session_id that exists only in chat history is unresumable afterward. Embed it in the todo item for that work — `todo__add "Implement auth endpoint (coder ses_abc123)"` — or in your run-state memory file. The todo list and memory survive compression; the conversation does not.
 ## Skill nudges to delegates
@@ -0,0 +1,40 @@
 ---
 description: Systematic troubleshooting of technical issues (services, networking, containers, OS) by running diagnostic commands directly instead of asking the user to.
 enabled_tools: execute_command
 ---
 A technical problem needs diagnosing. Apply this methodology strictly. Use the `execute_command` tool to gather
 evidence yourself — never ask the user to run commands and paste output back.
 ## Loop
 1. **Reproduce first.** Run the failing thing and read the actual error before theorizing.
 2. **Ask "what changed?"** Updates, config edits, reboots, expirations. Check recent history early.
 3. **Cheap checks first.** Service running/enabled? Interface up? Disk full? DNS resolving? Clock right?
 4. **Isolate by layer, one variable at a time.** Network: link → IP → routing → DNS → transport → app.
   Software: process → logs → config → deps/permissions → environment. Containers: daemon → image → container →
   logs → mounts/networks → host.
 5. **State each hypothesis in one line before testing it.** Pivot openly when disproved.
 6. **Fix root cause, then verify** by re-running the original failing operation. No verification, no fix.
 ## Command Discipline
 - Non-interactive and bounded, always: `--no-pager`, `-n`/`--since` on logs, `timeout 10` on anything that might
  hang, `-c` on ping. No TUIs — use batch modes.
 - Unprivileged first; `sudo` only when required, stating why.
 - Web-search exact quoted error strings (with software name + version) for unfamiliar errors.
 ## Safety Tiers
 1. **Read-only** (status, logs, ls, cat, ping, dig): run freely.
 2. **Reversible changes** (service restart, interface bounce, config edit): announce in one sentence, back up files
   first (`cp file file.bak.$(date +%s)`), then do it.
 3. **Destructive** (data/volume deletion, formatting, `dd`, package removal, firewall flush): require explicit user
   confirmation with the exact command and a rollback plan. Never on your own judgment.
 Redact any secrets appearing in command output. Never disable security controls as a "fix". Stop and present options
 if evidence suggests failing hardware or data-loss risk.
 ## Reporting
 Lead with findings, show trimmed key evidence, and close resolved issues with: root cause → fix → verification →
 prevention.
@@ -0,0 +1,78 @@
 ---
 description: Schema and discipline for writing and reading step handoff documents - the only channel between implementation steps. Evidence must be pasted, downstream plan changes proposed not imposed. Grants filesystem access for reading and writing handoffs.
 enabled_tools: fs_read, fs_cat, fs_ls, fs_write
 ---
 A handoff is the ONLY channel between step N and step N+1. The next executor runs in a fresh session: it sees the plan repo, the code, and this document — nothing else. Whatever you learned that isn't in the handoff (or in `plans/NOTES.md`) is lost. Write accordingly.
 Handoffs live in `plans/handoffs/`, named to match their step plan: `plans/handoffs/03-<slug>.md` for `plans/steps/03-<slug>.md`.
 ## Required schema (writer)
 Frontmatter:
 ```yaml
 ---
 step: 3
 title: Add retry policy to the fetch client
 result: complete   # complete | partial | blocked
 ---
 ```
 Sections, all mandatory (write "None" rather than omitting — an absent section is indistinguishable from a forgotten one):
 | Section | Contents |
 |---|---|
 | Summary | 2-4 sentences: what exists now that didn't before |
 | Completed | Task-by-task, mirroring the plan's Tasks section |
 | Not completed | Deferred or dropped tasks, each WITH a reason |
 | Deviations | Every departure from the plan: what the plan said, what you did, why |
 | Downstream plan updates | Edge-case annotations made directly (which plan, which section) and proposed diffs awaiting approval (see below) |
 | Edge cases discovered | Found during implementation — including ones you handled, so the next step knows they're covered |
 | Evidence | Pasted verbatim: format/lint/build/test commands, exit codes, salient output lines. Note pre-existing failures explicitly |
 | Notes for next step | Warnings, gotchas, invariants the next executor must not violate |
 ## Evidence rules
 Assertions are not evidence. "Tests pass" is a claim; this is evidence:
 ```
 $ cargo test
   ...
 test result: ok. 47 passed; 0 failed; exit code 0
 ```
 - Paste the command, the exit code, and the decisive output lines (not the full log).
 - Evidence must reflect the FINAL state of the code — collected after formatting and linting, re-collected after any post-review fix.
 - If a check was skipped (no formatter configured, etc.), say so explicitly.
 ## Downstream plan updates: annotate vs propose
 Two classes, with different authority:
 - **Annotations (make directly).** Adding an entry to a later plan's Edge cases section. Additive, non-scope-changing. Record each in Downstream plan updates.
 - **Proposals (never apply directly).** Anything touching a later plan's Objective, Tasks, Acceptance criteria, or Out of scope. Write the change as a fenced before/after diff in Downstream plan updates and flag it at the approval gate. The user applies or rejects it.
 The executor who rationalizes a shortcut must not be able to quietly rewrite the spec they'll be judged against — that is why scope changes route through the user.
 ## Rolling notes vs handoff
 - **Handoff**: step-scoped. What happened in THIS step.
 - **`plans/NOTES.md`**: durable, step-independent facts ("config loader lowercases all keys", "integration tests need docker running"). Append; never rewrite others' entries. Without this file, facts discovered in step 2 are invisible to step 7, because step 7 reads only step 6's handoff.
 ## Reading a handoff (start of a step)
 1. Check `result`. `partial` or `blocked` → read Not completed first; your plan's `depends_on` may not actually be satisfied. Escalate rather than build on missing ground.
 2. Trust what has pasted evidence. Re-verify bare assertions before depending on them.
 3. Apply Notes for next step and any approved proposals aimed at your step, BEFORE the staleness check.
 4. Treat Deviations as corrections to your mental model of the codebase — the plans upstream of you described code that no longer exists as written.
 5. Read `plans/NOTES.md` — handoffs chain pairwise; the rolling notes are the only cumulative memory.
 ## Anti-patterns
 - "All tests pass" with nothing pasted — a claim, not a handoff
 - Omitting a section instead of writing "None" — forgotten or empty, the reader can't tell
 - Editing a later plan's Tasks or scope directly instead of proposing a diff
 - Burying a major deviation in prose instead of the Deviations section
 - Durable facts in the handoff only — lost after one more step
 - Evidence collected before the formatter ran — the pasted output describes bytes that no longer exist
 - Writing the handoff before the completion gate (todos done or deferred-with-reason) is satisfied
@@ -0,0 +1,65 @@
 ---
 description: Navigate and curate markdown knowledge bases (plan repos, spec repos, companion docs) with IWE graph tools. Load when the workspace is or contains a markdown knowledge base and the task involves finding, reading, or reorganizing plans, specs, designs, or notes. Activates the iwe MCP server rooted at the current directory.
 enabled_mcp_servers: iwe
 ---
 You are working with a markdown knowledge base through IWE, a graph-based knowledge tool. The `iwe` MCP server is rooted at the current working directory (`--project .`), so the knowledge base is the directory Coyote was launched in. IWE derives structure from links: a link on its own line is an *inclusion link* (parent-child hierarchy); a link inside text is an *inline reference* (cross-reference, produces backlinks). The server watches the filesystem, so external edits are picked up automatically — never ask for a restart.
 ## When to use this (and when not)
 Use IWE tools when the task involves a corpus of markdown documents: plan repositories, spec/design collections, companion docs repos, meeting notes, PKM vaults.
 Do NOT use IWE tools for:
 - **Agent memory** (`.coyote/memory/`, `COYOTE.md`) — use the `memory__*` tools; they own the index conventions there.
 - **Semantic/similarity search over documents** — that is RAG's job. IWE search is fuzzy title/key matching plus structural traversal, not embeddings.
 - **Source code** — IWE only understands markdown.
 If unsure whether the current directory is actually a knowledge base, probe with `iwe_stats` first. Few or zero documents means this skill does not apply; unload it rather than forcing the tools.
 ## Orientation protocol (always start here)
 Never guess document keys. Orient first:
 1. `iwe_stats` — corpus size and shape. Cheap sanity check.
 2. `iwe_find(query="<topic>")` — fuzzy search for entry points. Use `roots` behavior via structural selectors when you want top-level topics only.
 3. `iwe_tree(key="<entry>", max_depth=2)` — see the hierarchy before reading bodies.
 4. `iwe_retrieve(key="<entry>", depth=1, context=1)` — read with structure.
 ## Reading efficiently
 `iwe_retrieve` is the workhorse. Control cost explicitly:
 - `depth` — how many levels of included children to expand. Start at 1-2; increase only if needed.
 - `context` — parent levels to include, so you know where a document sits. `context=1` is usually enough.
 - `max_tokens` — ALWAYS set a budget (e.g. 2000-4000) on large corpora; results report truncation so you can drill further deliberately.
 - `exclude` — pass keys you have already read to avoid re-retrieving known content.
 - `links` / `backlinks` — include outbound/inbound references when tracing how a topic connects.
 Scope searches structurally with selectors on `iwe_find`/`iwe_retrieve`/`iwe_tree`:
 - `in` — only sub-documents of EVERY listed key (AND)
 - `in_any` — sub-documents of at least one key (OR)
 - `not_in` — exclude subtrees (e.g. archives)
 Filter by frontmatter with the YAML query language: `status: draft`, `created: {$gte: "2026-01-01"}`, `tags: {$in: [urgent]}`, `reviewed: {$exists: true}`.
 Use `iwe_squash(key=...)` to flatten a subtree into one linear document — good for producing a full plan readout or summary input.
 ## Writing and refactoring
 Write tools: `iwe_create` (new doc from title + content), `iwe_update` (replace a doc's content), `iwe_delete` (remove + clean up references). Refactor tools: `iwe_rename` (key rename with automatic link updates everywhere), `iwe_extract` (split a section into its own doc, leaving an inclusion link), `iwe_inline` (merge a referenced doc back into its parent), `iwe_normalize` (reformat all docs consistently).
 Rules:
 - **Preview destructive operations**: `iwe_rename`, `iwe_delete`, `iwe_extract`, `iwe_inline`, and `iwe_normalize` support `dry_run` — use it first, show the user what will change, then apply.
 - Never rename or delete by editing files directly; the refactor tools update every referencing document, manual edits break links.
 - When adding a document, link it from an existing parent (inclusion link on its own line) so it joins the hierarchy instead of becoming an orphan.
 - Match the corpus conventions: check an existing document's frontmatter fields before inventing your own schema.
 - Do not run `iwe_normalize` across someone's knowledge base unprompted — it rewrites every file's formatting.
 ## Anti-patterns
 - Retrieving with `depth=5` and no `max_tokens` "to get everything" — you will flood the context. Iterate: shallow first, drill selectively.
 - Calling `iwe_find` repeatedly with rephrased queries when structural navigation (`iwe_tree`, selectors) would locate the document deterministically.
 - Using IWE write tools on `.coyote/memory/` files — wrong tier; that corrupts the memory index.
 - Creating documents without linking them into the hierarchy — orphans are invisible to depth-based retrieval.
@@ -0,0 +1,82 @@
 ---
 description: Author executable high-level plans and per-step implementation plans for phased work. Defines the plan repo layout and step-plan schema. Grants filesystem access for grounding plans in real code.
 enabled_tools: fs_read, fs_grep, fs_glob, fs_ls, fs_cat, fs_write
 ---
 You are writing implementation plans that a DIFFERENT agent will execute later, in a fresh session, with zero access to this conversation. The plan IS the executor's entire context. A plan that needs the conversation to make sense is a broken plan.
 ## Plan repo layout
 Default layout (match the existing layout instead if the repo already has one):
 ```
 plans/
  plan.md            # high-level plan; links each step plan
  steps/01-<slug>.md # one file per step, numbered in execution order
  handoffs/          # written by executors; see `handoff-protocol`
  NOTES.md           # rolling durable facts discovered during execution
 ```
 In `plan.md`, link each step plan with an inclusion link (the link alone on its own line). This makes the plan repo an IWE hierarchy — agents navigating a large plan corpus can load `iwe-knowledge-base` and traverse it structurally instead of globbing.
 ## High-level plan requirements
 - Ordered list of steps. Each step is independently implementable and independently verifiable — it compiles and its tests pass WITHOUT any later step existing.
 - The dependency graph is explicit and acyclic. If step 4 needs step 2's API, step 4's plan says so.
 - Steps are sized for one focused session: roughly 1-5 files of meaningful change. A step that needs "and then also..." is two steps.
 - State what the plan does NOT cover. Scope creep starts where scope boundaries are implicit.
 ## Step plan schema
 Every step plan starts with frontmatter:
 ```yaml
 ---
 step: 3
 title: Add retry policy to the fetch client
 depends_on: [1, 2]
 status: pending   # pending | in-progress | complete
 ---
 ```
 And contains these sections, all mandatory:
 | Section | Contents |
 |---|---|
 | Objective | 1-3 sentences: what exists after this step that didn't before |
 | Context | File paths AND pasted code snippets (5-20 lines) showing the patterns to follow. Not just paths — actual code |
 | Tasks | Ordered, atomic tasks. Each maps to one todo item for the executor |
 | Acceptance criteria | Measurable behaviors. These become the tests |
 | Test commands | Exact commands to run, from the repo root |
 | Edge cases | Known edge cases this step must handle or explicitly punt on |
 | Out of scope | What the executor must NOT touch, even if tempting |
 ## Writing for a context-free executor
 - Paste code snippets from your exploration into Context. "Follow the pattern in foo.rs" forces the executor to re-do exploration you already did.
 - Use repo-relative paths from the project root. Never "the file we discussed."
 - Name symbols exactly: `RetryPolicy::backoff`, not "the backoff logic."
 - If a decision was made in discussion (X over Y), record the decision AND the one-line reason. The executor will face the same fork and must not re-litigate it.
 - Write acceptance criteria as observable behavior ("returns 429 after 3 failed attempts"), not implementation ("uses a for loop"). Criteria that describe implementation produce tautological tests.
 ## Grounding (before the plan is done)
 Plans rot when written from memory. Before finalizing each step plan:
 1. `fs_grep` every symbol the plan references — confirm it exists and is spelled right.
 2. `fs_read` the files listed in Context — confirm the pasted snippets are current.
 3. Confirm the test commands actually exist (check `justfile`, `Makefile`, `package.json` scripts, CI config).
 A plan referencing a function that doesn't exist fails the executor at the worst possible time: mid-implementation.
 ## Edge cases are a first-class section
 For every step, enumerate the edge cases you can foresee: empty inputs, concurrent access, error paths, partial failures, migration/compat concerns. If an edge case belongs to a LATER step, write it in that step's plan now — not in a comment, not in your head. Executors are instructed to propagate newly discovered edge cases downstream; make their diff small by having the section exist.
 ## Anti-patterns
 - "As discussed above" / "per our conversation" — the executor has no conversation
 - File paths without pasted snippets in Context — forces re-exploration
 - Acceptance criteria like "works correctly" — unmeasurable, untestable
 - A step that depends on a later step — cycle; re-order or merge
 - Omitting Out of scope — the executor will helpfully refactor things you didn't ask for
 - Frontmatter without `depends_on` or `status` — breaks status queries and dependency checks
@@ -0,0 +1,83 @@
 ---
 description: Adversarial review of implementation plans against executability, verifiability, and completeness standards. Verdict is OKAY or REJECT with line-referenced complaints. Grants read-only filesystem access for ground-truth checks.
 enabled_tools: fs_read, fs_grep, fs_glob, fs_ls, fs_cat
 ---
 You are reviewing an implementation plan BEFORE any code is written. You are the critic, not a co-author: your job is to find the ways this plan fails an executor who has zero conversation context, not to redesign the approach. A flaw caught here costs one plan edit; the same flaw caught mid-implementation costs a deviation, a handoff note, and possibly rework across steps.
 The plan schema you are checking against is defined in the `plan-authoring` skill — load it alongside this one if it is not already loaded.
 ## Review checklist (in order)
 ### 1. Executability without context
 Read the plan as if you know nothing but what is on the page.
 - Does every referenced decision carry its rationale, or does it assume a conversation you can't see?
 - Does Context contain pasted code snippets, or only file paths (which force re-exploration)?
 - Are symbols named exactly? "The validation logic" is not a name.
 ### 2. Ground truth (verify, don't trust)
 Plans are written from exploration that may be stale or wrong. Spot-check claims against the actual codebase:
 - `fs_grep` for every function, type, and file the plan references. Flag anything that doesn't exist or is spelled differently.
 - `fs_read` 1-2 of the pasted Context snippets at their claimed locations. Flag drift.
 - Check that the Test commands exist (`justfile`, `Makefile`, `package.json`, CI config).
 A plan that references phantom code is an automatic REJECT.
 ### 3. Verifiability
 - Is every acceptance criterion a measurable, observable behavior? "Works correctly" and "is robust" are unmeasurable — flag them.
 - Do the criteria describe behavior rather than implementation? Implementation-shaped criteria produce tautological tests.
 - Can each criterion be checked by the listed Test commands, or is there a criterion with no way to verify it?
 ### 4. Dependencies and ordering
 - Is `depends_on` present, acyclic, and complete? If the step uses an API introduced in step N, is N listed?
 - Does anything in this step silently assume a LATER step's output? That's a cycle the frontmatter hides.
 - Is the step independently verifiable — will it build and pass tests without later steps existing?
 ### 5. Scope and sizing
 - Is Out of scope present and specific? Absent scope boundaries invite helpful refactoring.
 - Is the step sized for one focused session (~1-5 files of meaningful change)? Flag steps hiding an "and then also".
 - Do two steps touch the same code region without an ordering constraint between them?
 ### 6. Edge cases
 - Is the Edge cases section present and non-empty (or explicitly "none foreseen — <reason>")?
 - Think adversarially for 60 seconds: empty inputs, concurrency, error paths, partial failure, compat. Anything obvious the plan misses?
 - If this step creates a new surface (API, config, schema), do DOWNSTREAM step plans account for it where they must?
 ## Verdict format
 End with exactly one of:
 ```
 PLAN_REVIEW: OKAY
 <optional: 1-3 non-blocking observations>
 ```
 ```
 PLAN_REVIEW: REJECT
 Complaints:
 1. <file>:<line or section> — <what is wrong> — <what would fix it>
 2. ...
 ```
 Every complaint must be actionable and point at a specific location. "The plan could be clearer" is noise; "steps/03-retry.md, Acceptance criteria #2 — 'handles errors gracefully' is unmeasurable — specify the expected behavior per error class" is signal.
 ## Scope discipline
 - Review THE PLAN, not the design. If the approach is defensible, do not relitigate it because you'd have chosen differently. Flag design only when it is factually broken (races, missing dependency, contradicts the codebase).
 - Do not rewrite the plan yourself. Complaints, not patches — the author owns the fix.
 - Three strong complaints beat fifteen weak ones. If you have fifteen, the plan needs a rewrite, not a list: say so.
 ## Anti-patterns
 - Approving without running a single ground-truth check — a syntax review, not a plan review
 - REJECT for style or phrasing while missing a phantom-symbol reference
 - Redesigning the author's approach in your complaints
 - Vague complaints with no location and no fix direction
 - Rubber-stamping a step with no acceptance criteria because "the tasks look reasonable"
@@ -0,0 +1,85 @@
 ---
 description: End-to-end protocol for executing one step of a phased implementation plan - orient, staleness check, checklist, implement, edge-case sweep, verify, review, handoff, approval. Grants shell access for build/test commands.
 enabled_tools: execute_command
 ---
 You are executing ONE step of a phased implementation plan. Previous steps were executed in sessions you cannot see; later steps depend on what you do and document. The protocol below is ordered — do not skip phases, do not reorder them.
 Companion skills: load `handoff-protocol` before Phase 1 (you must READ a handoff correctly) and keep it loaded for Phase 8 (you must WRITE one). Load `verification-gates` for Phase 6. The plan schema is defined in `plan-authoring`.
 ## Phase 1 - Orient
 1. Read the previous step's handoff (`plans/handoffs/`, highest step number below yours). If none exists, you are step 1.
 2. Read the current step plan (`plans/steps/`). Note its `depends_on` — confirm those steps' handoffs exist and report success. If a dependency failed or is missing, STOP and escalate via `user__ask`.
 3. Read `plans/NOTES.md` for durable facts discovered by earlier steps.
 4. Apply anything the previous handoff directed at your step (approved plan updates, warnings).
 5. Set the plan's frontmatter `status: in-progress`.
 ## Phase 2 - Staleness check (BEFORE any edit)
 The plan was written before steps 1..N-1 changed the codebase. Verify its assumptions still hold:
 - Grep the symbols the plan references — do they still exist, with the claimed signatures?
 - Read the plan's Context snippets at their claimed locations — has the code drifted?
 - Confirm the Test commands still work.
 Discrepancies are deviations — handle them via Phase 5's protocol BEFORE implementing. Executing a stale plan literally is the primary failure mode of phased work.
 ## Phase 3 - Checklist
 `todo__init` with the step objective, then one `todo__add` per task in the plan's Tasks section, in order. Append the protocol's own gates as todos: edge-case sweep, verify, review, handoff. Mark items done with `todo__done` as you go — never batch. The checklist is what survives context compression; keep it truthful.
 When you spawn an agent whose session you may need to resume, embed its session_id in the corresponding todo item text (`"Implement task 3 (coder ses_abc123)"`). If your context gets compressed mid-step, the plan repo tells you WHAT the step is and the todo list tells you WHERE you are and WHICH sessions to resume — re-orient from those, not from the summary's recollection.
 ## Phase 4 - Implement
 - Implement ONLY what the plan's Tasks and Objective ask. Out of scope means out of scope.
 - Follow the patterns pasted in the plan's Context. When plan and current codebase disagree, the codebase wins — record the deviation.
 - Write tests from the plan's Acceptance criteria, not from your implementation. Criteria-first tests catch what tautological tests cannot.
 - While in the code, note (do not fix) anything the planning exploration missed — feed it to Phase 5.
 ## Phase 5 - Edge-case sweep and deviations
 **Edge cases.** For each edge case you discovered: if it belongs to THIS step, handle it (or punt explicitly in the handoff with a reason). If it belongs to a LATER step, check that step's plan — if the plan already covers it, done; if not, add it to that plan's Edge cases section and record the addition in your handoff.
 **Deviations.** Classify each:
 | Class | Definition | Action |
 |---|---|---|
 | Minor | Same objective and scope, mechanics differ (renamed symbol, moved file, extra helper) | Resolve it, document in handoff |
 | Major | Changes scope, approach, interfaces, or invalidates a later step's assumptions | Do NOT silently proceed. Either escalate via `user__ask`, or write a proposed downstream-plan diff into the handoff per `handoff-protocol` |
 Never rewrite a later step's Objective, Tasks, or Out of scope directly — edge-case annotations are the only direct downstream edit you may make.
 ## Phase 6 - Verify (order matters)
 1. Formatter (if configured) — format BEFORE collecting evidence, so evidence reflects final code.
 2. Linter (if configured) — fix findings your change introduced.
 3. Build/typecheck — exit code 0.
 4. FULL test suite — not just your new tests; regressions in untouched code are your problem if your change caused them.
 Capture commands and exit codes verbatim — they go in the handoff as evidence. Pre-existing failures: note explicitly, don't fix, don't hide. Apply the 3-strike rule: after 3 failed fix attempts, stop, revert to working state, escalate.
 ## Phase 7 - Review
 Self-review the diff with `code-review` + `ai-slop-remover` loaded. For broad steps (5+ files or crossing architectural boundaries), request an independent pass (`code-reviewer` agent) instead. Fix blockers; re-run Phase 6 after any fix.
 ## Phase 8 - Handoff
 Gate: every todo is either done or explicitly deferred with a reason. No silent drops.
 Write the handoff per `handoff-protocol` — schema, pasted evidence, deviations, downstream updates, notes for the next step. Append durable, step-independent facts to `plans/NOTES.md`. Set the plan's frontmatter `status: complete`.
 ## Phase 9 - User approval
 Present: what was done, deviations, downstream plan changes (made or proposed), evidence summary, handoff location. Then STOP — do not begin the next step. If the user requests changes, address them, re-run Phase 6, update the handoff, and present again.
 ## Anti-patterns
 - Editing code before the staleness check — the primary source of mid-step surprises
 - Implementing "while I'm here" improvements outside the plan's scope
 - Tests derived from the implementation instead of the acceptance criteria
 - Collecting build/test evidence BEFORE formatting/linting, then shipping different bytes
 - Running only your new tests and claiming "tests pass"
 - Silently absorbing a major deviation instead of escalating or proposing a plan diff
 - Rewriting downstream plan scope directly instead of proposing per `handoff-protocol`
 - Starting the next step without user approval
@@ -91,6 +91,7 @@ enabled_tools: null              # Which tools to enable by default.
                                 # Example (comma-separated form):
                                 #   enabled_tools: fs,web_search_coyote
 visible_tools:                   # Which tools are visible to be compiled (and are thus able to be defined in 'enabled_tools')
 #  - ast_grep.sh
 #  - demo_py.py
 #  - demo_sh.sh
 #  - demo_ts.ts
@@ -133,6 +133,13 @@ impl MessageContent {
        }
    }
    pub fn as_text(&self) -> Option<&str> {
        match self {
            MessageContent::Text(text) => Some(text),
            _ => None,
        }
    }
    pub fn merge_prompt(&mut self, replace_fn: impl Fn(&str) -> String) {
        match self {
            MessageContent::Text(text) => *text = replace_fn(text),
@@ -118,6 +118,14 @@ pub struct MemoryFrontmatter {
    pub description: Option<String>,
    #[serde(default, rename = "type")]
    pub kind: Option<String>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub created: Option<String>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub updated: Option<String>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub superseded_by: Option<String>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub expires: Option<String>,
 }
 #[derive(Debug, Clone)]
@@ -545,6 +553,7 @@ mod tests {
                name: "test".into(),
                description: Some("a test".into()),
                kind: Some("user".into()),
                ..Default::default()
            },
            body: "Hello world\nmore text".into(),
        };
@@ -135,6 +135,7 @@ const RAGS_DIR_NAME: &str = "rags";
 const FUNCTIONS_DIR_NAME: &str = "functions";
 const FUNCTIONS_BIN_DIR_NAME: &str = "bin";
 const AGENTS_DIR_NAME: &str = "agents";
 const REPL_HISTORY_DIR_NAME: &str = "repl-history";
 const GLOBAL_TOOLS_DIR_NAME: &str = "tools";
 const GLOBAL_TOOLS_UTILS_DIR_NAME: &str = "utils";
 const BASH_PROMPT_UTILS_FILE_NAME: &str = "prompt-utils.sh";
@@ -150,7 +151,7 @@ const SBX_VAULT_MIXINS_DIR_NAME: &str = "sbx-vault-mixins";
 const SBX_MIXIN_KITS_DIR_NAME: &str = "sbx-mixin-kits";
 const GIT_DIR_NAME: &str = ".git";
 const GITIGNORE_FILE_NAME: &str = ".gitignore";
-const DEFAULT_VISIBLE_TOOLS: [&str; 18] = [
+const DEFAULT_VISIBLE_TOOLS: [&str; 19] = [
    "execute_command.sh",
    "execute_py_code.py",
    "execute_sql_code.sh",
@@ -164,6 +165,7 @@ const DEFAULT_VISIBLE_TOOLS: [&str; 18] = [
    "fs_read.sh",
    "fs_rm.sh",
    "fs_write.sh",
    "ast_grep.sh",
    "get_current_time.sh",
    "get_current_weather.sh",
    "search_wikipedia.sh",
@@ -8,6 +8,8 @@ use super::{
    SKILLS_DIR_NAME, WORKSPACE_MEMORY_DIR_NAME,
 };
 use crate::client::ProviderModels;
 use crate::config::REPL_HISTORY_DIR_NAME;
 use crate::config::session::Session;
 use crate::utils::{get_env_name, list_file_names, normalize_env_name};
 use anyhow::{Context, Result, anyhow, bail};
@@ -320,6 +322,20 @@ pub fn workspace_memory_dir_for(workspace_root: &Path) -> PathBuf {
        .join(MEMORY_DIR_NAME)
 }
 pub fn repl_history_dir() -> PathBuf {
    cache_path().join(REPL_HISTORY_DIR_NAME)
 }
 pub fn repl_history_file(session: &Option<Session>) -> PathBuf {
    let history_key = if let Some(session) = &session {
        format!("session_{}", session.name().replace('/', "_"))
    } else {
        "default".to_string()
    };
    repl_history_dir().join(history_key)
 }
 pub fn log_config() -> Result<(LevelFilter, Option<PathBuf>)> {
    let log_level = env::var(get_env_name("log_level"))
        .ok()
@@ -18,10 +18,16 @@ pub(crate) const DEFAULT_MEMORY_INSTRUCTIONS: &str = indoc! {"
        - `memory__read(name)`: Read a specific drill file's full content.
        - `memory__write(name, content, scope)`: Create or replace a drill file (scope: 'global' | 'workspace').
          The MEMORY.md index is appended automatically; do not also update the index by hand.
          Optional `superseded_by` / `expires` (YYYY-MM-DD) mark a memory as stale for later cleanup.
        - `memory__rename(name, new_name, scope)`: Rename a drill file. Its index entry and every
          [[wikilink]] to it are rewritten automatically.
        - `memory__delete(name, scope)`: Delete a drill file and its index entry. Reports any
          [[wikilinks]] left dangling in other files.
        - `memory__edit_index(scope, content)`: Replace the entire MEMORY.md at the given scope.
          Use this to add always-on facts, reorganize, prune stale entries, or fix descriptions.
        - `memory__list()`: See all known drill files and their metadata.
-        - `memory__lint()`: Health-check memory for orphans, broken links, oversized files.
+        - `memory__lint()`: Health-check memory for orphans, broken links, oversized files,
          stale (superseded/expired) files, and index descriptions that drifted from the files.
    RULES:
        - Every interaction has two outputs: your answer AND any memory updates the conversation warrants.
@@ -29,7 +35,11 @@ pub(crate) const DEFAULT_MEMORY_INSTRUCTIONS: &str = indoc! {"
        - All MEMORY.md edits MUST go through `memory__edit_index`. NEVER use `fs_write`, `fs_patch`,
          or any other generic file tool on MEMORY.md — Coyote manages its location and a stray
          MEMORY.md outside the managed path is invisible to memory.
-        - All drill files MUST go through `memory__write`. The index updates itself.
+        - All drill files MUST go through `memory__write`. The index updates itself. Renames and
          deletions MUST go through `memory__rename` / `memory__delete` so links stay intact.
        - When a fact becomes outdated, update it in place, delete it, or mark the old file with
          `superseded_by`/`expires` so `memory__lint` flags it later. Never leave contradictory
          memories side by side.
        - Use [[wikilink]] notation in memory files to reference other memories by their `name:` slug.
        - NEVER write secrets, credentials, or API keys to memory — memory is plaintext on disk.
          Use coyote's Vault for secrets.
@@ -5116,6 +5116,45 @@ mod tests {
        assert!(paths::skill_file("frontend-ui-ux").exists());
    }
    #[test]
    #[serial]
    fn bundled_graph_agents_parse_and_validate() {
        use crate::graph::GraphParser;
        use crate::graph::validator::GraphValidator;
        let _guard = TestConfigDirGuard::new();
        Agent::install_builtin_agents(false).unwrap();
        Skill::install_builtin_skills(false).unwrap();
        let mut checked = Vec::new();
        for entry in std::fs::read_dir(paths::agents_data_dir()).unwrap() {
            let dir = entry.unwrap().path();
            let graph_path = dir.join("graph.yaml");
            if !graph_path.exists() {
                continue;
            }
            let name = dir.file_name().unwrap().to_string_lossy().to_string();
            let graph = GraphParser::new(&dir)
                .load_from_file(&graph_path)
                .unwrap_or_else(|e| panic!("graph.yaml for '{name}' failed to parse: {e}"));
            let result = GraphValidator::new(&dir).validate(&graph);
            assert!(
                result.errors.is_empty(),
                "graph.yaml for '{name}' failed validation: {:#?}",
                result.errors
            );
            checked.push(name);
        }
        checked.sort();
        for expected in ["coder", "librarian", "step-runner"] {
            assert!(
                checked.iter().any(|n| n == expected),
                "expected bundled graph agent '{expected}' to be checked; found {checked:?}"
            );
        }
    }
    #[test]
    #[serial]
    fn install_functions_force_preserves_user_mcp_json() {
@@ -163,6 +163,14 @@ impl Session {
        self.messages.is_empty() && self.compressed_messages.is_empty()
    }
    pub fn messages(&self) -> &[Message] {
        &self.messages
    }
    pub fn compressed_messages(&self) -> &[Message] {
        &self.compressed_messages
    }
    pub fn name(&self) -> &str {
        &self.name
    }
@@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
 use std::{env, fs};
 use anyhow::{Context, Result, anyhow, bail};
 use chrono::Local;
 use indexmap::IndexMap;
 use serde_json::{Value, json};
@@ -97,6 +98,32 @@ pub fn memory_function_declarations() -> Vec<FunctionDeclaration> {
                            ..Default::default()
                        },
                    ),
                    (
                        "superseded_by".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some(
                                "Optional `name:` slug of the memory that replaces this one. \
                                 `memory__lint` flags superseded files for cleanup. Omitting this \
                                 on overwrite clears any previous value."
                                    .into(),
                            ),
                            ..Default::default()
                        },
                    ),
                    (
                        "expires".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some(
                                "Optional ISO date (YYYY-MM-DD) after which this memory is stale. \
                                 `memory__lint` flags expired files. Omitting this on overwrite \
                                 clears any previous value."
                                    .into(),
                            ),
                            ..Default::default()
                        },
                    ),
                ])),
                required: Some(vec![
                    "name".to_string(),
@@ -164,6 +191,90 @@ pub fn memory_function_declarations() -> Vec<FunctionDeclaration> {
            },
            agent: false,
        },
        FunctionDeclaration {
            name: format!("{MEMORY_FUNCTION_PREFIX}rename"),
            description:
                "Rename a memory file. Its MEMORY.md index entry and every [[wikilink]] to it in \
                 other memory files are rewritten automatically."
                    .to_string(),
            parameters: JsonSchema {
                type_value: Some("object".to_string()),
                properties: Some(IndexMap::from([
                    (
                        "name".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some("Current `name:` slug of the memory file".into()),
                            ..Default::default()
                        },
                    ),
                    (
                        "new_name".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some(
                                "New kebab-case slug for the file (no extension)".into(),
                            ),
                            ..Default::default()
                        },
                    ),
                    (
                        "scope".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some(
                                "Scope of the file: 'global' (user-level) or 'workspace' (project-level)"
                                    .into(),
                            ),
                            ..Default::default()
                        },
                    ),
                ])),
                required: Some(vec![
                    "name".to_string(),
                    "new_name".to_string(),
                    "scope".to_string(),
                ]),
                ..Default::default()
            },
            agent: false,
        },
        FunctionDeclaration {
            name: format!("{MEMORY_FUNCTION_PREFIX}delete"),
            description:
                "Delete a memory file and remove its MEMORY.md index entry. Reports any \
                 [[wikilinks]] in other memory files left dangling by the deletion."
                    .to_string(),
            parameters: JsonSchema {
                type_value: Some("object".to_string()),
                properties: Some(IndexMap::from([
                    (
                        "name".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some(
                                "The `name:` slug of the memory file to delete".into(),
                            ),
                            ..Default::default()
                        },
                    ),
                    (
                        "scope".to_string(),
                        JsonSchema {
                            type_value: Some("string".to_string()),
                            description: Some(
                                "Scope of the file: 'global' (user-level) or 'workspace' (project-level)"
                                    .into(),
                            ),
                            ..Default::default()
                        },
                    ),
                ])),
                required: Some(vec!["name".to_string(), "scope".to_string()]),
                ..Default::default()
            },
            agent: false,
        },
    ]
 }
@@ -214,47 +325,13 @@ pub fn handle_memory_tool(ctx: &mut RequestContext, cmd_name: &str, args: &Value
                "workspace": store.workspace.as_ref().map(workspace_label),
            }))
        }
-        "write" => {
+        "write" => write_memory(&store, &cwd, args),
-            let name = arg_str(args, "name")?;
+        "rename" => rename_memory(&store, &cwd, args),
-            let description = arg_str(args, "description")?;
+        "delete" => delete_memory(&store, &cwd, args),
            let content = arg_str(args, "content")?;
            let scope = arg_str(args, "scope")?;
            let kind = args.get("type").and_then(Value::as_str).map(String::from);
            let target_dir = match scope.as_str() {
                "global" => paths::global_memory_dir(),
                "workspace" => workspace_write_dir(&store, &cwd)?,
                other => bail!("unknown scope '{}': use 'global' or 'workspace'", other),
            };
            let file = MemoryFile {
                path: target_dir.join(format!("{name}.md")),
                frontmatter: MemoryFrontmatter {
                    name: name.clone(),
                    description: Some(description.clone()),
                    kind,
                },
                body: content,
            };
            file.save()?;
            let index_path = target_dir.join("MEMORY.md");
            let index_updated = ensure_index_entry(&index_path, &name, &description)?;
            Ok(json!({
                "status": "ok",
                "path": file.path.display().to_string(),
                "index_path": index_path.display().to_string(),
                "index_updated": index_updated,
            }))
        }
        "edit_index" => {
            let scope = arg_str(args, "scope")?;
            let content = arg_str(args, "content")?;
-            let target_dir = match scope.as_str() {
+            let target_dir = scope_dir(&store, &cwd, &scope)?;
                "global" => paths::global_memory_dir(),
                "workspace" => workspace_write_dir(&store, &cwd)?,
                other => bail!("unknown scope '{}': use 'global' or 'workspace'", other),
            };
            let index_path = write_memory_index(&target_dir, &content)?;
            Ok(json!({
@@ -267,19 +344,229 @@ pub fn handle_memory_tool(ctx: &mut RequestContext, cmd_name: &str, args: &Value
    }
 }
 fn write_memory(store: &MemoryStore, cwd: &Path, args: &Value) -> Result<Value> {
    let name = arg_str(args, "name")?;
    let description = arg_str(args, "description")?;
    let content = arg_str(args, "content")?;
    let scope = arg_str(args, "scope")?;
    let kind = args.get("type").and_then(Value::as_str).map(String::from);
    let superseded_by = args
        .get("superseded_by")
        .and_then(Value::as_str)
        .map(String::from);
    let expires = args
        .get("expires")
        .and_then(Value::as_str)
        .map(String::from);
    let target_dir = scope_dir(store, cwd, &scope)?;
    let path = target_dir.join(format!("{name}.md"));
    let previous = if path.exists() {
        MemoryFile::load(&path).ok()
    } else {
        None
    };
    let today = today_string();
    let created = previous
        .as_ref()
        .and_then(|p| p.frontmatter.created.clone())
        .unwrap_or_else(|| today.clone());
    let file = MemoryFile {
        path,
        frontmatter: MemoryFrontmatter {
            name: name.clone(),
            description: Some(description.clone()),
            kind,
            created: Some(created),
            updated: Some(today),
            superseded_by,
            expires,
        },
        body: content,
    };
    file.save()?;
    let index_path = target_dir.join("MEMORY.md");
    let index_updated = ensure_index_entry(&index_path, &name, &description)?;
    Ok(json!({
        "status": "ok",
        "path": file.path.display().to_string(),
        "index_path": index_path.display().to_string(),
        "index_updated": index_updated,
        "replaced": previous.is_some(),
        "previous_description": previous.and_then(|p| p.frontmatter.description),
    }))
 }
 fn rename_memory(store: &MemoryStore, cwd: &Path, args: &Value) -> Result<Value> {
    let name = arg_str(args, "name")?;
    let new_name = arg_str(args, "new_name")?;
    let scope = arg_str(args, "scope")?;
    if new_name.is_empty()
        || !new_name
            .chars()
            .all(|c| c.is_alphanumeric() || c == '-' || c == '_')
    {
        bail!(
            "invalid new_name '{}': use a kebab-case slug (alphanumeric, hyphens, underscores)",
            new_name
        );
    }
    if name == new_name {
        bail!("new_name matches the current name");
    }
    let target_dir = scope_dir(store, cwd, &scope)?;
    let files = store.list_files()?;
    let file = files
        .iter()
        .find(|f| f.path.starts_with(&target_dir) && f.frontmatter.name == name)
        .ok_or_else(|| anyhow!("memory file '{}' not found in scope '{}'", name, scope))?
        .clone();
    if target_dir.join(format!("{new_name}.md")).exists()
        || files
            .iter()
            .any(|f| f.path.starts_with(&target_dir) && f.frontmatter.name == new_name)
    {
        bail!(
            "memory file '{}' already exists in scope '{}'",
            new_name,
            scope
        );
    }
    let needle = format!("[[{name}]]");
    let replacement = format!("[[{new_name}]]");
    let mut renamed = file.clone();
    renamed.path = target_dir.join(format!("{new_name}.md"));
    renamed.frontmatter.name = new_name.clone();
    renamed.frontmatter.updated = Some(today_string());
    renamed.body = renamed.body.replace(&needle, &replacement);
    renamed.save()?;
    fs::remove_file(&file.path).with_context(|| format!("remove {}", file.path.display()))?;
    let mut rewritten = Vec::new();
    for f in &files {
        if f.path == file.path || !f.body.contains(&needle) {
            continue;
        }
        let mut updated = f.clone();
        updated.body = updated.body.replace(&needle, &replacement);
        updated.save()?;
        rewritten.push(f.frontmatter.name.clone());
    }
    // Own-scope index: rewrite the wikilink, drop any leftover references to the
    // old name, and guarantee the new name is present.
    let index_path = target_dir.join("MEMORY.md");
    if let Ok(existing) = fs::read_to_string(&index_path)
        && existing.contains(&needle)
    {
        fs::write(&index_path, existing.replace(&needle, &replacement))?;
    }
    remove_index_entry(&index_path, &name)?;
    let description = renamed.frontmatter.description.clone().unwrap_or_default();
    ensure_index_entry(&index_path, &new_name, &description)?;
    // Other indexes (other scope's MEMORY.md, lite COYOTE.md): rewrite wikilinks only.
    for other_index in other_index_paths(store, &target_dir) {
        if let Ok(existing) = fs::read_to_string(&other_index)
            && existing.contains(&needle)
        {
            fs::write(&other_index, existing.replace(&needle, &replacement))?;
        }
    }
    Ok(json!({
        "status": "ok",
        "old_path": file.path.display().to_string(),
        "new_path": renamed.path.display().to_string(),
        "rewritten_references": rewritten,
    }))
 }
 fn delete_memory(store: &MemoryStore, cwd: &Path, args: &Value) -> Result<Value> {
    let name = arg_str(args, "name")?;
    let scope = arg_str(args, "scope")?;
    let target_dir = scope_dir(store, cwd, &scope)?;
    let files = store.list_files()?;
    let file = files
        .iter()
        .find(|f| f.path.starts_with(&target_dir) && f.frontmatter.name == name)
        .ok_or_else(|| anyhow!("memory file '{}' not found in scope '{}'", name, scope))?;
    let deleted_path = file.path.clone();
    fs::remove_file(&deleted_path).with_context(|| format!("delete {}", deleted_path.display()))?;
    let index_path = target_dir.join("MEMORY.md");
    let index_updated = remove_index_entry(&index_path, &name)?;
    let dangling: Vec<String> = files
        .iter()
        .filter(|f| f.path != deleted_path && extract_wikilinks(&f.body).iter().any(|l| l == &name))
        .map(|f| f.frontmatter.name.clone())
        .collect();
    Ok(json!({
        "status": "ok",
        "deleted_path": deleted_path.display().to_string(),
        "index_updated": index_updated,
        "dangling_references": dangling,
    }))
 }
 fn scope_dir(store: &MemoryStore, cwd: &Path, scope: &str) -> Result<PathBuf> {
    match scope {
        "global" => Ok(paths::global_memory_dir()),
        "workspace" => workspace_write_dir(store, cwd),
        other => bail!("unknown scope '{}': use 'global' or 'workspace'", other),
    }
 }
 fn today_string() -> String {
    Local::now().format("%Y-%m-%d").to_string()
 }
 fn other_index_paths(store: &MemoryStore, own_dir: &Path) -> Vec<PathBuf> {
    let mut out = Vec::new();
    let global_index = store.global_dir.join("MEMORY.md");
    if store.global_dir.as_path() != own_dir && global_index.exists() {
        out.push(global_index);
    }
    match &store.workspace {
        Some(WorkspaceMemory::Structured { dir, .. }) => {
            let index = dir.join("MEMORY.md");
            if dir.as_path() != own_dir && index.exists() {
                out.push(index);
            }
        }
        Some(WorkspaceMemory::Lite { file, .. }) if file.exists() => {
            out.push(file.clone());
        }
        _ => {}
    }
    out
 }
 fn write_memory_index(target_dir: &Path, content: &str) -> Result<PathBuf> {
    fs::create_dir_all(target_dir)?;
    let index_path = target_dir.join("MEMORY.md");
    fs::write(&index_path, content)?;
    Ok(index_path)
 }
 fn ensure_index_entry(index_path: &Path, name: &str, description: &str) -> Result<bool> {
    let existing = fs::read_to_string(index_path).unwrap_or_default();
-    let already_referenced =
+    if index_references(&existing, name) {
        existing.contains(&format!("[[{name}]]")) || existing.contains(&format!("{name}.md"));
    if already_referenced {
        return Ok(false);
    }
@@ -297,6 +584,40 @@ fn ensure_index_entry(index_path: &Path, name: &str, description: &str) -> Resul
    }
    fs::write(index_path, new_content)?;
    Ok(true)
 }
 fn line_references(line: &str, name: &str) -> bool {
    let file_name = format!("{name}.md");
    line.split(|c: char| !(c.is_alphanumeric() || c == '-' || c == '_' || c == '.'))
        .any(|token| token == file_name || token.trim_matches('.') == name)
 }
 fn index_references(index: &str, name: &str) -> bool {
    index.lines().any(|line| line_references(line, name))
 }
 fn remove_index_entry(index_path: &Path, name: &str) -> Result<bool> {
    let Ok(existing) = fs::read_to_string(index_path) else {
        return Ok(false);
    };
    let kept: Vec<&str> = existing
        .lines()
        .filter(|line| !line_references(line, name))
        .collect();
    let mut new_content = kept.join("\n");
    if existing.ends_with('\n') && !new_content.is_empty() {
        new_content.push('\n');
    }
    if new_content == existing {
        return Ok(false);
    }
    fs::write(index_path, new_content)?;
    Ok(true)
 }
@@ -350,9 +671,11 @@ fn workspace_label(w: &WorkspaceMemory) -> Value {
 fn lint_memory(store: &MemoryStore) -> Result<Value> {
    let files = store.list_files()?;
    let names: HashSet<&str> = files.iter().map(|f| f.frontmatter.name.as_str()).collect();
    let today = today_string();
    let mut oversized = Vec::new();
    let mut broken_links = Vec::new();
    let mut stale = Vec::new();
    for f in &files {
        if f.char_len() > PER_FILE_SOFT_CAP {
            oversized.push(json!({"name": &f.frontmatter.name, "chars": f.char_len()}));
@@ -362,16 +685,54 @@ fn lint_memory(store: &MemoryStore) -> Result<Value> {
                broken_links.push(json!({"from": &f.frontmatter.name, "to": link}));
            }
        }
        if let Some(target) = &f.frontmatter.superseded_by {
            stale.push(json!({
                "name": &f.frontmatter.name,
                "reason": "superseded",
                "superseded_by": target,
                "target_exists": names.contains(target.as_str()),
            }));
        }
        if let Some(expires) = &f.frontmatter.expires
            && expires.as_str() < today.as_str()
        {
            stale.push(json!({
                "name": &f.frontmatter.name,
                "reason": "expired",
                "expires": expires,
            }));
        }
    }
-    let index_content = store
+    let global_index = store.load_global_index()?.unwrap_or_default();
-        .load_global_index()?
+    let workspace_index = store
-        .or_else(|| store.load_workspace_index().ok().flatten())
+        .load_workspace_index()
        .ok()
        .flatten()
        .unwrap_or_default();
    let mut orphans = Vec::new();
    let mut description_drift = Vec::new();
    for f in &files {
-        if !index_content.contains(&f.frontmatter.name) {
+        let index = if f.path.starts_with(&store.global_dir) {
            &global_index
        } else {
            &workspace_index
        };
        if !index_references(index, &f.frontmatter.name) {
            orphans.push(f.frontmatter.name.clone());
        } else if let (Some(index_desc), Some(file_desc)) = (
            index_description(index, &f.frontmatter.name),
            f.frontmatter.description.as_deref(),
        ) && index_desc != file_desc
        {
            description_drift.push(json!({
                "name": &f.frontmatter.name,
                "index_description": index_desc,
                "file_description": file_desc,
            }));
        }
    }
@@ -380,13 +741,26 @@ fn lint_memory(store: &MemoryStore) -> Result<Value> {
        "oversized": oversized,
        "broken_wikilinks": broken_links,
        "orphans": orphans,
        "stale": stale,
        "description_drift": description_drift,
    }))
 }
 fn index_description(index: &str, name: &str) -> Option<String> {
    let marker = format!("[[{name}]]");
    index.lines().find_map(|line| {
        let pos = line.find(&marker)?;
        let rest = line[pos + marker.len()..].trim_start();
        let desc = rest.strip_prefix(':')?.trim();
        (!desc.is_empty()).then(|| desc.to_string())
    })
 }
 fn extract_wikilinks(body: &str) -> Vec<String> {
    let mut out = Vec::new();
    let bytes = body.as_bytes();
    let mut i = 0;
    while i + 1 < bytes.len() {
        if bytes[i] == b'['
            && bytes[i + 1] == b'['
@@ -676,4 +1050,305 @@ mod tests {
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn line_references_requires_exact_token_match() {
        assert!(line_references("- [[auth]]: description", "auth"));
        assert!(line_references("- auth.md is here", "auth"));
        assert!(line_references("- referenced", "referenced"));
        assert!(line_references("see auth.", "auth"));
        assert!(!line_references("- [[auth-flow]]: description", "auth"));
        assert!(!line_references("- oauth.md legacy", "auth"));
        assert!(!line_references("- preauth notes", "auth"));
    }
    #[test]
    fn remove_index_entry_drops_only_matching_lines() {
        let root = temp_root("index_remove");
        let index = root.join("MEMORY.md");
        fs::write(
            &index,
            "# Memory Index\n\n- [[keep]]: stays\n- [[gone]]: removed\n",
        )
        .unwrap();
        assert!(remove_index_entry(&index, "gone").unwrap());
        let content = fs::read_to_string(&index).unwrap();
        assert!(content.contains("[[keep]]"));
        assert!(!content.contains("[[gone]]"));
        assert!(!remove_index_entry(&index, "gone").unwrap());
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn lint_checks_orphans_against_own_scope_index() {
        let root = temp_root("lint_scopes");
        let global = root.join("global");
        fs::create_dir_all(&global).unwrap();
        fs::write(global.join("MEMORY.md"), "- [[global-note]]: g\n").unwrap();
        fs::write(
            global.join("global-note.md"),
            "---\nname: global-note\n---\ng\n",
        )
        .unwrap();
        let workspace = root.join("ws");
        let structured = workspace.join(".coyote").join("memory");
        fs::create_dir_all(&structured).unwrap();
        fs::write(structured.join("MEMORY.md"), "- [[ws-note]]: w\n").unwrap();
        fs::write(
            structured.join("ws-note.md"),
            "---\nname: ws-note\n---\nw\n",
        )
        .unwrap();
        let store = MemoryStore {
            global_dir: global,
            workspace: discover_workspace_memory(&workspace),
        };
        let report = lint_memory(&store).unwrap();
        assert!(
            report["orphans"].as_array().unwrap().is_empty(),
            "expected no orphans, got: {report}"
        );
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn lint_flags_stale_and_description_drift() {
        let root = temp_root("lint_stale");
        let workspace = root.join("ws");
        let structured = workspace.join(".coyote").join("memory");
        fs::create_dir_all(&structured).unwrap();
        fs::write(
            structured.join("MEMORY.md"),
            "- [[old-plan]]: old\n- [[bygone]]: e\n- [[drifted]]: index says this\n",
        )
        .unwrap();
        fs::write(
            structured.join("old-plan.md"),
            "---\nname: old-plan\nsuperseded_by: new-plan\n---\nx\n",
        )
        .unwrap();
        fs::write(
            structured.join("bygone.md"),
            "---\nname: bygone\nexpires: 2000-01-01\n---\nx\n",
        )
        .unwrap();
        fs::write(
            structured.join("drifted.md"),
            "---\nname: drifted\ndescription: file says that\n---\nx\n",
        )
        .unwrap();
        let store = MemoryStore {
            global_dir: root.join("nonexistent_global"),
            workspace: discover_workspace_memory(&workspace),
        };
        let report = lint_memory(&store).unwrap();
        let stale = report["stale"].as_array().unwrap();
        let reasons: Vec<(&str, &str)> = stale
            .iter()
            .map(|v| (v["name"].as_str().unwrap(), v["reason"].as_str().unwrap()))
            .collect();
        assert!(reasons.contains(&("old-plan", "superseded")));
        assert!(reasons.contains(&("bygone", "expired")));
        let superseded = stale.iter().find(|v| v["name"] == "old-plan").unwrap();
        assert_eq!(superseded["target_exists"], false);
        let drift = report["description_drift"].as_array().unwrap();
        assert_eq!(drift.len(), 1);
        assert_eq!(drift[0]["name"], "drifted");
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn delete_memory_removes_file_index_entry_and_reports_dangling() {
        let root = temp_root("delete");
        let workspace = root.join("ws");
        let structured = workspace.join(".coyote").join("memory");
        fs::create_dir_all(&structured).unwrap();
        fs::write(
            structured.join("MEMORY.md"),
            "# Memory Index\n\n- [[doomed]]: bye\n- [[linker]]: links\n",
        )
        .unwrap();
        fs::write(
            structured.join("doomed.md"),
            "---\nname: doomed\n---\nbye\n",
        )
        .unwrap();
        fs::write(
            structured.join("linker.md"),
            "---\nname: linker\n---\nsee [[doomed]]\n",
        )
        .unwrap();
        let store = MemoryStore {
            global_dir: root.join("g"),
            workspace: discover_workspace_memory(&workspace),
        };
        let args = json!({"name": "doomed", "scope": "workspace"});
        let result = delete_memory(&store, &workspace, &args).unwrap();
        assert_eq!(result["status"], "ok");
        assert_eq!(result["index_updated"], true);
        assert!(!structured.join("doomed.md").exists());
        let index = fs::read_to_string(structured.join("MEMORY.md")).unwrap();
        assert!(!index.contains("doomed"));
        assert!(index.contains("[[linker]]"));
        assert_eq!(
            result["dangling_references"].as_array().unwrap(),
            &vec![json!("linker")]
        );
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn rename_memory_moves_file_and_rewrites_references() {
        let root = temp_root("rename");
        let workspace = root.join("ws");
        let structured = workspace.join(".coyote").join("memory");
        fs::create_dir_all(&structured).unwrap();
        fs::write(
            structured.join("MEMORY.md"),
            "# Memory Index\n\n- [[old-name]]: the plan\n- [[linker]]: links\n",
        )
        .unwrap();
        fs::write(
            structured.join("old-name.md"),
            "---\nname: old-name\ndescription: the plan\n---\nself link [[old-name]]\n",
        )
        .unwrap();
        fs::write(
            structured.join("linker.md"),
            "---\nname: linker\n---\nsee [[old-name]] and [[old-name-extended]]\n",
        )
        .unwrap();
        let store = MemoryStore {
            global_dir: root.join("g"),
            workspace: discover_workspace_memory(&workspace),
        };
        let args = json!({"name": "old-name", "new_name": "new-name", "scope": "workspace"});
        let result = rename_memory(&store, &workspace, &args).unwrap();
        assert_eq!(result["status"], "ok");
        assert!(!structured.join("old-name.md").exists());
        let renamed = MemoryFile::load(&structured.join("new-name.md")).unwrap();
        assert_eq!(renamed.frontmatter.name, "new-name");
        assert!(renamed.body.contains("[[new-name]]"));
        let linker = fs::read_to_string(structured.join("linker.md")).unwrap();
        assert!(linker.contains("[[new-name]]"));
        assert!(
            linker.contains("[[old-name-extended]]"),
            "unrelated links must be untouched: {linker}"
        );
        let index = fs::read_to_string(structured.join("MEMORY.md")).unwrap();
        assert!(index.contains("- [[new-name]]: the plan"));
        assert!(!index.contains("[[old-name]]"));
        assert!(index.contains("[[linker]]"));
        assert_eq!(
            result["rewritten_references"].as_array().unwrap(),
            &vec![json!("linker")]
        );
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn rename_memory_rejects_collisions_and_bad_slugs() {
        let root = temp_root("rename_guard");
        let workspace = root.join("ws");
        let structured = workspace.join(".coyote").join("memory");
        fs::create_dir_all(&structured).unwrap();
        fs::write(structured.join("MEMORY.md"), "- [[a]]: a\n- [[b]]: b\n").unwrap();
        fs::write(structured.join("a.md"), "---\nname: a\n---\nx\n").unwrap();
        fs::write(structured.join("b.md"), "---\nname: b\n---\nx\n").unwrap();
        let store = MemoryStore {
            global_dir: root.join("g"),
            workspace: discover_workspace_memory(&workspace),
        };
        let collision = json!({"name": "a", "new_name": "b", "scope": "workspace"});
        let err = rename_memory(&store, &workspace, &collision).unwrap_err();
        assert!(err.to_string().contains("already exists"));
        let bad_slug = json!({"name": "a", "new_name": "bad name!", "scope": "workspace"});
        let err = rename_memory(&store, &workspace, &bad_slug).unwrap_err();
        assert!(err.to_string().contains("invalid new_name"));
        let _ = fs::remove_dir_all(&root);
    }
    #[test]
    fn write_memory_stamps_timestamps_and_reports_replacement() {
        let root = temp_root("write_stamps");
        let workspace = root.join("ws");
        let structured = workspace.join(".coyote").join("memory");
        fs::create_dir_all(&structured).unwrap();
        fs::write(structured.join("MEMORY.md"), "# Memory Index\n").unwrap();
        let store = MemoryStore {
            global_dir: root.join("g"),
            workspace: discover_workspace_memory(&workspace),
        };
        let first = json!({
            "name": "fact",
            "description": "first version",
            "content": "body v1",
            "scope": "workspace",
            "expires": "2099-01-01",
        });
        let before = today_string();
        let result = write_memory(&store, &workspace, &first).unwrap();
        let after = today_string();
        assert_eq!(result["replaced"], false);
        assert_eq!(result["previous_description"], Value::Null);
        let saved = MemoryFile::load(&structured.join("fact.md")).unwrap();
        let created = saved.frontmatter.created.clone().expect("created stamped");
        assert!(
            created == before || created == after,
            "created '{created}' should be stamped with today's date"
        );
        assert_eq!(saved.frontmatter.updated, Some(created.clone()));
        assert_eq!(saved.frontmatter.expires.as_deref(), Some("2099-01-01"));
        assert_eq!(saved.frontmatter.superseded_by, None);
        let second = json!({
            "name": "fact",
            "description": "second version",
            "content": "body v2",
            "scope": "workspace",
        });
        let result = write_memory(&store, &workspace, &second).unwrap();
        assert_eq!(result["replaced"], true);
        assert_eq!(result["previous_description"], "first version");
        let saved = MemoryFile::load(&structured.join("fact.md")).unwrap();
        assert_eq!(
            saved.frontmatter.created,
            Some(created),
            "creation date must be preserved across overwrites"
        );
        assert!(saved.frontmatter.updated.is_some());
        assert_eq!(saved.frontmatter.expires, None);
        let _ = fs::remove_dir_all(&root);
    }
 }
@@ -1691,6 +1691,33 @@ mod tests {
        assert!(f.declarations().is_empty());
    }
    #[test]
    fn bundled_bash_tools_generate_declarations() {
        let tools_dir =
            std::path::Path::new(env!("CARGO_MANIFEST_DIR")).join("assets/functions/tools");
        let mut checked = Vec::new();
        for entry in std::fs::read_dir(&tools_dir).unwrap() {
            let path = entry.unwrap().path();
            if path.extension().and_then(OsStr::to_str) != Some("sh") {
                continue;
            }
            let name = path.file_stem().unwrap().to_string_lossy().to_string();
            let declarations = Functions::generate_declarations(&path)
                .unwrap_or_else(|e| panic!("bundled tool '{name}' failed to parse: {e}"));
            assert!(
                !declarations.is_empty(),
                "bundled tool '{name}' produced no function declaration"
            );
            checked.push(name);
        }
        for expected in ["fs_grep", "ast_grep", "execute_command"] {
            assert!(
                checked.iter().any(|n| n == expected),
                "expected bundled tool '{expected}' to be checked; found {checked:?}"
            );
        }
    }
    #[test]
    fn functions_append_todo_adds_declarations() {
        let mut f = Functions::default();
@@ -1,4 +1,4 @@
-use crate::client::oauth::{OAuthProvider, load_oauth_tokens, run_oauth_flow};
+use crate::client::oauth::{OAuthProvider, TokenRequestFormat, load_oauth_tokens, run_oauth_flow};
 use crate::config::paths;
 use anyhow::{Context, Result, anyhow};
 use chrono::Utc;
@@ -63,6 +63,10 @@ impl OAuthProvider for McpOAuthProvider {
        &self.scopes
    }
    fn token_request_format(&self) -> TokenRequestFormat {
        TokenRequestFormat::FormUrlEncoded
    }
    fn uses_localhost_redirect(&self) -> bool {
        false
    }
@@ -6,7 +6,10 @@ use self::completer::ReplCompleter;
 use self::highlighter::ReplHighlighter;
 use self::prompt::ReplPrompt;
-use crate::client::{call_chat_completions, call_chat_completions_streaming, init_client, oauth};
+use crate::client::{
    Message, MessageRole, call_chat_completions, call_chat_completions_streaming, init_client,
    oauth,
 };
 use crate::config::{
    AgentVariables, AppConfig, AssertState, Input, LastMessage, RequestContext, StateFlags,
    macro_execute,
@@ -29,9 +32,9 @@ use log::warn;
 use parking_lot::RwLock;
 use reedline::CursorConfig;
 use reedline::{
-    ColumnarMenu, EditCommand, EditMode, Emacs, KeyCode, KeyModifiers, Keybindings, Reedline,
+    ColumnarMenu, EditCommand, EditMode, Emacs, FileBackedHistory, KeyCode, KeyModifiers,
-    ReedlineEvent, ReedlineMenu, ValidationResult, Validator, Vi, default_emacs_keybindings,
+    Keybindings, Reedline, ReedlineEvent, ReedlineMenu, ValidationResult, Validator, Vi,
-    default_vi_insert_keybindings, default_vi_normal_keybindings,
+    default_emacs_keybindings, default_vi_insert_keybindings, default_vi_normal_keybindings,
 };
 use reedline::{MenuBuilder, Signal};
 use std::sync::LazyLock;
@@ -318,6 +321,58 @@ Type ".help" for additional help.
            }
        }
        {
            let (messages_snapshot, compressed_count) = {
                let ctx = self.ctx.read();
                if let Some(session) = &ctx.session {
                    let msgs: Vec<Message> = session
                        .messages()
                        .iter()
                        .filter(|m| !m.role.is_system())
                        .cloned()
                        .collect();
                    let compressed = session.compressed_messages().len();
                    (msgs, compressed)
                } else {
                    (vec![], 0)
                }
            };
            if !messages_snapshot.is_empty() || compressed_count > 0 {
                let app = Arc::clone(&self.ctx.read().app.config);
                if compressed_count > 0 {
                    println!(
                        "{}",
                        dimmed_text(&format!(
                            "({compressed_count} earlier messages not shown; compressed for context)"
                        ))
                    );
                    println!();
                }
                for message in &messages_snapshot {
                    match message.role {
                        MessageRole::User => {
                            if let Some(text) = message.content.as_text() {
                                println!("{}", dimmed_text("You:"));
                                println!("{text}");
                                println!();
                            }
                        }
                        MessageRole::Assistant => {
                            if let Some(text) = message.content.as_text() {
                                app.print_markdown(text)?;
                                println!();
                            }
                        }
                        _ => {}
                    }
                }
                println!("{}", dimmed_text("─── ↑ previous conversation ↑ ───"));
                println!();
            }
        }
        loop {
            if self.abort_signal.aborted_ctrld() {
                break;
@@ -393,6 +448,14 @@ Type ".help" for additional help.
            editor = editor.with_buffer_editor(command, temp_file);
        }
        if app.save_shell_history {
            let ctx = ctx.read();
            let history_path = paths::repl_history_file(&ctx.session);
            if let Ok(history) = FileBackedHistory::with_file(1000, history_path) {
                editor = editor.with_history(Box::new(history));
            }
        }
        Ok(editor)
    }
@@ -684,6 +747,46 @@ pub async fn run_repl_command(
                        session.set_autonaming(false);
                    }
                }
                if let Some(session) = &ctx.session {
                    let messages_snapshot: Vec<Message> = session
                        .messages()
                        .iter()
                        .filter(|m| !m.role.is_system())
                        .cloned()
                        .collect();
                    let compressed_count = session.compressed_messages().len();
                    if !messages_snapshot.is_empty() || compressed_count > 0 {
                        if compressed_count > 0 {
                            println!(
                                "{}",
                                dimmed_text(&format!(
                                    "({compressed_count} earlier messages not shown — compressed for context)"
                                ))
                            );
                            println!();
                        }
                        for message in &messages_snapshot {
                            match message.role {
                                MessageRole::User => {
                                    if let Some(text) = message.content.as_text() {
                                        println!("{}", dimmed_text("You:"));
                                        println!("{text}");
                                        println!();
                                    }
                                }
                                MessageRole::Assistant => {
                                    if let Some(text) = message.content.as_text() {
                                        app.print_markdown(text)?;
                                        println!();
                                    }
                                }
                                _ => {}
                            }
                        }
                        println!("{}", dimmed_text("─── ↑ previous conversation ↑ ───"));
                        println!();
                    }
                }
            }
            ".install" => {
                let trimmed = args.map(str::trim).unwrap_or("");
Author	SHA1	Message	Date
Dark-Alex-17	d4a6a2fb34	build: Installed ast_grep in the sandbox kit definition CI / All (macos-latest) (push) Waiting to run Details CI / All (windows-latest) (push) Waiting to run Details CI / All (ubuntu-latest) (push) Failing after 26s Details	2026-07-04 13:14:22 -06:00
Dark-Alex-17	8f667886c8	docs: Updated the README to mention the installation of ast-grep	2026-07-04 13:14:08 -06:00
Dark-Alex-17	898bac3c69	docs: added the ast_grep tool to the config.example.yaml	2026-07-04 13:13:16 -06:00
Dark-Alex-17	fc0b2ada7e	feat: Implemented durable state for sisyphus	2026-07-04 13:02:50 -06:00
Dark-Alex-17	09cdb40420	feat: Installed ast-grep for the explore agent to use for better code exploration	2026-07-04 12:59:05 -06:00
Dark-Alex-17	9d2e936e7f	feat: Created the step-runner graph agent for more deterministic coding workflows to produce even more reliable and higher-quality results	2026-07-04 12:50:37 -06:00
Dark-Alex-17	159afbbc06	feat: Improved oracle and sisyphus agents with skill integrations for the new skills	2026-07-04 12:34:09 -06:00
Dark-Alex-17	428d544277	feat: Created new sisyphus family skills to improve performance	2026-07-04 12:28:42 -06:00
Dark-Alex-17	531bdfab7f	feat: Created new diagnostic role and skill for use in other contexts	2026-07-04 12:28:24 -06:00
Dark-Alex-17	08f6ea5e6c	feat: Added new memory functions for deleting and renaming memory files, as well as new lints for memory expiration dates and staleness of memories to improve the memory system CI / All (macos-latest) (push) Waiting to run Details CI / All (windows-latest) (push) Waiting to run Details CI / All (ubuntu-latest) (push) Failing after 28s Details	2026-07-03 22:30:08 -06:00
Dark-Alex-17	ede0f75a89	feat: Created a new iwe skill and installed the iwe MCP server for utilizing large knowledgebases	2026-07-03 22:04:16 -06:00
Dark-Alex-17	2ec2aec4c0	style: updated the previous conversation marker a tad CI / All (ubuntu-latest) (push) Failing after 26s Details CI / All (macos-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-07-02 16:49:38 -06:00
Dark-Alex-17	c2cb4ac433	feat: Session-specific, file-backed history in the REPL CI / All (ubuntu-latest) (push) Failing after 25s Details CI / All (macos-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-07-02 16:44:55 -06:00
Dark-Alex-17	605a9170b0	feat: Replay session output when a user re-enters a session so all output can be seen again	2026-07-02 16:35:10 -06:00
Dark-Alex-17	385bd3eda2	fix: Overrode the default JSON content-type for MCP OAuth so its properly application/x-www-form-urlencoded CI / All (ubuntu-latest) (push) Failing after 26s Details CI / All (macos-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-07-02 15:53:29 -06:00