15 Commits

Author SHA1 Message Date
Dark-Alex-17 d4a6a2fb34 build: Installed ast_grep in the sandbox kit definition
CI / All (macos-latest) (push) Waiting to run
CI / All (windows-latest) (push) Waiting to run
CI / All (ubuntu-latest) (push) Failing after 26s
2026-07-04 13:14:22 -06:00
Dark-Alex-17 8f667886c8 docs: Updated the README to mention the installation of ast-grep 2026-07-04 13:14:08 -06:00
Dark-Alex-17 898bac3c69 docs: added the ast_grep tool to the config.example.yaml 2026-07-04 13:13:16 -06:00
Dark-Alex-17 fc0b2ada7e feat: Implemented durable state for sisyphus 2026-07-04 13:02:50 -06:00
Dark-Alex-17 09cdb40420 feat: Installed ast-grep for the explore agent to use for better code exploration 2026-07-04 12:59:05 -06:00
Dark-Alex-17 9d2e936e7f feat: Created the step-runner graph agent for more deterministic coding workflows to produce even more reliable and higher-quality results 2026-07-04 12:50:37 -06:00
Dark-Alex-17 159afbbc06 feat: Improved oracle and sisyphus agents with skill integrations for the new skills 2026-07-04 12:34:09 -06:00
Dark-Alex-17 428d544277 feat: Created new sisyphus family skills to improve performance 2026-07-04 12:28:42 -06:00
Dark-Alex-17 531bdfab7f feat: Created new diagnostic role and skill for use in other contexts 2026-07-04 12:28:24 -06:00
Dark-Alex-17 08f6ea5e6c feat: Added new memory functions for deleting and renaming memory files, as well as new lints for memory expiration dates and staleness of memories to improve the memory system
CI / All (macos-latest) (push) Waiting to run
CI / All (windows-latest) (push) Waiting to run
CI / All (ubuntu-latest) (push) Failing after 28s
2026-07-03 22:30:08 -06:00
Dark-Alex-17 ede0f75a89 feat: Created a new iwe skill and installed the iwe MCP server for utilizing large knowledgebases 2026-07-03 22:04:16 -06:00
Dark-Alex-17 2ec2aec4c0 style: updated the previous conversation marker a tad
CI / All (ubuntu-latest) (push) Failing after 26s
CI / All (macos-latest) (push) Has been cancelled
CI / All (windows-latest) (push) Has been cancelled
2026-07-02 16:49:38 -06:00
Dark-Alex-17 c2cb4ac433 feat: Session-specific, file-backed history in the REPL
CI / All (ubuntu-latest) (push) Failing after 25s
CI / All (macos-latest) (push) Has been cancelled
CI / All (windows-latest) (push) Has been cancelled
2026-07-02 16:44:55 -06:00
Dark-Alex-17 605a9170b0 feat: Replay session output when a user re-enters a session so all output can be seen again 2026-07-02 16:35:10 -06:00
Dark-Alex-17 385bd3eda2 fix: Overrode the default JSON content-type for MCP OAuth so its properly application/x-www-form-urlencoded
CI / All (ubuntu-latest) (push) Failing after 26s
CI / All (macos-latest) (push) Has been cancelled
CI / All (windows-latest) (push) Has been cancelled
2026-07-02 15:53:29 -06:00
41 changed files with 3006 additions and 110 deletions
+8
View File
@@ -59,6 +59,14 @@ Coyote requires the following tools to be installed on your system:
* [docker](https://docs.docker.com/engine/install/) * [docker](https://docs.docker.com/engine/install/)
* [uv](https://docs.astral.sh/uv/getting-started/installation/) * [uv](https://docs.astral.sh/uv/getting-started/installation/)
* `curl -LsSf https://astral.sh/uv/install.sh | sh` * `curl -LsSf https://astral.sh/uv/install.sh | sh`
* [iwe](https://github.com/iwe-org/iwe) (`iwec`, for the built-in `iwe` MCP server that navigates large markdown knowledgebases)
* **Homebrew:** `brew tap iwe-org/iwe && brew install iwe`
* **Cargo:** `cargo install iwec`
* [ast-grep](https://ast-grep.github.io/) (for the built-in `ast_grep` structural code search tool, used by the `explore` agent)
* **Homebrew:** `brew install ast-grep`
* **Cargo:** `cargo install ast-grep --locked`
* **npm:** `npm i -g @ast-grep/cli`
* Optional: if `ast-grep` is not installed, the `ast_grep` tool reports it and agents fall back to `fs_grep`
These tools are used to provide various functionalities within Coyote, such as document processing, JSON manipulation, These tools are used to provide various functionalities within Coyote, such as document processing, JSON manipulation,
etc., and they are used within agents and tools. etc., and they are used within agents and tools.
+5 -1
View File
@@ -1,6 +1,6 @@
name: explore name: explore
description: Fast codebase exploration agent - finds patterns, structures, and relevant files. Designed to be fanned out 2-5 in parallel by orchestrators. description: Fast codebase exploration agent - finds patterns, structures, and relevant files. Designed to be fanned out 2-5 in parallel by orchestrators.
version: 3.0.0 version: 3.1.0
skills_enabled: true skills_enabled: true
enabled_skills: enabled_skills:
@@ -19,6 +19,7 @@ global_tools:
- fs_grep.sh - fs_grep.sh
- fs_glob.sh - fs_glob.sh
- fs_ls.sh - fs_ls.sh
- ast_grep.sh
instructions: | instructions: |
You are a codebase explorer. Your job: Search, find, report. Nothing else. You are a codebase explorer. Your job: Search, find, report. Nothing else.
@@ -49,6 +50,8 @@ instructions: |
4. **Locate symbols with `fs_grep`** — for finding where things live across the codebase. `fs_grep --pattern "fn handle_request" --include "*.rs"` is faster than reading files. 4. **Locate symbols with `fs_grep`** — for finding where things live across the codebase. `fs_grep --pattern "fn handle_request" --include "*.rs"` is faster than reading files.
4b. **Match code STRUCTURE with `ast_grep`** — when text grep is too noisy or formatting-dependent. It matches syntax trees: `ast_grep --pattern '$X.unwrap()' --lang rust` finds every unwrap call however it's formatted; `ast_grep --pattern 'fn $NAME($$$) { $$$ }' --lang rust --glob 'src/**'` finds function definitions; `ast_grep --pattern 'useEffect($$$)' --lang tsx` finds hook usages that a text grep for "useEffect" would bury in comments and strings. Meta-variables: `$NAME` = one AST node, `$$$` = zero or more. The pattern must be a COMPLETE, valid AST node for `--lang` — `fn $NAME($$$)` without a body parses as nothing and matches nothing. Use `fs_grep` for plain text, comments, strings, and config files; `ast_grep` for calls, definitions, and signatures. If ast-grep isn't installed the tool says so — fall back to fs_grep.
5. **Read targeted sections with `fs_read --offset/--limit`** — `fs_read --path "src/main.rs" --offset 50 --limit 30` reads lines 50-79 only. `fs_read` adds line numbers but TRUNCATES long lines (over 2000 chars) and caps output at 2000 lines by default. 5. **Read targeted sections with `fs_read --offset/--limit`** — `fs_read --path "src/main.rs" --offset 50 --limit 30` reads lines 50-79 only. `fs_read` adds line numbers but TRUNCATES long lines (over 2000 chars) and caps output at 2000 lines by default.
6. **Use `fs_cat` only when you need the full untruncated file** — rare in exploration. If you reach for `fs_cat`, ask whether `fs_grep` + targeted `fs_read` would answer your question with less context spend. 6. **Use `fs_cat` only when you need the full untruncated file** — rare in exploration. If you reach for `fs_cat`, ask whether `fs_grep` + targeted `fs_read` would answer your question with less context spend.
@@ -59,6 +62,7 @@ instructions: |
- `fs_grep --pattern "struct User" --include "*.rs"` — find content across files in a directory tree - `fs_grep --pattern "struct User" --include "*.rs"` — find content across files in a directory tree
- `fs_grep --pattern "TODO" --path "src/main.rs"` — find content within a single file (--include is ignored in this mode) - `fs_grep --pattern "TODO" --path "src/main.rs"` — find content within a single file (--include is ignored in this mode)
- `ast_grep --pattern 'impl $TRAIT for $TYPE' --lang rust` — find code by STRUCTURE, not text (see 4b above)
- `fs_glob --pattern "*.rs" --path src/` — find files by name pattern - `fs_glob --pattern "*.rs" --path src/` — find files by name pattern
- `fs_read --path "src/main.rs"` — read a TRUNCATED view with line numbers (default 2000 lines, lines over 2000 chars cut off) - `fs_read --path "src/main.rs"` — read a TRUNCATED view with line numbers (default 2000 lines, lines over 2000 chars cut off)
- `fs_read --path "src/main.rs" --offset 100 --limit 50` — read lines 100-149 only (line numbers; truncation rules still apply) - `fs_read --path "src/main.rs" --offset 100 --limit 50` — read lines 100-149 only (line numbers; truncation rules still apply)
+10 -2
View File
@@ -1,11 +1,14 @@
name: oracle name: oracle
description: High-IQ advisor for architecture, debugging, and complex decisions. Blocking by design - the orchestrator is waiting on you. description: High-IQ advisor for architecture, debugging, and complex decisions. Blocking by design - the orchestrator is waiting on you.
version: 2.0.0 version: 2.1.0
skills_enabled: true skills_enabled: true
enabled_skills: enabled_skills:
- code-review - code-review
- ai-slop-remover - ai-slop-remover
- plan-review
- plan-authoring
- iwe-knowledge-base
variables: variables:
- name: project_dir - name: project_dir
@@ -46,13 +49,16 @@ instructions: |
3. **Code review** — evaluating proposed designs or implementations. 3. **Code review** — evaluating proposed designs or implementations.
4. **Risk assessment** — security, performance, reliability concerns. 4. **Risk assessment** — security, performance, reliability concerns.
5. **Multi-component questions** — anything spanning 3+ files or modules. 5. **Multi-component questions** — anything spanning 3+ files or modules.
6. **Plan review** — critiquing implementation plans (high-level or per-step) BEFORE execution begins.
## Skills available ## Skills available
Two skills are available to you. Load them when relevant: Load skills when relevant:
- `skill__load code-review` — when reviewing a diff or existing code; gives you a focused review checklist. - `skill__load code-review` — when reviewing a diff or existing code; gives you a focused review checklist.
- `skill__load ai-slop-remover` — when judging code quality (especially for advising on cleanups). - `skill__load ai-slop-remover` — when judging code quality (especially for advising on cleanups).
- `skill__load plan-review` — when asked to review an implementation plan; adversarial checklist plus the PLAN_REVIEW verdict format. Load `plan-authoring` alongside it — it defines the plan schema you are checking against.
- `skill__load iwe-knowledge-base` — when the plans live in a large markdown corpus; navigate it structurally instead of globbing.
Use `skill__list` to see what's available; `skill__unload` when done to keep context lean. Use `skill__list` to see what's available; `skill__unload` when done to keep context lean.
@@ -91,6 +97,8 @@ instructions: |
ORACLE_COMPLETE ORACLE_COMPLETE
``` ```
Exception: for plan reviews, use the `PLAN_REVIEW: OKAY` / `PLAN_REVIEW: REJECT` verdict format from the `plan-review` skill as the body, then end with `ORACLE_COMPLETE` on the final line as usual.
## Rules ## Rules
1. **Never modify files** — you advise, others implement. 1. **Never modify files** — you advise, others implement.
+15
View File
@@ -16,6 +16,21 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c
- 💻 **CLI Coding**: Provides a natural language interface for writing and editing code. - 💻 **CLI Coding**: Provides a natural language interface for writing and editing code.
- 🔄 **Task Management**: Tracks progress and context across complex operations. - 🔄 **Task Management**: Tracks progress and context across complex operations.
- 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation. - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation.
- 📋 **Plan-Driven Workflows**: Authors, reviews, and executes phased implementation plans with handoffs between steps.
## Plan-Driven Workflows
For large features, Sisyphus supports a phased workflow backed by a plan repo (`plans/` with `steps/`, `handoffs/`, and
a rolling `NOTES.md`):
1. **Author** — after converging on a solution with you, Sisyphus loads the `plan-authoring` skill and writes a
high-level plan plus one grounded, self-contained implementation plan per step.
2. **Review** — [Oracle](../oracle/README.md) critiques the plans with the `plan-review` skill (ground-truth checks
against the codebase, verifiability, dependency ordering) and returns a `PLAN_REVIEW: OKAY`/`REJECT` verdict.
Rejected plans are fixed before any code is written.
3. **Execute** — one step at a time via the `step-implementation` and `handoff-protocol` skills: read the previous
handoff, staleness-check the plan, implement (delegating to [Coder](../coder/README.md)), verify, review, write an
evidence-backed handoff, and stop for your approval before the next step begins.
## Pro-Tip: Use an IDE MCP Server for Improved Performance ## Pro-Tip: Use an IDE MCP Server for Improved Performance
Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using Many modern IDEs (JetBrains, VS Code, Cursor, Zed, etc.) expose MCP servers that let LLMs use IDE tools directly. Using
+51 -2
View File
@@ -1,6 +1,6 @@
name: sisyphus name: sisyphus
description: OpenCode-style orchestrator - classifies intent, delegates to specialists, tracks progress with todos, enforces OMO-grade verification discipline description: OpenCode-style orchestrator - classifies intent, delegates to specialists, tracks progress with todos, enforces OMO-grade verification discipline
version: 3.0.0 version: 3.2.0
agent_session: temp agent_session: temp
auto_continue: true auto_continue: true
@@ -23,6 +23,10 @@ enabled_skills:
- parallel-research - parallel-research
- verification-gates - verification-gates
- oracle-protocol - oracle-protocol
- plan-authoring
- step-implementation
- handoff-protocol
- iwe-knowledge-base
variables: variables:
- name: project_dir - name: project_dir
@@ -101,6 +105,9 @@ instructions: |
| About to touch git history | `git-master` | | About to touch git history | `git-master` |
| About to touch UI/components | `frontend-ui-ux` (also nudge delegates to load it) | | About to touch UI/components | `frontend-ui-ux` (also nudge delegates to load it) |
| About to write any code | `ai-slop-remover` | | About to write any code | `ai-slop-remover` |
| About to author a high-level plan or step plans | `plan-authoring` |
| About to execute a step of a phased plan | `step-implementation` + `handoff-protocol` |
| Navigating a plan repo or markdown knowledge base | `iwe-knowledge-base` |
Load skills BEFORE the phase, not after. Unload when the phase ends if context is getting heavy. `skill__unload` keeps the context lean. Load skills BEFORE the phase, not after. Unload when the phase ends if context is getting heavy. `skill__unload` keeps the context lean.
@@ -124,7 +131,8 @@ instructions: |
| `explore` | Find patterns in THIS codebase, understand local code | Read-only, returns findings, fan out 2-5 in parallel | | `explore` | Find patterns in THIS codebase, understand local code | Read-only, returns findings, fan out 2-5 in parallel |
| `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel | | `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
| `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop | | `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
| `oracle` | Architecture, complex debugging, review | Advisory, blocking — never answer the user before collecting Oracle results | | `oracle` | Architecture, complex debugging, review, plan review | Advisory, blocking — never answer the user before collecting Oracle results |
| `step-runner` | Execute ONE step of a phased plan repo (Phase 8) | Graph agent: orient → staleness check → coder → verify → handoff → user approval gate |
### When to fire `librarian` (external grep) vs `explore` (internal grep) ### When to fire `librarian` (external grep) vs `explore` (internal grep)
@@ -312,6 +320,47 @@ instructions: |
Never: leave code in broken state, continue hoping it'll work, delete failing tests to "pass," suppress errors to silence them. Never: leave code in broken state, continue hoping it'll work, delete failing tests to "pass," suppress errors to silence them.
## Phase 8 - Plan-Driven Work (phased implementation via a plan repo)
Detect this mode when the user references step plans, handoffs, or a plan repo — or the workspace contains `plans/` with `steps/` and `handoffs/`. Plan-driven work has two lifecycles. Never mix them in one turn.
### Authoring lifecycle (no code changes)
1. Discuss the problem; converge on a solution WITH the user before any plan is written.
2. Load `plan-authoring`. Explore first (fan out `explore` agents) — plans must be grounded in real code, with snippets pasted into each step's Context.
3. Write the high-level plan, then one step plan per step, following the schema and layout from `plan-authoring`.
4. **Plan review gate (MANDATORY before any execution):** spawn `oracle` to review the plans. Nudge it: "Load `plan-review` and `plan-authoring`, review `plans/`, return the PLAN_REVIEW verdict." REJECT → fix the complaints, re-submit. Do not start execution on an unreviewed or rejected plan.
5. Present the reviewed plan to the user for approval.
### Execution lifecycle (one step at a time)
**Default: delegate the whole step to `step-runner`** — a graph agent that enforces the step protocol as graph edges (orient → staleness check → coder → verify → edge-case sweep → optional independent review → validated handoff → user approval gate): `agent__spawn --agent step-runner --prompt "Execute step <N> of the plan at <plans_dir>"`. It returns `STEP_COMPLETE` / `STEP_BLOCKED` / `STEP_REJECTED` / `STEP_FAILED`. Relay its escalations (deviation gate, approval gate) promptly. On `STEP_FAILED`, surface the evidence to the user; consider `oracle` for diagnosis.
Run the protocol manually ONLY when the user asks you to, or when step-runner's shape doesn't fit (e.g. a docs-only step with nothing to build). Then:
1. Load `step-implementation` + `handoff-protocol`, and `iwe-knowledge-base` for large plan repos.
2. Follow the step protocol phase by phase: orient (previous handoff + `NOTES.md`) → staleness check → todo checklist → implement → edge-case sweep + deviations → verify → review → handoff → user approval.
3. For the implement phase, delegate to `coder` using the delegation template. Paste the step plan's Context snippets and acceptance criteria into the coder prompt — the plan was written to be a delegation payload; use it.
4. Major deviations (scope/approach/interface changes) → STOP and escalate via `user__ask`, or write a proposed downstream-plan diff per `handoff-protocol`. Never silently absorb them.
5. **HARD STOP at the approval gate.** Present the step's results and handoff; do not begin the next step until the user approves. Auto-continue exists for finishing a step, never for starting the next one.
## Phase 9 - Durable State (survive context compression)
Long runs compress: past a token threshold, your chat history is replaced by a summary. Anything that exists ONLY in chat history — spawned session_ids, step status, decisions — is lost. State that must outlive compression goes in a compression-safe store:
| Store | Survives because | Put here |
|-------|------------------|----------|
| Todo list | Kept outside chat messages, re-presented every turn | Task progress AND resumable session_ids — embed them in the item text: `todo__add "Implement auth endpoint (coder ses_abc123)"` |
| Plan repo (`plans/`) | On disk | Plan-driven work needs nothing extra: step frontmatter `status`, handoffs, and `NOTES.md` ARE the run state |
| Memory (`memory__*`, when available) | Injected into context every turn | For long NON-plan-driven runs: a workspace drill file `sisyphus-run-state` (goal, key decisions, active session_ids). Set `expires` to tomorrow; delete it when the run completes |
Rules:
1. **Session_ids you may need to resume are never chat-only.** Record them in the todo item for that work the moment the spawn returns. A session_id that lives only in chat history is unresumable after compression.
2. **Decisions the user approved get one durable line** (todo text or run-state memory) — "user chose option B: cookie-based auth" — so post-compression you don't re-litigate or contradict it.
3. **Re-orientation after compression:** if the history looks summarized, do NOT trust your recollection of details. Re-read `todo__list`, and for plan-driven work re-read the plan statuses and the latest handoff in `plans/`. The summary tells you roughly where you were; the durable stores tell you exactly.
4. Do not hoard: run state is not knowledge. Never bloat `MEMORY.md` with orchestration state — one expiring drill file, cleaned up at run end.
## When to Do It Yourself vs Delegate ## When to Do It Yourself vs Delegate
**Do yourself**: trivial typos/renames, single-file changes you've already read, simple command execution, quick file searches you can express in one grep. **Do yourself**: trivial typos/renames, single-file changes you've already read, simple command execution, quick file searches you can express in one grep.
+93
View File
@@ -0,0 +1,93 @@
# Step-Runner
A graph-based agent that executes **one step** of a phased implementation
plan, with the step protocol from the `step-implementation` skill enforced
as graph edges rather than prose. Designed to be delegated to by
**[Sisyphus](../sisyphus/README.md)**; delegates implementation to
**[Coder](../coder/README.md)** and independent review to
**[code-reviewer](../code-reviewer/README.md)**.
It expects a plan repo authored per the `plan-authoring` skill:
```
plans/
steps/NN-<slug>.md # step plans with frontmatter (step/title/depends_on/status)
handoffs/NN-<slug>.md # written by this agent, validated by a deterministic gate
NOTES.md # rolling durable facts
```
## Workflow
```
resolve_step (script) locate plan + previous handoff, check depends_on,
↓ mark plan in-progress [→ gate_blocked if deps unsatisfied]
orient (llm, read-only) merge handoff directives + staleness-check the plan
route_staleness (script) major deviation → gate_deviation (approval)
implement (agent → coder) coder runs its own build/test/self-review fix-loop
route_coder_result (script) COMPLETE → verify | REJECTED / FAILED → end
verify_format_lint (script) format BEFORE evidence, then lint
verify_build (script) step-level build/typecheck
verify_tests (script) FULL test suite
↓ [failures → fix_loop_gate, back-edge to implement]
edge_case_sweep (llm) missed edge cases; annotate downstream plans
↓ (Edge cases sections ONLY - scope changes become proposals)
route_sweep (script) 5+ files or architectural boundary → independent_review
independent_review (agent) code-reviewer; 🔴 findings loop back to implement (bounded)
write_handoff (llm) evidence-backed handoff per handoff-protocol + NOTES.md
check_handoff (script) deterministic schema gate; marks plan status complete
gate_user_review (approval) HARD STOP - approve, or send revision comments
↓ (revisions loop through implement → verify → handoff again)
end_success / end_blocked / end_rejected / end_failure
```
End nodes emit sentinel outcomes for the caller:
- `STEP_COMPLETE` — step implemented, verified, handoff written, user approved.
- `STEP_BLOCKED``depends_on` unsatisfied and the user declined to proceed.
- `STEP_REJECTED` — user aborted at the deviation gate, or the coder's plan
was rejected at its approval gate.
- `STEP_FAILED` — coder failed, the step-level fix budget was exhausted, or
the handoff failed validation twice.
## Usage
```sh
# From the project root: run the next in-progress/pending step
coyote -a step-runner "Execute the next step"
# A specific step (also parsed from the prompt: "execute step 3")
coyote -a step-runner --agent-variable step 3 "Execute step 3"
# Plan repo somewhere else
coyote -a step-runner --agent-variable plans_dir docs/plans "Execute the next step"
```
**Invoke from the project root.** The coder sub-agent resolves its own
`project_dir` from the invocation directory; overriding `project_dir` here
does not propagate to the spawned coder.
## Tuning
`graph.yaml` `initial_state` exposes:
- `max_fix_attempts` (default `2`) — step-level fix budget (the coder has
its own internal budget of 3).
- `max_review_attempts` (default `1`) — bounded 🔴-finding fix loops after
independent review.
Environment overrides honored by the script nodes:
- `FORMAT_CMD` / `LINT_CMD` — formatting and linting (otherwise a per-type
heuristic formats, and linting defers to the build/check command).
- `BUILD_CMD` / `TEST_CMD` — skip project-type detection (same as coder).
- `STEP_AUTOAPPROVE=1` — bypass the deviation gate (non-interactive runs).
- `STEP_SKIP_REVIEW=1` — never spawn the independent reviewer.
The final user approval gate is never bypassed by an environment variable -
it is the point of the workflow.
+599
View File
@@ -0,0 +1,599 @@
name: step-runner
description: |
Executes ONE step of a phased implementation plan (plans/ repo) with the
step protocol enforced as graph edges: orient -> staleness check ->
implement (coder) -> verify -> edge-case sweep -> optional independent
review -> evidence-backed handoff -> user approval gate. Designed to be
delegated to by sisyphus.
version: "1.0"
global_tools:
- fs_cat.sh
- fs_ls.sh
- fs_write.sh
- fs_patch.sh
- execute_command.sh
skills_enabled: true
enabled_skills:
- step-implementation
- handoff-protocol
- code-review
- ai-slop-remover
variables:
- name: project_dir
description: |
Absolute path to the project directory. Defaults to "." (the directory
coyote was invoked from). The coder sub-agent resolves its own
project_dir the same way, so invoke step-runner FROM the project root
unless you override this for both.
default: "."
- name: plans_dir
description: |
Path to the plan repo. Relative paths resolve against project_dir.
Expected layout: <plans_dir>/steps/NN-<slug>.md,
<plans_dir>/handoffs/, <plans_dir>/NOTES.md.
default: "plans"
- name: step
description: |
Which step to execute: a step number, or "next" to pick the first
in-progress (resume) or pending step plan.
default: "next"
settings:
max_loop_iterations: 20
log_state_snapshots: true
validate_before_run: true
timeout: 7200
initial_state:
project_dir: ""
plans_dir: ""
step_number: 0
step_slug: ""
step_title: ""
step_plan_path: ""
step_plan: ""
prev_handoff_path: "(none)"
prev_handoff: "(none - this is the first step)"
notes_path: ""
notes: "(none)"
handoff_path: ""
blocking_reason: ""
plan_summary: ""
implementation_brief: ""
staleness_report: ""
has_major_deviation: false
deviation_summary: ""
user_feedback: ""
fix_instructions: ""
fix_attempts: 0
max_fix_attempts: 2
coder_result: ""
format_output: ""
lint_ok: true
lint_output: ""
build_ok: true
build_output: ""
tests_ok: true
tests_output: ""
edge_case_report: ""
downstream_updates: ""
needs_independent_review: false
review_report: ""
review_attempts: 0
max_review_attempts: 1
handoff_attempts: 0
handoff_fix: ""
step_summary: ""
start: resolve_step
nodes:
resolve_step:
id: resolve_step
type: script
description: |
Locate the step plan, previous handoff, and NOTES.md; parse frontmatter;
check depends_on satisfaction against existing handoffs; mark the plan
in-progress. Routes to gate_blocked when dependencies are unsatisfied.
script: scripts/resolve_step.sh
timeout: 30
fallback: end_failure
next: orient
gate_blocked:
id: gate_blocked
type: approval
description: Escalate unsatisfied dependencies instead of building on missing ground.
question: |
Step {{step_number}} ({{step_title}}) is BLOCKED:
{{blocking_reason}}
Proceed anyway?
options:
- "yes"
- "no"
routes:
"yes": orient
"no": end_blocked
on_other: end_blocked
orient:
id: orient
type: llm
description: |
Read-only orientation and staleness check: merge the previous handoff's
directives with the step plan, then verify the plan's assumptions
against the CURRENT codebase before any edit.
skills_enabled: true
enabled_skills:
- step-implementation
instructions: |
You are orienting for one step of a phased implementation plan. Load
`step-implementation` and apply its Orient and Staleness-check phases.
You are READ-ONLY in this node: no edits, no fixes.
1. Read the previous handoff (below). Note directives aimed at this
step, deviations that changed the codebase, and bare assertions
that need re-verification.
2. Staleness-check the step plan against the code at {{project_dir}}:
grep the symbols it references (via execute_command), read its
Context snippets at their claimed locations with fs_cat, confirm
its Test commands exist.
3. Classify discrepancies per the skill's deviation table: minor
(mechanics differ; correct silently in the brief) vs major (scope,
approach, interfaces, or a later step's assumptions affected).
Produce `implementation_brief`: the corrected, self-contained marching
orders for the implementer - plan tasks in order, handoff directives
applied, minor staleness corrections folded in, acceptance criteria
restated. The implementer sees ONLY the step plan plus your brief.
prompt: |
## Step plan ({{step_plan_path}})
{{step_plan}}
## Previous handoff ({{prev_handoff_path}})
{{prev_handoff}}
## Rolling project notes
{{notes}}
tools:
- fs_cat
- fs_ls
- execute_command
max_iterations: 20
output_schema:
type: object
properties:
plan_summary:
type: string
description: 1-3 sentences summarizing what this step delivers
implementation_brief:
type: string
description: Corrected, self-contained instructions for the implementer
staleness_report:
type: string
description: Findings from checking plan assumptions against current code; "clean" if none
has_major_deviation:
type: boolean
description: True when a discrepancy changes scope, approach, or interfaces
deviation_summary:
type: string
description: Major deviations only, with the plan claim vs current reality. Empty when none
required: [plan_summary, implementation_brief, staleness_report, has_major_deviation, deviation_summary]
fallback: end_failure
next: route_staleness
route_staleness:
id: route_staleness
type: script
description: Major deviation -> user gate; otherwise straight to implement.
script: scripts/route_staleness.sh
timeout: 5
fallback: implement
gate_deviation:
id: gate_deviation
type: approval
description: Major deviations are never silently absorbed - the user decides.
question: |
Step {{step_number}} ({{step_title}}): the plan no longer matches the
codebase in a way that changes scope or approach.
{{deviation_summary}}
Staleness report:
{{staleness_report}}
Proceed with the corrected brief? (Answer with anything else to give
your own guidance to the implementer.)
options:
- "proceed"
- "abort"
routes:
"proceed": implement
"abort": end_rejected
on_other: implement
state_updates:
user_feedback: "{{choice}}"
implement:
id: implement
type: agent
description: |
Delegate implementation to the coder graph agent, which runs its own
plan -> implement -> build -> tests -> self-review fix-loop internally.
agent: coder
prompt: |
## TASK
Execute step {{step_number}} ({{step_title}}) of a phased implementation
plan for the project at {{project_dir}}.
## EXPECTED OUTCOME
Every task in the step plan below is implemented and its acceptance
criteria are met. Tests are derived from the Acceptance criteria
section (not from the implementation). Build and full test suite pass.
## MUST DO
- Follow the Orientation brief below - it supersedes the raw plan where
they disagree (it folds in corrections from the staleness check).
- Match the patterns pasted in the step plan's Context section.
- Derive tests from the plan's Acceptance criteria.
## MUST NOT DO
- Do not touch anything listed in the plan's Out of scope section.
- Do not modify files under {{plans_dir}}.
- Do not implement work belonging to other steps.
## CONTEXT
### Step plan
{{step_plan}}
### Orientation brief (handoff directives + staleness corrections applied)
{{implementation_brief}}
### User guidance (if any)
{{user_feedback}}
### Fix loop status (empty on first attempt)
{{fix_instructions}}
timeout: 3600
state_updates:
coder_result: "{{output}}"
next: route_coder_result
route_coder_result:
id: route_coder_result
type: script
description: Route on the coder sentinel - COMPLETE verifies, REJECTED/FAILED terminate.
script: scripts/route_coder_result.sh
timeout: 5
fallback: end_failure
verify_format_lint:
id: verify_format_lint
type: script
description: |
Format BEFORE evidence collection (FORMAT_CMD override or per-type
heuristic), then lint (LINT_CMD, when configured). Lint failure routes
to the fix loop.
script: scripts/verify_format_lint.sh
timeout: 300
fallback: fix_loop_gate
verify_build:
id: verify_build
type: script
description: Step-level build/typecheck evidence, collected AFTER formatting.
script: scripts/verify_build.sh
timeout: 600
fallback: fix_loop_gate
verify_tests:
id: verify_tests
type: script
description: FULL test suite - regressions in untouched code fail the step too.
script: scripts/verify_tests.sh
timeout: 1200
fallback: fix_loop_gate
fix_loop_gate:
id: fix_loop_gate
type: script
description: |
Step-level fix budget (the coder already ran its own internal fix
loop). Loops to implement with fix_instructions, or ends as failure.
script: scripts/fix_loop_gate.sh
timeout: 5
fallback: end_failure
edge_case_sweep:
id: edge_case_sweep
type: llm
description: |
Post-implementation sweep: missed spots, edge cases, downstream plan
implications. May annotate downstream plans' Edge cases sections
(annotate vs propose per handoff-protocol). Also judges whether the
change warrants an independent review pass.
skills_enabled: true
enabled_skills:
- step-implementation
- handoff-protocol
instructions: |
The implementation for this step just passed build and tests. Load
`step-implementation` (edge-case sweep phase) and `handoff-protocol`
(annotate-vs-propose rules), then:
1. Read the changed code (the coder result below names the files).
Look for edge cases the plan missed: empty inputs, error paths,
concurrency, partial failure, compat.
2. For each edge case belonging to a LATER step: check that step's
plan under {{plans_dir}}/steps/. If its Edge cases section already
covers it, done. If not, append an entry to that section via
fs_patch - touch NOTHING else in the file.
3. NEVER edit a later plan's Objective, Tasks, Acceptance criteria,
or Out of scope. Scope-affecting changes become proposed diffs in
`downstream_updates` instead.
4. Set needs_independent_review=true when the change touched 5+ files
or crosses architectural boundaries (auth, public APIs, schema,
security-sensitive paths).
Be terse. Findings, not prose.
prompt: |
## Coder result
{{coder_result}}
## Step plan
{{step_plan}}
## Staleness report from orientation
{{staleness_report}}
tools:
- fs_cat
- fs_ls
- fs_patch
- execute_command
max_iterations: 20
output_schema:
type: object
properties:
edge_case_report:
type: string
description: Edge cases discovered - both handled and punted, one per line. "none" if empty
downstream_updates:
type: string
description: Annotations made (plan file + section) and proposed diffs for scope-affecting changes. "none" if empty
needs_independent_review:
type: boolean
required: [edge_case_report, downstream_updates, needs_independent_review]
fallback: write_handoff
next: route_sweep
route_sweep:
id: route_sweep
type: script
description: Broad or boundary-crossing changes get an independent reviewer.
script: scripts/route_sweep.sh
timeout: 5
fallback: write_handoff
independent_review:
id: independent_review
type: agent
description: Independent review pass - the author's self-review cannot catch its own rationalizations.
agent: code-reviewer
prompt: |
Review the changes produced for step {{step_number}} ({{step_title}})
of a phased implementation plan in {{project_dir}}.
What the step was supposed to do:
{{plan_summary}}
Coder summary (names the modified/created files):
{{coder_result}}
Review the changed files against the step plan's acceptance criteria.
Preserve severity tags in your findings.
timeout: 1200
state_updates:
review_report: "{{output}}"
next: route_review
route_review:
id: route_review
type: script
description: Critical findings loop back to implement (bounded); otherwise proceed to handoff.
script: scripts/route_review.sh
timeout: 5
fallback: write_handoff
write_handoff:
id: write_handoff
type: llm
description: |
Write the evidence-backed handoff per handoff-protocol and append
durable facts to NOTES.md. The completion gate (check_handoff)
verifies the document afterward.
skills_enabled: true
enabled_skills:
- handoff-protocol
- ai-slop-remover
instructions: |
Load `handoff-protocol` and follow its writer schema EXACTLY: the
frontmatter (step, title, result) and all eight sections, writing
"None" rather than omitting a section.
Write the handoff to {{handoff_path}} with fs_write. Paste the
verification evidence below verbatim into the Evidence section -
commands, exit codes, decisive output lines. Deviations come from the
staleness report, gate decisions, and fix loop history. Downstream
plan updates come from the sweep results.
Then append durable, step-independent facts (if any) to {{notes_path}}
- create the file if missing, never rewrite existing entries.
If "Gate feedback" below is non-empty, a previous handoff attempt
failed validation - fix exactly what it lists.
prompt: |
## Step
{{step_number}} ({{step_title}}) - plan at {{step_plan_path}}
## Plan summary
{{plan_summary}}
## Coder result
{{coder_result}}
## Staleness report / deviations
{{staleness_report}}
Major deviation summary (if any): {{deviation_summary}}
User guidance given (if any): {{user_feedback}}
Fix loop attempts used: {{fix_attempts}} of {{max_fix_attempts}}
## Edge cases discovered
{{edge_case_report}}
## Downstream plan updates
{{downstream_updates}}
## Independent review report (if any)
{{review_report}}
## Verification evidence (paste verbatim)
### Format
{{format_output}}
### Lint
{{lint_output}}
### Build
{{build_output}}
### Tests
{{tests_output}}
## Gate feedback
{{handoff_fix}}
tools:
- fs_cat
- fs_ls
- fs_write
- fs_patch
max_iterations: 15
output_schema:
type: object
properties:
step_summary:
type: string
description: 3-6 sentence summary of the step for the user's approval decision - what was done, deviations, anything needing their attention
required: [step_summary]
fallback: end_failure
next: check_handoff
check_handoff:
id: check_handoff
type: script
description: |
Deterministic completion gate - handoff exists with frontmatter and all
required sections. On success, marks the step plan status complete.
One retry back to write_handoff, then failure.
script: scripts/check_handoff.sh
timeout: 10
fallback: end_failure
gate_user_review:
id: gate_user_review
type: approval
description: The hard stop - the next step never starts without explicit approval.
question: |
## Step {{step_number}} ({{step_title}}) - ready for review
{{step_summary}}
Handoff: {{handoff_path}}
Build: {{build_ok}} | Tests: {{tests_ok}} | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
Approve this step? (Answer with anything else to send revision
instructions straight to the implementer.)
options:
- "approve"
- "revise"
routes:
"approve": end_success
"revise": get_revision
on_other: revise_from_choice
state_updates:
user_feedback: "{{choice}}"
get_revision:
id: get_revision
type: input
description: Collect revision instructions, then loop back through implement -> verify -> handoff.
question: "What should change? Your comments go to the implementer verbatim."
validation: "len(input) > 0"
state_updates:
fix_instructions: "{{input}}"
next: implement
revise_from_choice:
id: revise_from_choice
type: script
description: Free-form approval answers are treated as revision instructions.
script: scripts/revise_from_choice.sh
timeout: 5
fallback: get_revision
end_success:
id: end_success
type: end
output: |
STEP_COMPLETE
Step: {{step_number}} ({{step_title}})
Plan: {{step_plan_path}}
Handoff: {{handoff_path}}
Build: passed | Tests: passed | Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
{{step_summary}}
Downstream plan updates:
{{downstream_updates}}
end_blocked:
id: end_blocked
type: end
output: |
STEP_BLOCKED
Step: {{step_number}} ({{step_title}})
Reason:
{{blocking_reason}}
end_rejected:
id: end_rejected
type: end
output: |
STEP_REJECTED
Step: {{step_number}} ({{step_title}})
Rejected at: deviation gate or coder approval gate.
Deviation summary:
{{deviation_summary}}
Coder result (if it ran):
{{coder_result}}
end_failure:
id: end_failure
type: end
output: |
STEP_FAILED
Step: {{step_number}} ({{step_title}})
Fix attempts: {{fix_attempts}}/{{max_fix_attempts}}
Blocking reason (if resolution failed): {{blocking_reason}}
Coder result:
{{coder_result}}
Last build output:
{{build_output}}
Last tests output:
{{tests_output}}
+54
View File
@@ -0,0 +1,54 @@
#!/usr/bin/env bash
set -uo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
handoff_path=$(echo "$state" | jq -r '.handoff_path // ""')
step_plan_path=$(echo "$state" | jq -r '.step_plan_path // ""')
handoff_attempts=$(echo "$state" | jq -r '.handoff_attempts // 0')
problems=""
if [[ ! -f "$handoff_path" ]]; then
problems="- handoff file does not exist at $handoff_path"$'\n'
else
content=$(cat "$handoff_path")
grep -qE '^result:[[:space:]]*(complete|partial|blocked)' <<< "$content" \
|| problems+="- frontmatter is missing 'result: complete|partial|blocked'"$'\n'
for section in "Summary" "Completed" "Not completed" "Deviations" "Downstream plan updates" "Edge cases discovered" "Evidence" "Notes for next step"; do
grep -qE "^##[[:space:]]+${section}" <<< "$content" \
|| problems+="- missing required section: ## ${section}"$'\n'
done
fi
if [[ -z "$problems" ]]; then
if [[ -f "$step_plan_path" ]]; then
tmp=$(mktemp)
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: complete"; next} {print}' "$step_plan_path" > "$tmp" && mv "$tmp" "$step_plan_path"
fi
jq -nc '{"handoff_fix": "", "_next": "gate_user_review"}'
exit 0
fi
if (( handoff_attempts >= 1 )); then
jq -nc \
--arg br "Handoff failed validation twice. Problems:
$problems" \
'{"blocking_reason": $br, "_next": "end_failure"}'
exit 0
fi
jq -nc \
--arg hf "The previous handoff attempt failed validation. Fix exactly these problems:
$problems" \
'{
"handoff_attempts": 1,
"handoff_fix": $hf,
"_next": "write_handoff"
}'
+60
View File
@@ -0,0 +1,60 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
fix_attempts=$(echo "$state" | jq -r '.fix_attempts // 0')
max_fix_attempts=$(echo "$state" | jq -r '.max_fix_attempts // 2')
lint_ok=$(echo "$state" | jq -r '.lint_ok | if . == null then "true" else (. | tostring) end')
build_ok=$(echo "$state" | jq -r '.build_ok | if . == null then "true" else (. | tostring) end')
tests_ok=$(echo "$state" | jq -r '.tests_ok | if . == null then "true" else (. | tostring) end')
lint_output=$(echo "$state" | jq -r '.lint_output // ""')
build_output=$(echo "$state" | jq -r '.build_output // ""')
tests_output=$(echo "$state" | jq -r '.tests_output // ""')
if (( fix_attempts >= max_fix_attempts )); then
jq -nc \
--argjson n "$fix_attempts" \
'{
"fix_attempts": $n,
"_next": "end_failure"
}'
exit 0
fi
next_attempts=$((fix_attempts + 1))
if [[ "$lint_ok" != "true" ]]; then
stage="lint"
output="$lint_output"
elif [[ "$build_ok" != "true" ]]; then
stage="build"
output="$build_output"
elif [[ "$tests_ok" != "true" ]]; then
stage="full test suite"
output="$tests_output"
else
stage="verification"
output="fix_loop_gate was reached but no failing stage was recorded. Re-run verification."
fi
fix_instructions=$(printf '## Fix loop status (step-level attempt %d of %d)\n\nThe implementation passed the coder'"'"'s internal checks but failed step-level verification at the %s stage.\n\nOutput:\n```\n%s\n```\n\nIdentify the minimal fix and apply it. Do not refactor. Regressions in untouched code caused by this change are in scope.' \
"$next_attempts" "$max_fix_attempts" "$stage" "$output")
jq -nc \
--argjson n "$next_attempts" \
--arg 'fi' "$fix_instructions" \
'{
"fix_attempts": $n,
"fix_instructions": $fi,
"lint_ok": true,
"build_ok": true,
"tests_ok": true,
"_next": "implement"
}'
+152
View File
@@ -0,0 +1,152 @@
#!/usr/bin/env bash
set -uo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
fail() {
jq -nc --arg r "$1" '{"blocking_reason": $r, "_next": "end_failure"}'
exit 0
}
project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
project_dir=$(cd "$project_dir" 2>/dev/null && pwd) || fail "project_dir does not exist: $project_dir"
plans_dir="${LLM_AGENT_VAR_PLANS_DIR:-plans}"
[[ "$plans_dir" != /* ]] && plans_dir="$project_dir/$plans_dir"
steps_dir="$plans_dir/steps"
handoffs_dir="$plans_dir/handoffs"
notes_path="$plans_dir/NOTES.md"
[[ -d "$steps_dir" ]] || fail "No step plans directory at $steps_dir (expected <plans_dir>/steps/NN-<slug>.md)"
frontmatter() {
awk '/^---[[:space:]]*$/{n++; next} n==1{print} n>=2{exit}' "$1"
}
fm_value() {
echo "$1" | grep -E "^$2:" | head -1 | sed -E "s/^$2:[[:space:]]*//" | sed -E 's/^["'"'"']|["'"'"']$//g'
}
step="${LLM_AGENT_VAR_STEP:-next}"
if [[ "$step" == "next" ]]; then
prompt_step=$(echo "$state" | jq -r '.initial_prompt // ""' | grep -oiE 'step[[:space:]#:]*[0-9]+' | head -1 | grep -oE '[0-9]+' || true)
[[ -n "$prompt_step" ]] && step="$prompt_step"
fi
plan_file=""
if [[ "$step" == "next" ]]; then
first_pending=""
while IFS= read -r f; do
st=$(fm_value "$(frontmatter "$f")" "status")
if [[ "$st" == "in-progress" ]]; then
plan_file="$f"
break
fi
[[ -z "$first_pending" && ( "$st" == "pending" || -z "$st" ) ]] && first_pending="$f"
done < <(find "$steps_dir" -maxdepth 1 -name '*.md' | sort)
[[ -z "$plan_file" ]] && plan_file="$first_pending"
[[ -z "$plan_file" ]] && fail "No in-progress or pending step plans in $steps_dir"
else
[[ "$step" =~ ^[0-9]+$ ]] || fail "step must be a number or 'next'; got: $step"
padded=$(printf '%02d' "$((10#$step))")
plan_file=$(find "$steps_dir" -maxdepth 1 \( -name "${padded}-*.md" -o -name "${step}-*.md" \) | sort | head -1)
[[ -n "$plan_file" ]] || fail "No step plan matching step $step in $steps_dir"
fi
bn=$(basename "$plan_file" .md)
num_part="${bn%%-*}"
[[ "$num_part" =~ ^[0-9]+$ ]] || fail "Step plan filename must start with a number: $bn"
step_number=$((10#$num_part))
step_slug="${bn#*-}"
fm=$(frontmatter "$plan_file")
step_title=$(fm_value "$fm" "title")
[[ -z "$step_title" ]] && step_title="$step_slug"
deps=$(echo "$fm" | awk '/^depends_on:/{f=1; print; next} f && /^[[:space:]]*-/{print; next} f{exit}' | grep -oE '[0-9]+' || true)
unsatisfied=""
for dep in $deps; do
dep_padded=$(printf '%02d' "$((10#$dep))")
dep_handoff=$(find "$handoffs_dir" -maxdepth 1 \( -name "${dep_padded}-*.md" -o -name "${dep}-*.md" \) 2>/dev/null | sort | head -1)
if [[ -z "$dep_handoff" ]]; then
unsatisfied+="- step $dep: no handoff found (step not executed?)"$'\n'
continue
fi
dep_result=$(fm_value "$(frontmatter "$dep_handoff")" "result")
if [[ "$dep_result" != "complete" ]]; then
unsatisfied+="- step $dep: handoff result is '$dep_result' (not complete): $dep_handoff"$'\n'
fi
done
prev_handoff_path="(none)"
prev_handoff="(none - this is the first step)"
prev_file=""
prev_num=0
while IFS= read -r h; do
hn="${h##*/}"
hn="${hn%%-*}"
[[ "$hn" =~ ^[0-9]+$ ]] || continue
n=$((10#$hn))
if (( n < step_number && n >= prev_num )); then
prev_num=$n
prev_file="$h"
fi
done < <(find "$handoffs_dir" -maxdepth 1 -name '*.md' 2>/dev/null | sort)
if [[ -n "$prev_file" ]]; then
prev_handoff_path="$prev_file"
prev_handoff=$(head -c 16000 "$prev_file")
fi
notes="(none)"
[[ -f "$notes_path" ]] && notes=$(head -c 8000 "$notes_path")
step_plan=$(head -c 24000 "$plan_file")
handoff_path="$handoffs_dir/$(basename "$plan_file")"
tmp=$(mktemp)
awk 'BEGIN{n=0} /^---[[:space:]]*$/{n++; print; next} n==1 && /^status:/{print "status: in-progress"; next} {print}' "$plan_file" > "$tmp" && mv "$tmp" "$plan_file"
next_node="orient"
blocking_reason=""
if [[ -n "$unsatisfied" ]]; then
next_node="gate_blocked"
blocking_reason="Unsatisfied dependencies:"$'\n'"$unsatisfied"
fi
jq -nc \
--arg pd "$project_dir" \
--arg pl "$plans_dir" \
--argjson sn "$step_number" \
--arg ss "$step_slug" \
--arg st "$step_title" \
--arg spp "$plan_file" \
--arg sp "$step_plan" \
--arg php "$prev_handoff_path" \
--arg ph "$prev_handoff" \
--arg np "$notes_path" \
--arg no "$notes" \
--arg hp "$handoff_path" \
--arg br "$blocking_reason" \
--arg nx "$next_node" \
'{
"project_dir": $pd,
"plans_dir": $pl,
"step_number": $sn,
"step_slug": $ss,
"step_title": $st,
"step_plan_path": $spp,
"step_plan": $sp,
"prev_handoff_path": $php,
"prev_handoff": $ph,
"notes_path": $np,
"notes": $no,
"handoff_path": $hp,
"blocking_reason": $br,
"_next": $nx
}'
+27
View File
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
feedback=$(echo "$state" | jq -r '.user_feedback // ""')
if [[ -z "$feedback" ]]; then
jq -nc '{"_next": "get_revision"}'
exit 0
fi
fix_instructions=$(printf '## Revision requested by the user at the step approval gate\n\nAddress these comments with minimal edits, then the step re-verifies and the handoff is rewritten:\n\n%s' \
"$feedback")
jq -nc \
--arg 'fi' "$fix_instructions" \
'{
"fix_instructions": $fi,
"_next": "implement"
}'
+27
View File
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
coder_result=$(echo "$state" | jq -r '.coder_result // ""')
case "$coder_result" in
*CODER_COMPLETE*)
jq -nc '{"_next": "verify_format_lint"}'
;;
*CODER_REJECTED*)
jq -nc '{"_next": "end_rejected"}'
;;
*CODER_FAILED*)
jq -nc '{"blocking_reason": "coder fix-loop exhausted; see coder result", "_next": "end_failure"}'
;;
*)
jq -nc '{"blocking_reason": "coder returned no recognizable sentinel (expected CODER_COMPLETE / CODER_REJECTED / CODER_FAILED)", "_next": "end_failure"}'
;;
esac
+38
View File
@@ -0,0 +1,38 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
review_report=$(echo "$state" | jq -r '.review_report // ""')
review_attempts=$(echo "$state" | jq -r '.review_attempts // 0')
max_review_attempts=$(echo "$state" | jq -r '.max_review_attempts // 1')
if ! grep -qF "🔴" <<< "$review_report"; then
jq -nc '{"_next": "write_handoff"}'
exit 0
fi
if (( review_attempts >= max_review_attempts )); then
jq -nc '{"_next": "write_handoff"}'
exit 0
fi
next_review=$((review_attempts + 1))
fix_instructions=$(printf '## Independent review findings (attempt %d of %d)\n\nAn independent reviewer flagged CRITICAL (🔴) findings. Address ONLY the 🔴 findings with minimal edits. Do not refactor unrelated code.\n\n%s' \
"$next_review" "$max_review_attempts" "$review_report")
jq -nc \
--argjson n "$next_review" \
--arg 'fi' "$fix_instructions" \
'{
"review_attempts": $n,
"fix_instructions": $fi,
"needs_independent_review": false,
"_next": "implement"
}'
+23
View File
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
has_major=$(echo "$state" | jq -r '.has_major_deviation // false')
if [[ "${STEP_AUTOAPPROVE:-0}" == "1" ]]; then
jq -nc '{"_next": "implement"}'
exit 0
fi
if [[ "$has_major" == "true" ]]; then
jq -nc '{"_next": "gate_deviation"}'
else
jq -nc '{"_next": "implement"}'
fi
+23
View File
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
needs_review=$(echo "$state" | jq -r '.needs_independent_review // false')
if [[ "${STEP_SKIP_REVIEW:-0}" == "1" ]]; then
jq -nc '{"_next": "write_handoff"}'
exit 0
fi
if [[ "$needs_review" == "true" ]]; then
jq -nc '{"_next": "independent_review"}'
else
jq -nc '{"_next": "write_handoff"}'
fi
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
set -uo pipefail
# shellcheck disable=SC1091
source "$(dirname "$0")/../../.shared/utils.sh"
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
if [[ -n "${BUILD_CMD:-}" ]]; then
cmd="$BUILD_CMD"
else
project_info=$(detect_project "$project_dir")
cmd=$(echo "$project_info" | jq -r '.check // .build // ""')
fi
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
jq -nc '{
"build_ok": true,
"build_output": "(no build/check command available for this project type)",
"_next": "verify_tests"
}'
exit 0
fi
exit_code=0
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
if (( exit_code == 0 )); then
jq -nc \
--arg out "Ran: $cmd
$output" \
'{
"build_ok": true,
"build_output": $out,
"_next": "verify_tests"
}'
else
jq -nc \
--arg out "Ran: $cmd
Exit code: $exit_code
$output" \
'{
"build_ok": false,
"build_output": $out,
"_next": "fix_loop_gate"
}'
fi
+79
View File
@@ -0,0 +1,79 @@
#!/usr/bin/env bash
set -uo pipefail
# shellcheck disable=SC1091
source "$(dirname "$0")/../../.shared/utils.sh"
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
project_type=$(detect_project "$project_dir" | jq -r '.type // "unknown"')
format_cmd="${FORMAT_CMD:-}"
if [[ -z "$format_cmd" ]]; then
case "$project_type" in
rust) format_cmd="cargo fmt" ;;
go) format_cmd="gofmt -w ." ;;
python) command -v ruff &>/dev/null && format_cmd="ruff format ." ;;
esac
fi
if [[ -z "$format_cmd" ]]; then
format_output="(no format command configured for project type '$project_type'; skipped. Set FORMAT_CMD to enable.)"
else
fmt_rc=0
fmt_out=$(cd "$project_dir" && eval "$format_cmd" 2>&1) || fmt_rc=$?
format_output="Ran: $format_cmd
Exit code: $fmt_rc
$fmt_out"
fi
lint_cmd="${LINT_CMD:-}"
if [[ -z "$lint_cmd" ]]; then
jq -nc \
--arg fo "$format_output" \
'{
"format_output": $fo,
"lint_ok": true,
"lint_output": "(no LINT_CMD configured; linting is covered by the build/check command)",
"_next": "verify_build"
}'
exit 0
fi
lint_rc=0
lint_out=$(cd "$project_dir" && eval "$lint_cmd" 2>&1) || lint_rc=$?
if (( lint_rc == 0 )); then
jq -nc \
--arg fo "$format_output" \
--arg lo "Ran: $lint_cmd
$lint_out" \
'{
"format_output": $fo,
"lint_ok": true,
"lint_output": $lo,
"_next": "verify_build"
}'
else
jq -nc \
--arg fo "$format_output" \
--arg lo "Ran: $lint_cmd
Exit code: $lint_rc
$lint_out" \
'{
"format_output": $fo,
"lint_ok": false,
"lint_output": $lo,
"_next": "fix_loop_gate"
}'
fi
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
set -uo pipefail
# shellcheck disable=SC1091
source "$(dirname "$0")/../../.shared/utils.sh"
if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
state=$(cat "$GRAPH_STATE_FILE")
elif [[ -n "${GRAPH_STATE:-}" ]]; then
state="$GRAPH_STATE"
else
state='{}'
fi
project_dir=$(echo "$state" | jq -r '.project_dir // "."')
if [[ -n "${TEST_CMD:-}" ]]; then
cmd="$TEST_CMD"
else
project_info=$(detect_project "$project_dir")
cmd=$(echo "$project_info" | jq -r '.test // ""')
fi
if [[ -z "$cmd" || "$cmd" == "null" ]]; then
jq -nc '{
"tests_ok": true,
"tests_output": "(no test command available for this project type)",
"_next": "edge_case_sweep"
}'
exit 0
fi
exit_code=0
output=$(cd "$project_dir" && eval "$cmd" 2>&1) || exit_code=$?
if (( exit_code == 0 )); then
jq -nc \
--arg out "Ran: $cmd
$output" \
'{
"tests_ok": true,
"tests_output": $out,
"_next": "edge_case_sweep"
}'
else
jq -nc \
--arg out "Ran: $cmd
Exit code: $exit_code
$output" \
'{
"tests_ok": false,
"tests_output": $out,
"_next": "fix_loop_gate"
}'
fi
+5
View File
@@ -18,6 +18,11 @@
"type": "stdio", "type": "stdio",
"command": "uvx", "command": "uvx",
"args": ["duckduckgo-mcp-server"] "args": ["duckduckgo-mcp-server"]
},
"iwe": {
"type": "stdio",
"command": "iwec",
"args": ["--project", "."]
} }
} }
} }
+81
View File
@@ -0,0 +1,81 @@
#!/usr/bin/env bash
set -e
# @describe Structural code search using AST patterns (ast-grep). Matches syntax trees, not text,
# so it finds code regardless of formatting: function calls with any arguments, definitions, etc.
# Use meta-variables in patterns: $NAME matches one AST node, $$$ matches zero or more nodes.
# Patterns must be COMPLETE, valid AST nodes in the target language: 'fn $NAME($$$) { $$$ }'
# matches Rust fn definitions (with body - 'fn $NAME($$$)' alone parses as nothing and matches
# nothing), 'foo($$$)' matches all calls to foo, '$X.unwrap()' matches all unwrap calls.
# Prefer this over fs_grep when searching for code STRUCTURE (calls, definitions, signatures);
# use fs_grep for plain text, comments, or strings.
# @option --pattern! The AST pattern to search for (must parse as valid code in the target language)
# @option --lang The target language (e.g. rust, typescript, tsx, javascript, python, go, java, c, cpp, kotlin, swift, ruby, php, css, html, yaml, json). Strongly recommended; without it files of every supported language are scanned
# @option --path The directory OR file to search in (defaults to current working directory)
# @option --glob File glob to narrow the search (e.g. "src/**/*.rs", "!**/tests/**")
# @env LLM_OUTPUT=/dev/stdout The output path
MAX_RESULTS=100
MAX_OUTPUT_BYTES=32768
resolve_binary() {
if command -v ast-grep &>/dev/null; then
echo "ast-grep"
return 0
fi
if command -v sg &>/dev/null && sg --version 2>/dev/null | grep -qi 'ast-grep'; then
echo "sg"
return 0
fi
return 1
}
main() {
# shellcheck disable=SC2154
local pattern="$argc_pattern"
local lang="${argc_lang:-}"
local search_path="${argc_path:-.}"
local glob="${argc_glob:-}"
local bin
if ! bin=$(resolve_binary); then
printf 'ast-grep is not installed. Fall back to fs_grep for this search.\nTo enable structural search, install ast-grep:\n cargo install ast-grep --locked\n brew install ast-grep\n npm i -g @ast-grep/cli\n' >> "$LLM_OUTPUT"
return 0
fi
if [[ ! -e "$search_path" ]]; then
echo "Error: path not found: $search_path" >> "$LLM_OUTPUT"
return 1
fi
local args=(run --pattern "$pattern" --color never --heading never)
[[ -n "$lang" ]] && args+=(--lang "$lang")
[[ -n "$glob" ]] && args+=(--globs "$glob")
args+=("$search_path")
local output exit_code=0
output=$("$bin" "${args[@]}" 2>&1) || exit_code=$?
if [[ -z "$output" ]]; then
echo "No structural matches found for: $pattern" >> "$LLM_OUTPUT"
return 0
fi
if (( exit_code > 1 )); then
printf 'ast-grep failed (exit %s):\n%s\n\nHint: the pattern must be valid %s syntax. Meta-variables: $NAME (one node), $$$ (zero or more).\n' \
"$exit_code" "$output" "${lang:-source}" >> "$LLM_OUTPUT"
return 0
fi
local total
total=$(wc -l <<< "$output")
output=$(head -n "$MAX_RESULTS" <<< "$output" | head -c "$MAX_OUTPUT_BYTES")
echo "$output" >> "$LLM_OUTPUT"
if (( total > MAX_RESULTS )); then
printf '\n(Showing %s of %s matching lines. Narrow with --glob, --lang, or a more specific pattern.)\n' \
"$MAX_RESULTS" "$total" >> "$LLM_OUTPUT"
fi
}
+93
View File
@@ -0,0 +1,93 @@
---
name: diagnose
temperature: 0.2
enabled_tools:
- execute_command
- fs_cat
- fs_ls
- web_search_coyote
skills_enabled: false
auto_continue: true
max_auto_continues: 10
---
You are an expert systems troubleshooter: equal parts SRE, sysadmin, network engineer, and homelab tinkerer. Your job
is to diagnose and fix technical problems of any kind: services that won't start, networking failures, container
issues, driver problems, permission errors, misbehaving hardware, broken configs, or anything else. You are not limited
to code.
<system>
os: {{__os__}}
distro: {{__os_distro__}}
arch: {{__arch__}}
shell: {{__shell__}}
cwd: {{__cwd__}}
now: {{__now__}}
</system>
## Prime Directive
**You run the diagnostics yourself.** Never tell the user to run a command and paste the output back. Use the
`execute_command` tool to gather evidence directly, then interpret the results for them. The user should watch you
work, not act as your terminal.
## Diagnostic Loop
Work the loop until the problem is solved or genuinely blocked:
1. **Reproduce & observe.** Run the failing thing (or inspect its state) to see the actual error with your own eyes.
Never diagnose from the user's paraphrase alone.
2. **Establish what changed.** Most breakage follows a change: updates, config edits, reboots, new hardware, expired
certs/leases. Check timestamps, package logs, and recent history early.
3. **Check the dumb stuff first.** Is the service running? Is it enabled? Is the interface up? Is the disk full? Is
DNS resolving? Is the clock right? Cheap checks before deep theories.
4. **Isolate by layer.** Split the problem space in half with each test:
- Networking: bottom-up — link → IP/DHCP → routing → DNS → transport → application.
- Software: process alive? → logs → config → dependencies/permissions → environment → binary itself.
- Containers: daemon → image → container state → logs → mounts/networks → host resources.
5. **Hypothesize, then test.** State your current best hypothesis in one line before each test, and change ONE
variable at a time. If a test disproves the hypothesis, say so and pivot; don't quietly move on.
6. **Fix the root cause, not the symptom.** A restart that "fixes" it without explanation is a data point, not a fix.
7. **Verify.** After any fix, re-run the original failing operation and confirm it now works. No verification, no
victory declaration.
## Evidence Gathering
- Primary sources, in rough order of value: exit codes and stderr, service/app logs (`journalctl`, `docker logs`,
files under `/var/log`), kernel messages (`dmesg`), state inspection (`systemctl status`, `ip`, `ss`, `df`, `free`,
`lsblk`, `nmcli`, `docker ps/inspect`), then config files.
- Make every command non-interactive and bounded: `--no-pager` for `journalctl`/`systemctl`, `-n`/`--since` to limit
log output, `timeout 10 ...` for anything that might hang, `-c` counts for `ping`. Never launch interactive TUIs
(top, htop, lazydocker itself) — use their batch/one-shot modes or underlying CLIs instead.
- Prefer unprivileged commands. When root is genuinely required, say why and use `sudo` (the user may get a password
prompt in their terminal — that's expected).
- Search the web for exact error strings (quoted, with software name and version) when an error is unfamiliar or
smells like a known bug or recent regression. Distro wikis, GitHub issues, and bug trackers beat guessing.
## Safety Rules
Commands fall into three tiers:
1. **Read-only / inspection** (status, logs, listing, ping, dig, cat): run freely, no permission needed.
2. **Reversible state changes** (restart a service, bounce an interface, recreate a container, edit a config after
backing it up): announce what you're about to do and why in one sentence, then do it. Back up any file before
modifying it (`cp file file.bak.$(date +%s)`).
3. **Destructive or hard-to-reverse actions** (deleting data or volumes, formatting, `dd`, partitioning, package
removal, firewall flushes, forced resets): STOP and ask for explicit confirmation first, including the exact
command and a rollback plan. Never run these on your own judgment.
Additional hard rules:
- Never print or transmit secrets. If command output contains tokens, keys, or passwords, redact them in your response.
- Never disable security controls (firewalls, SELinux/AppArmor, certificate validation) as a "fix" — at most as a
temporary, clearly-labeled isolation test, restored immediately after.
- If the evidence points to failing hardware or risk of data loss, stop, say so plainly, and present options before
touching anything else.
## Communication
- Lead with what you found, not what you did. Then show the key evidence: the command and the relevant lines of its
output (trimmed — never dump walls of text).
- When the problem is multi-step, keep a running todo list so the user can follow the investigation.
- On resolution, close with a short summary: **root cause → fix applied → how it was verified → how to prevent it**.
- If you're blocked (needs physical access, a password you don't have, a reboot decision), say exactly what you need
and what you'll do once you have it.
+60 -50
View File
@@ -5,7 +5,7 @@
# sbx cp $HOME/.config/coyote/ testing:/home/agent/.config/ # sbx cp $HOME/.config/coyote/ testing:/home/agent/.config/
# sbx cp $HOME/.coyote_password testing:/home/agent/ # sbx cp $HOME/.coyote_password testing:/home/agent/
# sbx run testing --kit ./sbx-kit/ # sbx run testing --kit ./sbx-kit/
schemaVersion: '1' schemaVersion: "1"
kind: sandbox kind: sandbox
name: coyote name: coyote
displayName: Coyote displayName: Coyote
@@ -14,10 +14,10 @@ description: >
CLI & REPL mode, RAG, AI tools & agents, MCP servers, skills, and macros. CLI & REPL mode, RAG, AI tools & agents, MCP servers, skills, and macros.
sandbox: sandbox:
image: 'docker/sandbox-templates:shell-docker' image: "docker/sandbox-templates:shell-docker"
aiFilename: COYOTE.md aiFilename: COYOTE.md
entrypoint: entrypoint:
run: ['bash', '-lc', 'exec /home/agent/.cargo/bin/coyote'] run: ["bash", "-lc", "exec /home/agent/.cargo/bin/coyote"]
network: network:
# Proxy-managed LLM providers: the proxy substitutes `proxy-managed` for # Proxy-managed LLM providers: the proxy substitutes `proxy-managed` for
@@ -50,96 +50,96 @@ network:
serviceAuth: serviceAuth:
openai: openai:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
anthropic: anthropic:
headerName: x-api-key headerName: x-api-key
valueFormat: '%s' valueFormat: "%s"
gemini: gemini:
headerName: x-goog-api-key headerName: x-goog-api-key
valueFormat: '%s' valueFormat: "%s"
cohere: cohere:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
groq: groq:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
openrouter: openrouter:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
ai21: ai21:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
cloudflare: cloudflare:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
deepinfra: deepinfra:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
deepseek: deepseek:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
mistral: mistral:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
perplexity: perplexity:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
voyageai: voyageai:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
xai: xai:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
jina: jina:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
ernie: ernie:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
hunyuan: hunyuan:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
minimax: minimax:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
moonshot: moonshot:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
qianwen: qianwen:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
zhipuai: zhipuai:
headerName: Authorization headerName: Authorization
valueFormat: 'Bearer %s' valueFormat: "Bearer %s"
allowedDomains: allowedDomains:
# Coyote release + self-update + model-registry sync # Coyote release + self-update + model-registry sync
- 'github.com:443' - "github.com:443"
- 'api.github.com:443' - "api.github.com:443"
- 'raw.githubusercontent.com:443' - "raw.githubusercontent.com:443"
- 'objects.githubusercontent.com:443' - "objects.githubusercontent.com:443"
- '*.githubusercontent.com:443' - "*.githubusercontent.com:443"
# Coyote install paths (cargo install + uv + rustup + Python tool deps at runtime) # Coyote install paths (cargo install + uv + rustup + Python tool deps at runtime)
- 'crates.io:443' - "crates.io:443"
- 'static.crates.io:443' - "static.crates.io:443"
- 'pypi.org:443' - "pypi.org:443"
- 'files.pythonhosted.org:443' - "files.pythonhosted.org:443"
- 'astral.sh:443' - "astral.sh:443"
- 'sh.rustup.rs:443' - "sh.rustup.rs:443"
- 'static.rust-lang.org:443' - "static.rust-lang.org:443"
# LLM model OAuth + API endpoints # LLM model OAuth + API endpoints
- 'claude.ai:443' - "claude.ai:443"
- 'console.anthropic.com:443' - "console.anthropic.com:443"
- 'accounts.google.com:443' - "accounts.google.com:443"
# *.googleapis.com covers oauth2 + userinfo + VertexAI regional endpoints # *.googleapis.com covers oauth2 + userinfo + VertexAI regional endpoints
# (*-aiplatform.googleapis.com). Do not narrow without re-checking VertexAI. # (*-aiplatform.googleapis.com). Do not narrow without re-checking VertexAI.
- '*.googleapis.com:443' - "*.googleapis.com:443"
# Bedrock and GitHub Models use signed / GitHub-PAT auth that the proxy # Bedrock and GitHub Models use signed / GitHub-PAT auth that the proxy
# cannot rewrite. Domains are allow-listed; credentials must be injected # cannot rewrite. Domains are allow-listed; credentials must be injected
# separately (see README "Extending"). # separately (see README "Extending").
- '*.amazonaws.com:443' - "*.amazonaws.com:443"
- 'models.inference.ai.azure.com:443' - "models.inference.ai.azure.com:443"
credentials: credentials:
sources: sources:
@@ -210,7 +210,7 @@ credentials:
environment: environment:
variables: variables:
IS_SANDBOX: '1' IS_SANDBOX: "1"
COYOTE_LOG_LEVEL: INFO COYOTE_LOG_LEVEL: INFO
COYOTE_CONFIG_DIR: /home/agent/.config/coyote COYOTE_CONFIG_DIR: /home/agent/.config/coyote
proxyManaged: proxyManaged:
@@ -250,7 +250,7 @@ commands:
libssl-dev \ libssl-dev \
pandoc \ pandoc \
bzip2 bzip2
user: '1000' user: "1000"
description: Install system prerequisites (including pandoc for fetch_url_via_curl) description: Install system prerequisites (including pandoc for fetch_url_via_curl)
- command: | - command: |
curl -LsSf https://astral.sh/uv/install.sh | sh curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -258,7 +258,7 @@ commands:
printf '#!/bin/sh\nexec uv tool run "$@"\n' > "$HOME/.local/bin/uvx" printf '#!/bin/sh\nexec uv tool run "$@"\n' > "$HOME/.local/bin/uvx"
chmod +x "$HOME/.local/bin/uvx" chmod +x "$HOME/.local/bin/uvx"
fi fi
user: '1000' user: "1000"
description: Install uv and write a uvx shell wrapper (the installer may place a macOS binary at this path on Docker-for-Mac hosts, which the Linux container cannot execute) description: Install uv and write a uvx shell wrapper (the installer may place a macOS binary at this path on Docker-for-Mac hosts, which the Linux container cannot execute)
- command: | - command: |
set -euo pipefail set -euo pipefail
@@ -274,7 +274,7 @@ commands:
curl -fsSL --retry 3 "https://github.com/xo/usql/releases/download/v${USQL_VERSION}/usql_static-${USQL_VERSION}-linux-${USQL_ARCH}.tar.bz2" -o "$TMPDIR/usql.tar.bz2" curl -fsSL --retry 3 "https://github.com/xo/usql/releases/download/v${USQL_VERSION}/usql_static-${USQL_VERSION}-linux-${USQL_ARCH}.tar.bz2" -o "$TMPDIR/usql.tar.bz2"
tar -xjf "$TMPDIR/usql.tar.bz2" -C "$TMPDIR" tar -xjf "$TMPDIR/usql.tar.bz2" -C "$TMPDIR"
sudo install -m 0755 "$TMPDIR/usql_static" /usr/local/bin/usql sudo install -m 0755 "$TMPDIR/usql_static" /usr/local/bin/usql
user: '1000' user: "1000"
description: Install the usql universal SQL CLI (used by the built-in sql agent and execute_sql_code tool) description: Install the usql universal SQL CLI (used by the built-in sql agent and execute_sql_code tool)
- command: | - command: |
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | \ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | \
@@ -284,17 +284,27 @@ commands:
--target x86_64-unknown-linux-musl --target x86_64-unknown-linux-musl
. "$HOME/.cargo/env" . "$HOME/.cargo/env"
cargo install --locked coyote-ai cargo install --locked coyote-ai
user: '1000' user: "1000"
description: Install Coyote AI CLI via Rust's Cargo description: Install Coyote AI CLI via Rust's Cargo
- command: |
. "$HOME/.cargo/env"
cargo install --locked iwec
user: "1000"
description: Install the IWE MCP server binary (iwec) used by the built-in iwe MCP server and iwe-knowledge-base skill
- command: |
. "$HOME/.cargo/env"
cargo install --locked ast-grep
user: "1000"
description: Install ast-grep, used by the built-in ast_grep structural code search tool (and the explore agent)
startup: startup:
- command: - command:
[ [
'sh', "sh",
'-c', "-c",
'test -f "$HOME/.config/coyote/config.yaml" || coyote --info >/dev/null 2>&1 || true', 'test -f "$HOME/.config/coyote/config.yaml" || coyote --info >/dev/null 2>&1 || true',
] ]
user: '1000' user: "1000"
background: false background: false
description: Bootstrap Coyote config directory on first sandbox start description: Bootstrap Coyote config directory on first sandbox start
+1 -1
View File
@@ -37,7 +37,7 @@ Every `agent__spawn` result includes a session_id. **Use it.**
Starting a fresh agent for a follow-up forces it to re-read every file it already read. That's 70%+ wasted tokens, plus the agent loses the reasoning it built up. Starting a fresh agent for a follow-up forces it to re-read every file it already read. That's 70%+ wasted tokens, plus the agent loses the reasoning it built up.
After every delegation, **store the session_id** for potential continuation. After every delegation, **store the session_id compression-safe** for potential continuation. Long sessions compress: chat history gets replaced by a summary, and a session_id that exists only in chat history is unresumable afterward. Embed it in the todo item for that work — `todo__add "Implement auth endpoint (coder ses_abc123)"` — or in your run-state memory file. The todo list and memory survive compression; the conversation does not.
## Skill nudges to delegates ## Skill nudges to delegates
+40
View File
@@ -0,0 +1,40 @@
---
description: Systematic troubleshooting of technical issues (services, networking, containers, OS) by running diagnostic commands directly instead of asking the user to.
enabled_tools: execute_command
---
A technical problem needs diagnosing. Apply this methodology strictly. Use the `execute_command` tool to gather
evidence yourself — never ask the user to run commands and paste output back.
## Loop
1. **Reproduce first.** Run the failing thing and read the actual error before theorizing.
2. **Ask "what changed?"** Updates, config edits, reboots, expirations. Check recent history early.
3. **Cheap checks first.** Service running/enabled? Interface up? Disk full? DNS resolving? Clock right?
4. **Isolate by layer, one variable at a time.** Network: link → IP → routing → DNS → transport → app.
Software: process → logs → config → deps/permissions → environment. Containers: daemon → image → container →
logs → mounts/networks → host.
5. **State each hypothesis in one line before testing it.** Pivot openly when disproved.
6. **Fix root cause, then verify** by re-running the original failing operation. No verification, no fix.
## Command Discipline
- Non-interactive and bounded, always: `--no-pager`, `-n`/`--since` on logs, `timeout 10` on anything that might
hang, `-c` on ping. No TUIs — use batch modes.
- Unprivileged first; `sudo` only when required, stating why.
- Web-search exact quoted error strings (with software name + version) for unfamiliar errors.
## Safety Tiers
1. **Read-only** (status, logs, ls, cat, ping, dig): run freely.
2. **Reversible changes** (service restart, interface bounce, config edit): announce in one sentence, back up files
first (`cp file file.bak.$(date +%s)`), then do it.
3. **Destructive** (data/volume deletion, formatting, `dd`, package removal, firewall flush): require explicit user
confirmation with the exact command and a rollback plan. Never on your own judgment.
Redact any secrets appearing in command output. Never disable security controls as a "fix". Stop and present options
if evidence suggests failing hardware or data-loss risk.
## Reporting
Lead with findings, show trimmed key evidence, and close resolved issues with: root cause → fix → verification →
prevention.
+78
View File
@@ -0,0 +1,78 @@
---
description: Schema and discipline for writing and reading step handoff documents - the only channel between implementation steps. Evidence must be pasted, downstream plan changes proposed not imposed. Grants filesystem access for reading and writing handoffs.
enabled_tools: fs_read, fs_cat, fs_ls, fs_write
---
A handoff is the ONLY channel between step N and step N+1. The next executor runs in a fresh session: it sees the plan repo, the code, and this document — nothing else. Whatever you learned that isn't in the handoff (or in `plans/NOTES.md`) is lost. Write accordingly.
Handoffs live in `plans/handoffs/`, named to match their step plan: `plans/handoffs/03-<slug>.md` for `plans/steps/03-<slug>.md`.
## Required schema (writer)
Frontmatter:
```yaml
---
step: 3
title: Add retry policy to the fetch client
result: complete # complete | partial | blocked
---
```
Sections, all mandatory (write "None" rather than omitting — an absent section is indistinguishable from a forgotten one):
| Section | Contents |
|---|---|
| Summary | 2-4 sentences: what exists now that didn't before |
| Completed | Task-by-task, mirroring the plan's Tasks section |
| Not completed | Deferred or dropped tasks, each WITH a reason |
| Deviations | Every departure from the plan: what the plan said, what you did, why |
| Downstream plan updates | Edge-case annotations made directly (which plan, which section) and proposed diffs awaiting approval (see below) |
| Edge cases discovered | Found during implementation — including ones you handled, so the next step knows they're covered |
| Evidence | Pasted verbatim: format/lint/build/test commands, exit codes, salient output lines. Note pre-existing failures explicitly |
| Notes for next step | Warnings, gotchas, invariants the next executor must not violate |
## Evidence rules
Assertions are not evidence. "Tests pass" is a claim; this is evidence:
```
$ cargo test
...
test result: ok. 47 passed; 0 failed; exit code 0
```
- Paste the command, the exit code, and the decisive output lines (not the full log).
- Evidence must reflect the FINAL state of the code — collected after formatting and linting, re-collected after any post-review fix.
- If a check was skipped (no formatter configured, etc.), say so explicitly.
## Downstream plan updates: annotate vs propose
Two classes, with different authority:
- **Annotations (make directly).** Adding an entry to a later plan's Edge cases section. Additive, non-scope-changing. Record each in Downstream plan updates.
- **Proposals (never apply directly).** Anything touching a later plan's Objective, Tasks, Acceptance criteria, or Out of scope. Write the change as a fenced before/after diff in Downstream plan updates and flag it at the approval gate. The user applies or rejects it.
The executor who rationalizes a shortcut must not be able to quietly rewrite the spec they'll be judged against — that is why scope changes route through the user.
## Rolling notes vs handoff
- **Handoff**: step-scoped. What happened in THIS step.
- **`plans/NOTES.md`**: durable, step-independent facts ("config loader lowercases all keys", "integration tests need docker running"). Append; never rewrite others' entries. Without this file, facts discovered in step 2 are invisible to step 7, because step 7 reads only step 6's handoff.
## Reading a handoff (start of a step)
1. Check `result`. `partial` or `blocked` → read Not completed first; your plan's `depends_on` may not actually be satisfied. Escalate rather than build on missing ground.
2. Trust what has pasted evidence. Re-verify bare assertions before depending on them.
3. Apply Notes for next step and any approved proposals aimed at your step, BEFORE the staleness check.
4. Treat Deviations as corrections to your mental model of the codebase — the plans upstream of you described code that no longer exists as written.
5. Read `plans/NOTES.md` — handoffs chain pairwise; the rolling notes are the only cumulative memory.
## Anti-patterns
- "All tests pass" with nothing pasted — a claim, not a handoff
- Omitting a section instead of writing "None" — forgotten or empty, the reader can't tell
- Editing a later plan's Tasks or scope directly instead of proposing a diff
- Burying a major deviation in prose instead of the Deviations section
- Durable facts in the handoff only — lost after one more step
- Evidence collected before the formatter ran — the pasted output describes bytes that no longer exist
- Writing the handoff before the completion gate (todos done or deferred-with-reason) is satisfied
+65
View File
@@ -0,0 +1,65 @@
---
description: Navigate and curate markdown knowledge bases (plan repos, spec repos, companion docs) with IWE graph tools. Load when the workspace is or contains a markdown knowledge base and the task involves finding, reading, or reorganizing plans, specs, designs, or notes. Activates the iwe MCP server rooted at the current directory.
enabled_mcp_servers: iwe
---
You are working with a markdown knowledge base through IWE, a graph-based knowledge tool. The `iwe` MCP server is rooted at the current working directory (`--project .`), so the knowledge base is the directory Coyote was launched in. IWE derives structure from links: a link on its own line is an *inclusion link* (parent-child hierarchy); a link inside text is an *inline reference* (cross-reference, produces backlinks). The server watches the filesystem, so external edits are picked up automatically — never ask for a restart.
## When to use this (and when not)
Use IWE tools when the task involves a corpus of markdown documents: plan repositories, spec/design collections, companion docs repos, meeting notes, PKM vaults.
Do NOT use IWE tools for:
- **Agent memory** (`.coyote/memory/`, `COYOTE.md`) — use the `memory__*` tools; they own the index conventions there.
- **Semantic/similarity search over documents** — that is RAG's job. IWE search is fuzzy title/key matching plus structural traversal, not embeddings.
- **Source code** — IWE only understands markdown.
If unsure whether the current directory is actually a knowledge base, probe with `iwe_stats` first. Few or zero documents means this skill does not apply; unload it rather than forcing the tools.
## Orientation protocol (always start here)
Never guess document keys. Orient first:
1. `iwe_stats` — corpus size and shape. Cheap sanity check.
2. `iwe_find(query="<topic>")` — fuzzy search for entry points. Use `roots` behavior via structural selectors when you want top-level topics only.
3. `iwe_tree(key="<entry>", max_depth=2)` — see the hierarchy before reading bodies.
4. `iwe_retrieve(key="<entry>", depth=1, context=1)` — read with structure.
## Reading efficiently
`iwe_retrieve` is the workhorse. Control cost explicitly:
- `depth` — how many levels of included children to expand. Start at 1-2; increase only if needed.
- `context` — parent levels to include, so you know where a document sits. `context=1` is usually enough.
- `max_tokens` — ALWAYS set a budget (e.g. 2000-4000) on large corpora; results report truncation so you can drill further deliberately.
- `exclude` — pass keys you have already read to avoid re-retrieving known content.
- `links` / `backlinks` — include outbound/inbound references when tracing how a topic connects.
Scope searches structurally with selectors on `iwe_find`/`iwe_retrieve`/`iwe_tree`:
- `in` — only sub-documents of EVERY listed key (AND)
- `in_any` — sub-documents of at least one key (OR)
- `not_in` — exclude subtrees (e.g. archives)
Filter by frontmatter with the YAML query language: `status: draft`, `created: {$gte: "2026-01-01"}`, `tags: {$in: [urgent]}`, `reviewed: {$exists: true}`.
Use `iwe_squash(key=...)` to flatten a subtree into one linear document — good for producing a full plan readout or summary input.
## Writing and refactoring
Write tools: `iwe_create` (new doc from title + content), `iwe_update` (replace a doc's content), `iwe_delete` (remove + clean up references). Refactor tools: `iwe_rename` (key rename with automatic link updates everywhere), `iwe_extract` (split a section into its own doc, leaving an inclusion link), `iwe_inline` (merge a referenced doc back into its parent), `iwe_normalize` (reformat all docs consistently).
Rules:
- **Preview destructive operations**: `iwe_rename`, `iwe_delete`, `iwe_extract`, `iwe_inline`, and `iwe_normalize` support `dry_run` — use it first, show the user what will change, then apply.
- Never rename or delete by editing files directly; the refactor tools update every referencing document, manual edits break links.
- When adding a document, link it from an existing parent (inclusion link on its own line) so it joins the hierarchy instead of becoming an orphan.
- Match the corpus conventions: check an existing document's frontmatter fields before inventing your own schema.
- Do not run `iwe_normalize` across someone's knowledge base unprompted — it rewrites every file's formatting.
## Anti-patterns
- Retrieving with `depth=5` and no `max_tokens` "to get everything" — you will flood the context. Iterate: shallow first, drill selectively.
- Calling `iwe_find` repeatedly with rephrased queries when structural navigation (`iwe_tree`, selectors) would locate the document deterministically.
- Using IWE write tools on `.coyote/memory/` files — wrong tier; that corrupts the memory index.
- Creating documents without linking them into the hierarchy — orphans are invisible to depth-based retrieval.
+82
View File
@@ -0,0 +1,82 @@
---
description: Author executable high-level plans and per-step implementation plans for phased work. Defines the plan repo layout and step-plan schema. Grants filesystem access for grounding plans in real code.
enabled_tools: fs_read, fs_grep, fs_glob, fs_ls, fs_cat, fs_write
---
You are writing implementation plans that a DIFFERENT agent will execute later, in a fresh session, with zero access to this conversation. The plan IS the executor's entire context. A plan that needs the conversation to make sense is a broken plan.
## Plan repo layout
Default layout (match the existing layout instead if the repo already has one):
```
plans/
plan.md # high-level plan; links each step plan
steps/01-<slug>.md # one file per step, numbered in execution order
handoffs/ # written by executors; see `handoff-protocol`
NOTES.md # rolling durable facts discovered during execution
```
In `plan.md`, link each step plan with an inclusion link (the link alone on its own line). This makes the plan repo an IWE hierarchy — agents navigating a large plan corpus can load `iwe-knowledge-base` and traverse it structurally instead of globbing.
## High-level plan requirements
- Ordered list of steps. Each step is independently implementable and independently verifiable — it compiles and its tests pass WITHOUT any later step existing.
- The dependency graph is explicit and acyclic. If step 4 needs step 2's API, step 4's plan says so.
- Steps are sized for one focused session: roughly 1-5 files of meaningful change. A step that needs "and then also..." is two steps.
- State what the plan does NOT cover. Scope creep starts where scope boundaries are implicit.
## Step plan schema
Every step plan starts with frontmatter:
```yaml
---
step: 3
title: Add retry policy to the fetch client
depends_on: [1, 2]
status: pending # pending | in-progress | complete
---
```
And contains these sections, all mandatory:
| Section | Contents |
|---|---|
| Objective | 1-3 sentences: what exists after this step that didn't before |
| Context | File paths AND pasted code snippets (5-20 lines) showing the patterns to follow. Not just paths — actual code |
| Tasks | Ordered, atomic tasks. Each maps to one todo item for the executor |
| Acceptance criteria | Measurable behaviors. These become the tests |
| Test commands | Exact commands to run, from the repo root |
| Edge cases | Known edge cases this step must handle or explicitly punt on |
| Out of scope | What the executor must NOT touch, even if tempting |
## Writing for a context-free executor
- Paste code snippets from your exploration into Context. "Follow the pattern in foo.rs" forces the executor to re-do exploration you already did.
- Use repo-relative paths from the project root. Never "the file we discussed."
- Name symbols exactly: `RetryPolicy::backoff`, not "the backoff logic."
- If a decision was made in discussion (X over Y), record the decision AND the one-line reason. The executor will face the same fork and must not re-litigate it.
- Write acceptance criteria as observable behavior ("returns 429 after 3 failed attempts"), not implementation ("uses a for loop"). Criteria that describe implementation produce tautological tests.
## Grounding (before the plan is done)
Plans rot when written from memory. Before finalizing each step plan:
1. `fs_grep` every symbol the plan references — confirm it exists and is spelled right.
2. `fs_read` the files listed in Context — confirm the pasted snippets are current.
3. Confirm the test commands actually exist (check `justfile`, `Makefile`, `package.json` scripts, CI config).
A plan referencing a function that doesn't exist fails the executor at the worst possible time: mid-implementation.
## Edge cases are a first-class section
For every step, enumerate the edge cases you can foresee: empty inputs, concurrent access, error paths, partial failures, migration/compat concerns. If an edge case belongs to a LATER step, write it in that step's plan now — not in a comment, not in your head. Executors are instructed to propagate newly discovered edge cases downstream; make their diff small by having the section exist.
## Anti-patterns
- "As discussed above" / "per our conversation" — the executor has no conversation
- File paths without pasted snippets in Context — forces re-exploration
- Acceptance criteria like "works correctly" — unmeasurable, untestable
- A step that depends on a later step — cycle; re-order or merge
- Omitting Out of scope — the executor will helpfully refactor things you didn't ask for
- Frontmatter without `depends_on` or `status` — breaks status queries and dependency checks
+83
View File
@@ -0,0 +1,83 @@
---
description: Adversarial review of implementation plans against executability, verifiability, and completeness standards. Verdict is OKAY or REJECT with line-referenced complaints. Grants read-only filesystem access for ground-truth checks.
enabled_tools: fs_read, fs_grep, fs_glob, fs_ls, fs_cat
---
You are reviewing an implementation plan BEFORE any code is written. You are the critic, not a co-author: your job is to find the ways this plan fails an executor who has zero conversation context, not to redesign the approach. A flaw caught here costs one plan edit; the same flaw caught mid-implementation costs a deviation, a handoff note, and possibly rework across steps.
The plan schema you are checking against is defined in the `plan-authoring` skill — load it alongside this one if it is not already loaded.
## Review checklist (in order)
### 1. Executability without context
Read the plan as if you know nothing but what is on the page.
- Does every referenced decision carry its rationale, or does it assume a conversation you can't see?
- Does Context contain pasted code snippets, or only file paths (which force re-exploration)?
- Are symbols named exactly? "The validation logic" is not a name.
### 2. Ground truth (verify, don't trust)
Plans are written from exploration that may be stale or wrong. Spot-check claims against the actual codebase:
- `fs_grep` for every function, type, and file the plan references. Flag anything that doesn't exist or is spelled differently.
- `fs_read` 1-2 of the pasted Context snippets at their claimed locations. Flag drift.
- Check that the Test commands exist (`justfile`, `Makefile`, `package.json`, CI config).
A plan that references phantom code is an automatic REJECT.
### 3. Verifiability
- Is every acceptance criterion a measurable, observable behavior? "Works correctly" and "is robust" are unmeasurable — flag them.
- Do the criteria describe behavior rather than implementation? Implementation-shaped criteria produce tautological tests.
- Can each criterion be checked by the listed Test commands, or is there a criterion with no way to verify it?
### 4. Dependencies and ordering
- Is `depends_on` present, acyclic, and complete? If the step uses an API introduced in step N, is N listed?
- Does anything in this step silently assume a LATER step's output? That's a cycle the frontmatter hides.
- Is the step independently verifiable — will it build and pass tests without later steps existing?
### 5. Scope and sizing
- Is Out of scope present and specific? Absent scope boundaries invite helpful refactoring.
- Is the step sized for one focused session (~1-5 files of meaningful change)? Flag steps hiding an "and then also".
- Do two steps touch the same code region without an ordering constraint between them?
### 6. Edge cases
- Is the Edge cases section present and non-empty (or explicitly "none foreseen — <reason>")?
- Think adversarially for 60 seconds: empty inputs, concurrency, error paths, partial failure, compat. Anything obvious the plan misses?
- If this step creates a new surface (API, config, schema), do DOWNSTREAM step plans account for it where they must?
## Verdict format
End with exactly one of:
```
PLAN_REVIEW: OKAY
<optional: 1-3 non-blocking observations>
```
```
PLAN_REVIEW: REJECT
Complaints:
1. <file>:<line or section> — <what is wrong> — <what would fix it>
2. ...
```
Every complaint must be actionable and point at a specific location. "The plan could be clearer" is noise; "steps/03-retry.md, Acceptance criteria #2 — 'handles errors gracefully' is unmeasurable — specify the expected behavior per error class" is signal.
## Scope discipline
- Review THE PLAN, not the design. If the approach is defensible, do not relitigate it because you'd have chosen differently. Flag design only when it is factually broken (races, missing dependency, contradicts the codebase).
- Do not rewrite the plan yourself. Complaints, not patches — the author owns the fix.
- Three strong complaints beat fifteen weak ones. If you have fifteen, the plan needs a rewrite, not a list: say so.
## Anti-patterns
- Approving without running a single ground-truth check — a syntax review, not a plan review
- REJECT for style or phrasing while missing a phantom-symbol reference
- Redesigning the author's approach in your complaints
- Vague complaints with no location and no fix direction
- Rubber-stamping a step with no acceptance criteria because "the tasks look reasonable"
@@ -0,0 +1,85 @@
---
description: End-to-end protocol for executing one step of a phased implementation plan - orient, staleness check, checklist, implement, edge-case sweep, verify, review, handoff, approval. Grants shell access for build/test commands.
enabled_tools: execute_command
---
You are executing ONE step of a phased implementation plan. Previous steps were executed in sessions you cannot see; later steps depend on what you do and document. The protocol below is ordered — do not skip phases, do not reorder them.
Companion skills: load `handoff-protocol` before Phase 1 (you must READ a handoff correctly) and keep it loaded for Phase 8 (you must WRITE one). Load `verification-gates` for Phase 6. The plan schema is defined in `plan-authoring`.
## Phase 1 - Orient
1. Read the previous step's handoff (`plans/handoffs/`, highest step number below yours). If none exists, you are step 1.
2. Read the current step plan (`plans/steps/`). Note its `depends_on` — confirm those steps' handoffs exist and report success. If a dependency failed or is missing, STOP and escalate via `user__ask`.
3. Read `plans/NOTES.md` for durable facts discovered by earlier steps.
4. Apply anything the previous handoff directed at your step (approved plan updates, warnings).
5. Set the plan's frontmatter `status: in-progress`.
## Phase 2 - Staleness check (BEFORE any edit)
The plan was written before steps 1..N-1 changed the codebase. Verify its assumptions still hold:
- Grep the symbols the plan references — do they still exist, with the claimed signatures?
- Read the plan's Context snippets at their claimed locations — has the code drifted?
- Confirm the Test commands still work.
Discrepancies are deviations — handle them via Phase 5's protocol BEFORE implementing. Executing a stale plan literally is the primary failure mode of phased work.
## Phase 3 - Checklist
`todo__init` with the step objective, then one `todo__add` per task in the plan's Tasks section, in order. Append the protocol's own gates as todos: edge-case sweep, verify, review, handoff. Mark items done with `todo__done` as you go — never batch. The checklist is what survives context compression; keep it truthful.
When you spawn an agent whose session you may need to resume, embed its session_id in the corresponding todo item text (`"Implement task 3 (coder ses_abc123)"`). If your context gets compressed mid-step, the plan repo tells you WHAT the step is and the todo list tells you WHERE you are and WHICH sessions to resume — re-orient from those, not from the summary's recollection.
## Phase 4 - Implement
- Implement ONLY what the plan's Tasks and Objective ask. Out of scope means out of scope.
- Follow the patterns pasted in the plan's Context. When plan and current codebase disagree, the codebase wins — record the deviation.
- Write tests from the plan's Acceptance criteria, not from your implementation. Criteria-first tests catch what tautological tests cannot.
- While in the code, note (do not fix) anything the planning exploration missed — feed it to Phase 5.
## Phase 5 - Edge-case sweep and deviations
**Edge cases.** For each edge case you discovered: if it belongs to THIS step, handle it (or punt explicitly in the handoff with a reason). If it belongs to a LATER step, check that step's plan — if the plan already covers it, done; if not, add it to that plan's Edge cases section and record the addition in your handoff.
**Deviations.** Classify each:
| Class | Definition | Action |
|---|---|---|
| Minor | Same objective and scope, mechanics differ (renamed symbol, moved file, extra helper) | Resolve it, document in handoff |
| Major | Changes scope, approach, interfaces, or invalidates a later step's assumptions | Do NOT silently proceed. Either escalate via `user__ask`, or write a proposed downstream-plan diff into the handoff per `handoff-protocol` |
Never rewrite a later step's Objective, Tasks, or Out of scope directly — edge-case annotations are the only direct downstream edit you may make.
## Phase 6 - Verify (order matters)
1. Formatter (if configured) — format BEFORE collecting evidence, so evidence reflects final code.
2. Linter (if configured) — fix findings your change introduced.
3. Build/typecheck — exit code 0.
4. FULL test suite — not just your new tests; regressions in untouched code are your problem if your change caused them.
Capture commands and exit codes verbatim — they go in the handoff as evidence. Pre-existing failures: note explicitly, don't fix, don't hide. Apply the 3-strike rule: after 3 failed fix attempts, stop, revert to working state, escalate.
## Phase 7 - Review
Self-review the diff with `code-review` + `ai-slop-remover` loaded. For broad steps (5+ files or crossing architectural boundaries), request an independent pass (`code-reviewer` agent) instead. Fix blockers; re-run Phase 6 after any fix.
## Phase 8 - Handoff
Gate: every todo is either done or explicitly deferred with a reason. No silent drops.
Write the handoff per `handoff-protocol` — schema, pasted evidence, deviations, downstream updates, notes for the next step. Append durable, step-independent facts to `plans/NOTES.md`. Set the plan's frontmatter `status: complete`.
## Phase 9 - User approval
Present: what was done, deviations, downstream plan changes (made or proposed), evidence summary, handoff location. Then STOP — do not begin the next step. If the user requests changes, address them, re-run Phase 6, update the handoff, and present again.
## Anti-patterns
- Editing code before the staleness check — the primary source of mid-step surprises
- Implementing "while I'm here" improvements outside the plan's scope
- Tests derived from the implementation instead of the acceptance criteria
- Collecting build/test evidence BEFORE formatting/linting, then shipping different bytes
- Running only your new tests and claiming "tests pass"
- Silently absorbing a major deviation instead of escalating or proposing a plan diff
- Rewriting downstream plan scope directly instead of proposing per `handoff-protocol`
- Starting the next step without user approval
+1
View File
@@ -91,6 +91,7 @@ enabled_tools: null # Which tools to enable by default.
# Example (comma-separated form): # Example (comma-separated form):
# enabled_tools: fs,web_search_coyote # enabled_tools: fs,web_search_coyote
visible_tools: # Which tools are visible to be compiled (and are thus able to be defined in 'enabled_tools') visible_tools: # Which tools are visible to be compiled (and are thus able to be defined in 'enabled_tools')
# - ast_grep.sh
# - demo_py.py # - demo_py.py
# - demo_sh.sh # - demo_sh.sh
# - demo_ts.ts # - demo_ts.ts
+7
View File
@@ -133,6 +133,13 @@ impl MessageContent {
} }
} }
pub fn as_text(&self) -> Option<&str> {
match self {
MessageContent::Text(text) => Some(text),
_ => None,
}
}
pub fn merge_prompt(&mut self, replace_fn: impl Fn(&str) -> String) { pub fn merge_prompt(&mut self, replace_fn: impl Fn(&str) -> String) {
match self { match self {
MessageContent::Text(text) => *text = replace_fn(text), MessageContent::Text(text) => *text = replace_fn(text),
+9
View File
@@ -118,6 +118,14 @@ pub struct MemoryFrontmatter {
pub description: Option<String>, pub description: Option<String>,
#[serde(default, rename = "type")] #[serde(default, rename = "type")]
pub kind: Option<String>, pub kind: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub created: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub updated: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub superseded_by: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub expires: Option<String>,
} }
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
@@ -545,6 +553,7 @@ mod tests {
name: "test".into(), name: "test".into(),
description: Some("a test".into()), description: Some("a test".into()),
kind: Some("user".into()), kind: Some("user".into()),
..Default::default()
}, },
body: "Hello world\nmore text".into(), body: "Hello world\nmore text".into(),
}; };
+3 -1
View File
@@ -135,6 +135,7 @@ const RAGS_DIR_NAME: &str = "rags";
const FUNCTIONS_DIR_NAME: &str = "functions"; const FUNCTIONS_DIR_NAME: &str = "functions";
const FUNCTIONS_BIN_DIR_NAME: &str = "bin"; const FUNCTIONS_BIN_DIR_NAME: &str = "bin";
const AGENTS_DIR_NAME: &str = "agents"; const AGENTS_DIR_NAME: &str = "agents";
const REPL_HISTORY_DIR_NAME: &str = "repl-history";
const GLOBAL_TOOLS_DIR_NAME: &str = "tools"; const GLOBAL_TOOLS_DIR_NAME: &str = "tools";
const GLOBAL_TOOLS_UTILS_DIR_NAME: &str = "utils"; const GLOBAL_TOOLS_UTILS_DIR_NAME: &str = "utils";
const BASH_PROMPT_UTILS_FILE_NAME: &str = "prompt-utils.sh"; const BASH_PROMPT_UTILS_FILE_NAME: &str = "prompt-utils.sh";
@@ -150,7 +151,7 @@ const SBX_VAULT_MIXINS_DIR_NAME: &str = "sbx-vault-mixins";
const SBX_MIXIN_KITS_DIR_NAME: &str = "sbx-mixin-kits"; const SBX_MIXIN_KITS_DIR_NAME: &str = "sbx-mixin-kits";
const GIT_DIR_NAME: &str = ".git"; const GIT_DIR_NAME: &str = ".git";
const GITIGNORE_FILE_NAME: &str = ".gitignore"; const GITIGNORE_FILE_NAME: &str = ".gitignore";
const DEFAULT_VISIBLE_TOOLS: [&str; 18] = [ const DEFAULT_VISIBLE_TOOLS: [&str; 19] = [
"execute_command.sh", "execute_command.sh",
"execute_py_code.py", "execute_py_code.py",
"execute_sql_code.sh", "execute_sql_code.sh",
@@ -164,6 +165,7 @@ const DEFAULT_VISIBLE_TOOLS: [&str; 18] = [
"fs_read.sh", "fs_read.sh",
"fs_rm.sh", "fs_rm.sh",
"fs_write.sh", "fs_write.sh",
"ast_grep.sh",
"get_current_time.sh", "get_current_time.sh",
"get_current_weather.sh", "get_current_weather.sh",
"search_wikipedia.sh", "search_wikipedia.sh",
+16
View File
@@ -8,6 +8,8 @@ use super::{
SKILLS_DIR_NAME, WORKSPACE_MEMORY_DIR_NAME, SKILLS_DIR_NAME, WORKSPACE_MEMORY_DIR_NAME,
}; };
use crate::client::ProviderModels; use crate::client::ProviderModels;
use crate::config::REPL_HISTORY_DIR_NAME;
use crate::config::session::Session;
use crate::utils::{get_env_name, list_file_names, normalize_env_name}; use crate::utils::{get_env_name, list_file_names, normalize_env_name};
use anyhow::{Context, Result, anyhow, bail}; use anyhow::{Context, Result, anyhow, bail};
@@ -320,6 +322,20 @@ pub fn workspace_memory_dir_for(workspace_root: &Path) -> PathBuf {
.join(MEMORY_DIR_NAME) .join(MEMORY_DIR_NAME)
} }
pub fn repl_history_dir() -> PathBuf {
cache_path().join(REPL_HISTORY_DIR_NAME)
}
pub fn repl_history_file(session: &Option<Session>) -> PathBuf {
let history_key = if let Some(session) = &session {
format!("session_{}", session.name().replace('/', "_"))
} else {
"default".to_string()
};
repl_history_dir().join(history_key)
}
pub fn log_config() -> Result<(LevelFilter, Option<PathBuf>)> { pub fn log_config() -> Result<(LevelFilter, Option<PathBuf>)> {
let log_level = env::var(get_env_name("log_level")) let log_level = env::var(get_env_name("log_level"))
.ok() .ok()
+12 -2
View File
@@ -18,10 +18,16 @@ pub(crate) const DEFAULT_MEMORY_INSTRUCTIONS: &str = indoc! {"
- `memory__read(name)`: Read a specific drill file's full content. - `memory__read(name)`: Read a specific drill file's full content.
- `memory__write(name, content, scope)`: Create or replace a drill file (scope: 'global' | 'workspace'). - `memory__write(name, content, scope)`: Create or replace a drill file (scope: 'global' | 'workspace').
The MEMORY.md index is appended automatically; do not also update the index by hand. The MEMORY.md index is appended automatically; do not also update the index by hand.
Optional `superseded_by` / `expires` (YYYY-MM-DD) mark a memory as stale for later cleanup.
- `memory__rename(name, new_name, scope)`: Rename a drill file. Its index entry and every
[[wikilink]] to it are rewritten automatically.
- `memory__delete(name, scope)`: Delete a drill file and its index entry. Reports any
[[wikilinks]] left dangling in other files.
- `memory__edit_index(scope, content)`: Replace the entire MEMORY.md at the given scope. - `memory__edit_index(scope, content)`: Replace the entire MEMORY.md at the given scope.
Use this to add always-on facts, reorganize, prune stale entries, or fix descriptions. Use this to add always-on facts, reorganize, prune stale entries, or fix descriptions.
- `memory__list()`: See all known drill files and their metadata. - `memory__list()`: See all known drill files and their metadata.
- `memory__lint()`: Health-check memory for orphans, broken links, oversized files. - `memory__lint()`: Health-check memory for orphans, broken links, oversized files,
stale (superseded/expired) files, and index descriptions that drifted from the files.
RULES: RULES:
- Every interaction has two outputs: your answer AND any memory updates the conversation warrants. - Every interaction has two outputs: your answer AND any memory updates the conversation warrants.
@@ -29,7 +35,11 @@ pub(crate) const DEFAULT_MEMORY_INSTRUCTIONS: &str = indoc! {"
- All MEMORY.md edits MUST go through `memory__edit_index`. NEVER use `fs_write`, `fs_patch`, - All MEMORY.md edits MUST go through `memory__edit_index`. NEVER use `fs_write`, `fs_patch`,
or any other generic file tool on MEMORY.md — Coyote manages its location and a stray or any other generic file tool on MEMORY.md — Coyote manages its location and a stray
MEMORY.md outside the managed path is invisible to memory. MEMORY.md outside the managed path is invisible to memory.
- All drill files MUST go through `memory__write`. The index updates itself. - All drill files MUST go through `memory__write`. The index updates itself. Renames and
deletions MUST go through `memory__rename` / `memory__delete` so links stay intact.
- When a fact becomes outdated, update it in place, delete it, or mark the old file with
`superseded_by`/`expires` so `memory__lint` flags it later. Never leave contradictory
memories side by side.
- Use [[wikilink]] notation in memory files to reference other memories by their `name:` slug. - Use [[wikilink]] notation in memory files to reference other memories by their `name:` slug.
- NEVER write secrets, credentials, or API keys to memory — memory is plaintext on disk. - NEVER write secrets, credentials, or API keys to memory — memory is plaintext on disk.
Use coyote's Vault for secrets. Use coyote's Vault for secrets.
+39
View File
@@ -5116,6 +5116,45 @@ mod tests {
assert!(paths::skill_file("frontend-ui-ux").exists()); assert!(paths::skill_file("frontend-ui-ux").exists());
} }
#[test]
#[serial]
fn bundled_graph_agents_parse_and_validate() {
use crate::graph::GraphParser;
use crate::graph::validator::GraphValidator;
let _guard = TestConfigDirGuard::new();
Agent::install_builtin_agents(false).unwrap();
Skill::install_builtin_skills(false).unwrap();
let mut checked = Vec::new();
for entry in std::fs::read_dir(paths::agents_data_dir()).unwrap() {
let dir = entry.unwrap().path();
let graph_path = dir.join("graph.yaml");
if !graph_path.exists() {
continue;
}
let name = dir.file_name().unwrap().to_string_lossy().to_string();
let graph = GraphParser::new(&dir)
.load_from_file(&graph_path)
.unwrap_or_else(|e| panic!("graph.yaml for '{name}' failed to parse: {e}"));
let result = GraphValidator::new(&dir).validate(&graph);
assert!(
result.errors.is_empty(),
"graph.yaml for '{name}' failed validation: {:#?}",
result.errors
);
checked.push(name);
}
checked.sort();
for expected in ["coder", "librarian", "step-runner"] {
assert!(
checked.iter().any(|n| n == expected),
"expected bundled graph agent '{expected}' to be checked; found {checked:?}"
);
}
}
#[test] #[test]
#[serial] #[serial]
fn install_functions_force_preserves_user_mcp_json() { fn install_functions_force_preserves_user_mcp_json() {
+8
View File
@@ -163,6 +163,14 @@ impl Session {
self.messages.is_empty() && self.compressed_messages.is_empty() self.messages.is_empty() && self.compressed_messages.is_empty()
} }
pub fn messages(&self) -> &[Message] {
&self.messages
}
pub fn compressed_messages(&self) -> &[Message] {
&self.compressed_messages
}
pub fn name(&self) -> &str { pub fn name(&self) -> &str {
&self.name &self.name
} }
+721 -46
View File
@@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
use std::{env, fs}; use std::{env, fs};
use anyhow::{Context, Result, anyhow, bail}; use anyhow::{Context, Result, anyhow, bail};
use chrono::Local;
use indexmap::IndexMap; use indexmap::IndexMap;
use serde_json::{Value, json}; use serde_json::{Value, json};
@@ -97,6 +98,32 @@ pub fn memory_function_declarations() -> Vec<FunctionDeclaration> {
..Default::default() ..Default::default()
}, },
), ),
(
"superseded_by".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some(
"Optional `name:` slug of the memory that replaces this one. \
`memory__lint` flags superseded files for cleanup. Omitting this \
on overwrite clears any previous value."
.into(),
),
..Default::default()
},
),
(
"expires".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some(
"Optional ISO date (YYYY-MM-DD) after which this memory is stale. \
`memory__lint` flags expired files. Omitting this on overwrite \
clears any previous value."
.into(),
),
..Default::default()
},
),
])), ])),
required: Some(vec![ required: Some(vec![
"name".to_string(), "name".to_string(),
@@ -164,6 +191,90 @@ pub fn memory_function_declarations() -> Vec<FunctionDeclaration> {
}, },
agent: false, agent: false,
}, },
FunctionDeclaration {
name: format!("{MEMORY_FUNCTION_PREFIX}rename"),
description:
"Rename a memory file. Its MEMORY.md index entry and every [[wikilink]] to it in \
other memory files are rewritten automatically."
.to_string(),
parameters: JsonSchema {
type_value: Some("object".to_string()),
properties: Some(IndexMap::from([
(
"name".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some("Current `name:` slug of the memory file".into()),
..Default::default()
},
),
(
"new_name".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some(
"New kebab-case slug for the file (no extension)".into(),
),
..Default::default()
},
),
(
"scope".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some(
"Scope of the file: 'global' (user-level) or 'workspace' (project-level)"
.into(),
),
..Default::default()
},
),
])),
required: Some(vec![
"name".to_string(),
"new_name".to_string(),
"scope".to_string(),
]),
..Default::default()
},
agent: false,
},
FunctionDeclaration {
name: format!("{MEMORY_FUNCTION_PREFIX}delete"),
description:
"Delete a memory file and remove its MEMORY.md index entry. Reports any \
[[wikilinks]] in other memory files left dangling by the deletion."
.to_string(),
parameters: JsonSchema {
type_value: Some("object".to_string()),
properties: Some(IndexMap::from([
(
"name".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some(
"The `name:` slug of the memory file to delete".into(),
),
..Default::default()
},
),
(
"scope".to_string(),
JsonSchema {
type_value: Some("string".to_string()),
description: Some(
"Scope of the file: 'global' (user-level) or 'workspace' (project-level)"
.into(),
),
..Default::default()
},
),
])),
required: Some(vec!["name".to_string(), "scope".to_string()]),
..Default::default()
},
agent: false,
},
] ]
} }
@@ -214,47 +325,13 @@ pub fn handle_memory_tool(ctx: &mut RequestContext, cmd_name: &str, args: &Value
"workspace": store.workspace.as_ref().map(workspace_label), "workspace": store.workspace.as_ref().map(workspace_label),
})) }))
} }
"write" => { "write" => write_memory(&store, &cwd, args),
let name = arg_str(args, "name")?; "rename" => rename_memory(&store, &cwd, args),
let description = arg_str(args, "description")?; "delete" => delete_memory(&store, &cwd, args),
let content = arg_str(args, "content")?;
let scope = arg_str(args, "scope")?;
let kind = args.get("type").and_then(Value::as_str).map(String::from);
let target_dir = match scope.as_str() {
"global" => paths::global_memory_dir(),
"workspace" => workspace_write_dir(&store, &cwd)?,
other => bail!("unknown scope '{}': use 'global' or 'workspace'", other),
};
let file = MemoryFile {
path: target_dir.join(format!("{name}.md")),
frontmatter: MemoryFrontmatter {
name: name.clone(),
description: Some(description.clone()),
kind,
},
body: content,
};
file.save()?;
let index_path = target_dir.join("MEMORY.md");
let index_updated = ensure_index_entry(&index_path, &name, &description)?;
Ok(json!({
"status": "ok",
"path": file.path.display().to_string(),
"index_path": index_path.display().to_string(),
"index_updated": index_updated,
}))
}
"edit_index" => { "edit_index" => {
let scope = arg_str(args, "scope")?; let scope = arg_str(args, "scope")?;
let content = arg_str(args, "content")?; let content = arg_str(args, "content")?;
let target_dir = match scope.as_str() { let target_dir = scope_dir(&store, &cwd, &scope)?;
"global" => paths::global_memory_dir(),
"workspace" => workspace_write_dir(&store, &cwd)?,
other => bail!("unknown scope '{}': use 'global' or 'workspace'", other),
};
let index_path = write_memory_index(&target_dir, &content)?; let index_path = write_memory_index(&target_dir, &content)?;
Ok(json!({ Ok(json!({
@@ -267,19 +344,229 @@ pub fn handle_memory_tool(ctx: &mut RequestContext, cmd_name: &str, args: &Value
} }
} }
fn write_memory(store: &MemoryStore, cwd: &Path, args: &Value) -> Result<Value> {
let name = arg_str(args, "name")?;
let description = arg_str(args, "description")?;
let content = arg_str(args, "content")?;
let scope = arg_str(args, "scope")?;
let kind = args.get("type").and_then(Value::as_str).map(String::from);
let superseded_by = args
.get("superseded_by")
.and_then(Value::as_str)
.map(String::from);
let expires = args
.get("expires")
.and_then(Value::as_str)
.map(String::from);
let target_dir = scope_dir(store, cwd, &scope)?;
let path = target_dir.join(format!("{name}.md"));
let previous = if path.exists() {
MemoryFile::load(&path).ok()
} else {
None
};
let today = today_string();
let created = previous
.as_ref()
.and_then(|p| p.frontmatter.created.clone())
.unwrap_or_else(|| today.clone());
let file = MemoryFile {
path,
frontmatter: MemoryFrontmatter {
name: name.clone(),
description: Some(description.clone()),
kind,
created: Some(created),
updated: Some(today),
superseded_by,
expires,
},
body: content,
};
file.save()?;
let index_path = target_dir.join("MEMORY.md");
let index_updated = ensure_index_entry(&index_path, &name, &description)?;
Ok(json!({
"status": "ok",
"path": file.path.display().to_string(),
"index_path": index_path.display().to_string(),
"index_updated": index_updated,
"replaced": previous.is_some(),
"previous_description": previous.and_then(|p| p.frontmatter.description),
}))
}
fn rename_memory(store: &MemoryStore, cwd: &Path, args: &Value) -> Result<Value> {
let name = arg_str(args, "name")?;
let new_name = arg_str(args, "new_name")?;
let scope = arg_str(args, "scope")?;
if new_name.is_empty()
|| !new_name
.chars()
.all(|c| c.is_alphanumeric() || c == '-' || c == '_')
{
bail!(
"invalid new_name '{}': use a kebab-case slug (alphanumeric, hyphens, underscores)",
new_name
);
}
if name == new_name {
bail!("new_name matches the current name");
}
let target_dir = scope_dir(store, cwd, &scope)?;
let files = store.list_files()?;
let file = files
.iter()
.find(|f| f.path.starts_with(&target_dir) && f.frontmatter.name == name)
.ok_or_else(|| anyhow!("memory file '{}' not found in scope '{}'", name, scope))?
.clone();
if target_dir.join(format!("{new_name}.md")).exists()
|| files
.iter()
.any(|f| f.path.starts_with(&target_dir) && f.frontmatter.name == new_name)
{
bail!(
"memory file '{}' already exists in scope '{}'",
new_name,
scope
);
}
let needle = format!("[[{name}]]");
let replacement = format!("[[{new_name}]]");
let mut renamed = file.clone();
renamed.path = target_dir.join(format!("{new_name}.md"));
renamed.frontmatter.name = new_name.clone();
renamed.frontmatter.updated = Some(today_string());
renamed.body = renamed.body.replace(&needle, &replacement);
renamed.save()?;
fs::remove_file(&file.path).with_context(|| format!("remove {}", file.path.display()))?;
let mut rewritten = Vec::new();
for f in &files {
if f.path == file.path || !f.body.contains(&needle) {
continue;
}
let mut updated = f.clone();
updated.body = updated.body.replace(&needle, &replacement);
updated.save()?;
rewritten.push(f.frontmatter.name.clone());
}
// Own-scope index: rewrite the wikilink, drop any leftover references to the
// old name, and guarantee the new name is present.
let index_path = target_dir.join("MEMORY.md");
if let Ok(existing) = fs::read_to_string(&index_path)
&& existing.contains(&needle)
{
fs::write(&index_path, existing.replace(&needle, &replacement))?;
}
remove_index_entry(&index_path, &name)?;
let description = renamed.frontmatter.description.clone().unwrap_or_default();
ensure_index_entry(&index_path, &new_name, &description)?;
// Other indexes (other scope's MEMORY.md, lite COYOTE.md): rewrite wikilinks only.
for other_index in other_index_paths(store, &target_dir) {
if let Ok(existing) = fs::read_to_string(&other_index)
&& existing.contains(&needle)
{
fs::write(&other_index, existing.replace(&needle, &replacement))?;
}
}
Ok(json!({
"status": "ok",
"old_path": file.path.display().to_string(),
"new_path": renamed.path.display().to_string(),
"rewritten_references": rewritten,
}))
}
fn delete_memory(store: &MemoryStore, cwd: &Path, args: &Value) -> Result<Value> {
let name = arg_str(args, "name")?;
let scope = arg_str(args, "scope")?;
let target_dir = scope_dir(store, cwd, &scope)?;
let files = store.list_files()?;
let file = files
.iter()
.find(|f| f.path.starts_with(&target_dir) && f.frontmatter.name == name)
.ok_or_else(|| anyhow!("memory file '{}' not found in scope '{}'", name, scope))?;
let deleted_path = file.path.clone();
fs::remove_file(&deleted_path).with_context(|| format!("delete {}", deleted_path.display()))?;
let index_path = target_dir.join("MEMORY.md");
let index_updated = remove_index_entry(&index_path, &name)?;
let dangling: Vec<String> = files
.iter()
.filter(|f| f.path != deleted_path && extract_wikilinks(&f.body).iter().any(|l| l == &name))
.map(|f| f.frontmatter.name.clone())
.collect();
Ok(json!({
"status": "ok",
"deleted_path": deleted_path.display().to_string(),
"index_updated": index_updated,
"dangling_references": dangling,
}))
}
fn scope_dir(store: &MemoryStore, cwd: &Path, scope: &str) -> Result<PathBuf> {
match scope {
"global" => Ok(paths::global_memory_dir()),
"workspace" => workspace_write_dir(store, cwd),
other => bail!("unknown scope '{}': use 'global' or 'workspace'", other),
}
}
fn today_string() -> String {
Local::now().format("%Y-%m-%d").to_string()
}
fn other_index_paths(store: &MemoryStore, own_dir: &Path) -> Vec<PathBuf> {
let mut out = Vec::new();
let global_index = store.global_dir.join("MEMORY.md");
if store.global_dir.as_path() != own_dir && global_index.exists() {
out.push(global_index);
}
match &store.workspace {
Some(WorkspaceMemory::Structured { dir, .. }) => {
let index = dir.join("MEMORY.md");
if dir.as_path() != own_dir && index.exists() {
out.push(index);
}
}
Some(WorkspaceMemory::Lite { file, .. }) if file.exists() => {
out.push(file.clone());
}
_ => {}
}
out
}
fn write_memory_index(target_dir: &Path, content: &str) -> Result<PathBuf> { fn write_memory_index(target_dir: &Path, content: &str) -> Result<PathBuf> {
fs::create_dir_all(target_dir)?; fs::create_dir_all(target_dir)?;
let index_path = target_dir.join("MEMORY.md"); let index_path = target_dir.join("MEMORY.md");
fs::write(&index_path, content)?; fs::write(&index_path, content)?;
Ok(index_path) Ok(index_path)
} }
fn ensure_index_entry(index_path: &Path, name: &str, description: &str) -> Result<bool> { fn ensure_index_entry(index_path: &Path, name: &str, description: &str) -> Result<bool> {
let existing = fs::read_to_string(index_path).unwrap_or_default(); let existing = fs::read_to_string(index_path).unwrap_or_default();
let already_referenced = if index_references(&existing, name) {
existing.contains(&format!("[[{name}]]")) || existing.contains(&format!("{name}.md"));
if already_referenced {
return Ok(false); return Ok(false);
} }
@@ -297,6 +584,40 @@ fn ensure_index_entry(index_path: &Path, name: &str, description: &str) -> Resul
} }
fs::write(index_path, new_content)?; fs::write(index_path, new_content)?;
Ok(true)
}
fn line_references(line: &str, name: &str) -> bool {
let file_name = format!("{name}.md");
line.split(|c: char| !(c.is_alphanumeric() || c == '-' || c == '_' || c == '.'))
.any(|token| token == file_name || token.trim_matches('.') == name)
}
fn index_references(index: &str, name: &str) -> bool {
index.lines().any(|line| line_references(line, name))
}
fn remove_index_entry(index_path: &Path, name: &str) -> Result<bool> {
let Ok(existing) = fs::read_to_string(index_path) else {
return Ok(false);
};
let kept: Vec<&str> = existing
.lines()
.filter(|line| !line_references(line, name))
.collect();
let mut new_content = kept.join("\n");
if existing.ends_with('\n') && !new_content.is_empty() {
new_content.push('\n');
}
if new_content == existing {
return Ok(false);
}
fs::write(index_path, new_content)?;
Ok(true) Ok(true)
} }
@@ -350,9 +671,11 @@ fn workspace_label(w: &WorkspaceMemory) -> Value {
fn lint_memory(store: &MemoryStore) -> Result<Value> { fn lint_memory(store: &MemoryStore) -> Result<Value> {
let files = store.list_files()?; let files = store.list_files()?;
let names: HashSet<&str> = files.iter().map(|f| f.frontmatter.name.as_str()).collect(); let names: HashSet<&str> = files.iter().map(|f| f.frontmatter.name.as_str()).collect();
let today = today_string();
let mut oversized = Vec::new(); let mut oversized = Vec::new();
let mut broken_links = Vec::new(); let mut broken_links = Vec::new();
let mut stale = Vec::new();
for f in &files { for f in &files {
if f.char_len() > PER_FILE_SOFT_CAP { if f.char_len() > PER_FILE_SOFT_CAP {
oversized.push(json!({"name": &f.frontmatter.name, "chars": f.char_len()})); oversized.push(json!({"name": &f.frontmatter.name, "chars": f.char_len()}));
@@ -362,16 +685,54 @@ fn lint_memory(store: &MemoryStore) -> Result<Value> {
broken_links.push(json!({"from": &f.frontmatter.name, "to": link})); broken_links.push(json!({"from": &f.frontmatter.name, "to": link}));
} }
} }
if let Some(target) = &f.frontmatter.superseded_by {
stale.push(json!({
"name": &f.frontmatter.name,
"reason": "superseded",
"superseded_by": target,
"target_exists": names.contains(target.as_str()),
}));
}
if let Some(expires) = &f.frontmatter.expires
&& expires.as_str() < today.as_str()
{
stale.push(json!({
"name": &f.frontmatter.name,
"reason": "expired",
"expires": expires,
}));
}
} }
let index_content = store let global_index = store.load_global_index()?.unwrap_or_default();
.load_global_index()? let workspace_index = store
.or_else(|| store.load_workspace_index().ok().flatten()) .load_workspace_index()
.ok()
.flatten()
.unwrap_or_default(); .unwrap_or_default();
let mut orphans = Vec::new(); let mut orphans = Vec::new();
let mut description_drift = Vec::new();
for f in &files { for f in &files {
if !index_content.contains(&f.frontmatter.name) { let index = if f.path.starts_with(&store.global_dir) {
&global_index
} else {
&workspace_index
};
if !index_references(index, &f.frontmatter.name) {
orphans.push(f.frontmatter.name.clone()); orphans.push(f.frontmatter.name.clone());
} else if let (Some(index_desc), Some(file_desc)) = (
index_description(index, &f.frontmatter.name),
f.frontmatter.description.as_deref(),
) && index_desc != file_desc
{
description_drift.push(json!({
"name": &f.frontmatter.name,
"index_description": index_desc,
"file_description": file_desc,
}));
} }
} }
@@ -380,13 +741,26 @@ fn lint_memory(store: &MemoryStore) -> Result<Value> {
"oversized": oversized, "oversized": oversized,
"broken_wikilinks": broken_links, "broken_wikilinks": broken_links,
"orphans": orphans, "orphans": orphans,
"stale": stale,
"description_drift": description_drift,
})) }))
} }
fn index_description(index: &str, name: &str) -> Option<String> {
let marker = format!("[[{name}]]");
index.lines().find_map(|line| {
let pos = line.find(&marker)?;
let rest = line[pos + marker.len()..].trim_start();
let desc = rest.strip_prefix(':')?.trim();
(!desc.is_empty()).then(|| desc.to_string())
})
}
fn extract_wikilinks(body: &str) -> Vec<String> { fn extract_wikilinks(body: &str) -> Vec<String> {
let mut out = Vec::new(); let mut out = Vec::new();
let bytes = body.as_bytes(); let bytes = body.as_bytes();
let mut i = 0; let mut i = 0;
while i + 1 < bytes.len() { while i + 1 < bytes.len() {
if bytes[i] == b'[' if bytes[i] == b'['
&& bytes[i + 1] == b'[' && bytes[i + 1] == b'['
@@ -676,4 +1050,305 @@ mod tests {
let _ = fs::remove_dir_all(&root); let _ = fs::remove_dir_all(&root);
} }
#[test]
fn line_references_requires_exact_token_match() {
assert!(line_references("- [[auth]]: description", "auth"));
assert!(line_references("- auth.md is here", "auth"));
assert!(line_references("- referenced", "referenced"));
assert!(line_references("see auth.", "auth"));
assert!(!line_references("- [[auth-flow]]: description", "auth"));
assert!(!line_references("- oauth.md legacy", "auth"));
assert!(!line_references("- preauth notes", "auth"));
}
#[test]
fn remove_index_entry_drops_only_matching_lines() {
let root = temp_root("index_remove");
let index = root.join("MEMORY.md");
fs::write(
&index,
"# Memory Index\n\n- [[keep]]: stays\n- [[gone]]: removed\n",
)
.unwrap();
assert!(remove_index_entry(&index, "gone").unwrap());
let content = fs::read_to_string(&index).unwrap();
assert!(content.contains("[[keep]]"));
assert!(!content.contains("[[gone]]"));
assert!(!remove_index_entry(&index, "gone").unwrap());
let _ = fs::remove_dir_all(&root);
}
#[test]
fn lint_checks_orphans_against_own_scope_index() {
let root = temp_root("lint_scopes");
let global = root.join("global");
fs::create_dir_all(&global).unwrap();
fs::write(global.join("MEMORY.md"), "- [[global-note]]: g\n").unwrap();
fs::write(
global.join("global-note.md"),
"---\nname: global-note\n---\ng\n",
)
.unwrap();
let workspace = root.join("ws");
let structured = workspace.join(".coyote").join("memory");
fs::create_dir_all(&structured).unwrap();
fs::write(structured.join("MEMORY.md"), "- [[ws-note]]: w\n").unwrap();
fs::write(
structured.join("ws-note.md"),
"---\nname: ws-note\n---\nw\n",
)
.unwrap();
let store = MemoryStore {
global_dir: global,
workspace: discover_workspace_memory(&workspace),
};
let report = lint_memory(&store).unwrap();
assert!(
report["orphans"].as_array().unwrap().is_empty(),
"expected no orphans, got: {report}"
);
let _ = fs::remove_dir_all(&root);
}
#[test]
fn lint_flags_stale_and_description_drift() {
let root = temp_root("lint_stale");
let workspace = root.join("ws");
let structured = workspace.join(".coyote").join("memory");
fs::create_dir_all(&structured).unwrap();
fs::write(
structured.join("MEMORY.md"),
"- [[old-plan]]: old\n- [[bygone]]: e\n- [[drifted]]: index says this\n",
)
.unwrap();
fs::write(
structured.join("old-plan.md"),
"---\nname: old-plan\nsuperseded_by: new-plan\n---\nx\n",
)
.unwrap();
fs::write(
structured.join("bygone.md"),
"---\nname: bygone\nexpires: 2000-01-01\n---\nx\n",
)
.unwrap();
fs::write(
structured.join("drifted.md"),
"---\nname: drifted\ndescription: file says that\n---\nx\n",
)
.unwrap();
let store = MemoryStore {
global_dir: root.join("nonexistent_global"),
workspace: discover_workspace_memory(&workspace),
};
let report = lint_memory(&store).unwrap();
let stale = report["stale"].as_array().unwrap();
let reasons: Vec<(&str, &str)> = stale
.iter()
.map(|v| (v["name"].as_str().unwrap(), v["reason"].as_str().unwrap()))
.collect();
assert!(reasons.contains(&("old-plan", "superseded")));
assert!(reasons.contains(&("bygone", "expired")));
let superseded = stale.iter().find(|v| v["name"] == "old-plan").unwrap();
assert_eq!(superseded["target_exists"], false);
let drift = report["description_drift"].as_array().unwrap();
assert_eq!(drift.len(), 1);
assert_eq!(drift[0]["name"], "drifted");
let _ = fs::remove_dir_all(&root);
}
#[test]
fn delete_memory_removes_file_index_entry_and_reports_dangling() {
let root = temp_root("delete");
let workspace = root.join("ws");
let structured = workspace.join(".coyote").join("memory");
fs::create_dir_all(&structured).unwrap();
fs::write(
structured.join("MEMORY.md"),
"# Memory Index\n\n- [[doomed]]: bye\n- [[linker]]: links\n",
)
.unwrap();
fs::write(
structured.join("doomed.md"),
"---\nname: doomed\n---\nbye\n",
)
.unwrap();
fs::write(
structured.join("linker.md"),
"---\nname: linker\n---\nsee [[doomed]]\n",
)
.unwrap();
let store = MemoryStore {
global_dir: root.join("g"),
workspace: discover_workspace_memory(&workspace),
};
let args = json!({"name": "doomed", "scope": "workspace"});
let result = delete_memory(&store, &workspace, &args).unwrap();
assert_eq!(result["status"], "ok");
assert_eq!(result["index_updated"], true);
assert!(!structured.join("doomed.md").exists());
let index = fs::read_to_string(structured.join("MEMORY.md")).unwrap();
assert!(!index.contains("doomed"));
assert!(index.contains("[[linker]]"));
assert_eq!(
result["dangling_references"].as_array().unwrap(),
&vec![json!("linker")]
);
let _ = fs::remove_dir_all(&root);
}
#[test]
fn rename_memory_moves_file_and_rewrites_references() {
let root = temp_root("rename");
let workspace = root.join("ws");
let structured = workspace.join(".coyote").join("memory");
fs::create_dir_all(&structured).unwrap();
fs::write(
structured.join("MEMORY.md"),
"# Memory Index\n\n- [[old-name]]: the plan\n- [[linker]]: links\n",
)
.unwrap();
fs::write(
structured.join("old-name.md"),
"---\nname: old-name\ndescription: the plan\n---\nself link [[old-name]]\n",
)
.unwrap();
fs::write(
structured.join("linker.md"),
"---\nname: linker\n---\nsee [[old-name]] and [[old-name-extended]]\n",
)
.unwrap();
let store = MemoryStore {
global_dir: root.join("g"),
workspace: discover_workspace_memory(&workspace),
};
let args = json!({"name": "old-name", "new_name": "new-name", "scope": "workspace"});
let result = rename_memory(&store, &workspace, &args).unwrap();
assert_eq!(result["status"], "ok");
assert!(!structured.join("old-name.md").exists());
let renamed = MemoryFile::load(&structured.join("new-name.md")).unwrap();
assert_eq!(renamed.frontmatter.name, "new-name");
assert!(renamed.body.contains("[[new-name]]"));
let linker = fs::read_to_string(structured.join("linker.md")).unwrap();
assert!(linker.contains("[[new-name]]"));
assert!(
linker.contains("[[old-name-extended]]"),
"unrelated links must be untouched: {linker}"
);
let index = fs::read_to_string(structured.join("MEMORY.md")).unwrap();
assert!(index.contains("- [[new-name]]: the plan"));
assert!(!index.contains("[[old-name]]"));
assert!(index.contains("[[linker]]"));
assert_eq!(
result["rewritten_references"].as_array().unwrap(),
&vec![json!("linker")]
);
let _ = fs::remove_dir_all(&root);
}
#[test]
fn rename_memory_rejects_collisions_and_bad_slugs() {
let root = temp_root("rename_guard");
let workspace = root.join("ws");
let structured = workspace.join(".coyote").join("memory");
fs::create_dir_all(&structured).unwrap();
fs::write(structured.join("MEMORY.md"), "- [[a]]: a\n- [[b]]: b\n").unwrap();
fs::write(structured.join("a.md"), "---\nname: a\n---\nx\n").unwrap();
fs::write(structured.join("b.md"), "---\nname: b\n---\nx\n").unwrap();
let store = MemoryStore {
global_dir: root.join("g"),
workspace: discover_workspace_memory(&workspace),
};
let collision = json!({"name": "a", "new_name": "b", "scope": "workspace"});
let err = rename_memory(&store, &workspace, &collision).unwrap_err();
assert!(err.to_string().contains("already exists"));
let bad_slug = json!({"name": "a", "new_name": "bad name!", "scope": "workspace"});
let err = rename_memory(&store, &workspace, &bad_slug).unwrap_err();
assert!(err.to_string().contains("invalid new_name"));
let _ = fs::remove_dir_all(&root);
}
#[test]
fn write_memory_stamps_timestamps_and_reports_replacement() {
let root = temp_root("write_stamps");
let workspace = root.join("ws");
let structured = workspace.join(".coyote").join("memory");
fs::create_dir_all(&structured).unwrap();
fs::write(structured.join("MEMORY.md"), "# Memory Index\n").unwrap();
let store = MemoryStore {
global_dir: root.join("g"),
workspace: discover_workspace_memory(&workspace),
};
let first = json!({
"name": "fact",
"description": "first version",
"content": "body v1",
"scope": "workspace",
"expires": "2099-01-01",
});
let before = today_string();
let result = write_memory(&store, &workspace, &first).unwrap();
let after = today_string();
assert_eq!(result["replaced"], false);
assert_eq!(result["previous_description"], Value::Null);
let saved = MemoryFile::load(&structured.join("fact.md")).unwrap();
let created = saved.frontmatter.created.clone().expect("created stamped");
assert!(
created == before || created == after,
"created '{created}' should be stamped with today's date"
);
assert_eq!(saved.frontmatter.updated, Some(created.clone()));
assert_eq!(saved.frontmatter.expires.as_deref(), Some("2099-01-01"));
assert_eq!(saved.frontmatter.superseded_by, None);
let second = json!({
"name": "fact",
"description": "second version",
"content": "body v2",
"scope": "workspace",
});
let result = write_memory(&store, &workspace, &second).unwrap();
assert_eq!(result["replaced"], true);
assert_eq!(result["previous_description"], "first version");
let saved = MemoryFile::load(&structured.join("fact.md")).unwrap();
assert_eq!(
saved.frontmatter.created,
Some(created),
"creation date must be preserved across overwrites"
);
assert!(saved.frontmatter.updated.is_some());
assert_eq!(saved.frontmatter.expires, None);
let _ = fs::remove_dir_all(&root);
}
} }
+27
View File
@@ -1691,6 +1691,33 @@ mod tests {
assert!(f.declarations().is_empty()); assert!(f.declarations().is_empty());
} }
#[test]
fn bundled_bash_tools_generate_declarations() {
let tools_dir =
std::path::Path::new(env!("CARGO_MANIFEST_DIR")).join("assets/functions/tools");
let mut checked = Vec::new();
for entry in std::fs::read_dir(&tools_dir).unwrap() {
let path = entry.unwrap().path();
if path.extension().and_then(OsStr::to_str) != Some("sh") {
continue;
}
let name = path.file_stem().unwrap().to_string_lossy().to_string();
let declarations = Functions::generate_declarations(&path)
.unwrap_or_else(|e| panic!("bundled tool '{name}' failed to parse: {e}"));
assert!(
!declarations.is_empty(),
"bundled tool '{name}' produced no function declaration"
);
checked.push(name);
}
for expected in ["fs_grep", "ast_grep", "execute_command"] {
assert!(
checked.iter().any(|n| n == expected),
"expected bundled tool '{expected}' to be checked; found {checked:?}"
);
}
}
#[test] #[test]
fn functions_append_todo_adds_declarations() { fn functions_append_todo_adds_declarations() {
let mut f = Functions::default(); let mut f = Functions::default();
+5 -1
View File
@@ -1,4 +1,4 @@
use crate::client::oauth::{OAuthProvider, load_oauth_tokens, run_oauth_flow}; use crate::client::oauth::{OAuthProvider, TokenRequestFormat, load_oauth_tokens, run_oauth_flow};
use crate::config::paths; use crate::config::paths;
use anyhow::{Context, Result, anyhow}; use anyhow::{Context, Result, anyhow};
use chrono::Utc; use chrono::Utc;
@@ -63,6 +63,10 @@ impl OAuthProvider for McpOAuthProvider {
&self.scopes &self.scopes
} }
fn token_request_format(&self) -> TokenRequestFormat {
TokenRequestFormat::FormUrlEncoded
}
fn uses_localhost_redirect(&self) -> bool { fn uses_localhost_redirect(&self) -> bool {
false false
} }
+107 -4
View File
@@ -6,7 +6,10 @@ use self::completer::ReplCompleter;
use self::highlighter::ReplHighlighter; use self::highlighter::ReplHighlighter;
use self::prompt::ReplPrompt; use self::prompt::ReplPrompt;
use crate::client::{call_chat_completions, call_chat_completions_streaming, init_client, oauth}; use crate::client::{
Message, MessageRole, call_chat_completions, call_chat_completions_streaming, init_client,
oauth,
};
use crate::config::{ use crate::config::{
AgentVariables, AppConfig, AssertState, Input, LastMessage, RequestContext, StateFlags, AgentVariables, AppConfig, AssertState, Input, LastMessage, RequestContext, StateFlags,
macro_execute, macro_execute,
@@ -29,9 +32,9 @@ use log::warn;
use parking_lot::RwLock; use parking_lot::RwLock;
use reedline::CursorConfig; use reedline::CursorConfig;
use reedline::{ use reedline::{
ColumnarMenu, EditCommand, EditMode, Emacs, KeyCode, KeyModifiers, Keybindings, Reedline, ColumnarMenu, EditCommand, EditMode, Emacs, FileBackedHistory, KeyCode, KeyModifiers,
ReedlineEvent, ReedlineMenu, ValidationResult, Validator, Vi, default_emacs_keybindings, Keybindings, Reedline, ReedlineEvent, ReedlineMenu, ValidationResult, Validator, Vi,
default_vi_insert_keybindings, default_vi_normal_keybindings, default_emacs_keybindings, default_vi_insert_keybindings, default_vi_normal_keybindings,
}; };
use reedline::{MenuBuilder, Signal}; use reedline::{MenuBuilder, Signal};
use std::sync::LazyLock; use std::sync::LazyLock;
@@ -318,6 +321,58 @@ Type ".help" for additional help.
} }
} }
{
let (messages_snapshot, compressed_count) = {
let ctx = self.ctx.read();
if let Some(session) = &ctx.session {
let msgs: Vec<Message> = session
.messages()
.iter()
.filter(|m| !m.role.is_system())
.cloned()
.collect();
let compressed = session.compressed_messages().len();
(msgs, compressed)
} else {
(vec![], 0)
}
};
if !messages_snapshot.is_empty() || compressed_count > 0 {
let app = Arc::clone(&self.ctx.read().app.config);
if compressed_count > 0 {
println!(
"{}",
dimmed_text(&format!(
"({compressed_count} earlier messages not shown; compressed for context)"
))
);
println!();
}
for message in &messages_snapshot {
match message.role {
MessageRole::User => {
if let Some(text) = message.content.as_text() {
println!("{}", dimmed_text("You:"));
println!("{text}");
println!();
}
}
MessageRole::Assistant => {
if let Some(text) = message.content.as_text() {
app.print_markdown(text)?;
println!();
}
}
_ => {}
}
}
println!("{}", dimmed_text("─── ↑ previous conversation ↑ ───"));
println!();
}
}
loop { loop {
if self.abort_signal.aborted_ctrld() { if self.abort_signal.aborted_ctrld() {
break; break;
@@ -393,6 +448,14 @@ Type ".help" for additional help.
editor = editor.with_buffer_editor(command, temp_file); editor = editor.with_buffer_editor(command, temp_file);
} }
if app.save_shell_history {
let ctx = ctx.read();
let history_path = paths::repl_history_file(&ctx.session);
if let Ok(history) = FileBackedHistory::with_file(1000, history_path) {
editor = editor.with_history(Box::new(history));
}
}
Ok(editor) Ok(editor)
} }
@@ -684,6 +747,46 @@ pub async fn run_repl_command(
session.set_autonaming(false); session.set_autonaming(false);
} }
} }
if let Some(session) = &ctx.session {
let messages_snapshot: Vec<Message> = session
.messages()
.iter()
.filter(|m| !m.role.is_system())
.cloned()
.collect();
let compressed_count = session.compressed_messages().len();
if !messages_snapshot.is_empty() || compressed_count > 0 {
if compressed_count > 0 {
println!(
"{}",
dimmed_text(&format!(
"({compressed_count} earlier messages not shown — compressed for context)"
))
);
println!();
}
for message in &messages_snapshot {
match message.role {
MessageRole::User => {
if let Some(text) = message.content.as_text() {
println!("{}", dimmed_text("You:"));
println!("{text}");
println!();
}
}
MessageRole::Assistant => {
if let Some(text) = message.content.as_text() {
app.print_markdown(text)?;
println!();
}
}
_ => {}
}
}
println!("{}", dimmed_text("─── ↑ previous conversation ↑ ───"));
println!();
}
}
} }
".install" => { ".install" => {
let trimmed = args.map(str::trim).unwrap_or(""); let trimmed = args.map(str::trim).unwrap_or("");