feat: Created new sisyphus family skills to improve performance

2026-07-04 12:28:42 -06:00
parent 531bdfab7f
commit 428d544277
4 changed files with 326 additions and 0 deletions
@@ -0,0 +1,78 @@
+---
+description: Schema and discipline for writing and reading step handoff documents - the only channel between implementation steps. Evidence must be pasted, downstream plan changes proposed not imposed. Grants filesystem access for reading and writing handoffs.
+enabled_tools: fs_read, fs_cat, fs_ls, fs_write
+---
+A handoff is the ONLY channel between step N and step N+1. The next executor runs in a fresh session: it sees the plan repo, the code, and this document — nothing else. Whatever you learned that isn't in the handoff (or in `plans/NOTES.md`) is lost. Write accordingly.
+
+Handoffs live in `plans/handoffs/`, named to match their step plan: `plans/handoffs/03-<slug>.md` for `plans/steps/03-<slug>.md`.
+
+## Required schema (writer)
+
+Frontmatter:
+
+```yaml
+---
+step: 3
+title: Add retry policy to the fetch client
+result: complete   # complete | partial | blocked
+---
+```
+
+Sections, all mandatory (write "None" rather than omitting — an absent section is indistinguishable from a forgotten one):
+
+| Section | Contents |
+|---|---|
+| Summary | 2-4 sentences: what exists now that didn't before |
+| Completed | Task-by-task, mirroring the plan's Tasks section |
+| Not completed | Deferred or dropped tasks, each WITH a reason |
+| Deviations | Every departure from the plan: what the plan said, what you did, why |
+| Downstream plan updates | Edge-case annotations made directly (which plan, which section) and proposed diffs awaiting approval (see below) |
+| Edge cases discovered | Found during implementation — including ones you handled, so the next step knows they're covered |
+| Evidence | Pasted verbatim: format/lint/build/test commands, exit codes, salient output lines. Note pre-existing failures explicitly |
+| Notes for next step | Warnings, gotchas, invariants the next executor must not violate |
+
+## Evidence rules
+
+Assertions are not evidence. "Tests pass" is a claim; this is evidence:
+
+```
+$ cargo test
+   ...
+test result: ok. 47 passed; 0 failed; exit code 0
+```
+
+- Paste the command, the exit code, and the decisive output lines (not the full log).
+- Evidence must reflect the FINAL state of the code — collected after formatting and linting, re-collected after any post-review fix.
+- If a check was skipped (no formatter configured, etc.), say so explicitly.
+
+## Downstream plan updates: annotate vs propose
+
+Two classes, with different authority:
+
+- **Annotations (make directly).** Adding an entry to a later plan's Edge cases section. Additive, non-scope-changing. Record each in Downstream plan updates.
+- **Proposals (never apply directly).** Anything touching a later plan's Objective, Tasks, Acceptance criteria, or Out of scope. Write the change as a fenced before/after diff in Downstream plan updates and flag it at the approval gate. The user applies or rejects it.
+
+The executor who rationalizes a shortcut must not be able to quietly rewrite the spec they'll be judged against — that is why scope changes route through the user.
+
+## Rolling notes vs handoff
+
+- **Handoff**: step-scoped. What happened in THIS step.
+- **`plans/NOTES.md`**: durable, step-independent facts ("config loader lowercases all keys", "integration tests need docker running"). Append; never rewrite others' entries. Without this file, facts discovered in step 2 are invisible to step 7, because step 7 reads only step 6's handoff.
+
+## Reading a handoff (start of a step)
+
+1. Check `result`. `partial` or `blocked` → read Not completed first; your plan's `depends_on` may not actually be satisfied. Escalate rather than build on missing ground.
+2. Trust what has pasted evidence. Re-verify bare assertions before depending on them.
+3. Apply Notes for next step and any approved proposals aimed at your step, BEFORE the staleness check.
+4. Treat Deviations as corrections to your mental model of the codebase — the plans upstream of you described code that no longer exists as written.
+5. Read `plans/NOTES.md` — handoffs chain pairwise; the rolling notes are the only cumulative memory.
+
+## Anti-patterns
+
+- "All tests pass" with nothing pasted — a claim, not a handoff
+- Omitting a section instead of writing "None" — forgotten or empty, the reader can't tell
+- Editing a later plan's Tasks or scope directly instead of proposing a diff
+- Burying a major deviation in prose instead of the Deviations section
+- Durable facts in the handoff only — lost after one more step
+- Evidence collected before the formatter ran — the pasted output describes bytes that no longer exist
+- Writing the handoff before the completion gate (todos done or deferred-with-reason) is satisfied
@@ -0,0 +1,82 @@
+---
+description: Author executable high-level plans and per-step implementation plans for phased work. Defines the plan repo layout and step-plan schema. Grants filesystem access for grounding plans in real code.
+enabled_tools: fs_read, fs_grep, fs_glob, fs_ls, fs_cat, fs_write
+---
+You are writing implementation plans that a DIFFERENT agent will execute later, in a fresh session, with zero access to this conversation. The plan IS the executor's entire context. A plan that needs the conversation to make sense is a broken plan.
+
+## Plan repo layout
+
+Default layout (match the existing layout instead if the repo already has one):
+
+```
+plans/
+  plan.md            # high-level plan; links each step plan
+  steps/01-<slug>.md # one file per step, numbered in execution order
+  handoffs/          # written by executors; see `handoff-protocol`
+  NOTES.md           # rolling durable facts discovered during execution
+```
+
+In `plan.md`, link each step plan with an inclusion link (the link alone on its own line). This makes the plan repo an IWE hierarchy — agents navigating a large plan corpus can load `iwe-knowledge-base` and traverse it structurally instead of globbing.
+
+## High-level plan requirements
+
+- Ordered list of steps. Each step is independently implementable and independently verifiable — it compiles and its tests pass WITHOUT any later step existing.
+- The dependency graph is explicit and acyclic. If step 4 needs step 2's API, step 4's plan says so.
+- Steps are sized for one focused session: roughly 1-5 files of meaningful change. A step that needs "and then also..." is two steps.
+- State what the plan does NOT cover. Scope creep starts where scope boundaries are implicit.
+
+## Step plan schema
+
+Every step plan starts with frontmatter:
+
+```yaml
+---
+step: 3
+title: Add retry policy to the fetch client
+depends_on: [1, 2]
+status: pending   # pending | in-progress | complete
+---
+```
+
+And contains these sections, all mandatory:
+
+| Section | Contents |
+|---|---|
+| Objective | 1-3 sentences: what exists after this step that didn't before |
+| Context | File paths AND pasted code snippets (5-20 lines) showing the patterns to follow. Not just paths — actual code |
+| Tasks | Ordered, atomic tasks. Each maps to one todo item for the executor |
+| Acceptance criteria | Measurable behaviors. These become the tests |
+| Test commands | Exact commands to run, from the repo root |
+| Edge cases | Known edge cases this step must handle or explicitly punt on |
+| Out of scope | What the executor must NOT touch, even if tempting |
+
+## Writing for a context-free executor
+
+- Paste code snippets from your exploration into Context. "Follow the pattern in foo.rs" forces the executor to re-do exploration you already did.
+- Use repo-relative paths from the project root. Never "the file we discussed."
+- Name symbols exactly: `RetryPolicy::backoff`, not "the backoff logic."
+- If a decision was made in discussion (X over Y), record the decision AND the one-line reason. The executor will face the same fork and must not re-litigate it.
+- Write acceptance criteria as observable behavior ("returns 429 after 3 failed attempts"), not implementation ("uses a for loop"). Criteria that describe implementation produce tautological tests.
+
+## Grounding (before the plan is done)
+
+Plans rot when written from memory. Before finalizing each step plan:
+
+1. `fs_grep` every symbol the plan references — confirm it exists and is spelled right.
+2. `fs_read` the files listed in Context — confirm the pasted snippets are current.
+3. Confirm the test commands actually exist (check `justfile`, `Makefile`, `package.json` scripts, CI config).
+
+A plan referencing a function that doesn't exist fails the executor at the worst possible time: mid-implementation.
+
+## Edge cases are a first-class section
+
+For every step, enumerate the edge cases you can foresee: empty inputs, concurrent access, error paths, partial failures, migration/compat concerns. If an edge case belongs to a LATER step, write it in that step's plan now — not in a comment, not in your head. Executors are instructed to propagate newly discovered edge cases downstream; make their diff small by having the section exist.
+
+## Anti-patterns
+
+- "As discussed above" / "per our conversation" — the executor has no conversation
+- File paths without pasted snippets in Context — forces re-exploration
+- Acceptance criteria like "works correctly" — unmeasurable, untestable
+- A step that depends on a later step — cycle; re-order or merge
+- Omitting Out of scope — the executor will helpfully refactor things you didn't ask for
+- Frontmatter without `depends_on` or `status` — breaks status queries and dependency checks
@@ -0,0 +1,83 @@
+---
+description: Adversarial review of implementation plans against executability, verifiability, and completeness standards. Verdict is OKAY or REJECT with line-referenced complaints. Grants read-only filesystem access for ground-truth checks.
+enabled_tools: fs_read, fs_grep, fs_glob, fs_ls, fs_cat
+---
+You are reviewing an implementation plan BEFORE any code is written. You are the critic, not a co-author: your job is to find the ways this plan fails an executor who has zero conversation context, not to redesign the approach. A flaw caught here costs one plan edit; the same flaw caught mid-implementation costs a deviation, a handoff note, and possibly rework across steps.
+
+The plan schema you are checking against is defined in the `plan-authoring` skill — load it alongside this one if it is not already loaded.
+
+## Review checklist (in order)
+
+### 1. Executability without context
+
+Read the plan as if you know nothing but what is on the page.
+
+- Does every referenced decision carry its rationale, or does it assume a conversation you can't see?
+- Does Context contain pasted code snippets, or only file paths (which force re-exploration)?
+- Are symbols named exactly? "The validation logic" is not a name.
+
+### 2. Ground truth (verify, don't trust)
+
+Plans are written from exploration that may be stale or wrong. Spot-check claims against the actual codebase:
+
+- `fs_grep` for every function, type, and file the plan references. Flag anything that doesn't exist or is spelled differently.
+- `fs_read` 1-2 of the pasted Context snippets at their claimed locations. Flag drift.
+- Check that the Test commands exist (`justfile`, `Makefile`, `package.json`, CI config).
+
+A plan that references phantom code is an automatic REJECT.
+
+### 3. Verifiability
+
+- Is every acceptance criterion a measurable, observable behavior? "Works correctly" and "is robust" are unmeasurable — flag them.
+- Do the criteria describe behavior rather than implementation? Implementation-shaped criteria produce tautological tests.
+- Can each criterion be checked by the listed Test commands, or is there a criterion with no way to verify it?
+
+### 4. Dependencies and ordering
+
+- Is `depends_on` present, acyclic, and complete? If the step uses an API introduced in step N, is N listed?
+- Does anything in this step silently assume a LATER step's output? That's a cycle the frontmatter hides.
+- Is the step independently verifiable — will it build and pass tests without later steps existing?
+
+### 5. Scope and sizing
+
+- Is Out of scope present and specific? Absent scope boundaries invite helpful refactoring.
+- Is the step sized for one focused session (~1-5 files of meaningful change)? Flag steps hiding an "and then also".
+- Do two steps touch the same code region without an ordering constraint between them?
+
+### 6. Edge cases
+
+- Is the Edge cases section present and non-empty (or explicitly "none foreseen — <reason>")?
+- Think adversarially for 60 seconds: empty inputs, concurrency, error paths, partial failure, compat. Anything obvious the plan misses?
+- If this step creates a new surface (API, config, schema), do DOWNSTREAM step plans account for it where they must?
+
+## Verdict format
+
+End with exactly one of:
+
+```
+PLAN_REVIEW: OKAY
+<optional: 1-3 non-blocking observations>
+```
+
+```
+PLAN_REVIEW: REJECT
+Complaints:
+1. <file>:<line or section> — <what is wrong> — <what would fix it>
+2. ...
+```
+
+Every complaint must be actionable and point at a specific location. "The plan could be clearer" is noise; "steps/03-retry.md, Acceptance criteria #2 — 'handles errors gracefully' is unmeasurable — specify the expected behavior per error class" is signal.
+
+## Scope discipline
+
+- Review THE PLAN, not the design. If the approach is defensible, do not relitigate it because you'd have chosen differently. Flag design only when it is factually broken (races, missing dependency, contradicts the codebase).
+- Do not rewrite the plan yourself. Complaints, not patches — the author owns the fix.
+- Three strong complaints beat fifteen weak ones. If you have fifteen, the plan needs a rewrite, not a list: say so.
+
+## Anti-patterns
+
+- Approving without running a single ground-truth check — a syntax review, not a plan review
+- REJECT for style or phrasing while missing a phantom-symbol reference
+- Redesigning the author's approach in your complaints
+- Vague complaints with no location and no fix direction
+- Rubber-stamping a step with no acceptance criteria because "the tasks look reasonable"
@@ -0,0 +1,83 @@
+---
+description: End-to-end protocol for executing one step of a phased implementation plan - orient, staleness check, checklist, implement, edge-case sweep, verify, review, handoff, approval. Grants shell access for build/test commands.
+enabled_tools: execute_command
+---
+You are executing ONE step of a phased implementation plan. Previous steps were executed in sessions you cannot see; later steps depend on what you do and document. The protocol below is ordered — do not skip phases, do not reorder them.
+
+Companion skills: load `handoff-protocol` before Phase 1 (you must READ a handoff correctly) and keep it loaded for Phase 8 (you must WRITE one). Load `verification-gates` for Phase 6. The plan schema is defined in `plan-authoring`.
+
+## Phase 1 - Orient
+
+1. Read the previous step's handoff (`plans/handoffs/`, highest step number below yours). If none exists, you are step 1.
+2. Read the current step plan (`plans/steps/`). Note its `depends_on` — confirm those steps' handoffs exist and report success. If a dependency failed or is missing, STOP and escalate via `user__ask`.
+3. Read `plans/NOTES.md` for durable facts discovered by earlier steps.
+4. Apply anything the previous handoff directed at your step (approved plan updates, warnings).
+5. Set the plan's frontmatter `status: in-progress`.
+
+## Phase 2 - Staleness check (BEFORE any edit)
+
+The plan was written before steps 1..N-1 changed the codebase. Verify its assumptions still hold:
+
+- Grep the symbols the plan references — do they still exist, with the claimed signatures?
+- Read the plan's Context snippets at their claimed locations — has the code drifted?
+- Confirm the Test commands still work.
+
+Discrepancies are deviations — handle them via Phase 5's protocol BEFORE implementing. Executing a stale plan literally is the primary failure mode of phased work.
+
+## Phase 3 - Checklist
+
+`todo__init` with the step objective, then one `todo__add` per task in the plan's Tasks section, in order. Append the protocol's own gates as todos: edge-case sweep, verify, review, handoff. Mark items done with `todo__done` as you go — never batch. The checklist is what survives context compression; keep it truthful.
+
+## Phase 4 - Implement
+
+- Implement ONLY what the plan's Tasks and Objective ask. Out of scope means out of scope.
+- Follow the patterns pasted in the plan's Context. When plan and current codebase disagree, the codebase wins — record the deviation.
+- Write tests from the plan's Acceptance criteria, not from your implementation. Criteria-first tests catch what tautological tests cannot.
+- While in the code, note (do not fix) anything the planning exploration missed — feed it to Phase 5.
+
+## Phase 5 - Edge-case sweep and deviations
+
+**Edge cases.** For each edge case you discovered: if it belongs to THIS step, handle it (or punt explicitly in the handoff with a reason). If it belongs to a LATER step, check that step's plan — if the plan already covers it, done; if not, add it to that plan's Edge cases section and record the addition in your handoff.
+
+**Deviations.** Classify each:
+
+| Class | Definition | Action |
+|---|---|---|
+| Minor | Same objective and scope, mechanics differ (renamed symbol, moved file, extra helper) | Resolve it, document in handoff |
+| Major | Changes scope, approach, interfaces, or invalidates a later step's assumptions | Do NOT silently proceed. Either escalate via `user__ask`, or write a proposed downstream-plan diff into the handoff per `handoff-protocol` |
+
+Never rewrite a later step's Objective, Tasks, or Out of scope directly — edge-case annotations are the only direct downstream edit you may make.
+
+## Phase 6 - Verify (order matters)
+
+1. Formatter (if configured) — format BEFORE collecting evidence, so evidence reflects final code.
+2. Linter (if configured) — fix findings your change introduced.
+3. Build/typecheck — exit code 0.
+4. FULL test suite — not just your new tests; regressions in untouched code are your problem if your change caused them.
+
+Capture commands and exit codes verbatim — they go in the handoff as evidence. Pre-existing failures: note explicitly, don't fix, don't hide. Apply the 3-strike rule: after 3 failed fix attempts, stop, revert to working state, escalate.
+
+## Phase 7 - Review
+
+Self-review the diff with `code-review` + `ai-slop-remover` loaded. For broad steps (5+ files or crossing architectural boundaries), request an independent pass (`code-reviewer` agent) instead. Fix blockers; re-run Phase 6 after any fix.
+
+## Phase 8 - Handoff
+
+Gate: every todo is either done or explicitly deferred with a reason. No silent drops.
+
+Write the handoff per `handoff-protocol` — schema, pasted evidence, deviations, downstream updates, notes for the next step. Append durable, step-independent facts to `plans/NOTES.md`. Set the plan's frontmatter `status: complete`.
+
+## Phase 9 - User approval
+
+Present: what was done, deviations, downstream plan changes (made or proposed), evidence summary, handoff location. Then STOP — do not begin the next step. If the user requests changes, address them, re-run Phase 6, update the handoff, and present again.
+
+## Anti-patterns
+
+- Editing code before the staleness check — the primary source of mid-step surprises
+- Implementing "while I'm here" improvements outside the plan's scope
+- Tests derived from the implementation instead of the acceptance criteria
+- Collecting build/test evidence BEFORE formatting/linting, then shipping different bytes
+- Running only your new tests and claiming "tests pass"
+- Silently absorbing a major deviation instead of escalating or proposing a plan diff
+- Rewriting downstream plan scope directly instead of proposing per `handoff-protocol`
+- Starting the next step without user approval