feat: Refactored the sisyhpus agent system to utilize the new skills system to improve performance and reliability
This commit is contained in:
@@ -0,0 +1,69 @@
|
||||
---
|
||||
description: Structured 6-section delegation template and session-continuity rules for orchestrating sub-agents. Load before spawning any agent.
|
||||
---
|
||||
You are delegating work to a sub-agent. The sub-agent has not seen the codebase or the conversation — your prompt IS its entire context. Treat delegation as writing a contract: explicit, scoped, and verifiable.
|
||||
|
||||
## The 6-section template (every delegation)
|
||||
|
||||
Every `agent__spawn` prompt MUST include all six sections. Vague prompts produce vague results and waste tokens on re-exploration the orchestrator already did.
|
||||
|
||||
```
|
||||
## TASK
|
||||
[One atomic goal. One verb. One outcome. No "and also".]
|
||||
|
||||
## EXPECTED OUTCOME
|
||||
[Concrete deliverables and success criteria. "I will know this is done when ..."]
|
||||
|
||||
## REQUIRED TOOLS
|
||||
[Explicit allowlist: fs_read, fs_grep, etc. Prevents tool sprawl.]
|
||||
|
||||
## MUST DO
|
||||
[Exhaustive requirements. Leave nothing implicit. If you'd be annoyed by the agent not doing X, list X.]
|
||||
|
||||
## MUST NOT DO
|
||||
[Forbidden actions. Anticipate rogue behavior. "Do not modify files outside src/auth/."]
|
||||
|
||||
## CONTEXT
|
||||
[File paths, code snippets, existing patterns, constraints. Paste actual code lines from prior exploration — not just file paths.]
|
||||
```
|
||||
|
||||
## Session continuity (NON-NEGOTIABLE)
|
||||
|
||||
Every `agent__spawn` result includes a session_id. **Use it.**
|
||||
|
||||
- Task failed/incomplete → resume with `session_id` + a tight "Fix: <error>" prompt.
|
||||
- Follow-up on a result → resume with `session_id` + "Also: <question>".
|
||||
- Multi-turn with the same agent → always resume. Never start fresh.
|
||||
|
||||
Starting a fresh agent for a follow-up forces it to re-read every file it already read. That's 70%+ wasted tokens, plus the agent loses the reasoning it built up.
|
||||
|
||||
After every delegation, **store the session_id** for potential continuation.
|
||||
|
||||
## Skill nudges to delegates
|
||||
|
||||
Sub-agents have their own skills. Nudge them in the CONTEXT section:
|
||||
|
||||
> "Load `code-review` before evaluating the diff."
|
||||
> "Load `frontend-ui-ux` before editing component files."
|
||||
> "Load `git-master` before touching history."
|
||||
|
||||
A one-line nudge saves the delegate a `skill__list` turn.
|
||||
|
||||
## Verification after delegation
|
||||
|
||||
A delegation is NOT complete when the sub-agent returns. It is complete when YOU have verified:
|
||||
|
||||
1. Did it work as expected? (Did the file change? Did the test pass?)
|
||||
2. Did it follow existing codebase patterns?
|
||||
3. Did the EXPECTED OUTCOME actually materialize?
|
||||
4. Did it respect MUST DO and MUST NOT DO?
|
||||
|
||||
If any answer is no → resume the session with a corrective prompt. Do not re-spawn from scratch.
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
- "Follow existing patterns" with no snippet → agent guesses, often wrong
|
||||
- Multi-goal prompts → agent does the easy one, skips the rest
|
||||
- Missing MUST NOT DO → agent over-reaches into unrelated files
|
||||
- Discarding session_id on failure → forced re-exploration, wasted tokens
|
||||
- Re-spawning instead of resuming for a 1-line fix → 10x cost
|
||||
@@ -0,0 +1,81 @@
|
||||
---
|
||||
description: Discipline for when and how to consult Oracle - blocking by design, never deliver an answer with Oracle pending, never bypass Oracle for design questions.
|
||||
---
|
||||
Oracle is your read-only, high-IQ advisor. Using it correctly is the difference between shipping the right thing slowly and shipping the wrong thing fast.
|
||||
|
||||
## When you MUST consult Oracle
|
||||
|
||||
Spawn `oracle` (do NOT answer yourself) any time the user asks:
|
||||
|
||||
- "How should I..." / "What's the best way to..." — design/approach questions
|
||||
- "Why does X keep..." / "What's wrong with..." — complex debugging (not simple errors)
|
||||
- "Should I use X or Y?" — technology or pattern choices
|
||||
- "How should this be structured?" — architecture and organization
|
||||
- "Review this" / "What do you think of..." — code/design review
|
||||
- Tradeoff questions — performance vs readability, complexity vs flexibility
|
||||
- Multi-component questions — anything spanning 3+ files or modules
|
||||
- Vague/open-ended — "improve this", "make this better", "clean this up"
|
||||
- After 2+ failed fix attempts on the same problem — complex debugging
|
||||
|
||||
Even if you think you know the answer, Oracle provides deeper, more thorough analysis. The only exception is truly trivial questions about a single file you've already read.
|
||||
|
||||
## Oracle is BLOCKING by design
|
||||
|
||||
The orchestrator (you) has paused work and CANNOT proceed until Oracle returns. This is intentional. The cost of Oracle's latency is paid so YOU get a thorough, considered answer rather than rushing in a wrong direction.
|
||||
|
||||
Therefore:
|
||||
|
||||
- **Do NOT implement before Oracle returns** if your implementation depends on Oracle's recommendation.
|
||||
- **Do NOT deliver the final user-facing answer** while Oracle is still running.
|
||||
- **Do NOT "time out and continue anyway"** for Oracle-dependent tasks.
|
||||
- While waiting, do only NON-OVERLAPPING prep work (work that doesn't depend on Oracle's verdict).
|
||||
|
||||
## How to consult Oracle effectively
|
||||
|
||||
Oracle has not seen the codebase or the conversation. Give it enough context to think:
|
||||
|
||||
```
|
||||
## Question
|
||||
[The decision you need help with, stated as a question]
|
||||
|
||||
## Background
|
||||
[Why this question matters now. What constraint or trigger raised it.]
|
||||
|
||||
## Code context
|
||||
[Paste the actual snippets from prior exploration — file paths alone are not enough]
|
||||
- From `path/to/file.ext`:
|
||||
<relevant 5-20 lines>
|
||||
|
||||
## What you've considered
|
||||
[Options you've already weighed and their tradeoffs as you see them]
|
||||
|
||||
## What I'd love Oracle to evaluate
|
||||
[Specific aspects: correctness, performance, security, future flexibility, etc.]
|
||||
```
|
||||
|
||||
A well-scoped Oracle consult returns a tighter answer faster.
|
||||
|
||||
## After Oracle returns
|
||||
|
||||
1. Read the recommendation, reasoning, and risks sections carefully.
|
||||
2. If the recommendation conflicts with your prior plan, update the plan — do not silently ignore Oracle.
|
||||
3. Pass Oracle's recommendation (and reasoning) to the implementer (e.g., coder) as CONTEXT in your delegation.
|
||||
4. If you disagree with Oracle's verdict, raise it with the user before implementing the alternative — don't act unilaterally against Oracle's advice.
|
||||
|
||||
## When NOT to consult Oracle
|
||||
|
||||
- Simple file operations you can do with direct tools
|
||||
- First attempt at any fix (try yourself first; consult after 2 failures)
|
||||
- Questions answerable from code you've already read
|
||||
- Trivial decisions (variable names in small functions, formatting)
|
||||
- Things you can infer from existing code patterns
|
||||
|
||||
Over-consultation wastes Oracle's budget and slows the work. Reserve Oracle for genuinely hard or load-bearing decisions.
|
||||
|
||||
## Anti-patterns (BLOCKING)
|
||||
|
||||
- Answering an architecture question yourself "just this once"
|
||||
- Delivering a user-facing answer while Oracle is still running
|
||||
- Implementing the obvious approach without consulting Oracle on a tradeoff question
|
||||
- Ignoring Oracle's recommendation because it's inconvenient
|
||||
- Polling `agent__collect` on a running Oracle (end your response, wait for notification)
|
||||
@@ -0,0 +1,70 @@
|
||||
---
|
||||
description: Fan-out exploration protocol — fire multiple research agents in parallel, wait for completion notifications, and never duplicate delegated work.
|
||||
---
|
||||
You are entering a research phase. Exploration is parallelizable; serial reads leave throughput on the table.
|
||||
|
||||
## Fan out, don't read serially
|
||||
|
||||
For any non-trivial codebase question, fire 2-5 `explore` agents in parallel, each scoped to a different angle:
|
||||
|
||||
- Auth implementation? → one for routes, one for middleware, one for token handling, one for error response shape.
|
||||
- Bug investigation? → one for the failing path, one for similar working paths, one for recent changes near the area.
|
||||
|
||||
Each agent gets a NARROW slice. Narrow scope = fast, focused result. Broad scope = the agent over-reads and returns a wall of text.
|
||||
|
||||
## The wait protocol
|
||||
|
||||
After spawning background agents:
|
||||
|
||||
1. If you have **non-overlapping** work to do (work that doesn't depend on the delegated research), do it now.
|
||||
2. If you don't, **end your response.** Do not call `agent__collect` immediately — the agent is still running.
|
||||
3. The system notifies you when the agent completes (`pending_escalations` or completion event).
|
||||
4. On notification, call `agent__collect` to retrieve results.
|
||||
|
||||
Polling `agent__collect` on a still-running agent blocks your turn for nothing.
|
||||
|
||||
## Anti-duplication rule (BLOCKING)
|
||||
|
||||
Once you delegate a search to an `explore` agent, **do not perform that same search yourself.**
|
||||
|
||||
Forbidden:
|
||||
- After firing `explore` for "auth middleware", running `fs_grep` for "auth middleware" yourself
|
||||
- "Just quickly checking" the same files the delegate is checking
|
||||
- Re-doing the research while waiting impatiently
|
||||
|
||||
Allowed:
|
||||
- Non-overlapping work in a different module
|
||||
- Preparation work that doesn't depend on the delegated result
|
||||
- Ending your response and waiting
|
||||
|
||||
Duplicate searches waste tokens, may contradict the delegate, and defeat the point of parallelism.
|
||||
|
||||
## Stop conditions
|
||||
|
||||
Stop searching when:
|
||||
|
||||
- The same information appears across multiple sources
|
||||
- Two search iterations yield no new useful data
|
||||
- A direct answer was found
|
||||
- You have enough context to proceed confidently
|
||||
|
||||
Over-exploration is as bad as under-exploration. Time spent searching is time not spent shipping.
|
||||
|
||||
## Parallel + sequential composition
|
||||
|
||||
It is fine to fire `explore` and then `oracle` when oracle needs the explore results — just sequence them:
|
||||
|
||||
1. Fire explore(s) in parallel.
|
||||
2. End response, wait for completion.
|
||||
3. Synthesize findings, fire `oracle` with those findings as CONTEXT.
|
||||
4. End response, wait for oracle.
|
||||
5. Act on oracle's recommendation.
|
||||
|
||||
Don't fire oracle blind to "save a turn" — it will give worse advice.
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
- One huge "explore everything about X" agent → slow, unfocused result
|
||||
- Serial explores ("wait for first, then fire next") → unnecessary latency
|
||||
- Firing 8+ parallel agents → diminishing returns, harder to synthesize
|
||||
- Calling `agent__collect` immediately after spawn → wastes a turn
|
||||
@@ -0,0 +1,66 @@
|
||||
---
|
||||
description: Evidence requirements before claiming completion — diagnostics, build exit code, tests. No completion without proof. Grants shell access for running build/test commands.
|
||||
enabled_tools: execute_command
|
||||
---
|
||||
You are about to mark work complete. Before claiming "done," produce evidence. "I'm fairly confident it works" is not evidence.
|
||||
|
||||
## Hard gates
|
||||
|
||||
A task is NOT complete until:
|
||||
|
||||
| Change kind | Required evidence |
|
||||
|---|---|
|
||||
| File edit | Read the file to confirm the change landed; output is clean (or only pre-existing issues, explicitly noted) |
|
||||
| Build command exists | `execute_command` the build; exit code 0 |
|
||||
| Test command exists | `execute_command` the tests; pass (or explicit note of pre-existing failures unrelated to this change) |
|
||||
| Delegation | The delegate's result was received AND verified against your acceptance criteria |
|
||||
|
||||
**No evidence = not complete.** Marking a todo done without evidence is dishonest reporting.
|
||||
|
||||
## The verification loop
|
||||
|
||||
After every meaningful edit:
|
||||
|
||||
1. Read the changed file region (confirm the change actually landed where intended).
|
||||
2. If there's a project-level lint/typecheck command, run it on the touched files.
|
||||
3. Run the project's build/check command if one exists.
|
||||
4. Run the project's test command if one exists.
|
||||
5. Only then mark the corresponding todo `completed`.
|
||||
|
||||
If any step fails: do not mark complete. Fix the issue or surface it explicitly.
|
||||
|
||||
## Build/test detection (fallback)
|
||||
|
||||
If no build/test command is configured, try standard ones for the project:
|
||||
|
||||
- Rust: `cargo check`, `cargo test`
|
||||
- Node/TS: `npm run build`, `npm test`, or `pnpm` / `yarn` equivalents
|
||||
- Python: `pytest`, `python -m mypy <pkg>`, `ruff check`
|
||||
- Go: `go build ./...`, `go test ./...`
|
||||
|
||||
Run from the project root. Capture exit codes.
|
||||
|
||||
## Distinguishing your failures from pre-existing failures
|
||||
|
||||
If build or tests fail, identify the cause:
|
||||
|
||||
- Caused by your change? → fix it before reporting complete.
|
||||
- Pre-existing (unrelated)? → note it explicitly: "Done. Build passes. Note: 3 lint errors pre-existing in unrelated files, not touched."
|
||||
|
||||
Never silently leave broken state behind. Never delete a failing test to make CI green.
|
||||
|
||||
## Anti-patterns (BLOCKING)
|
||||
|
||||
- "It should work" without running anything
|
||||
- Marking a todo complete based on intent, not verified outcome
|
||||
- Suppressing errors with `@ts-ignore`, `as any`, `#[allow(...)]` on unfamiliar lints, empty catch blocks
|
||||
- Deleting failing tests to "pass"
|
||||
- Reporting "all green" when you only ran a subset
|
||||
|
||||
## Reporting completion
|
||||
|
||||
When the work is verifiably done, report in one sentence:
|
||||
|
||||
> "Done. Build passes, 47 tests pass. Modified `auth.rs:42-58` to add JWT validation."
|
||||
|
||||
Not a paragraph. Not a victory lap. Specific, terse, evidence-backed.
|
||||
Reference in New Issue
Block a user