testing

2026-04-10 15:45:51 -06:00
parent ff3419a714
commit e9e6b82e24
42 changed files with 11578 additions and 358 deletions
@@ -0,0 +1,727 @@
+# Phase 2 Implementation Plan: Engine + Emitter
+
+## Overview
+
+Phase 1 splits `Config` into `AppState` + `RequestContext`. Phase 2 takes the unified state and introduces the **Engine** — a single core function that replaces CLI's `start_directive()` and REPL's `ask()` — plus an **Emitter trait** that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call `Engine::run()` with different `Emitter` implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.
+
+**Estimated effort:** ~1 week
+**Risk:** Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior.
+**Depends on:** Phase 1 Steps 0–10 complete (`GlobalConfig` eliminated, `RequestContext` wired through all entry points).
+
+---
+
+## Why Phase 2 Exists
+
+Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:
+
+1. **Streaming flag handling.** `start_directive` forces non-streaming when extracting code; `ask` never extracts code.
+2. **Auto-continuation loop.** `ask` has complex logic for `auto_continue_count`, todo inspection, and continuation prompt injection. `start_directive` has none.
+3. **Session compression.** `ask` triggers `maybe_compress_session` and awaits completion; `start_directive` never compresses.
+4. **Session autoname.** `ask` calls `maybe_autoname_session` after each turn; `start_directive` doesn't.
+5. **Cleanup on exit.** `start_directive` calls `exit_session()` at the end; `ask` lets the REPL loop handle it.
+
+Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one `Engine::run()` that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., `auto_continue: bool` on `RunRequest`).
+
+The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via `crossterm`. An `Emitter` implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.
+
+---
+
+## The Architecture After Phase 2
+
+```
+┌─────────┐  ┌─────────┐                 ┌─────────┐
+│   CLI   │  │  REPL   │                 │   API   │ (Phase 4)
+└────┬────┘  └────┬────┘                 └────┬────┘
+     │            │                           │
+     ▼            ▼                           ▼
+┌──────────────────────────────────────────────────┐
+│            Engine::run(ctx, req, emitter)        │
+│  ┌────────────────────────────────────────────┐  │
+│  │ 1. Apply CoreCommand (if any)              │  │
+│  │ 2. Build Input from req                    │  │
+│  │ 3. apply_prelude (first turn only)         │  │
+│  │ 4. before_chat_completion                  │  │
+│  │ 5. Stream or buffered LLM call             │  │
+│  │    ├─ emit Started                         │  │
+│  │    ├─ emit AssistantDelta (per chunk)      │  │
+│  │    ├─ emit ToolCall                        │  │
+│  │    ├─ execute tool                         │  │
+│  │    ├─ emit ToolResult                      │  │
+│  │    └─ loop on tool results                 │  │
+│  │ 6. after_chat_completion                   │  │
+│  │ 7. maybe_compress_session                  │  │
+│  │ 8. maybe_autoname_session                  │  │
+│  │ 9. Auto-continuation (if applicable)       │  │
+│  │ 10. emit Finished                          │  │
+│  └────────────────────────────────────────────┘  │
+└──────────────────────────────────────────────────┘
+     │            │                           │
+     ▼            ▼                           ▼
+TerminalEmitter  TerminalEmitter          JsonEmitter / SseEmitter
+```
+
+---
+
+## Core Types
+
+### `Engine`
+
+```rust
+pub struct Engine {
+    pub app: Arc<AppState>,
+}
+
+impl Engine {
+    pub fn new(app: Arc<AppState>) -> Self { Self { app } }
+
+    pub async fn run(
+        &self,
+        ctx: &mut RequestContext,
+        req: RunRequest,
+        emitter: &dyn Emitter,
+    ) -> Result<RunOutcome, CoreError>;
+}
+```
+
+`Engine` is intentionally a thin wrapper around `Arc<AppState>`. All per-turn state lives on `RequestContext`, so the engine itself has no per-call fields. This makes it cheap to clone and makes `Engine::run` trivially testable.
+
+### `RunRequest`
+
+```rust
+pub struct RunRequest {
+    pub input: Option<UserInput>,
+    pub command: Option<CoreCommand>,
+    pub options: RunOptions,
+}
+
+pub struct UserInput {
+    pub text: String,
+    pub files: Vec<FileInput>,
+    pub media: Vec<MediaInput>,
+    pub continuation: Option<ContinuationKind>,
+}
+
+pub enum ContinuationKind {
+    Continue,
+    Regenerate,
+}
+
+pub struct RunOptions {
+    pub stream: Option<bool>,
+    pub extract_code: bool,
+    pub auto_continue: bool,
+    pub compress_session: bool,
+    pub autoname_session: bool,
+    pub apply_prelude: bool,
+    pub with_embeddings: bool,
+    pub cancel: CancellationToken,
+}
+
+impl RunOptions {
+    pub fn cli() -> Self { /* today's start_directive defaults */ }
+    pub fn repl_turn() -> Self { /* today's ask defaults */ }
+    pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
+    pub fn api_session() -> Self { /* API session defaults */ }
+}
+```
+
+Two things to notice:
+
+1. **`input` is `Option`.** A `RunRequest` can carry just a `command` (e.g., `.role explain`) with no user text, just an input (a plain prompt), or both (the `.role <name> <text>` form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path.
+
+2. **`RunOptions` is the knob panel that replaces the five divergences.** CLI today has `auto_continue: false, compress_session: false, autoname_session: false`; REPL has all three `true`. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing `RunOptions::cli()` and flipping `auto_continue = true` — a capability that doesn't exist today.
+
+### `CoreCommand`
+
+```rust
+pub enum CoreCommand {
+    // State setters
+    SetModel(String),
+    UsePrompt(String),
+    UseRole { name: String, trailing_text: Option<String> },
+    UseSession(Option<String>),
+    UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
+    UseRag(Option<String>),
+
+    // Exit commands
+    ExitRole,
+    ExitSession,
+    ExitRag,
+    ExitAgent,
+
+    // State queries
+    Info(InfoScope),
+    RagSources,
+
+    // Config mutation
+    Set { key: String, value: String },
+
+    // Session actions
+    CompressSession,
+    EmptySession,
+    SaveSession { name: Option<String> },
+    EditSession,
+
+    // Role actions
+    SaveRole { name: Option<String> },
+    EditRole,
+
+    // RAG actions
+    EditRagDocs,
+    RebuildRag,
+
+    // Agent actions
+    EditAgentConfig,
+    ClearTodo,
+    StarterList,
+    StarterRun(usize),
+
+    // File input shortcut
+    IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },
+
+    // Macro execution
+    Macro { name: String, args: Vec<String> },
+
+    // Vault
+    VaultAdd(String),
+    VaultGet(String),
+    VaultUpdate(String),
+    VaultDelete(String),
+    VaultList,
+
+    // Miscellaneous
+    EditConfig,
+    Authenticate,
+    Delete(DeleteKind),
+    Copy,
+    Help,
+}
+
+pub enum InfoScope {
+    System,
+    Role,
+    Session,
+    Rag,
+    Agent,
+}
+
+pub enum DeleteKind {
+    Role(String),
+    Session(String),
+    Rag(String),
+    Macro(String),
+    AgentData(String),
+}
+```
+
+This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:
+
+- **LLM-triggering commands** (`UsePrompt`, `UseRole` with trailing_text, `IncludeFiles` with trailing_text, `StarterRun`, `Macro` that contains LLM calls, and the continuation variants `Continue`/`Regenerate` expressed via `UserInput.continuation`) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as `RunRequest { command: Some(_), input: Some(_), .. }` — command runs first, then input flows through.
+
+- **Asynchronous commands that return immediately** (`EditConfig`, `EditRole`, `EditRagDocs`, `EditAgentConfig`, most `Vault*`, `Delete`) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a `Result` event, and returns without invoking the LLM path.
+
+- **Context-dependent commands** (`ClearTodo`, `StarterList`, `StarterRun`, `EditAgentConfig`, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a `CoreError::InvalidState { expected: "active agent" }` if the precondition fails.
+
+### `Emitter` trait and `Event` enum
+
+```rust
+#[async_trait]
+pub trait Emitter: Send + Sync {
+    async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
+}
+
+pub enum Event<'a> {
+    // Lifecycle
+    Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
+    Finished { outcome: &'a RunOutcome },
+
+    // Assistant output
+    AssistantDelta(&'a str),
+    AssistantMessageEnd { full_text: &'a str },
+
+    // Tool calls
+    ToolCall { id: &'a str, name: &'a str, args: &'a str },
+    ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },
+
+    // Auto-continuation
+    AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },
+
+    // Session lifecycle signals
+    SessionCompressing,
+    SessionCompressed { tokens_saved: Option<usize> },
+    SessionAutonamed(&'a str),
+
+    // Informational
+    Info(&'a str),
+    Warning(&'a str),
+
+    // Errors
+    Error(&'a CoreError),
+}
+
+pub enum EmitError {
+    ClientDisconnected,
+    WriteFailed(std::io::Error),
+}
+```
+
+Three implementations ship in Phase 2; two are stubs, one is real:
+
+- **`TerminalEmitter`** (real) — wraps today's `SseHandler` → `markdown_stream`/`raw_stream` path. This is the bulk of Phase 2's work; see "Terminal rendering details" below.
+- **`NullEmitter`** (stub, for tests) — drops all events on the floor.
+- **`CollectingEmitter`** (stub, for tests and future JSON API) — appends events to a `Vec<OwnedEvent>` for later inspection.
+
+The `JsonEmitter` and `SseEmitter` implementations land in **Phase 4** when the API server comes online.
+
+### `RunOutcome`
+
+```rust
+pub struct RunOutcome {
+    pub request_id: Uuid,
+    pub session_id: Option<SessionId>,
+    pub final_message: Option<String>,
+    pub tool_call_count: usize,
+    pub turns: usize,
+    pub compressed: bool,
+    pub autonamed: Option<String>,
+    pub auto_continued: usize,
+}
+```
+
+`RunOutcome` is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.
+
+### `CoreError`
+
+```rust
+pub enum CoreError {
+    InvalidRequest { msg: String },
+    InvalidState { expected: String, found: String },
+    NotFound { what: String, name: String },
+    Cancelled,
+    ProviderError { provider: String, msg: String },
+    ToolError { tool: String, msg: String },
+    EmitterError(EmitError),
+    Io(std::io::Error),
+    Other(anyhow::Error),
+}
+
+impl CoreError {
+    pub fn is_retryable(&self) -> bool { /* ... */ }
+    pub fn http_status(&self) -> u16 { /* for future API use */ }
+    pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
+}
+```
+
+---
+
+## Terminal Rendering Details
+
+The `TerminalEmitter` is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:
+
+**Today's flow:**
+```
+LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
+                                                      ├─ markdown_stream (if highlight)
+                                                      └─ raw_stream (else)
+```
+
+Both `markdown_stream` and `raw_stream` write directly to stdout via `crossterm`, managing cursor positions, line clears, and incremental markdown parsing themselves.
+
+**Target flow:**
+```
+LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
+                                                      ├─ (internal) markdown_stream state machine
+                                                      └─ (internal) raw_stream state machine
+```
+
+The `TerminalEmitter` owns a `RefCell<StreamRenderState>` (or `Mutex` if we need `Send`) that wraps the existing `markdown_stream`/`raw_stream` state. Each `emit(AssistantDelta)` call feeds the chunk into this state machine exactly as `SseHandler`'s receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.
+
+**Things that migrate 1:1 into `TerminalEmitter`:**
+- Spinner start/stop on first delta
+- Cursor positioning for line reprint during code block growth
+- Syntax highlighting invocation via `MarkdownRender`
+- Color/dim output for tool call banners
+- Final newline + cursor reset on `AssistantMessageEnd`
+
+**Things that the engine handles, not the emitter:**
+- Tool call *execution* (still lives in the engine loop)
+- Session state mutations (engine calls `before_chat_completion` / `after_chat_completion` on `RequestContext`)
+- Auto-continuation decisions (engine inspects agent runtime)
+- Compression and autoname decisions (engine)
+
+**Things the emitter decides, not the engine:**
+- Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a `verbose: bool` flag)
+- How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
+- Whether to show a spinner at all (disabled for non-TTY output)
+
+**One gotcha:** today's `SseHandler` itself produces the `mpsc` channel that LLM clients push into. In the new model, `SseHandler` becomes an internal helper inside the engine's streaming path that converts `mpsc::Receiver<SseEvent>` into `Emitter::emit(Event::AssistantDelta(...))` calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.
+
+---
+
+## The Engine::run Pipeline
+
+Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via `RunOptions`:
+
+```rust
+impl Engine {
+    pub async fn run(
+        &self,
+        ctx: &mut RequestContext,
+        req: RunRequest,
+        emitter: &dyn Emitter,
+    ) -> Result<RunOutcome, CoreError> {
+        let request_id = Uuid::new_v4();
+        let mut outcome = RunOutcome::new(request_id);
+
+        emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;
+
+        // 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
+        if let Some(command) = req.command {
+            self.dispatch_command(ctx, command, emitter, &req.options).await?;
+        }
+
+        // 2. Early return if there's no user input (pure command)
+        let Some(user_input) = req.input else {
+            emitter.emit(Event::Finished { outcome: &outcome }).await?;
+            return Ok(outcome);
+        };
+
+        // 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
+        if req.options.apply_prelude && !ctx.prelude_applied {
+            apply_prelude(ctx, &req.options.cancel).await?;
+            ctx.prelude_applied = true;
+        }
+
+        // 4. Build Input from user_input + ctx
+        let input = build_input(ctx, user_input, &req.options).await?;
+
+        // 5. Wait for any in-progress compression to finish (REPL-style block)
+        while ctx.is_compressing_session() {
+            tokio::time::sleep(Duration::from_millis(100)).await;
+        }
+
+        // 6. Enter the turn loop
+        self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;
+
+        // 7. Maybe compress session
+        if req.options.compress_session && ctx.session_needs_compression() {
+            emitter.emit(Event::SessionCompressing).await?;
+            compress_session(ctx).await?;
+            outcome.compressed = true;
+            emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
+        }
+
+        // 8. Maybe autoname session
+        if req.options.autoname_session {
+            if let Some(name) = maybe_autoname_session(ctx).await? {
+                outcome.autonamed = Some(name.clone());
+                emitter.emit(Event::SessionAutonamed(&name)).await?;
+            }
+        }
+
+        // 9. Auto-continuation (agents only)
+        if req.options.auto_continue {
+            if let Some(continuation) = self.check_auto_continue(ctx) {
+                emitter.emit(Event::AutoContinueTriggered { .. }).await?;
+                outcome.auto_continued += 1;
+                // Recursive call with continuation prompt
+                let next_req = RunRequest {
+                    input: Some(UserInput::from_continuation(continuation)),
+                    command: None,
+                    options: req.options.clone(),
+                };
+                return Box::pin(self.run(ctx, next_req, emitter)).await;
+            }
+        }
+
+        emitter.emit(Event::Finished { outcome: &outcome }).await?;
+        Ok(outcome)
+    }
+
+    async fn run_turn(
+        &self,
+        ctx: &mut RequestContext,
+        mut input: Input,
+        options: &RunOptions,
+        emitter: &dyn Emitter,
+        outcome: &mut RunOutcome,
+    ) -> Result<(), CoreError> {
+        loop {
+            outcome.turns += 1;
+
+            before_chat_completion(ctx, &input);
+
+            let client = input.create_client(ctx)?;
+            let (output, tool_results) = if should_stream(&input, options) {
+                stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
+            } else {
+                buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
+            };
+
+            after_chat_completion(ctx, &input, &output, &tool_results);
+            outcome.tool_call_count += tool_results.len();
+
+            if tool_results.is_empty() {
+                outcome.final_message = Some(output);
+                return Ok(());
+            }
+
+            // Emit each tool call and result
+            for result in &tool_results {
+                emitter.emit(Event::ToolCall { .. }).await?;
+                emitter.emit(Event::ToolResult { .. }).await?;
+            }
+
+            // Loop: feed tool results back in
+            input = input.merge_tool_results(output, tool_results);
+        }
+    }
+}
+```
+
+**Key design decisions in this pipeline:**
+
+1. **Command dispatch happens first.** A `RunRequest` that carries both a command and input runs the command first (mutating `ctx`), then the input flows through the now-updated context. This lets `.role explain "tell me about X"` work as a single atomic operation — the role is activated, then the prompt is sent under the new role.
+
+2. **Tool loop is iterative, not recursive.** Today both `start_directive` and `ask` recursively call themselves after tool results. The new `run_turn` uses a `loop` instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation.
+
+3. **Cancellation is checked at every await point.** `options.cancel: CancellationToken` is threaded into every async call. On cancellation, the engine emits `Event::Error(CoreError::Cancelled)` and returns. Today's `AbortSignal` pattern gets wrapped in a `CancellationToken` adapter during the migration.
+
+4. **Session state hooks fire at the same points as today.** `before_chat_completion` and `after_chat_completion` continue to exist on `RequestContext`, called from the same places in the same order. The refactor doesn't change their semantics.
+
+5. **Emitter errors don't abort the run.** If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The `EmitError::ClientDisconnected` case is special-cased to swallow subsequent emits. Session save + tool execution still happen.
+
+---
+
+## Migration Strategy
+
+This phase is structured as **extract, unify, rewrite frontends** — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.
+
+### Step 1: Create the core types
+
+Add the new files without wiring them into anything:
+
+- `src/engine/mod.rs` — module root
+- `src/engine/engine.rs` — `Engine` struct + `run` method (initially `unimplemented!()`)
+- `src/engine/request.rs` — `RunRequest`, `UserInput`, `RunOptions`, `ContinuationKind`, `RunOutcome`
+- `src/engine/command.rs` — `CoreCommand` enum + sub-enums
+- `src/engine/error.rs` — `CoreError` enum
+- `src/engine/emitter.rs` — `Emitter` trait + `Event` enum + `EmitError`
+- `src/engine/emitters/mod.rs` — emitter module
+- `src/engine/emitters/null.rs` — `NullEmitter` (test stub)
+- `src/engine/emitters/collecting.rs` — `CollectingEmitter` (test stub)
+- `src/engine/emitters/terminal.rs` — `TerminalEmitter` (initially `unimplemented!()`)
+
+Register `pub mod engine;` in `src/main.rs`. Code compiles but nothing calls it yet.
+
+**Verification:** `cargo check` clean, `cargo test` passes.
+
+### Step 2: Implement `TerminalEmitter` against existing render code
+
+Before wiring the engine, build the `TerminalEmitter` by wrapping today's `SseHandler` + `markdown_stream` + `raw_stream` + `MarkdownRender` + `Spinner` code. Don't change any of those modules — just construct a `TerminalEmitter` that holds the state they need and forwards `emit(Event::AssistantDelta(...))` into them.
+
+```rust
+pub struct TerminalEmitter {
+    render_state: Mutex<StreamRenderState>,
+    options: TerminalEmitterOptions,
+}
+
+pub struct TerminalEmitterOptions {
+    pub highlight: bool,
+    pub theme: Option<String>,
+    pub verbose_tool_calls: bool,
+    pub show_spinner: bool,
+}
+
+impl TerminalEmitter {
+    pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
+}
+```
+
+Implement `Emitter` for it, mapping each `Event` variant to the appropriate crossterm operation:
+
+| Event | TerminalEmitter action |
+|---|---|
+| `Started` | Start spinner |
+| `AssistantDelta(chunk)` | Stop spinner (if first), feed chunk into render state |
+| `AssistantMessageEnd { full_text }` | Flush render state, emit trailing newline |
+| `ToolCall { name, args }` | Print dimmed `⚙ Using <name>` banner if verbose |
+| `ToolResult { .. }` | Print dimmed result summary if verbose |
+| `AutoContinueTriggered` | Print yellow `⟳ Continuing (N/M, R todos remaining)` to stderr |
+| `SessionCompressing` | Print `Compressing session...` to stderr |
+| `SessionCompressed` | Print `Session compressed.` to stderr |
+| `SessionAutonamed` | Print `Session auto-named: <name>` to stderr |
+| `Info(msg)` | Print to stdout |
+| `Warning(msg)` | Print yellow to stderr |
+| `Error(e)` | Print red to stderr |
+| `Finished` | No-op (ensures trailing newline is flushed) |
+
+**Verification:** write integration tests that construct a `TerminalEmitter`, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use `assert_cmd` or similar to snapshot the rendered output of each event variant.
+
+### Step 3: Implement `Engine::run` without wiring it
+
+Implement `Engine::run` and `Engine::run_turn` following the pseudocode above. Use the existing helper functions (`before_chat_completion`, `after_chat_completion`, `apply_prelude`, `create_client`, `call_chat_completions`, `call_chat_completions_streaming`, `maybe_compress_session`, `maybe_autoname_session`) unchanged, just called through `ctx` instead of `&GlobalConfig`.
+
+**Implementing `dispatch_command`** is the largest sub-task here because it needs to match all 37 `CoreCommand` variants and invoke the right `ctx` methods. Most variants are straightforward one-liners that call a corresponding method on `RequestContext`. A few need special handling:
+
+- `CoreCommand::UseRole { name, trailing_text }` — activate role, then if `trailing_text` is `Some`, the outer `run` will flow through with the trailing text as `UserInput.text`.
+- `CoreCommand::IncludeFiles` — reads files, converts to `FileInput` list, attaches to `ctx`'s next input (or fails if no input is provided).
+- `CoreCommand::StarterRun(id)` — looks up the starter text on the active agent, fails if no agent.
+- `CoreCommand::Macro` — delegates to `macro_execute`, which may itself call `Engine::run` internally for LLM-triggering macros.
+
+**Verification:** write unit tests for `dispatch_command` using `NullEmitter`. Each test activates a command and asserts the expected state mutation on `ctx`. This is ~37 tests, one per variant, and they catch the bulk of regressions early.
+
+Then write a handful of integration tests for `Engine::run` with `CollectingEmitter`, asserting the expected event sequence for:
+- Plain prompt, no tools, streaming
+- Plain prompt, no tools, non-streaming
+- Prompt that triggers 2 tool calls
+- Prompt that triggers auto-continuation (mock the LLM response)
+- Prompt on a session that crosses the compression threshold
+- Command-only request (`.info`)
+- Command + prompt request (`.role explain "..."`)
+
+### Step 4: Wire CLI to `Engine::run`
+
+Replace `main.rs::start_directive` with a thin wrapper:
+
+```rust
+async fn start_directive(
+    app: Arc<AppState>,
+    ctx: &mut RequestContext,
+    input_text: String,
+    files: Vec<String>,
+    code_mode: bool,
+) -> Result<()> {
+    let engine = Engine::new(app.clone());
+    let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);
+
+    let req = RunRequest {
+        input: Some(UserInput::from_text_and_files(input_text, files)),
+        command: None,
+        options: {
+            let mut o = RunOptions::cli();
+            o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
+            o
+        },
+    };
+
+    match engine.run(ctx, req, &emitter).await {
+        Ok(_outcome) => Ok(()),
+        Err(CoreError::Cancelled) => Ok(()),
+        Err(e) => Err(e.into()),
+    }
+}
+```
+
+**Verification:** manual smoke test. Run `loki "hello"`, `loki --code "write a rust hello world"`, `loki --role explain "what is TCP"`. All should produce identical output to before the change.
+
+### Step 5: Wire REPL to `Engine::run`
+
+Replace `repl/mod.rs::ask` with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls `run_repl_command` stays. `run_repl_command` for non-dot-command lines constructs a `RunRequest { input: Some(...), .. }` and calls `Engine::run`. Dot-commands get parsed into `CoreCommand` and called as `RunRequest { command: Some(...), input: None, .. }` (or with input if they carry trailing text).
+
+```rust
+// In Repl:
+async fn handle_line(&mut self, line: &str) -> Result<()> {
+    let req = if let Some(rest) = line.strip_prefix('.') {
+        parse_dot_command_to_run_request(rest, &self.ctx)?
+    } else {
+        RunRequest {
+            input: Some(UserInput::from_text(line.to_string())),
+            command: None,
+            options: RunOptions::repl_turn(),
+        }
+    };
+
+    match self.engine.run(&mut self.ctx, req, &self.emitter).await {
+        Ok(_) => Ok(()),
+        Err(CoreError::Cancelled) => Ok(()),
+        Err(e) => {
+            self.emitter.emit(Event::Error(&e)).await.ok();
+            Ok(())
+        }
+    }
+}
+```
+
+**Verification:** manual smoke test of the REPL. Run through a typical session:
+1. `loki` → REPL starts
+2. `hello` → plain prompt works
+3. `.role explain` → role activates
+4. `what is TCP` → responds under the role
+5. `.session` → session starts
+6. Several messages → conversation continues
+7. `.info session` → info prints
+8. `.compress session` → compression runs
+9. `.agent sisyphus` → agent activates with sub-agents
+10. `write a hello world in rust` → tool calls + output
+11. `.exit agent` → agent exits, previous session still active
+12. `.exit` → REPL exits
+
+Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.
+
+### Step 6: Delete the old `start_directive` and `ask`
+
+Once CLI and REPL both route through `Engine::run` and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run `cargo check` and `cargo test`.
+
+**Verification:** full test suite green, no dead code warnings.
+
+### Step 7: Tidy and document
+
+- Add rustdoc comments on `Engine`, `RunRequest`, `RunOptions`, `Emitter`, `Event`, `CoreCommand`, `CoreError`.
+- Add an `examples/` subdirectory under `src/engine/` showing how to call the engine with each emitter.
+- Update `docs/AGENTS.md` with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature).
+- Update `docs/REST-API-ARCHITECTURE.md` to remove any "in Phase 2" placeholders.
+
+---
+
+## Risks and Watch Items
+
+| Risk | Severity | Mitigation |
+|---|---|---|
+| **Terminal rendering regressions** | High | Golden-file snapshot tests for every `Event` variant. Manual smoke tests across all common REPL flows. Keep `TerminalEmitter` as a thin wrapper — no logic changes in the render code itself. |
+| **Auto-continuation recursion limits** | Medium | The new `Engine::run` uses `Box::pin` for the auto-continuation recursive call. Verify with a mock LLM that `max_auto_continues = 100` doesn't blow the stack. |
+| **Cancellation during tool execution** | Medium | Tool execution currently uses `AbortSignal`; the new path uses `CancellationToken`. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). |
+| **Command parsing fidelity** | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated `parse_dot_command_to_run_request` function with unit tests for every edge case found in today's code. |
+| **Macro execution recursion** | Medium | `.macro` can invoke LLM calls, which now go through `Engine::run`, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. |
+| **Emitter error propagation** | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first `EmitError::ClientDisconnected` — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. |
+| **Spinner interleaving with tool output** | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. |
+| **Feature flag: `auto_continue` in CLI** | Low | After Phase 2, CLI *could* support auto-continuation but it's not exposed. Decision: leave it off by default in `RunOptions::cli()`, add a `--auto-continue` flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. |
+
+---
+
+## What Phase 2 Does NOT Do
+
+- **No new features.** Everything that worked before works the same way after.
+- **No API server.** `JsonEmitter` and `SseEmitter` are placeholders — Phase 4 implements them.
+- **No `SessionStore` abstraction.** That's Phase 3.
+- **No `ToolScope` unification.** That landed in Phase 1 Step 6.5.
+- **No changes to LLM client code.** `call_chat_completions` and `call_chat_completions_streaming` keep their existing signatures.
+- **No MCP factory pooling.** That's Phase 5.
+- **No dot-command syntax changes.** The REPL still accepts exactly the same dot-commands; they just parse into `CoreCommand` instead of being hand-dispatched in `run_repl_command`.
+
+The sole goal of Phase 2 is: **extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.**
+
+---
+
+## Entry Criteria (from Phase 1)
+
+Before starting Phase 2, Phase 1 must be complete:
+
+- [ ] `GlobalConfig` type alias is removed
+- [ ] `AppState` and `RequestContext` are the only state holders
+- [ ] All 91 callsites in the original migration table have been updated
+- [ ] `cargo test` passes with no `Config`-based tests remaining
+- [ ] CLI and REPL manual smoke tests pass identically to pre-Phase-1
+
+## Exit Criteria (Phase 2 complete)
+
+- [ ] `src/engine/` module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError
+- [ ] `TerminalEmitter` implemented and wrapping all existing render paths
+- [ ] `NullEmitter` and `CollectingEmitter` implemented
+- [ ] `start_directive` in main.rs is a thin wrapper around `Engine::run`
+- [ ] REPL's per-line handler routes through `Engine::run`
+- [ ] All 37 `CoreCommand` variants implemented with unit tests
+- [ ] Integration tests for the 7 engine scenarios listed in Step 3
+- [ ] Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
+- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
+- [ ] Phase 3 (SessionStore abstraction) can begin