# Phase 2 Implementation Plan: Engine + Emitter ## Overview Phase 1 splits `Config` into `AppState` + `RequestContext`. Phase 2 takes the unified state and introduces the **Engine** — a single core function that replaces CLI's `start_directive()` and REPL's `ask()` — plus an **Emitter trait** that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call `Engine::run()` with different `Emitter` implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic. **Estimated effort:** ~1 week **Risk:** Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior. **Depends on:** Phase 1 Steps 0–10 complete (`GlobalConfig` eliminated, `RequestContext` wired through all entry points). --- ## Why Phase 2 Exists Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design: 1. **Streaming flag handling.** `start_directive` forces non-streaming when extracting code; `ask` never extracts code. 2. **Auto-continuation loop.** `ask` has complex logic for `auto_continue_count`, todo inspection, and continuation prompt injection. `start_directive` has none. 3. **Session compression.** `ask` triggers `maybe_compress_session` and awaits completion; `start_directive` never compresses. 4. **Session autoname.** `ask` calls `maybe_autoname_session` after each turn; `start_directive` doesn't. 5. **Cleanup on exit.** `start_directive` calls `exit_session()` at the end; `ask` lets the REPL loop handle it. Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one `Engine::run()` that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., `auto_continue: bool` on `RunRequest`). The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via `crossterm`. An `Emitter` implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them. --- ## The Architecture After Phase 2 ``` ┌─────────┐ ┌─────────┐ ┌─────────┐ │ CLI │ │ REPL │ │ API │ (Phase 4) └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ ▼ ▼ ▼ ┌──────────────────────────────────────────────────┐ │ Engine::run(ctx, req, emitter) │ │ ┌────────────────────────────────────────────┐ │ │ │ 1. Apply CoreCommand (if any) │ │ │ │ 2. Build Input from req │ │ │ │ 3. apply_prelude (first turn only) │ │ │ │ 4. before_chat_completion │ │ │ │ 5. Stream or buffered LLM call │ │ │ │ ├─ emit Started │ │ │ │ ├─ emit AssistantDelta (per chunk) │ │ │ │ ├─ emit ToolCall │ │ │ │ ├─ execute tool │ │ │ │ ├─ emit ToolResult │ │ │ │ └─ loop on tool results │ │ │ │ 6. after_chat_completion │ │ │ │ 7. maybe_compress_session │ │ │ │ 8. maybe_autoname_session │ │ │ │ 9. Auto-continuation (if applicable) │ │ │ │ 10. emit Finished │ │ │ └────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ TerminalEmitter TerminalEmitter JsonEmitter / SseEmitter ``` --- ## Core Types ### `Engine` ```rust pub struct Engine { pub app: Arc, } impl Engine { pub fn new(app: Arc) -> Self { Self { app } } pub async fn run( &self, ctx: &mut RequestContext, req: RunRequest, emitter: &dyn Emitter, ) -> Result; } ``` `Engine` is intentionally a thin wrapper around `Arc`. All per-turn state lives on `RequestContext`, so the engine itself has no per-call fields. This makes it cheap to clone and makes `Engine::run` trivially testable. ### `RunRequest` ```rust pub struct RunRequest { pub input: Option, pub command: Option, pub options: RunOptions, } pub struct UserInput { pub text: String, pub files: Vec, pub media: Vec, pub continuation: Option, } pub enum ContinuationKind { Continue, Regenerate, } pub struct RunOptions { pub stream: Option, pub extract_code: bool, pub auto_continue: bool, pub compress_session: bool, pub autoname_session: bool, pub apply_prelude: bool, pub with_embeddings: bool, pub cancel: CancellationToken, } impl RunOptions { pub fn cli() -> Self { /* today's start_directive defaults */ } pub fn repl_turn() -> Self { /* today's ask defaults */ } pub fn api_oneshot() -> Self { /* API one-shot defaults */ } pub fn api_session() -> Self { /* API session defaults */ } } ``` Two things to notice: 1. **`input` is `Option`.** A `RunRequest` can carry just a `command` (e.g., `.role explain`) with no user text, just an input (a plain prompt), or both (the `.role ` form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path. 2. **`RunOptions` is the knob panel that replaces the five divergences.** CLI today has `auto_continue: false, compress_session: false, autoname_session: false`; REPL has all three `true`. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing `RunOptions::cli()` and flipping `auto_continue = true` — a capability that doesn't exist today. ### `CoreCommand` ```rust pub enum CoreCommand { // State setters SetModel(String), UsePrompt(String), UseRole { name: String, trailing_text: Option }, UseSession(Option), UseAgent { name: String, session: Option, variables: Vec<(String, String)> }, UseRag(Option), // Exit commands ExitRole, ExitSession, ExitRag, ExitAgent, // State queries Info(InfoScope), RagSources, // Config mutation Set { key: String, value: String }, // Session actions CompressSession, EmptySession, SaveSession { name: Option }, EditSession, // Role actions SaveRole { name: Option }, EditRole, // RAG actions EditRagDocs, RebuildRag, // Agent actions EditAgentConfig, ClearTodo, StarterList, StarterRun(usize), // File input shortcut IncludeFiles { paths: Vec, trailing_text: Option }, // Macro execution Macro { name: String, args: Vec }, // Vault VaultAdd(String), VaultGet(String), VaultUpdate(String), VaultDelete(String), VaultList, // Miscellaneous EditConfig, Authenticate, Delete(DeleteKind), Copy, Help, } pub enum InfoScope { System, Role, Session, Rag, Agent, } pub enum DeleteKind { Role(String), Session(String), Rag(String), Macro(String), AgentData(String), } ``` This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention: - **LLM-triggering commands** (`UsePrompt`, `UseRole` with trailing_text, `IncludeFiles` with trailing_text, `StarterRun`, `Macro` that contains LLM calls, and the continuation variants `Continue`/`Regenerate` expressed via `UserInput.continuation`) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as `RunRequest { command: Some(_), input: Some(_), .. }` — command runs first, then input flows through. - **Asynchronous commands that return immediately** (`EditConfig`, `EditRole`, `EditRagDocs`, `EditAgentConfig`, most `Vault*`, `Delete`) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a `Result` event, and returns without invoking the LLM path. - **Context-dependent commands** (`ClearTodo`, `StarterList`, `StarterRun`, `EditAgentConfig`, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a `CoreError::InvalidState { expected: "active agent" }` if the precondition fails. ### `Emitter` trait and `Event` enum ```rust #[async_trait] pub trait Emitter: Send + Sync { async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>; } pub enum Event<'a> { // Lifecycle Started { request_id: Uuid, session_id: Option, agent: Option<&'a str> }, Finished { outcome: &'a RunOutcome }, // Assistant output AssistantDelta(&'a str), AssistantMessageEnd { full_text: &'a str }, // Tool calls ToolCall { id: &'a str, name: &'a str, args: &'a str }, ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool }, // Auto-continuation AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize }, // Session lifecycle signals SessionCompressing, SessionCompressed { tokens_saved: Option }, SessionAutonamed(&'a str), // Informational Info(&'a str), Warning(&'a str), // Errors Error(&'a CoreError), } pub enum EmitError { ClientDisconnected, WriteFailed(std::io::Error), } ``` Three implementations ship in Phase 2; two are stubs, one is real: - **`TerminalEmitter`** (real) — wraps today's `SseHandler` → `markdown_stream`/`raw_stream` path. This is the bulk of Phase 2's work; see "Terminal rendering details" below. - **`NullEmitter`** (stub, for tests) — drops all events on the floor. - **`CollectingEmitter`** (stub, for tests and future JSON API) — appends events to a `Vec` for later inspection. The `JsonEmitter` and `SseEmitter` implementations land in **Phase 4** when the API server comes online. ### `RunOutcome` ```rust pub struct RunOutcome { pub request_id: Uuid, pub session_id: Option, pub final_message: Option, pub tool_call_count: usize, pub turns: usize, pub compressed: bool, pub autonamed: Option, pub auto_continued: usize, } ``` `RunOutcome` is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run. ### `CoreError` ```rust pub enum CoreError { InvalidRequest { msg: String }, InvalidState { expected: String, found: String }, NotFound { what: String, name: String }, Cancelled, ProviderError { provider: String, msg: String }, ToolError { tool: String, msg: String }, EmitterError(EmitError), Io(std::io::Error), Other(anyhow::Error), } impl CoreError { pub fn is_retryable(&self) -> bool { /* ... */ } pub fn http_status(&self) -> u16 { /* for future API use */ } pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ } } ``` --- ## Terminal Rendering Details The `TerminalEmitter` is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model: **Today's flow:** ``` LLM client → mpsc::Sender → SseHandler → render_stream ├─ markdown_stream (if highlight) └─ raw_stream (else) ``` Both `markdown_stream` and `raw_stream` write directly to stdout via `crossterm`, managing cursor positions, line clears, and incremental markdown parsing themselves. **Target flow:** ``` LLM client → mpsc::Sender → SseHandler → TerminalEmitter::emit(Event::AssistantDelta) ├─ (internal) markdown_stream state machine └─ (internal) raw_stream state machine ``` The `TerminalEmitter` owns a `RefCell` (or `Mutex` if we need `Send`) that wraps the existing `markdown_stream`/`raw_stream` state. Each `emit(AssistantDelta)` call feeds the chunk into this state machine exactly as `SseHandler`'s receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait. **Things that migrate 1:1 into `TerminalEmitter`:** - Spinner start/stop on first delta - Cursor positioning for line reprint during code block growth - Syntax highlighting invocation via `MarkdownRender` - Color/dim output for tool call banners - Final newline + cursor reset on `AssistantMessageEnd` **Things that the engine handles, not the emitter:** - Tool call *execution* (still lives in the engine loop) - Session state mutations (engine calls `before_chat_completion` / `after_chat_completion` on `RequestContext`) - Auto-continuation decisions (engine inspects agent runtime) - Compression and autoname decisions (engine) **Things the emitter decides, not the engine:** - Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a `verbose: bool` flag) - How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON) - Whether to show a spinner at all (disabled for non-TTY output) **One gotcha:** today's `SseHandler` itself produces the `mpsc` channel that LLM clients push into. In the new model, `SseHandler` becomes an internal helper inside the engine's streaming path that converts `mpsc::Receiver` into `Emitter::emit(Event::AssistantDelta(...))` calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes. --- ## The Engine::run Pipeline Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via `RunOptions`: ```rust impl Engine { pub async fn run( &self, ctx: &mut RequestContext, req: RunRequest, emitter: &dyn Emitter, ) -> Result { let request_id = Uuid::new_v4(); let mut outcome = RunOutcome::new(request_id); emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?; // 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational. if let Some(command) = req.command { self.dispatch_command(ctx, command, emitter, &req.options).await?; } // 2. Early return if there's no user input (pure command) let Some(user_input) = req.input else { emitter.emit(Event::Finished { outcome: &outcome }).await?; return Ok(outcome); }; // 3. Apply prelude on first turn of a fresh context (CLI/REPL only) if req.options.apply_prelude && !ctx.prelude_applied { apply_prelude(ctx, &req.options.cancel).await?; ctx.prelude_applied = true; } // 4. Build Input from user_input + ctx let input = build_input(ctx, user_input, &req.options).await?; // 5. Wait for any in-progress compression to finish (REPL-style block) while ctx.is_compressing_session() { tokio::time::sleep(Duration::from_millis(100)).await; } // 6. Enter the turn loop self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?; // 7. Maybe compress session if req.options.compress_session && ctx.session_needs_compression() { emitter.emit(Event::SessionCompressing).await?; compress_session(ctx).await?; outcome.compressed = true; emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?; } // 8. Maybe autoname session if req.options.autoname_session { if let Some(name) = maybe_autoname_session(ctx).await? { outcome.autonamed = Some(name.clone()); emitter.emit(Event::SessionAutonamed(&name)).await?; } } // 9. Auto-continuation (agents only) if req.options.auto_continue { if let Some(continuation) = self.check_auto_continue(ctx) { emitter.emit(Event::AutoContinueTriggered { .. }).await?; outcome.auto_continued += 1; // Recursive call with continuation prompt let next_req = RunRequest { input: Some(UserInput::from_continuation(continuation)), command: None, options: req.options.clone(), }; return Box::pin(self.run(ctx, next_req, emitter)).await; } } emitter.emit(Event::Finished { outcome: &outcome }).await?; Ok(outcome) } async fn run_turn( &self, ctx: &mut RequestContext, mut input: Input, options: &RunOptions, emitter: &dyn Emitter, outcome: &mut RunOutcome, ) -> Result<(), CoreError> { loop { outcome.turns += 1; before_chat_completion(ctx, &input); let client = input.create_client(ctx)?; let (output, tool_results) = if should_stream(&input, options) { stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await? } else { buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await? }; after_chat_completion(ctx, &input, &output, &tool_results); outcome.tool_call_count += tool_results.len(); if tool_results.is_empty() { outcome.final_message = Some(output); return Ok(()); } // Emit each tool call and result for result in &tool_results { emitter.emit(Event::ToolCall { .. }).await?; emitter.emit(Event::ToolResult { .. }).await?; } // Loop: feed tool results back in input = input.merge_tool_results(output, tool_results); } } } ``` **Key design decisions in this pipeline:** 1. **Command dispatch happens first.** A `RunRequest` that carries both a command and input runs the command first (mutating `ctx`), then the input flows through the now-updated context. This lets `.role explain "tell me about X"` work as a single atomic operation — the role is activated, then the prompt is sent under the new role. 2. **Tool loop is iterative, not recursive.** Today both `start_directive` and `ask` recursively call themselves after tool results. The new `run_turn` uses a `loop` instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation. 3. **Cancellation is checked at every await point.** `options.cancel: CancellationToken` is threaded into every async call. On cancellation, the engine emits `Event::Error(CoreError::Cancelled)` and returns. Today's `AbortSignal` pattern gets wrapped in a `CancellationToken` adapter during the migration. 4. **Session state hooks fire at the same points as today.** `before_chat_completion` and `after_chat_completion` continue to exist on `RequestContext`, called from the same places in the same order. The refactor doesn't change their semantics. 5. **Emitter errors don't abort the run.** If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The `EmitError::ClientDisconnected` case is special-cased to swallow subsequent emits. Session save + tool execution still happen. --- ## Migration Strategy This phase is structured as **extract, unify, rewrite frontends** — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification. ### Step 1: Create the core types Add the new files without wiring them into anything: - `src/engine/mod.rs` — module root - `src/engine/engine.rs` — `Engine` struct + `run` method (initially `unimplemented!()`) - `src/engine/request.rs` — `RunRequest`, `UserInput`, `RunOptions`, `ContinuationKind`, `RunOutcome` - `src/engine/command.rs` — `CoreCommand` enum + sub-enums - `src/engine/error.rs` — `CoreError` enum - `src/engine/emitter.rs` — `Emitter` trait + `Event` enum + `EmitError` - `src/engine/emitters/mod.rs` — emitter module - `src/engine/emitters/null.rs` — `NullEmitter` (test stub) - `src/engine/emitters/collecting.rs` — `CollectingEmitter` (test stub) - `src/engine/emitters/terminal.rs` — `TerminalEmitter` (initially `unimplemented!()`) Register `pub mod engine;` in `src/main.rs`. Code compiles but nothing calls it yet. **Verification:** `cargo check` clean, `cargo test` passes. ### Step 2: Implement `TerminalEmitter` against existing render code Before wiring the engine, build the `TerminalEmitter` by wrapping today's `SseHandler` + `markdown_stream` + `raw_stream` + `MarkdownRender` + `Spinner` code. Don't change any of those modules — just construct a `TerminalEmitter` that holds the state they need and forwards `emit(Event::AssistantDelta(...))` into them. ```rust pub struct TerminalEmitter { render_state: Mutex, options: TerminalEmitterOptions, } pub struct TerminalEmitterOptions { pub highlight: bool, pub theme: Option, pub verbose_tool_calls: bool, pub show_spinner: bool, } impl TerminalEmitter { pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ } } ``` Implement `Emitter` for it, mapping each `Event` variant to the appropriate crossterm operation: | Event | TerminalEmitter action | |---|---| | `Started` | Start spinner | | `AssistantDelta(chunk)` | Stop spinner (if first), feed chunk into render state | | `AssistantMessageEnd { full_text }` | Flush render state, emit trailing newline | | `ToolCall { name, args }` | Print dimmed `⚙ Using ` banner if verbose | | `ToolResult { .. }` | Print dimmed result summary if verbose | | `AutoContinueTriggered` | Print yellow `⟳ Continuing (N/M, R todos remaining)` to stderr | | `SessionCompressing` | Print `Compressing session...` to stderr | | `SessionCompressed` | Print `Session compressed.` to stderr | | `SessionAutonamed` | Print `Session auto-named: ` to stderr | | `Info(msg)` | Print to stdout | | `Warning(msg)` | Print yellow to stderr | | `Error(e)` | Print red to stderr | | `Finished` | No-op (ensures trailing newline is flushed) | **Verification:** write integration tests that construct a `TerminalEmitter`, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use `assert_cmd` or similar to snapshot the rendered output of each event variant. ### Step 3: Implement `Engine::run` without wiring it Implement `Engine::run` and `Engine::run_turn` following the pseudocode above. Use the existing helper functions (`before_chat_completion`, `after_chat_completion`, `apply_prelude`, `create_client`, `call_chat_completions`, `call_chat_completions_streaming`, `maybe_compress_session`, `maybe_autoname_session`) unchanged, just called through `ctx` instead of `&GlobalConfig`. **Implementing `dispatch_command`** is the largest sub-task here because it needs to match all 37 `CoreCommand` variants and invoke the right `ctx` methods. Most variants are straightforward one-liners that call a corresponding method on `RequestContext`. A few need special handling: - `CoreCommand::UseRole { name, trailing_text }` — activate role, then if `trailing_text` is `Some`, the outer `run` will flow through with the trailing text as `UserInput.text`. - `CoreCommand::IncludeFiles` — reads files, converts to `FileInput` list, attaches to `ctx`'s next input (or fails if no input is provided). - `CoreCommand::StarterRun(id)` — looks up the starter text on the active agent, fails if no agent. - `CoreCommand::Macro` — delegates to `macro_execute`, which may itself call `Engine::run` internally for LLM-triggering macros. **Verification:** write unit tests for `dispatch_command` using `NullEmitter`. Each test activates a command and asserts the expected state mutation on `ctx`. This is ~37 tests, one per variant, and they catch the bulk of regressions early. Then write a handful of integration tests for `Engine::run` with `CollectingEmitter`, asserting the expected event sequence for: - Plain prompt, no tools, streaming - Plain prompt, no tools, non-streaming - Prompt that triggers 2 tool calls - Prompt that triggers auto-continuation (mock the LLM response) - Prompt on a session that crosses the compression threshold - Command-only request (`.info`) - Command + prompt request (`.role explain "..."`) ### Step 4: Wire CLI to `Engine::run` Replace `main.rs::start_directive` with a thin wrapper: ```rust async fn start_directive( app: Arc, ctx: &mut RequestContext, input_text: String, files: Vec, code_mode: bool, ) -> Result<()> { let engine = Engine::new(app.clone()); let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd); let req = RunRequest { input: Some(UserInput::from_text_and_files(input_text, files)), command: None, options: { let mut o = RunOptions::cli(); o.extract_code = code_mode && !*IS_STDOUT_TERMINAL; o }, }; match engine.run(ctx, req, &emitter).await { Ok(_outcome) => Ok(()), Err(CoreError::Cancelled) => Ok(()), Err(e) => Err(e.into()), } } ``` **Verification:** manual smoke test. Run `loki "hello"`, `loki --code "write a rust hello world"`, `loki --role explain "what is TCP"`. All should produce identical output to before the change. ### Step 5: Wire REPL to `Engine::run` Replace `repl/mod.rs::ask` with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls `run_repl_command` stays. `run_repl_command` for non-dot-command lines constructs a `RunRequest { input: Some(...), .. }` and calls `Engine::run`. Dot-commands get parsed into `CoreCommand` and called as `RunRequest { command: Some(...), input: None, .. }` (or with input if they carry trailing text). ```rust // In Repl: async fn handle_line(&mut self, line: &str) -> Result<()> { let req = if let Some(rest) = line.strip_prefix('.') { parse_dot_command_to_run_request(rest, &self.ctx)? } else { RunRequest { input: Some(UserInput::from_text(line.to_string())), command: None, options: RunOptions::repl_turn(), } }; match self.engine.run(&mut self.ctx, req, &self.emitter).await { Ok(_) => Ok(()), Err(CoreError::Cancelled) => Ok(()), Err(e) => { self.emitter.emit(Event::Error(&e)).await.ok(); Ok(()) } } } ``` **Verification:** manual smoke test of the REPL. Run through a typical session: 1. `loki` → REPL starts 2. `hello` → plain prompt works 3. `.role explain` → role activates 4. `what is TCP` → responds under the role 5. `.session` → session starts 6. Several messages → conversation continues 7. `.info session` → info prints 8. `.compress session` → compression runs 9. `.agent sisyphus` → agent activates with sub-agents 10. `write a hello world in rust` → tool calls + output 11. `.exit agent` → agent exits, previous session still active 12. `.exit` → REPL exits Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug. ### Step 6: Delete the old `start_directive` and `ask` Once CLI and REPL both route through `Engine::run` and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run `cargo check` and `cargo test`. **Verification:** full test suite green, no dead code warnings. ### Step 7: Tidy and document - Add rustdoc comments on `Engine`, `RunRequest`, `RunOptions`, `Emitter`, `Event`, `CoreCommand`, `CoreError`. - Add an `examples/` subdirectory under `src/engine/` showing how to call the engine with each emitter. - Update `docs/AGENTS.md` with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature). - Update `docs/REST-API-ARCHITECTURE.md` to remove any "in Phase 2" placeholders. --- ## Risks and Watch Items | Risk | Severity | Mitigation | |---|---|---| | **Terminal rendering regressions** | High | Golden-file snapshot tests for every `Event` variant. Manual smoke tests across all common REPL flows. Keep `TerminalEmitter` as a thin wrapper — no logic changes in the render code itself. | | **Auto-continuation recursion limits** | Medium | The new `Engine::run` uses `Box::pin` for the auto-continuation recursive call. Verify with a mock LLM that `max_auto_continues = 100` doesn't blow the stack. | | **Cancellation during tool execution** | Medium | Tool execution currently uses `AbortSignal`; the new path uses `CancellationToken`. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). | | **Command parsing fidelity** | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated `parse_dot_command_to_run_request` function with unit tests for every edge case found in today's code. | | **Macro execution recursion** | Medium | `.macro` can invoke LLM calls, which now go through `Engine::run`, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. | | **Emitter error propagation** | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first `EmitError::ClientDisconnected` — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. | | **Spinner interleaving with tool output** | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. | | **Feature flag: `auto_continue` in CLI** | Low | After Phase 2, CLI *could* support auto-continuation but it's not exposed. Decision: leave it off by default in `RunOptions::cli()`, add a `--auto-continue` flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. | --- ## What Phase 2 Does NOT Do - **No new features.** Everything that worked before works the same way after. - **No API server.** `JsonEmitter` and `SseEmitter` are placeholders — Phase 4 implements them. - **No `SessionStore` abstraction.** That's Phase 3. - **No `ToolScope` unification.** That landed in Phase 1 Step 6.5. - **No changes to LLM client code.** `call_chat_completions` and `call_chat_completions_streaming` keep their existing signatures. - **No MCP factory pooling.** That's Phase 5. - **No dot-command syntax changes.** The REPL still accepts exactly the same dot-commands; they just parse into `CoreCommand` instead of being hand-dispatched in `run_repl_command`. The sole goal of Phase 2 is: **extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.** --- ## Entry Criteria (from Phase 1) Before starting Phase 2, Phase 1 must be complete: - [ ] `GlobalConfig` type alias is removed - [ ] `AppState` and `RequestContext` are the only state holders - [ ] All 91 callsites in the original migration table have been updated - [ ] `cargo test` passes with no `Config`-based tests remaining - [ ] CLI and REPL manual smoke tests pass identically to pre-Phase-1 ## Exit Criteria (Phase 2 complete) - [ ] `src/engine/` module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError - [ ] `TerminalEmitter` implemented and wrapping all existing render paths - [ ] `NullEmitter` and `CollectingEmitter` implemented - [ ] `start_directive` in main.rs is a thin wrapper around `Engine::run` - [ ] REPL's per-line handler routes through `Engine::run` - [ ] All 37 `CoreCommand` variants implemented with unit tests - [ ] Integration tests for the 7 engine scenarios listed in Step 3 - [ ] Manual smoke tests for CLI and REPL match pre-Phase-2 behavior - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean - [ ] Phase 3 (SessionStore abstraction) can begin