Files
loki/docs/PHASE-2-IMPLEMENTATION-PLAN.md
2026-04-10 15:45:51 -06:00

728 lines
33 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 2 Implementation Plan: Engine + Emitter
## Overview
Phase 1 splits `Config` into `AppState` + `RequestContext`. Phase 2 takes the unified state and introduces the **Engine** — a single core function that replaces CLI's `start_directive()` and REPL's `ask()` — plus an **Emitter trait** that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call `Engine::run()` with different `Emitter` implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.
**Estimated effort:** ~1 week
**Risk:** Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior.
**Depends on:** Phase 1 Steps 010 complete (`GlobalConfig` eliminated, `RequestContext` wired through all entry points).
---
## Why Phase 2 Exists
Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:
1. **Streaming flag handling.** `start_directive` forces non-streaming when extracting code; `ask` never extracts code.
2. **Auto-continuation loop.** `ask` has complex logic for `auto_continue_count`, todo inspection, and continuation prompt injection. `start_directive` has none.
3. **Session compression.** `ask` triggers `maybe_compress_session` and awaits completion; `start_directive` never compresses.
4. **Session autoname.** `ask` calls `maybe_autoname_session` after each turn; `start_directive` doesn't.
5. **Cleanup on exit.** `start_directive` calls `exit_session()` at the end; `ask` lets the REPL loop handle it.
Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one `Engine::run()` that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., `auto_continue: bool` on `RunRequest`).
The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via `crossterm`. An `Emitter` implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.
---
## The Architecture After Phase 2
```
┌─────────┐ ┌─────────┐ ┌─────────┐
│ CLI │ │ REPL │ │ API │ (Phase 4)
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────┐
│ Engine::run(ctx, req, emitter) │
│ ┌────────────────────────────────────────────┐ │
│ │ 1. Apply CoreCommand (if any) │ │
│ │ 2. Build Input from req │ │
│ │ 3. apply_prelude (first turn only) │ │
│ │ 4. before_chat_completion │ │
│ │ 5. Stream or buffered LLM call │ │
│ │ ├─ emit Started │ │
│ │ ├─ emit AssistantDelta (per chunk) │ │
│ │ ├─ emit ToolCall │ │
│ │ ├─ execute tool │ │
│ │ ├─ emit ToolResult │ │
│ │ └─ loop on tool results │ │
│ │ 6. after_chat_completion │ │
│ │ 7. maybe_compress_session │ │
│ │ 8. maybe_autoname_session │ │
│ │ 9. Auto-continuation (if applicable) │ │
│ │ 10. emit Finished │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
TerminalEmitter TerminalEmitter JsonEmitter / SseEmitter
```
---
## Core Types
### `Engine`
```rust
pub struct Engine {
pub app: Arc<AppState>,
}
impl Engine {
pub fn new(app: Arc<AppState>) -> Self { Self { app } }
pub async fn run(
&self,
ctx: &mut RequestContext,
req: RunRequest,
emitter: &dyn Emitter,
) -> Result<RunOutcome, CoreError>;
}
```
`Engine` is intentionally a thin wrapper around `Arc<AppState>`. All per-turn state lives on `RequestContext`, so the engine itself has no per-call fields. This makes it cheap to clone and makes `Engine::run` trivially testable.
### `RunRequest`
```rust
pub struct RunRequest {
pub input: Option<UserInput>,
pub command: Option<CoreCommand>,
pub options: RunOptions,
}
pub struct UserInput {
pub text: String,
pub files: Vec<FileInput>,
pub media: Vec<MediaInput>,
pub continuation: Option<ContinuationKind>,
}
pub enum ContinuationKind {
Continue,
Regenerate,
}
pub struct RunOptions {
pub stream: Option<bool>,
pub extract_code: bool,
pub auto_continue: bool,
pub compress_session: bool,
pub autoname_session: bool,
pub apply_prelude: bool,
pub with_embeddings: bool,
pub cancel: CancellationToken,
}
impl RunOptions {
pub fn cli() -> Self { /* today's start_directive defaults */ }
pub fn repl_turn() -> Self { /* today's ask defaults */ }
pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
pub fn api_session() -> Self { /* API session defaults */ }
}
```
Two things to notice:
1. **`input` is `Option`.** A `RunRequest` can carry just a `command` (e.g., `.role explain`) with no user text, just an input (a plain prompt), or both (the `.role <name> <text>` form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path.
2. **`RunOptions` is the knob panel that replaces the five divergences.** CLI today has `auto_continue: false, compress_session: false, autoname_session: false`; REPL has all three `true`. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing `RunOptions::cli()` and flipping `auto_continue = true` — a capability that doesn't exist today.
### `CoreCommand`
```rust
pub enum CoreCommand {
// State setters
SetModel(String),
UsePrompt(String),
UseRole { name: String, trailing_text: Option<String> },
UseSession(Option<String>),
UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
UseRag(Option<String>),
// Exit commands
ExitRole,
ExitSession,
ExitRag,
ExitAgent,
// State queries
Info(InfoScope),
RagSources,
// Config mutation
Set { key: String, value: String },
// Session actions
CompressSession,
EmptySession,
SaveSession { name: Option<String> },
EditSession,
// Role actions
SaveRole { name: Option<String> },
EditRole,
// RAG actions
EditRagDocs,
RebuildRag,
// Agent actions
EditAgentConfig,
ClearTodo,
StarterList,
StarterRun(usize),
// File input shortcut
IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },
// Macro execution
Macro { name: String, args: Vec<String> },
// Vault
VaultAdd(String),
VaultGet(String),
VaultUpdate(String),
VaultDelete(String),
VaultList,
// Miscellaneous
EditConfig,
Authenticate,
Delete(DeleteKind),
Copy,
Help,
}
pub enum InfoScope {
System,
Role,
Session,
Rag,
Agent,
}
pub enum DeleteKind {
Role(String),
Session(String),
Rag(String),
Macro(String),
AgentData(String),
}
```
This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:
- **LLM-triggering commands** (`UsePrompt`, `UseRole` with trailing_text, `IncludeFiles` with trailing_text, `StarterRun`, `Macro` that contains LLM calls, and the continuation variants `Continue`/`Regenerate` expressed via `UserInput.continuation`) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as `RunRequest { command: Some(_), input: Some(_), .. }` — command runs first, then input flows through.
- **Asynchronous commands that return immediately** (`EditConfig`, `EditRole`, `EditRagDocs`, `EditAgentConfig`, most `Vault*`, `Delete`) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a `Result` event, and returns without invoking the LLM path.
- **Context-dependent commands** (`ClearTodo`, `StarterList`, `StarterRun`, `EditAgentConfig`, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a `CoreError::InvalidState { expected: "active agent" }` if the precondition fails.
### `Emitter` trait and `Event` enum
```rust
#[async_trait]
pub trait Emitter: Send + Sync {
async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
}
pub enum Event<'a> {
// Lifecycle
Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
Finished { outcome: &'a RunOutcome },
// Assistant output
AssistantDelta(&'a str),
AssistantMessageEnd { full_text: &'a str },
// Tool calls
ToolCall { id: &'a str, name: &'a str, args: &'a str },
ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },
// Auto-continuation
AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },
// Session lifecycle signals
SessionCompressing,
SessionCompressed { tokens_saved: Option<usize> },
SessionAutonamed(&'a str),
// Informational
Info(&'a str),
Warning(&'a str),
// Errors
Error(&'a CoreError),
}
pub enum EmitError {
ClientDisconnected,
WriteFailed(std::io::Error),
}
```
Three implementations ship in Phase 2; two are stubs, one is real:
- **`TerminalEmitter`** (real) — wraps today's `SseHandler``markdown_stream`/`raw_stream` path. This is the bulk of Phase 2's work; see "Terminal rendering details" below.
- **`NullEmitter`** (stub, for tests) — drops all events on the floor.
- **`CollectingEmitter`** (stub, for tests and future JSON API) — appends events to a `Vec<OwnedEvent>` for later inspection.
The `JsonEmitter` and `SseEmitter` implementations land in **Phase 4** when the API server comes online.
### `RunOutcome`
```rust
pub struct RunOutcome {
pub request_id: Uuid,
pub session_id: Option<SessionId>,
pub final_message: Option<String>,
pub tool_call_count: usize,
pub turns: usize,
pub compressed: bool,
pub autonamed: Option<String>,
pub auto_continued: usize,
}
```
`RunOutcome` is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.
### `CoreError`
```rust
pub enum CoreError {
InvalidRequest { msg: String },
InvalidState { expected: String, found: String },
NotFound { what: String, name: String },
Cancelled,
ProviderError { provider: String, msg: String },
ToolError { tool: String, msg: String },
EmitterError(EmitError),
Io(std::io::Error),
Other(anyhow::Error),
}
impl CoreError {
pub fn is_retryable(&self) -> bool { /* ... */ }
pub fn http_status(&self) -> u16 { /* for future API use */ }
pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
}
```
---
## Terminal Rendering Details
The `TerminalEmitter` is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:
**Today's flow:**
```
LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
├─ markdown_stream (if highlight)
└─ raw_stream (else)
```
Both `markdown_stream` and `raw_stream` write directly to stdout via `crossterm`, managing cursor positions, line clears, and incremental markdown parsing themselves.
**Target flow:**
```
LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
├─ (internal) markdown_stream state machine
└─ (internal) raw_stream state machine
```
The `TerminalEmitter` owns a `RefCell<StreamRenderState>` (or `Mutex` if we need `Send`) that wraps the existing `markdown_stream`/`raw_stream` state. Each `emit(AssistantDelta)` call feeds the chunk into this state machine exactly as `SseHandler`'s receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.
**Things that migrate 1:1 into `TerminalEmitter`:**
- Spinner start/stop on first delta
- Cursor positioning for line reprint during code block growth
- Syntax highlighting invocation via `MarkdownRender`
- Color/dim output for tool call banners
- Final newline + cursor reset on `AssistantMessageEnd`
**Things that the engine handles, not the emitter:**
- Tool call *execution* (still lives in the engine loop)
- Session state mutations (engine calls `before_chat_completion` / `after_chat_completion` on `RequestContext`)
- Auto-continuation decisions (engine inspects agent runtime)
- Compression and autoname decisions (engine)
**Things the emitter decides, not the engine:**
- Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a `verbose: bool` flag)
- How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
- Whether to show a spinner at all (disabled for non-TTY output)
**One gotcha:** today's `SseHandler` itself produces the `mpsc` channel that LLM clients push into. In the new model, `SseHandler` becomes an internal helper inside the engine's streaming path that converts `mpsc::Receiver<SseEvent>` into `Emitter::emit(Event::AssistantDelta(...))` calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.
---
## The Engine::run Pipeline
Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via `RunOptions`:
```rust
impl Engine {
pub async fn run(
&self,
ctx: &mut RequestContext,
req: RunRequest,
emitter: &dyn Emitter,
) -> Result<RunOutcome, CoreError> {
let request_id = Uuid::new_v4();
let mut outcome = RunOutcome::new(request_id);
emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;
// 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
if let Some(command) = req.command {
self.dispatch_command(ctx, command, emitter, &req.options).await?;
}
// 2. Early return if there's no user input (pure command)
let Some(user_input) = req.input else {
emitter.emit(Event::Finished { outcome: &outcome }).await?;
return Ok(outcome);
};
// 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
if req.options.apply_prelude && !ctx.prelude_applied {
apply_prelude(ctx, &req.options.cancel).await?;
ctx.prelude_applied = true;
}
// 4. Build Input from user_input + ctx
let input = build_input(ctx, user_input, &req.options).await?;
// 5. Wait for any in-progress compression to finish (REPL-style block)
while ctx.is_compressing_session() {
tokio::time::sleep(Duration::from_millis(100)).await;
}
// 6. Enter the turn loop
self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;
// 7. Maybe compress session
if req.options.compress_session && ctx.session_needs_compression() {
emitter.emit(Event::SessionCompressing).await?;
compress_session(ctx).await?;
outcome.compressed = true;
emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
}
// 8. Maybe autoname session
if req.options.autoname_session {
if let Some(name) = maybe_autoname_session(ctx).await? {
outcome.autonamed = Some(name.clone());
emitter.emit(Event::SessionAutonamed(&name)).await?;
}
}
// 9. Auto-continuation (agents only)
if req.options.auto_continue {
if let Some(continuation) = self.check_auto_continue(ctx) {
emitter.emit(Event::AutoContinueTriggered { .. }).await?;
outcome.auto_continued += 1;
// Recursive call with continuation prompt
let next_req = RunRequest {
input: Some(UserInput::from_continuation(continuation)),
command: None,
options: req.options.clone(),
};
return Box::pin(self.run(ctx, next_req, emitter)).await;
}
}
emitter.emit(Event::Finished { outcome: &outcome }).await?;
Ok(outcome)
}
async fn run_turn(
&self,
ctx: &mut RequestContext,
mut input: Input,
options: &RunOptions,
emitter: &dyn Emitter,
outcome: &mut RunOutcome,
) -> Result<(), CoreError> {
loop {
outcome.turns += 1;
before_chat_completion(ctx, &input);
let client = input.create_client(ctx)?;
let (output, tool_results) = if should_stream(&input, options) {
stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
} else {
buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
};
after_chat_completion(ctx, &input, &output, &tool_results);
outcome.tool_call_count += tool_results.len();
if tool_results.is_empty() {
outcome.final_message = Some(output);
return Ok(());
}
// Emit each tool call and result
for result in &tool_results {
emitter.emit(Event::ToolCall { .. }).await?;
emitter.emit(Event::ToolResult { .. }).await?;
}
// Loop: feed tool results back in
input = input.merge_tool_results(output, tool_results);
}
}
}
```
**Key design decisions in this pipeline:**
1. **Command dispatch happens first.** A `RunRequest` that carries both a command and input runs the command first (mutating `ctx`), then the input flows through the now-updated context. This lets `.role explain "tell me about X"` work as a single atomic operation — the role is activated, then the prompt is sent under the new role.
2. **Tool loop is iterative, not recursive.** Today both `start_directive` and `ask` recursively call themselves after tool results. The new `run_turn` uses a `loop` instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation.
3. **Cancellation is checked at every await point.** `options.cancel: CancellationToken` is threaded into every async call. On cancellation, the engine emits `Event::Error(CoreError::Cancelled)` and returns. Today's `AbortSignal` pattern gets wrapped in a `CancellationToken` adapter during the migration.
4. **Session state hooks fire at the same points as today.** `before_chat_completion` and `after_chat_completion` continue to exist on `RequestContext`, called from the same places in the same order. The refactor doesn't change their semantics.
5. **Emitter errors don't abort the run.** If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The `EmitError::ClientDisconnected` case is special-cased to swallow subsequent emits. Session save + tool execution still happen.
---
## Migration Strategy
This phase is structured as **extract, unify, rewrite frontends** — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.
### Step 1: Create the core types
Add the new files without wiring them into anything:
- `src/engine/mod.rs` — module root
- `src/engine/engine.rs``Engine` struct + `run` method (initially `unimplemented!()`)
- `src/engine/request.rs``RunRequest`, `UserInput`, `RunOptions`, `ContinuationKind`, `RunOutcome`
- `src/engine/command.rs``CoreCommand` enum + sub-enums
- `src/engine/error.rs``CoreError` enum
- `src/engine/emitter.rs``Emitter` trait + `Event` enum + `EmitError`
- `src/engine/emitters/mod.rs` — emitter module
- `src/engine/emitters/null.rs``NullEmitter` (test stub)
- `src/engine/emitters/collecting.rs``CollectingEmitter` (test stub)
- `src/engine/emitters/terminal.rs``TerminalEmitter` (initially `unimplemented!()`)
Register `pub mod engine;` in `src/main.rs`. Code compiles but nothing calls it yet.
**Verification:** `cargo check` clean, `cargo test` passes.
### Step 2: Implement `TerminalEmitter` against existing render code
Before wiring the engine, build the `TerminalEmitter` by wrapping today's `SseHandler` + `markdown_stream` + `raw_stream` + `MarkdownRender` + `Spinner` code. Don't change any of those modules — just construct a `TerminalEmitter` that holds the state they need and forwards `emit(Event::AssistantDelta(...))` into them.
```rust
pub struct TerminalEmitter {
render_state: Mutex<StreamRenderState>,
options: TerminalEmitterOptions,
}
pub struct TerminalEmitterOptions {
pub highlight: bool,
pub theme: Option<String>,
pub verbose_tool_calls: bool,
pub show_spinner: bool,
}
impl TerminalEmitter {
pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
}
```
Implement `Emitter` for it, mapping each `Event` variant to the appropriate crossterm operation:
| Event | TerminalEmitter action |
|---|---|
| `Started` | Start spinner |
| `AssistantDelta(chunk)` | Stop spinner (if first), feed chunk into render state |
| `AssistantMessageEnd { full_text }` | Flush render state, emit trailing newline |
| `ToolCall { name, args }` | Print dimmed `⚙ Using <name>` banner if verbose |
| `ToolResult { .. }` | Print dimmed result summary if verbose |
| `AutoContinueTriggered` | Print yellow `⟳ Continuing (N/M, R todos remaining)` to stderr |
| `SessionCompressing` | Print `Compressing session...` to stderr |
| `SessionCompressed` | Print `Session compressed.` to stderr |
| `SessionAutonamed` | Print `Session auto-named: <name>` to stderr |
| `Info(msg)` | Print to stdout |
| `Warning(msg)` | Print yellow to stderr |
| `Error(e)` | Print red to stderr |
| `Finished` | No-op (ensures trailing newline is flushed) |
**Verification:** write integration tests that construct a `TerminalEmitter`, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use `assert_cmd` or similar to snapshot the rendered output of each event variant.
### Step 3: Implement `Engine::run` without wiring it
Implement `Engine::run` and `Engine::run_turn` following the pseudocode above. Use the existing helper functions (`before_chat_completion`, `after_chat_completion`, `apply_prelude`, `create_client`, `call_chat_completions`, `call_chat_completions_streaming`, `maybe_compress_session`, `maybe_autoname_session`) unchanged, just called through `ctx` instead of `&GlobalConfig`.
**Implementing `dispatch_command`** is the largest sub-task here because it needs to match all 37 `CoreCommand` variants and invoke the right `ctx` methods. Most variants are straightforward one-liners that call a corresponding method on `RequestContext`. A few need special handling:
- `CoreCommand::UseRole { name, trailing_text }` — activate role, then if `trailing_text` is `Some`, the outer `run` will flow through with the trailing text as `UserInput.text`.
- `CoreCommand::IncludeFiles` — reads files, converts to `FileInput` list, attaches to `ctx`'s next input (or fails if no input is provided).
- `CoreCommand::StarterRun(id)` — looks up the starter text on the active agent, fails if no agent.
- `CoreCommand::Macro` — delegates to `macro_execute`, which may itself call `Engine::run` internally for LLM-triggering macros.
**Verification:** write unit tests for `dispatch_command` using `NullEmitter`. Each test activates a command and asserts the expected state mutation on `ctx`. This is ~37 tests, one per variant, and they catch the bulk of regressions early.
Then write a handful of integration tests for `Engine::run` with `CollectingEmitter`, asserting the expected event sequence for:
- Plain prompt, no tools, streaming
- Plain prompt, no tools, non-streaming
- Prompt that triggers 2 tool calls
- Prompt that triggers auto-continuation (mock the LLM response)
- Prompt on a session that crosses the compression threshold
- Command-only request (`.info`)
- Command + prompt request (`.role explain "..."`)
### Step 4: Wire CLI to `Engine::run`
Replace `main.rs::start_directive` with a thin wrapper:
```rust
async fn start_directive(
app: Arc<AppState>,
ctx: &mut RequestContext,
input_text: String,
files: Vec<String>,
code_mode: bool,
) -> Result<()> {
let engine = Engine::new(app.clone());
let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);
let req = RunRequest {
input: Some(UserInput::from_text_and_files(input_text, files)),
command: None,
options: {
let mut o = RunOptions::cli();
o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
o
},
};
match engine.run(ctx, req, &emitter).await {
Ok(_outcome) => Ok(()),
Err(CoreError::Cancelled) => Ok(()),
Err(e) => Err(e.into()),
}
}
```
**Verification:** manual smoke test. Run `loki "hello"`, `loki --code "write a rust hello world"`, `loki --role explain "what is TCP"`. All should produce identical output to before the change.
### Step 5: Wire REPL to `Engine::run`
Replace `repl/mod.rs::ask` with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls `run_repl_command` stays. `run_repl_command` for non-dot-command lines constructs a `RunRequest { input: Some(...), .. }` and calls `Engine::run`. Dot-commands get parsed into `CoreCommand` and called as `RunRequest { command: Some(...), input: None, .. }` (or with input if they carry trailing text).
```rust
// In Repl:
async fn handle_line(&mut self, line: &str) -> Result<()> {
let req = if let Some(rest) = line.strip_prefix('.') {
parse_dot_command_to_run_request(rest, &self.ctx)?
} else {
RunRequest {
input: Some(UserInput::from_text(line.to_string())),
command: None,
options: RunOptions::repl_turn(),
}
};
match self.engine.run(&mut self.ctx, req, &self.emitter).await {
Ok(_) => Ok(()),
Err(CoreError::Cancelled) => Ok(()),
Err(e) => {
self.emitter.emit(Event::Error(&e)).await.ok();
Ok(())
}
}
}
```
**Verification:** manual smoke test of the REPL. Run through a typical session:
1. `loki` → REPL starts
2. `hello` → plain prompt works
3. `.role explain` → role activates
4. `what is TCP` → responds under the role
5. `.session` → session starts
6. Several messages → conversation continues
7. `.info session` → info prints
8. `.compress session` → compression runs
9. `.agent sisyphus` → agent activates with sub-agents
10. `write a hello world in rust` → tool calls + output
11. `.exit agent` → agent exits, previous session still active
12. `.exit` → REPL exits
Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.
### Step 6: Delete the old `start_directive` and `ask`
Once CLI and REPL both route through `Engine::run` and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run `cargo check` and `cargo test`.
**Verification:** full test suite green, no dead code warnings.
### Step 7: Tidy and document
- Add rustdoc comments on `Engine`, `RunRequest`, `RunOptions`, `Emitter`, `Event`, `CoreCommand`, `CoreError`.
- Add an `examples/` subdirectory under `src/engine/` showing how to call the engine with each emitter.
- Update `docs/AGENTS.md` with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature).
- Update `docs/REST-API-ARCHITECTURE.md` to remove any "in Phase 2" placeholders.
---
## Risks and Watch Items
| Risk | Severity | Mitigation |
|---|---|---|
| **Terminal rendering regressions** | High | Golden-file snapshot tests for every `Event` variant. Manual smoke tests across all common REPL flows. Keep `TerminalEmitter` as a thin wrapper — no logic changes in the render code itself. |
| **Auto-continuation recursion limits** | Medium | The new `Engine::run` uses `Box::pin` for the auto-continuation recursive call. Verify with a mock LLM that `max_auto_continues = 100` doesn't blow the stack. |
| **Cancellation during tool execution** | Medium | Tool execution currently uses `AbortSignal`; the new path uses `CancellationToken`. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). |
| **Command parsing fidelity** | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated `parse_dot_command_to_run_request` function with unit tests for every edge case found in today's code. |
| **Macro execution recursion** | Medium | `.macro` can invoke LLM calls, which now go through `Engine::run`, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. |
| **Emitter error propagation** | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first `EmitError::ClientDisconnected` — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. |
| **Spinner interleaving with tool output** | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. |
| **Feature flag: `auto_continue` in CLI** | Low | After Phase 2, CLI *could* support auto-continuation but it's not exposed. Decision: leave it off by default in `RunOptions::cli()`, add a `--auto-continue` flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. |
---
## What Phase 2 Does NOT Do
- **No new features.** Everything that worked before works the same way after.
- **No API server.** `JsonEmitter` and `SseEmitter` are placeholders — Phase 4 implements them.
- **No `SessionStore` abstraction.** That's Phase 3.
- **No `ToolScope` unification.** That landed in Phase 1 Step 6.5.
- **No changes to LLM client code.** `call_chat_completions` and `call_chat_completions_streaming` keep their existing signatures.
- **No MCP factory pooling.** That's Phase 5.
- **No dot-command syntax changes.** The REPL still accepts exactly the same dot-commands; they just parse into `CoreCommand` instead of being hand-dispatched in `run_repl_command`.
The sole goal of Phase 2 is: **extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.**
---
## Entry Criteria (from Phase 1)
Before starting Phase 2, Phase 1 must be complete:
- [ ] `GlobalConfig` type alias is removed
- [ ] `AppState` and `RequestContext` are the only state holders
- [ ] All 91 callsites in the original migration table have been updated
- [ ] `cargo test` passes with no `Config`-based tests remaining
- [ ] CLI and REPL manual smoke tests pass identically to pre-Phase-1
## Exit Criteria (Phase 2 complete)
- [ ] `src/engine/` module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError
- [ ] `TerminalEmitter` implemented and wrapping all existing render paths
- [ ] `NullEmitter` and `CollectingEmitter` implemented
- [ ] `start_directive` in main.rs is a thin wrapper around `Engine::run`
- [ ] REPL's per-line handler routes through `Engine::run`
- [ ] All 37 `CoreCommand` variants implemented with unit tests
- [ ] Integration tests for the 7 engine scenarios listed in Step 3
- [ ] Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
- [ ] Phase 3 (SessionStore abstraction) can begin