This commit is contained in:
2026-04-10 15:45:51 -06:00
parent ff3419a714
commit e9e6b82e24
42 changed files with 11578 additions and 358 deletions
+727
View File
@@ -0,0 +1,727 @@
# Phase 2 Implementation Plan: Engine + Emitter
## Overview
Phase 1 splits `Config` into `AppState` + `RequestContext`. Phase 2 takes the unified state and introduces the **Engine** — a single core function that replaces CLI's `start_directive()` and REPL's `ask()` — plus an **Emitter trait** that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call `Engine::run()` with different `Emitter` implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.
**Estimated effort:** ~1 week
**Risk:** Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior.
**Depends on:** Phase 1 Steps 010 complete (`GlobalConfig` eliminated, `RequestContext` wired through all entry points).
---
## Why Phase 2 Exists
Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:
1. **Streaming flag handling.** `start_directive` forces non-streaming when extracting code; `ask` never extracts code.
2. **Auto-continuation loop.** `ask` has complex logic for `auto_continue_count`, todo inspection, and continuation prompt injection. `start_directive` has none.
3. **Session compression.** `ask` triggers `maybe_compress_session` and awaits completion; `start_directive` never compresses.
4. **Session autoname.** `ask` calls `maybe_autoname_session` after each turn; `start_directive` doesn't.
5. **Cleanup on exit.** `start_directive` calls `exit_session()` at the end; `ask` lets the REPL loop handle it.
Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one `Engine::run()` that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., `auto_continue: bool` on `RunRequest`).
The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via `crossterm`. An `Emitter` implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.
---
## The Architecture After Phase 2
```
┌─────────┐ ┌─────────┐ ┌─────────┐
│ CLI │ │ REPL │ │ API │ (Phase 4)
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────┐
│ Engine::run(ctx, req, emitter) │
│ ┌────────────────────────────────────────────┐ │
│ │ 1. Apply CoreCommand (if any) │ │
│ │ 2. Build Input from req │ │
│ │ 3. apply_prelude (first turn only) │ │
│ │ 4. before_chat_completion │ │
│ │ 5. Stream or buffered LLM call │ │
│ │ ├─ emit Started │ │
│ │ ├─ emit AssistantDelta (per chunk) │ │
│ │ ├─ emit ToolCall │ │
│ │ ├─ execute tool │ │
│ │ ├─ emit ToolResult │ │
│ │ └─ loop on tool results │ │
│ │ 6. after_chat_completion │ │
│ │ 7. maybe_compress_session │ │
│ │ 8. maybe_autoname_session │ │
│ │ 9. Auto-continuation (if applicable) │ │
│ │ 10. emit Finished │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
TerminalEmitter TerminalEmitter JsonEmitter / SseEmitter
```
---
## Core Types
### `Engine`
```rust
pub struct Engine {
pub app: Arc<AppState>,
}
impl Engine {
pub fn new(app: Arc<AppState>) -> Self { Self { app } }
pub async fn run(
&self,
ctx: &mut RequestContext,
req: RunRequest,
emitter: &dyn Emitter,
) -> Result<RunOutcome, CoreError>;
}
```
`Engine` is intentionally a thin wrapper around `Arc<AppState>`. All per-turn state lives on `RequestContext`, so the engine itself has no per-call fields. This makes it cheap to clone and makes `Engine::run` trivially testable.
### `RunRequest`
```rust
pub struct RunRequest {
pub input: Option<UserInput>,
pub command: Option<CoreCommand>,
pub options: RunOptions,
}
pub struct UserInput {
pub text: String,
pub files: Vec<FileInput>,
pub media: Vec<MediaInput>,
pub continuation: Option<ContinuationKind>,
}
pub enum ContinuationKind {
Continue,
Regenerate,
}
pub struct RunOptions {
pub stream: Option<bool>,
pub extract_code: bool,
pub auto_continue: bool,
pub compress_session: bool,
pub autoname_session: bool,
pub apply_prelude: bool,
pub with_embeddings: bool,
pub cancel: CancellationToken,
}
impl RunOptions {
pub fn cli() -> Self { /* today's start_directive defaults */ }
pub fn repl_turn() -> Self { /* today's ask defaults */ }
pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
pub fn api_session() -> Self { /* API session defaults */ }
}
```
Two things to notice:
1. **`input` is `Option`.** A `RunRequest` can carry just a `command` (e.g., `.role explain`) with no user text, just an input (a plain prompt), or both (the `.role <name> <text>` form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path.
2. **`RunOptions` is the knob panel that replaces the five divergences.** CLI today has `auto_continue: false, compress_session: false, autoname_session: false`; REPL has all three `true`. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing `RunOptions::cli()` and flipping `auto_continue = true` — a capability that doesn't exist today.
### `CoreCommand`
```rust
pub enum CoreCommand {
// State setters
SetModel(String),
UsePrompt(String),
UseRole { name: String, trailing_text: Option<String> },
UseSession(Option<String>),
UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
UseRag(Option<String>),
// Exit commands
ExitRole,
ExitSession,
ExitRag,
ExitAgent,
// State queries
Info(InfoScope),
RagSources,
// Config mutation
Set { key: String, value: String },
// Session actions
CompressSession,
EmptySession,
SaveSession { name: Option<String> },
EditSession,
// Role actions
SaveRole { name: Option<String> },
EditRole,
// RAG actions
EditRagDocs,
RebuildRag,
// Agent actions
EditAgentConfig,
ClearTodo,
StarterList,
StarterRun(usize),
// File input shortcut
IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },
// Macro execution
Macro { name: String, args: Vec<String> },
// Vault
VaultAdd(String),
VaultGet(String),
VaultUpdate(String),
VaultDelete(String),
VaultList,
// Miscellaneous
EditConfig,
Authenticate,
Delete(DeleteKind),
Copy,
Help,
}
pub enum InfoScope {
System,
Role,
Session,
Rag,
Agent,
}
pub enum DeleteKind {
Role(String),
Session(String),
Rag(String),
Macro(String),
AgentData(String),
}
```
This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:
- **LLM-triggering commands** (`UsePrompt`, `UseRole` with trailing_text, `IncludeFiles` with trailing_text, `StarterRun`, `Macro` that contains LLM calls, and the continuation variants `Continue`/`Regenerate` expressed via `UserInput.continuation`) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as `RunRequest { command: Some(_), input: Some(_), .. }` — command runs first, then input flows through.
- **Asynchronous commands that return immediately** (`EditConfig`, `EditRole`, `EditRagDocs`, `EditAgentConfig`, most `Vault*`, `Delete`) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a `Result` event, and returns without invoking the LLM path.
- **Context-dependent commands** (`ClearTodo`, `StarterList`, `StarterRun`, `EditAgentConfig`, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a `CoreError::InvalidState { expected: "active agent" }` if the precondition fails.
### `Emitter` trait and `Event` enum
```rust
#[async_trait]
pub trait Emitter: Send + Sync {
async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
}
pub enum Event<'a> {
// Lifecycle
Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
Finished { outcome: &'a RunOutcome },
// Assistant output
AssistantDelta(&'a str),
AssistantMessageEnd { full_text: &'a str },
// Tool calls
ToolCall { id: &'a str, name: &'a str, args: &'a str },
ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },
// Auto-continuation
AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },
// Session lifecycle signals
SessionCompressing,
SessionCompressed { tokens_saved: Option<usize> },
SessionAutonamed(&'a str),
// Informational
Info(&'a str),
Warning(&'a str),
// Errors
Error(&'a CoreError),
}
pub enum EmitError {
ClientDisconnected,
WriteFailed(std::io::Error),
}
```
Three implementations ship in Phase 2; two are stubs, one is real:
- **`TerminalEmitter`** (real) — wraps today's `SseHandler``markdown_stream`/`raw_stream` path. This is the bulk of Phase 2's work; see "Terminal rendering details" below.
- **`NullEmitter`** (stub, for tests) — drops all events on the floor.
- **`CollectingEmitter`** (stub, for tests and future JSON API) — appends events to a `Vec<OwnedEvent>` for later inspection.
The `JsonEmitter` and `SseEmitter` implementations land in **Phase 4** when the API server comes online.
### `RunOutcome`
```rust
pub struct RunOutcome {
pub request_id: Uuid,
pub session_id: Option<SessionId>,
pub final_message: Option<String>,
pub tool_call_count: usize,
pub turns: usize,
pub compressed: bool,
pub autonamed: Option<String>,
pub auto_continued: usize,
}
```
`RunOutcome` is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.
### `CoreError`
```rust
pub enum CoreError {
InvalidRequest { msg: String },
InvalidState { expected: String, found: String },
NotFound { what: String, name: String },
Cancelled,
ProviderError { provider: String, msg: String },
ToolError { tool: String, msg: String },
EmitterError(EmitError),
Io(std::io::Error),
Other(anyhow::Error),
}
impl CoreError {
pub fn is_retryable(&self) -> bool { /* ... */ }
pub fn http_status(&self) -> u16 { /* for future API use */ }
pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
}
```
---
## Terminal Rendering Details
The `TerminalEmitter` is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:
**Today's flow:**
```
LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
├─ markdown_stream (if highlight)
└─ raw_stream (else)
```
Both `markdown_stream` and `raw_stream` write directly to stdout via `crossterm`, managing cursor positions, line clears, and incremental markdown parsing themselves.
**Target flow:**
```
LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
├─ (internal) markdown_stream state machine
└─ (internal) raw_stream state machine
```
The `TerminalEmitter` owns a `RefCell<StreamRenderState>` (or `Mutex` if we need `Send`) that wraps the existing `markdown_stream`/`raw_stream` state. Each `emit(AssistantDelta)` call feeds the chunk into this state machine exactly as `SseHandler`'s receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.
**Things that migrate 1:1 into `TerminalEmitter`:**
- Spinner start/stop on first delta
- Cursor positioning for line reprint during code block growth
- Syntax highlighting invocation via `MarkdownRender`
- Color/dim output for tool call banners
- Final newline + cursor reset on `AssistantMessageEnd`
**Things that the engine handles, not the emitter:**
- Tool call *execution* (still lives in the engine loop)
- Session state mutations (engine calls `before_chat_completion` / `after_chat_completion` on `RequestContext`)
- Auto-continuation decisions (engine inspects agent runtime)
- Compression and autoname decisions (engine)
**Things the emitter decides, not the engine:**
- Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a `verbose: bool` flag)
- How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
- Whether to show a spinner at all (disabled for non-TTY output)
**One gotcha:** today's `SseHandler` itself produces the `mpsc` channel that LLM clients push into. In the new model, `SseHandler` becomes an internal helper inside the engine's streaming path that converts `mpsc::Receiver<SseEvent>` into `Emitter::emit(Event::AssistantDelta(...))` calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.
---
## The Engine::run Pipeline
Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via `RunOptions`:
```rust
impl Engine {
pub async fn run(
&self,
ctx: &mut RequestContext,
req: RunRequest,
emitter: &dyn Emitter,
) -> Result<RunOutcome, CoreError> {
let request_id = Uuid::new_v4();
let mut outcome = RunOutcome::new(request_id);
emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;
// 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
if let Some(command) = req.command {
self.dispatch_command(ctx, command, emitter, &req.options).await?;
}
// 2. Early return if there's no user input (pure command)
let Some(user_input) = req.input else {
emitter.emit(Event::Finished { outcome: &outcome }).await?;
return Ok(outcome);
};
// 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
if req.options.apply_prelude && !ctx.prelude_applied {
apply_prelude(ctx, &req.options.cancel).await?;
ctx.prelude_applied = true;
}
// 4. Build Input from user_input + ctx
let input = build_input(ctx, user_input, &req.options).await?;
// 5. Wait for any in-progress compression to finish (REPL-style block)
while ctx.is_compressing_session() {
tokio::time::sleep(Duration::from_millis(100)).await;
}
// 6. Enter the turn loop
self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;
// 7. Maybe compress session
if req.options.compress_session && ctx.session_needs_compression() {
emitter.emit(Event::SessionCompressing).await?;
compress_session(ctx).await?;
outcome.compressed = true;
emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
}
// 8. Maybe autoname session
if req.options.autoname_session {
if let Some(name) = maybe_autoname_session(ctx).await? {
outcome.autonamed = Some(name.clone());
emitter.emit(Event::SessionAutonamed(&name)).await?;
}
}
// 9. Auto-continuation (agents only)
if req.options.auto_continue {
if let Some(continuation) = self.check_auto_continue(ctx) {
emitter.emit(Event::AutoContinueTriggered { .. }).await?;
outcome.auto_continued += 1;
// Recursive call with continuation prompt
let next_req = RunRequest {
input: Some(UserInput::from_continuation(continuation)),
command: None,
options: req.options.clone(),
};
return Box::pin(self.run(ctx, next_req, emitter)).await;
}
}
emitter.emit(Event::Finished { outcome: &outcome }).await?;
Ok(outcome)
}
async fn run_turn(
&self,
ctx: &mut RequestContext,
mut input: Input,
options: &RunOptions,
emitter: &dyn Emitter,
outcome: &mut RunOutcome,
) -> Result<(), CoreError> {
loop {
outcome.turns += 1;
before_chat_completion(ctx, &input);
let client = input.create_client(ctx)?;
let (output, tool_results) = if should_stream(&input, options) {
stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
} else {
buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
};
after_chat_completion(ctx, &input, &output, &tool_results);
outcome.tool_call_count += tool_results.len();
if tool_results.is_empty() {
outcome.final_message = Some(output);
return Ok(());
}
// Emit each tool call and result
for result in &tool_results {
emitter.emit(Event::ToolCall { .. }).await?;
emitter.emit(Event::ToolResult { .. }).await?;
}
// Loop: feed tool results back in
input = input.merge_tool_results(output, tool_results);
}
}
}
```
**Key design decisions in this pipeline:**
1. **Command dispatch happens first.** A `RunRequest` that carries both a command and input runs the command first (mutating `ctx`), then the input flows through the now-updated context. This lets `.role explain "tell me about X"` work as a single atomic operation — the role is activated, then the prompt is sent under the new role.
2. **Tool loop is iterative, not recursive.** Today both `start_directive` and `ask` recursively call themselves after tool results. The new `run_turn` uses a `loop` instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation.
3. **Cancellation is checked at every await point.** `options.cancel: CancellationToken` is threaded into every async call. On cancellation, the engine emits `Event::Error(CoreError::Cancelled)` and returns. Today's `AbortSignal` pattern gets wrapped in a `CancellationToken` adapter during the migration.
4. **Session state hooks fire at the same points as today.** `before_chat_completion` and `after_chat_completion` continue to exist on `RequestContext`, called from the same places in the same order. The refactor doesn't change their semantics.
5. **Emitter errors don't abort the run.** If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The `EmitError::ClientDisconnected` case is special-cased to swallow subsequent emits. Session save + tool execution still happen.
---
## Migration Strategy
This phase is structured as **extract, unify, rewrite frontends** — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.
### Step 1: Create the core types
Add the new files without wiring them into anything:
- `src/engine/mod.rs` — module root
- `src/engine/engine.rs``Engine` struct + `run` method (initially `unimplemented!()`)
- `src/engine/request.rs``RunRequest`, `UserInput`, `RunOptions`, `ContinuationKind`, `RunOutcome`
- `src/engine/command.rs``CoreCommand` enum + sub-enums
- `src/engine/error.rs``CoreError` enum
- `src/engine/emitter.rs``Emitter` trait + `Event` enum + `EmitError`
- `src/engine/emitters/mod.rs` — emitter module
- `src/engine/emitters/null.rs``NullEmitter` (test stub)
- `src/engine/emitters/collecting.rs``CollectingEmitter` (test stub)
- `src/engine/emitters/terminal.rs``TerminalEmitter` (initially `unimplemented!()`)
Register `pub mod engine;` in `src/main.rs`. Code compiles but nothing calls it yet.
**Verification:** `cargo check` clean, `cargo test` passes.
### Step 2: Implement `TerminalEmitter` against existing render code
Before wiring the engine, build the `TerminalEmitter` by wrapping today's `SseHandler` + `markdown_stream` + `raw_stream` + `MarkdownRender` + `Spinner` code. Don't change any of those modules — just construct a `TerminalEmitter` that holds the state they need and forwards `emit(Event::AssistantDelta(...))` into them.
```rust
pub struct TerminalEmitter {
render_state: Mutex<StreamRenderState>,
options: TerminalEmitterOptions,
}
pub struct TerminalEmitterOptions {
pub highlight: bool,
pub theme: Option<String>,
pub verbose_tool_calls: bool,
pub show_spinner: bool,
}
impl TerminalEmitter {
pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
}
```
Implement `Emitter` for it, mapping each `Event` variant to the appropriate crossterm operation:
| Event | TerminalEmitter action |
|---|---|
| `Started` | Start spinner |
| `AssistantDelta(chunk)` | Stop spinner (if first), feed chunk into render state |
| `AssistantMessageEnd { full_text }` | Flush render state, emit trailing newline |
| `ToolCall { name, args }` | Print dimmed `⚙ Using <name>` banner if verbose |
| `ToolResult { .. }` | Print dimmed result summary if verbose |
| `AutoContinueTriggered` | Print yellow `⟳ Continuing (N/M, R todos remaining)` to stderr |
| `SessionCompressing` | Print `Compressing session...` to stderr |
| `SessionCompressed` | Print `Session compressed.` to stderr |
| `SessionAutonamed` | Print `Session auto-named: <name>` to stderr |
| `Info(msg)` | Print to stdout |
| `Warning(msg)` | Print yellow to stderr |
| `Error(e)` | Print red to stderr |
| `Finished` | No-op (ensures trailing newline is flushed) |
**Verification:** write integration tests that construct a `TerminalEmitter`, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use `assert_cmd` or similar to snapshot the rendered output of each event variant.
### Step 3: Implement `Engine::run` without wiring it
Implement `Engine::run` and `Engine::run_turn` following the pseudocode above. Use the existing helper functions (`before_chat_completion`, `after_chat_completion`, `apply_prelude`, `create_client`, `call_chat_completions`, `call_chat_completions_streaming`, `maybe_compress_session`, `maybe_autoname_session`) unchanged, just called through `ctx` instead of `&GlobalConfig`.
**Implementing `dispatch_command`** is the largest sub-task here because it needs to match all 37 `CoreCommand` variants and invoke the right `ctx` methods. Most variants are straightforward one-liners that call a corresponding method on `RequestContext`. A few need special handling:
- `CoreCommand::UseRole { name, trailing_text }` — activate role, then if `trailing_text` is `Some`, the outer `run` will flow through with the trailing text as `UserInput.text`.
- `CoreCommand::IncludeFiles` — reads files, converts to `FileInput` list, attaches to `ctx`'s next input (or fails if no input is provided).
- `CoreCommand::StarterRun(id)` — looks up the starter text on the active agent, fails if no agent.
- `CoreCommand::Macro` — delegates to `macro_execute`, which may itself call `Engine::run` internally for LLM-triggering macros.
**Verification:** write unit tests for `dispatch_command` using `NullEmitter`. Each test activates a command and asserts the expected state mutation on `ctx`. This is ~37 tests, one per variant, and they catch the bulk of regressions early.
Then write a handful of integration tests for `Engine::run` with `CollectingEmitter`, asserting the expected event sequence for:
- Plain prompt, no tools, streaming
- Plain prompt, no tools, non-streaming
- Prompt that triggers 2 tool calls
- Prompt that triggers auto-continuation (mock the LLM response)
- Prompt on a session that crosses the compression threshold
- Command-only request (`.info`)
- Command + prompt request (`.role explain "..."`)
### Step 4: Wire CLI to `Engine::run`
Replace `main.rs::start_directive` with a thin wrapper:
```rust
async fn start_directive(
app: Arc<AppState>,
ctx: &mut RequestContext,
input_text: String,
files: Vec<String>,
code_mode: bool,
) -> Result<()> {
let engine = Engine::new(app.clone());
let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);
let req = RunRequest {
input: Some(UserInput::from_text_and_files(input_text, files)),
command: None,
options: {
let mut o = RunOptions::cli();
o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
o
},
};
match engine.run(ctx, req, &emitter).await {
Ok(_outcome) => Ok(()),
Err(CoreError::Cancelled) => Ok(()),
Err(e) => Err(e.into()),
}
}
```
**Verification:** manual smoke test. Run `loki "hello"`, `loki --code "write a rust hello world"`, `loki --role explain "what is TCP"`. All should produce identical output to before the change.
### Step 5: Wire REPL to `Engine::run`
Replace `repl/mod.rs::ask` with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls `run_repl_command` stays. `run_repl_command` for non-dot-command lines constructs a `RunRequest { input: Some(...), .. }` and calls `Engine::run`. Dot-commands get parsed into `CoreCommand` and called as `RunRequest { command: Some(...), input: None, .. }` (or with input if they carry trailing text).
```rust
// In Repl:
async fn handle_line(&mut self, line: &str) -> Result<()> {
let req = if let Some(rest) = line.strip_prefix('.') {
parse_dot_command_to_run_request(rest, &self.ctx)?
} else {
RunRequest {
input: Some(UserInput::from_text(line.to_string())),
command: None,
options: RunOptions::repl_turn(),
}
};
match self.engine.run(&mut self.ctx, req, &self.emitter).await {
Ok(_) => Ok(()),
Err(CoreError::Cancelled) => Ok(()),
Err(e) => {
self.emitter.emit(Event::Error(&e)).await.ok();
Ok(())
}
}
}
```
**Verification:** manual smoke test of the REPL. Run through a typical session:
1. `loki` → REPL starts
2. `hello` → plain prompt works
3. `.role explain` → role activates
4. `what is TCP` → responds under the role
5. `.session` → session starts
6. Several messages → conversation continues
7. `.info session` → info prints
8. `.compress session` → compression runs
9. `.agent sisyphus` → agent activates with sub-agents
10. `write a hello world in rust` → tool calls + output
11. `.exit agent` → agent exits, previous session still active
12. `.exit` → REPL exits
Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.
### Step 6: Delete the old `start_directive` and `ask`
Once CLI and REPL both route through `Engine::run` and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run `cargo check` and `cargo test`.
**Verification:** full test suite green, no dead code warnings.
### Step 7: Tidy and document
- Add rustdoc comments on `Engine`, `RunRequest`, `RunOptions`, `Emitter`, `Event`, `CoreCommand`, `CoreError`.
- Add an `examples/` subdirectory under `src/engine/` showing how to call the engine with each emitter.
- Update `docs/AGENTS.md` with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature).
- Update `docs/REST-API-ARCHITECTURE.md` to remove any "in Phase 2" placeholders.
---
## Risks and Watch Items
| Risk | Severity | Mitigation |
|---|---|---|
| **Terminal rendering regressions** | High | Golden-file snapshot tests for every `Event` variant. Manual smoke tests across all common REPL flows. Keep `TerminalEmitter` as a thin wrapper — no logic changes in the render code itself. |
| **Auto-continuation recursion limits** | Medium | The new `Engine::run` uses `Box::pin` for the auto-continuation recursive call. Verify with a mock LLM that `max_auto_continues = 100` doesn't blow the stack. |
| **Cancellation during tool execution** | Medium | Tool execution currently uses `AbortSignal`; the new path uses `CancellationToken`. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). |
| **Command parsing fidelity** | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated `parse_dot_command_to_run_request` function with unit tests for every edge case found in today's code. |
| **Macro execution recursion** | Medium | `.macro` can invoke LLM calls, which now go through `Engine::run`, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. |
| **Emitter error propagation** | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first `EmitError::ClientDisconnected` — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. |
| **Spinner interleaving with tool output** | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. |
| **Feature flag: `auto_continue` in CLI** | Low | After Phase 2, CLI *could* support auto-continuation but it's not exposed. Decision: leave it off by default in `RunOptions::cli()`, add a `--auto-continue` flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. |
---
## What Phase 2 Does NOT Do
- **No new features.** Everything that worked before works the same way after.
- **No API server.** `JsonEmitter` and `SseEmitter` are placeholders — Phase 4 implements them.
- **No `SessionStore` abstraction.** That's Phase 3.
- **No `ToolScope` unification.** That landed in Phase 1 Step 6.5.
- **No changes to LLM client code.** `call_chat_completions` and `call_chat_completions_streaming` keep their existing signatures.
- **No MCP factory pooling.** That's Phase 5.
- **No dot-command syntax changes.** The REPL still accepts exactly the same dot-commands; they just parse into `CoreCommand` instead of being hand-dispatched in `run_repl_command`.
The sole goal of Phase 2 is: **extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.**
---
## Entry Criteria (from Phase 1)
Before starting Phase 2, Phase 1 must be complete:
- [ ] `GlobalConfig` type alias is removed
- [ ] `AppState` and `RequestContext` are the only state holders
- [ ] All 91 callsites in the original migration table have been updated
- [ ] `cargo test` passes with no `Config`-based tests remaining
- [ ] CLI and REPL manual smoke tests pass identically to pre-Phase-1
## Exit Criteria (Phase 2 complete)
- [ ] `src/engine/` module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError
- [ ] `TerminalEmitter` implemented and wrapping all existing render paths
- [ ] `NullEmitter` and `CollectingEmitter` implemented
- [ ] `start_directive` in main.rs is a thin wrapper around `Engine::run`
- [ ] REPL's per-line handler routes through `Engine::run`
- [ ] All 37 `CoreCommand` variants implemented with unit tests
- [ ] Integration tests for the 7 engine scenarios listed in Step 3
- [ ] Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
- [ ] Phase 3 (SessionStore abstraction) can begin