728 lines
33 KiB
Markdown
728 lines
33 KiB
Markdown
# Phase 2 Implementation Plan: Engine + Emitter
|
||
|
||
## Overview
|
||
|
||
Phase 1 splits `Config` into `AppState` + `RequestContext`. Phase 2 takes the unified state and introduces the **Engine** — a single core function that replaces CLI's `start_directive()` and REPL's `ask()` — plus an **Emitter trait** that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call `Engine::run()` with different `Emitter` implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.
|
||
|
||
**Estimated effort:** ~1 week
|
||
**Risk:** Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior.
|
||
**Depends on:** Phase 1 Steps 0–10 complete (`GlobalConfig` eliminated, `RequestContext` wired through all entry points).
|
||
|
||
---
|
||
|
||
## Why Phase 2 Exists
|
||
|
||
Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:
|
||
|
||
1. **Streaming flag handling.** `start_directive` forces non-streaming when extracting code; `ask` never extracts code.
|
||
2. **Auto-continuation loop.** `ask` has complex logic for `auto_continue_count`, todo inspection, and continuation prompt injection. `start_directive` has none.
|
||
3. **Session compression.** `ask` triggers `maybe_compress_session` and awaits completion; `start_directive` never compresses.
|
||
4. **Session autoname.** `ask` calls `maybe_autoname_session` after each turn; `start_directive` doesn't.
|
||
5. **Cleanup on exit.** `start_directive` calls `exit_session()` at the end; `ask` lets the REPL loop handle it.
|
||
|
||
Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one `Engine::run()` that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., `auto_continue: bool` on `RunRequest`).
|
||
|
||
The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via `crossterm`. An `Emitter` implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.
|
||
|
||
---
|
||
|
||
## The Architecture After Phase 2
|
||
|
||
```
|
||
┌─────────┐ ┌─────────┐ ┌─────────┐
|
||
│ CLI │ │ REPL │ │ API │ (Phase 4)
|
||
└────┬────┘ └────┬────┘ └────┬────┘
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌──────────────────────────────────────────────────┐
|
||
│ Engine::run(ctx, req, emitter) │
|
||
│ ┌────────────────────────────────────────────┐ │
|
||
│ │ 1. Apply CoreCommand (if any) │ │
|
||
│ │ 2. Build Input from req │ │
|
||
│ │ 3. apply_prelude (first turn only) │ │
|
||
│ │ 4. before_chat_completion │ │
|
||
│ │ 5. Stream or buffered LLM call │ │
|
||
│ │ ├─ emit Started │ │
|
||
│ │ ├─ emit AssistantDelta (per chunk) │ │
|
||
│ │ ├─ emit ToolCall │ │
|
||
│ │ ├─ execute tool │ │
|
||
│ │ ├─ emit ToolResult │ │
|
||
│ │ └─ loop on tool results │ │
|
||
│ │ 6. after_chat_completion │ │
|
||
│ │ 7. maybe_compress_session │ │
|
||
│ │ 8. maybe_autoname_session │ │
|
||
│ │ 9. Auto-continuation (if applicable) │ │
|
||
│ │ 10. emit Finished │ │
|
||
│ └────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────┘
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
TerminalEmitter TerminalEmitter JsonEmitter / SseEmitter
|
||
```
|
||
|
||
---
|
||
|
||
## Core Types
|
||
|
||
### `Engine`
|
||
|
||
```rust
|
||
pub struct Engine {
|
||
pub app: Arc<AppState>,
|
||
}
|
||
|
||
impl Engine {
|
||
pub fn new(app: Arc<AppState>) -> Self { Self { app } }
|
||
|
||
pub async fn run(
|
||
&self,
|
||
ctx: &mut RequestContext,
|
||
req: RunRequest,
|
||
emitter: &dyn Emitter,
|
||
) -> Result<RunOutcome, CoreError>;
|
||
}
|
||
```
|
||
|
||
`Engine` is intentionally a thin wrapper around `Arc<AppState>`. All per-turn state lives on `RequestContext`, so the engine itself has no per-call fields. This makes it cheap to clone and makes `Engine::run` trivially testable.
|
||
|
||
### `RunRequest`
|
||
|
||
```rust
|
||
pub struct RunRequest {
|
||
pub input: Option<UserInput>,
|
||
pub command: Option<CoreCommand>,
|
||
pub options: RunOptions,
|
||
}
|
||
|
||
pub struct UserInput {
|
||
pub text: String,
|
||
pub files: Vec<FileInput>,
|
||
pub media: Vec<MediaInput>,
|
||
pub continuation: Option<ContinuationKind>,
|
||
}
|
||
|
||
pub enum ContinuationKind {
|
||
Continue,
|
||
Regenerate,
|
||
}
|
||
|
||
pub struct RunOptions {
|
||
pub stream: Option<bool>,
|
||
pub extract_code: bool,
|
||
pub auto_continue: bool,
|
||
pub compress_session: bool,
|
||
pub autoname_session: bool,
|
||
pub apply_prelude: bool,
|
||
pub with_embeddings: bool,
|
||
pub cancel: CancellationToken,
|
||
}
|
||
|
||
impl RunOptions {
|
||
pub fn cli() -> Self { /* today's start_directive defaults */ }
|
||
pub fn repl_turn() -> Self { /* today's ask defaults */ }
|
||
pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
|
||
pub fn api_session() -> Self { /* API session defaults */ }
|
||
}
|
||
```
|
||
|
||
Two things to notice:
|
||
|
||
1. **`input` is `Option`.** A `RunRequest` can carry just a `command` (e.g., `.role explain`) with no user text, just an input (a plain prompt), or both (the `.role <name> <text>` form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path.
|
||
|
||
2. **`RunOptions` is the knob panel that replaces the five divergences.** CLI today has `auto_continue: false, compress_session: false, autoname_session: false`; REPL has all three `true`. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing `RunOptions::cli()` and flipping `auto_continue = true` — a capability that doesn't exist today.
|
||
|
||
### `CoreCommand`
|
||
|
||
```rust
|
||
pub enum CoreCommand {
|
||
// State setters
|
||
SetModel(String),
|
||
UsePrompt(String),
|
||
UseRole { name: String, trailing_text: Option<String> },
|
||
UseSession(Option<String>),
|
||
UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
|
||
UseRag(Option<String>),
|
||
|
||
// Exit commands
|
||
ExitRole,
|
||
ExitSession,
|
||
ExitRag,
|
||
ExitAgent,
|
||
|
||
// State queries
|
||
Info(InfoScope),
|
||
RagSources,
|
||
|
||
// Config mutation
|
||
Set { key: String, value: String },
|
||
|
||
// Session actions
|
||
CompressSession,
|
||
EmptySession,
|
||
SaveSession { name: Option<String> },
|
||
EditSession,
|
||
|
||
// Role actions
|
||
SaveRole { name: Option<String> },
|
||
EditRole,
|
||
|
||
// RAG actions
|
||
EditRagDocs,
|
||
RebuildRag,
|
||
|
||
// Agent actions
|
||
EditAgentConfig,
|
||
ClearTodo,
|
||
StarterList,
|
||
StarterRun(usize),
|
||
|
||
// File input shortcut
|
||
IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },
|
||
|
||
// Macro execution
|
||
Macro { name: String, args: Vec<String> },
|
||
|
||
// Vault
|
||
VaultAdd(String),
|
||
VaultGet(String),
|
||
VaultUpdate(String),
|
||
VaultDelete(String),
|
||
VaultList,
|
||
|
||
// Miscellaneous
|
||
EditConfig,
|
||
Authenticate,
|
||
Delete(DeleteKind),
|
||
Copy,
|
||
Help,
|
||
}
|
||
|
||
pub enum InfoScope {
|
||
System,
|
||
Role,
|
||
Session,
|
||
Rag,
|
||
Agent,
|
||
}
|
||
|
||
pub enum DeleteKind {
|
||
Role(String),
|
||
Session(String),
|
||
Rag(String),
|
||
Macro(String),
|
||
AgentData(String),
|
||
}
|
||
```
|
||
|
||
This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:
|
||
|
||
- **LLM-triggering commands** (`UsePrompt`, `UseRole` with trailing_text, `IncludeFiles` with trailing_text, `StarterRun`, `Macro` that contains LLM calls, and the continuation variants `Continue`/`Regenerate` expressed via `UserInput.continuation`) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as `RunRequest { command: Some(_), input: Some(_), .. }` — command runs first, then input flows through.
|
||
|
||
- **Asynchronous commands that return immediately** (`EditConfig`, `EditRole`, `EditRagDocs`, `EditAgentConfig`, most `Vault*`, `Delete`) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a `Result` event, and returns without invoking the LLM path.
|
||
|
||
- **Context-dependent commands** (`ClearTodo`, `StarterList`, `StarterRun`, `EditAgentConfig`, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a `CoreError::InvalidState { expected: "active agent" }` if the precondition fails.
|
||
|
||
### `Emitter` trait and `Event` enum
|
||
|
||
```rust
|
||
#[async_trait]
|
||
pub trait Emitter: Send + Sync {
|
||
async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
|
||
}
|
||
|
||
pub enum Event<'a> {
|
||
// Lifecycle
|
||
Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
|
||
Finished { outcome: &'a RunOutcome },
|
||
|
||
// Assistant output
|
||
AssistantDelta(&'a str),
|
||
AssistantMessageEnd { full_text: &'a str },
|
||
|
||
// Tool calls
|
||
ToolCall { id: &'a str, name: &'a str, args: &'a str },
|
||
ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },
|
||
|
||
// Auto-continuation
|
||
AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },
|
||
|
||
// Session lifecycle signals
|
||
SessionCompressing,
|
||
SessionCompressed { tokens_saved: Option<usize> },
|
||
SessionAutonamed(&'a str),
|
||
|
||
// Informational
|
||
Info(&'a str),
|
||
Warning(&'a str),
|
||
|
||
// Errors
|
||
Error(&'a CoreError),
|
||
}
|
||
|
||
pub enum EmitError {
|
||
ClientDisconnected,
|
||
WriteFailed(std::io::Error),
|
||
}
|
||
```
|
||
|
||
Three implementations ship in Phase 2; two are stubs, one is real:
|
||
|
||
- **`TerminalEmitter`** (real) — wraps today's `SseHandler` → `markdown_stream`/`raw_stream` path. This is the bulk of Phase 2's work; see "Terminal rendering details" below.
|
||
- **`NullEmitter`** (stub, for tests) — drops all events on the floor.
|
||
- **`CollectingEmitter`** (stub, for tests and future JSON API) — appends events to a `Vec<OwnedEvent>` for later inspection.
|
||
|
||
The `JsonEmitter` and `SseEmitter` implementations land in **Phase 4** when the API server comes online.
|
||
|
||
### `RunOutcome`
|
||
|
||
```rust
|
||
pub struct RunOutcome {
|
||
pub request_id: Uuid,
|
||
pub session_id: Option<SessionId>,
|
||
pub final_message: Option<String>,
|
||
pub tool_call_count: usize,
|
||
pub turns: usize,
|
||
pub compressed: bool,
|
||
pub autonamed: Option<String>,
|
||
pub auto_continued: usize,
|
||
}
|
||
```
|
||
|
||
`RunOutcome` is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.
|
||
|
||
### `CoreError`
|
||
|
||
```rust
|
||
pub enum CoreError {
|
||
InvalidRequest { msg: String },
|
||
InvalidState { expected: String, found: String },
|
||
NotFound { what: String, name: String },
|
||
Cancelled,
|
||
ProviderError { provider: String, msg: String },
|
||
ToolError { tool: String, msg: String },
|
||
EmitterError(EmitError),
|
||
Io(std::io::Error),
|
||
Other(anyhow::Error),
|
||
}
|
||
|
||
impl CoreError {
|
||
pub fn is_retryable(&self) -> bool { /* ... */ }
|
||
pub fn http_status(&self) -> u16 { /* for future API use */ }
|
||
pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Terminal Rendering Details
|
||
|
||
The `TerminalEmitter` is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:
|
||
|
||
**Today's flow:**
|
||
```
|
||
LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
|
||
├─ markdown_stream (if highlight)
|
||
└─ raw_stream (else)
|
||
```
|
||
|
||
Both `markdown_stream` and `raw_stream` write directly to stdout via `crossterm`, managing cursor positions, line clears, and incremental markdown parsing themselves.
|
||
|
||
**Target flow:**
|
||
```
|
||
LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
|
||
├─ (internal) markdown_stream state machine
|
||
└─ (internal) raw_stream state machine
|
||
```
|
||
|
||
The `TerminalEmitter` owns a `RefCell<StreamRenderState>` (or `Mutex` if we need `Send`) that wraps the existing `markdown_stream`/`raw_stream` state. Each `emit(AssistantDelta)` call feeds the chunk into this state machine exactly as `SseHandler`'s receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.
|
||
|
||
**Things that migrate 1:1 into `TerminalEmitter`:**
|
||
- Spinner start/stop on first delta
|
||
- Cursor positioning for line reprint during code block growth
|
||
- Syntax highlighting invocation via `MarkdownRender`
|
||
- Color/dim output for tool call banners
|
||
- Final newline + cursor reset on `AssistantMessageEnd`
|
||
|
||
**Things that the engine handles, not the emitter:**
|
||
- Tool call *execution* (still lives in the engine loop)
|
||
- Session state mutations (engine calls `before_chat_completion` / `after_chat_completion` on `RequestContext`)
|
||
- Auto-continuation decisions (engine inspects agent runtime)
|
||
- Compression and autoname decisions (engine)
|
||
|
||
**Things the emitter decides, not the engine:**
|
||
- Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a `verbose: bool` flag)
|
||
- How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
|
||
- Whether to show a spinner at all (disabled for non-TTY output)
|
||
|
||
**One gotcha:** today's `SseHandler` itself produces the `mpsc` channel that LLM clients push into. In the new model, `SseHandler` becomes an internal helper inside the engine's streaming path that converts `mpsc::Receiver<SseEvent>` into `Emitter::emit(Event::AssistantDelta(...))` calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.
|
||
|
||
---
|
||
|
||
## The Engine::run Pipeline
|
||
|
||
Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via `RunOptions`:
|
||
|
||
```rust
|
||
impl Engine {
|
||
pub async fn run(
|
||
&self,
|
||
ctx: &mut RequestContext,
|
||
req: RunRequest,
|
||
emitter: &dyn Emitter,
|
||
) -> Result<RunOutcome, CoreError> {
|
||
let request_id = Uuid::new_v4();
|
||
let mut outcome = RunOutcome::new(request_id);
|
||
|
||
emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;
|
||
|
||
// 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
|
||
if let Some(command) = req.command {
|
||
self.dispatch_command(ctx, command, emitter, &req.options).await?;
|
||
}
|
||
|
||
// 2. Early return if there's no user input (pure command)
|
||
let Some(user_input) = req.input else {
|
||
emitter.emit(Event::Finished { outcome: &outcome }).await?;
|
||
return Ok(outcome);
|
||
};
|
||
|
||
// 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
|
||
if req.options.apply_prelude && !ctx.prelude_applied {
|
||
apply_prelude(ctx, &req.options.cancel).await?;
|
||
ctx.prelude_applied = true;
|
||
}
|
||
|
||
// 4. Build Input from user_input + ctx
|
||
let input = build_input(ctx, user_input, &req.options).await?;
|
||
|
||
// 5. Wait for any in-progress compression to finish (REPL-style block)
|
||
while ctx.is_compressing_session() {
|
||
tokio::time::sleep(Duration::from_millis(100)).await;
|
||
}
|
||
|
||
// 6. Enter the turn loop
|
||
self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;
|
||
|
||
// 7. Maybe compress session
|
||
if req.options.compress_session && ctx.session_needs_compression() {
|
||
emitter.emit(Event::SessionCompressing).await?;
|
||
compress_session(ctx).await?;
|
||
outcome.compressed = true;
|
||
emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
|
||
}
|
||
|
||
// 8. Maybe autoname session
|
||
if req.options.autoname_session {
|
||
if let Some(name) = maybe_autoname_session(ctx).await? {
|
||
outcome.autonamed = Some(name.clone());
|
||
emitter.emit(Event::SessionAutonamed(&name)).await?;
|
||
}
|
||
}
|
||
|
||
// 9. Auto-continuation (agents only)
|
||
if req.options.auto_continue {
|
||
if let Some(continuation) = self.check_auto_continue(ctx) {
|
||
emitter.emit(Event::AutoContinueTriggered { .. }).await?;
|
||
outcome.auto_continued += 1;
|
||
// Recursive call with continuation prompt
|
||
let next_req = RunRequest {
|
||
input: Some(UserInput::from_continuation(continuation)),
|
||
command: None,
|
||
options: req.options.clone(),
|
||
};
|
||
return Box::pin(self.run(ctx, next_req, emitter)).await;
|
||
}
|
||
}
|
||
|
||
emitter.emit(Event::Finished { outcome: &outcome }).await?;
|
||
Ok(outcome)
|
||
}
|
||
|
||
async fn run_turn(
|
||
&self,
|
||
ctx: &mut RequestContext,
|
||
mut input: Input,
|
||
options: &RunOptions,
|
||
emitter: &dyn Emitter,
|
||
outcome: &mut RunOutcome,
|
||
) -> Result<(), CoreError> {
|
||
loop {
|
||
outcome.turns += 1;
|
||
|
||
before_chat_completion(ctx, &input);
|
||
|
||
let client = input.create_client(ctx)?;
|
||
let (output, tool_results) = if should_stream(&input, options) {
|
||
stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
|
||
} else {
|
||
buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
|
||
};
|
||
|
||
after_chat_completion(ctx, &input, &output, &tool_results);
|
||
outcome.tool_call_count += tool_results.len();
|
||
|
||
if tool_results.is_empty() {
|
||
outcome.final_message = Some(output);
|
||
return Ok(());
|
||
}
|
||
|
||
// Emit each tool call and result
|
||
for result in &tool_results {
|
||
emitter.emit(Event::ToolCall { .. }).await?;
|
||
emitter.emit(Event::ToolResult { .. }).await?;
|
||
}
|
||
|
||
// Loop: feed tool results back in
|
||
input = input.merge_tool_results(output, tool_results);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Key design decisions in this pipeline:**
|
||
|
||
1. **Command dispatch happens first.** A `RunRequest` that carries both a command and input runs the command first (mutating `ctx`), then the input flows through the now-updated context. This lets `.role explain "tell me about X"` work as a single atomic operation — the role is activated, then the prompt is sent under the new role.
|
||
|
||
2. **Tool loop is iterative, not recursive.** Today both `start_directive` and `ask` recursively call themselves after tool results. The new `run_turn` uses a `loop` instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation.
|
||
|
||
3. **Cancellation is checked at every await point.** `options.cancel: CancellationToken` is threaded into every async call. On cancellation, the engine emits `Event::Error(CoreError::Cancelled)` and returns. Today's `AbortSignal` pattern gets wrapped in a `CancellationToken` adapter during the migration.
|
||
|
||
4. **Session state hooks fire at the same points as today.** `before_chat_completion` and `after_chat_completion` continue to exist on `RequestContext`, called from the same places in the same order. The refactor doesn't change their semantics.
|
||
|
||
5. **Emitter errors don't abort the run.** If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The `EmitError::ClientDisconnected` case is special-cased to swallow subsequent emits. Session save + tool execution still happen.
|
||
|
||
---
|
||
|
||
## Migration Strategy
|
||
|
||
This phase is structured as **extract, unify, rewrite frontends** — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.
|
||
|
||
### Step 1: Create the core types
|
||
|
||
Add the new files without wiring them into anything:
|
||
|
||
- `src/engine/mod.rs` — module root
|
||
- `src/engine/engine.rs` — `Engine` struct + `run` method (initially `unimplemented!()`)
|
||
- `src/engine/request.rs` — `RunRequest`, `UserInput`, `RunOptions`, `ContinuationKind`, `RunOutcome`
|
||
- `src/engine/command.rs` — `CoreCommand` enum + sub-enums
|
||
- `src/engine/error.rs` — `CoreError` enum
|
||
- `src/engine/emitter.rs` — `Emitter` trait + `Event` enum + `EmitError`
|
||
- `src/engine/emitters/mod.rs` — emitter module
|
||
- `src/engine/emitters/null.rs` — `NullEmitter` (test stub)
|
||
- `src/engine/emitters/collecting.rs` — `CollectingEmitter` (test stub)
|
||
- `src/engine/emitters/terminal.rs` — `TerminalEmitter` (initially `unimplemented!()`)
|
||
|
||
Register `pub mod engine;` in `src/main.rs`. Code compiles but nothing calls it yet.
|
||
|
||
**Verification:** `cargo check` clean, `cargo test` passes.
|
||
|
||
### Step 2: Implement `TerminalEmitter` against existing render code
|
||
|
||
Before wiring the engine, build the `TerminalEmitter` by wrapping today's `SseHandler` + `markdown_stream` + `raw_stream` + `MarkdownRender` + `Spinner` code. Don't change any of those modules — just construct a `TerminalEmitter` that holds the state they need and forwards `emit(Event::AssistantDelta(...))` into them.
|
||
|
||
```rust
|
||
pub struct TerminalEmitter {
|
||
render_state: Mutex<StreamRenderState>,
|
||
options: TerminalEmitterOptions,
|
||
}
|
||
|
||
pub struct TerminalEmitterOptions {
|
||
pub highlight: bool,
|
||
pub theme: Option<String>,
|
||
pub verbose_tool_calls: bool,
|
||
pub show_spinner: bool,
|
||
}
|
||
|
||
impl TerminalEmitter {
|
||
pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
|
||
}
|
||
```
|
||
|
||
Implement `Emitter` for it, mapping each `Event` variant to the appropriate crossterm operation:
|
||
|
||
| Event | TerminalEmitter action |
|
||
|---|---|
|
||
| `Started` | Start spinner |
|
||
| `AssistantDelta(chunk)` | Stop spinner (if first), feed chunk into render state |
|
||
| `AssistantMessageEnd { full_text }` | Flush render state, emit trailing newline |
|
||
| `ToolCall { name, args }` | Print dimmed `⚙ Using <name>` banner if verbose |
|
||
| `ToolResult { .. }` | Print dimmed result summary if verbose |
|
||
| `AutoContinueTriggered` | Print yellow `⟳ Continuing (N/M, R todos remaining)` to stderr |
|
||
| `SessionCompressing` | Print `Compressing session...` to stderr |
|
||
| `SessionCompressed` | Print `Session compressed.` to stderr |
|
||
| `SessionAutonamed` | Print `Session auto-named: <name>` to stderr |
|
||
| `Info(msg)` | Print to stdout |
|
||
| `Warning(msg)` | Print yellow to stderr |
|
||
| `Error(e)` | Print red to stderr |
|
||
| `Finished` | No-op (ensures trailing newline is flushed) |
|
||
|
||
**Verification:** write integration tests that construct a `TerminalEmitter`, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use `assert_cmd` or similar to snapshot the rendered output of each event variant.
|
||
|
||
### Step 3: Implement `Engine::run` without wiring it
|
||
|
||
Implement `Engine::run` and `Engine::run_turn` following the pseudocode above. Use the existing helper functions (`before_chat_completion`, `after_chat_completion`, `apply_prelude`, `create_client`, `call_chat_completions`, `call_chat_completions_streaming`, `maybe_compress_session`, `maybe_autoname_session`) unchanged, just called through `ctx` instead of `&GlobalConfig`.
|
||
|
||
**Implementing `dispatch_command`** is the largest sub-task here because it needs to match all 37 `CoreCommand` variants and invoke the right `ctx` methods. Most variants are straightforward one-liners that call a corresponding method on `RequestContext`. A few need special handling:
|
||
|
||
- `CoreCommand::UseRole { name, trailing_text }` — activate role, then if `trailing_text` is `Some`, the outer `run` will flow through with the trailing text as `UserInput.text`.
|
||
- `CoreCommand::IncludeFiles` — reads files, converts to `FileInput` list, attaches to `ctx`'s next input (or fails if no input is provided).
|
||
- `CoreCommand::StarterRun(id)` — looks up the starter text on the active agent, fails if no agent.
|
||
- `CoreCommand::Macro` — delegates to `macro_execute`, which may itself call `Engine::run` internally for LLM-triggering macros.
|
||
|
||
**Verification:** write unit tests for `dispatch_command` using `NullEmitter`. Each test activates a command and asserts the expected state mutation on `ctx`. This is ~37 tests, one per variant, and they catch the bulk of regressions early.
|
||
|
||
Then write a handful of integration tests for `Engine::run` with `CollectingEmitter`, asserting the expected event sequence for:
|
||
- Plain prompt, no tools, streaming
|
||
- Plain prompt, no tools, non-streaming
|
||
- Prompt that triggers 2 tool calls
|
||
- Prompt that triggers auto-continuation (mock the LLM response)
|
||
- Prompt on a session that crosses the compression threshold
|
||
- Command-only request (`.info`)
|
||
- Command + prompt request (`.role explain "..."`)
|
||
|
||
### Step 4: Wire CLI to `Engine::run`
|
||
|
||
Replace `main.rs::start_directive` with a thin wrapper:
|
||
|
||
```rust
|
||
async fn start_directive(
|
||
app: Arc<AppState>,
|
||
ctx: &mut RequestContext,
|
||
input_text: String,
|
||
files: Vec<String>,
|
||
code_mode: bool,
|
||
) -> Result<()> {
|
||
let engine = Engine::new(app.clone());
|
||
let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);
|
||
|
||
let req = RunRequest {
|
||
input: Some(UserInput::from_text_and_files(input_text, files)),
|
||
command: None,
|
||
options: {
|
||
let mut o = RunOptions::cli();
|
||
o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
|
||
o
|
||
},
|
||
};
|
||
|
||
match engine.run(ctx, req, &emitter).await {
|
||
Ok(_outcome) => Ok(()),
|
||
Err(CoreError::Cancelled) => Ok(()),
|
||
Err(e) => Err(e.into()),
|
||
}
|
||
}
|
||
```
|
||
|
||
**Verification:** manual smoke test. Run `loki "hello"`, `loki --code "write a rust hello world"`, `loki --role explain "what is TCP"`. All should produce identical output to before the change.
|
||
|
||
### Step 5: Wire REPL to `Engine::run`
|
||
|
||
Replace `repl/mod.rs::ask` with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls `run_repl_command` stays. `run_repl_command` for non-dot-command lines constructs a `RunRequest { input: Some(...), .. }` and calls `Engine::run`. Dot-commands get parsed into `CoreCommand` and called as `RunRequest { command: Some(...), input: None, .. }` (or with input if they carry trailing text).
|
||
|
||
```rust
|
||
// In Repl:
|
||
async fn handle_line(&mut self, line: &str) -> Result<()> {
|
||
let req = if let Some(rest) = line.strip_prefix('.') {
|
||
parse_dot_command_to_run_request(rest, &self.ctx)?
|
||
} else {
|
||
RunRequest {
|
||
input: Some(UserInput::from_text(line.to_string())),
|
||
command: None,
|
||
options: RunOptions::repl_turn(),
|
||
}
|
||
};
|
||
|
||
match self.engine.run(&mut self.ctx, req, &self.emitter).await {
|
||
Ok(_) => Ok(()),
|
||
Err(CoreError::Cancelled) => Ok(()),
|
||
Err(e) => {
|
||
self.emitter.emit(Event::Error(&e)).await.ok();
|
||
Ok(())
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Verification:** manual smoke test of the REPL. Run through a typical session:
|
||
1. `loki` → REPL starts
|
||
2. `hello` → plain prompt works
|
||
3. `.role explain` → role activates
|
||
4. `what is TCP` → responds under the role
|
||
5. `.session` → session starts
|
||
6. Several messages → conversation continues
|
||
7. `.info session` → info prints
|
||
8. `.compress session` → compression runs
|
||
9. `.agent sisyphus` → agent activates with sub-agents
|
||
10. `write a hello world in rust` → tool calls + output
|
||
11. `.exit agent` → agent exits, previous session still active
|
||
12. `.exit` → REPL exits
|
||
|
||
Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.
|
||
|
||
### Step 6: Delete the old `start_directive` and `ask`
|
||
|
||
Once CLI and REPL both route through `Engine::run` and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run `cargo check` and `cargo test`.
|
||
|
||
**Verification:** full test suite green, no dead code warnings.
|
||
|
||
### Step 7: Tidy and document
|
||
|
||
- Add rustdoc comments on `Engine`, `RunRequest`, `RunOptions`, `Emitter`, `Event`, `CoreCommand`, `CoreError`.
|
||
- Add an `examples/` subdirectory under `src/engine/` showing how to call the engine with each emitter.
|
||
- Update `docs/AGENTS.md` with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature).
|
||
- Update `docs/REST-API-ARCHITECTURE.md` to remove any "in Phase 2" placeholders.
|
||
|
||
---
|
||
|
||
## Risks and Watch Items
|
||
|
||
| Risk | Severity | Mitigation |
|
||
|---|---|---|
|
||
| **Terminal rendering regressions** | High | Golden-file snapshot tests for every `Event` variant. Manual smoke tests across all common REPL flows. Keep `TerminalEmitter` as a thin wrapper — no logic changes in the render code itself. |
|
||
| **Auto-continuation recursion limits** | Medium | The new `Engine::run` uses `Box::pin` for the auto-continuation recursive call. Verify with a mock LLM that `max_auto_continues = 100` doesn't blow the stack. |
|
||
| **Cancellation during tool execution** | Medium | Tool execution currently uses `AbortSignal`; the new path uses `CancellationToken`. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). |
|
||
| **Command parsing fidelity** | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated `parse_dot_command_to_run_request` function with unit tests for every edge case found in today's code. |
|
||
| **Macro execution recursion** | Medium | `.macro` can invoke LLM calls, which now go through `Engine::run`, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. |
|
||
| **Emitter error propagation** | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first `EmitError::ClientDisconnected` — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. |
|
||
| **Spinner interleaving with tool output** | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. |
|
||
| **Feature flag: `auto_continue` in CLI** | Low | After Phase 2, CLI *could* support auto-continuation but it's not exposed. Decision: leave it off by default in `RunOptions::cli()`, add a `--auto-continue` flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. |
|
||
|
||
---
|
||
|
||
## What Phase 2 Does NOT Do
|
||
|
||
- **No new features.** Everything that worked before works the same way after.
|
||
- **No API server.** `JsonEmitter` and `SseEmitter` are placeholders — Phase 4 implements them.
|
||
- **No `SessionStore` abstraction.** That's Phase 3.
|
||
- **No `ToolScope` unification.** That landed in Phase 1 Step 6.5.
|
||
- **No changes to LLM client code.** `call_chat_completions` and `call_chat_completions_streaming` keep their existing signatures.
|
||
- **No MCP factory pooling.** That's Phase 5.
|
||
- **No dot-command syntax changes.** The REPL still accepts exactly the same dot-commands; they just parse into `CoreCommand` instead of being hand-dispatched in `run_repl_command`.
|
||
|
||
The sole goal of Phase 2 is: **extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.**
|
||
|
||
---
|
||
|
||
## Entry Criteria (from Phase 1)
|
||
|
||
Before starting Phase 2, Phase 1 must be complete:
|
||
|
||
- [ ] `GlobalConfig` type alias is removed
|
||
- [ ] `AppState` and `RequestContext` are the only state holders
|
||
- [ ] All 91 callsites in the original migration table have been updated
|
||
- [ ] `cargo test` passes with no `Config`-based tests remaining
|
||
- [ ] CLI and REPL manual smoke tests pass identically to pre-Phase-1
|
||
|
||
## Exit Criteria (Phase 2 complete)
|
||
|
||
- [ ] `src/engine/` module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError
|
||
- [ ] `TerminalEmitter` implemented and wrapping all existing render paths
|
||
- [ ] `NullEmitter` and `CollectingEmitter` implemented
|
||
- [ ] `start_directive` in main.rs is a thin wrapper around `Engine::run`
|
||
- [ ] REPL's per-line handler routes through `Engine::run`
|
||
- [ ] All 37 `CoreCommand` variants implemented with unit tests
|
||
- [ ] Integration tests for the 7 engine scenarios listed in Step 3
|
||
- [ ] Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
|
||
- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
|
||
- [ ] Phase 3 (SessionStore abstraction) can begin
|