Files
loki/docs/PHASE-2-IMPLEMENTATION-PLAN.md
2026-04-10 15:45:51 -06:00

33 KiB
Raw Permalink Blame History

Phase 2 Implementation Plan: Engine + Emitter

Overview

Phase 1 splits Config into AppState + RequestContext. Phase 2 takes the unified state and introduces the Engine — a single core function that replaces CLI's start_directive() and REPL's ask() — plus an Emitter trait that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call Engine::run() with different Emitter implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.

Estimated effort: ~1 week Risk: Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior. Depends on: Phase 1 Steps 010 complete (GlobalConfig eliminated, RequestContext wired through all entry points).


Why Phase 2 Exists

Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:

  1. Streaming flag handling. start_directive forces non-streaming when extracting code; ask never extracts code.
  2. Auto-continuation loop. ask has complex logic for auto_continue_count, todo inspection, and continuation prompt injection. start_directive has none.
  3. Session compression. ask triggers maybe_compress_session and awaits completion; start_directive never compresses.
  4. Session autoname. ask calls maybe_autoname_session after each turn; start_directive doesn't.
  5. Cleanup on exit. start_directive calls exit_session() at the end; ask lets the REPL loop handle it.

Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one Engine::run() that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., auto_continue: bool on RunRequest).

The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via crossterm. An Emitter implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.


The Architecture After Phase 2

┌─────────┐  ┌─────────┐                 ┌─────────┐
│   CLI   │  │  REPL   │                 │   API   │ (Phase 4)
└────┬────┘  └────┬────┘                 └────┬────┘
     │            │                           │
     ▼            ▼                           ▼
┌──────────────────────────────────────────────────┐
│            Engine::run(ctx, req, emitter)        │
│  ┌────────────────────────────────────────────┐  │
│  │ 1. Apply CoreCommand (if any)              │  │
│  │ 2. Build Input from req                    │  │
│  │ 3. apply_prelude (first turn only)         │  │
│  │ 4. before_chat_completion                  │  │
│  │ 5. Stream or buffered LLM call             │  │
│  │    ├─ emit Started                         │  │
│  │    ├─ emit AssistantDelta (per chunk)      │  │
│  │    ├─ emit ToolCall                        │  │
│  │    ├─ execute tool                         │  │
│  │    ├─ emit ToolResult                      │  │
│  │    └─ loop on tool results                 │  │
│  │ 6. after_chat_completion                   │  │
│  │ 7. maybe_compress_session                  │  │
│  │ 8. maybe_autoname_session                  │  │
│  │ 9. Auto-continuation (if applicable)       │  │
│  │ 10. emit Finished                          │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘
     │            │                           │
     ▼            ▼                           ▼
TerminalEmitter  TerminalEmitter          JsonEmitter / SseEmitter

Core Types

Engine

pub struct Engine {
    pub app: Arc<AppState>,
}

impl Engine {
    pub fn new(app: Arc<AppState>) -> Self { Self { app } }

    pub async fn run(
        &self,
        ctx: &mut RequestContext,
        req: RunRequest,
        emitter: &dyn Emitter,
    ) -> Result<RunOutcome, CoreError>;
}

Engine is intentionally a thin wrapper around Arc<AppState>. All per-turn state lives on RequestContext, so the engine itself has no per-call fields. This makes it cheap to clone and makes Engine::run trivially testable.

RunRequest

pub struct RunRequest {
    pub input: Option<UserInput>,
    pub command: Option<CoreCommand>,
    pub options: RunOptions,
}

pub struct UserInput {
    pub text: String,
    pub files: Vec<FileInput>,
    pub media: Vec<MediaInput>,
    pub continuation: Option<ContinuationKind>,
}

pub enum ContinuationKind {
    Continue,
    Regenerate,
}

pub struct RunOptions {
    pub stream: Option<bool>,
    pub extract_code: bool,
    pub auto_continue: bool,
    pub compress_session: bool,
    pub autoname_session: bool,
    pub apply_prelude: bool,
    pub with_embeddings: bool,
    pub cancel: CancellationToken,
}

impl RunOptions {
    pub fn cli() -> Self { /* today's start_directive defaults */ }
    pub fn repl_turn() -> Self { /* today's ask defaults */ }
    pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
    pub fn api_session() -> Self { /* API session defaults */ }
}

Two things to notice:

  1. input is Option. A RunRequest can carry just a command (e.g., .role explain) with no user text, just an input (a plain prompt), or both (the .role <name> <text> form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path.

  2. RunOptions is the knob panel that replaces the five divergences. CLI today has auto_continue: false, compress_session: false, autoname_session: false; REPL has all three true. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing RunOptions::cli() and flipping auto_continue = true — a capability that doesn't exist today.

CoreCommand

pub enum CoreCommand {
    // State setters
    SetModel(String),
    UsePrompt(String),
    UseRole { name: String, trailing_text: Option<String> },
    UseSession(Option<String>),
    UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
    UseRag(Option<String>),

    // Exit commands
    ExitRole,
    ExitSession,
    ExitRag,
    ExitAgent,

    // State queries
    Info(InfoScope),
    RagSources,

    // Config mutation
    Set { key: String, value: String },

    // Session actions
    CompressSession,
    EmptySession,
    SaveSession { name: Option<String> },
    EditSession,

    // Role actions
    SaveRole { name: Option<String> },
    EditRole,

    // RAG actions
    EditRagDocs,
    RebuildRag,

    // Agent actions
    EditAgentConfig,
    ClearTodo,
    StarterList,
    StarterRun(usize),

    // File input shortcut
    IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },

    // Macro execution
    Macro { name: String, args: Vec<String> },

    // Vault
    VaultAdd(String),
    VaultGet(String),
    VaultUpdate(String),
    VaultDelete(String),
    VaultList,

    // Miscellaneous
    EditConfig,
    Authenticate,
    Delete(DeleteKind),
    Copy,
    Help,
}

pub enum InfoScope {
    System,
    Role,
    Session,
    Rag,
    Agent,
}

pub enum DeleteKind {
    Role(String),
    Session(String),
    Rag(String),
    Macro(String),
    AgentData(String),
}

This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:

  • LLM-triggering commands (UsePrompt, UseRole with trailing_text, IncludeFiles with trailing_text, StarterRun, Macro that contains LLM calls, and the continuation variants Continue/Regenerate expressed via UserInput.continuation) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as RunRequest { command: Some(_), input: Some(_), .. } — command runs first, then input flows through.

  • Asynchronous commands that return immediately (EditConfig, EditRole, EditRagDocs, EditAgentConfig, most Vault*, Delete) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a Result event, and returns without invoking the LLM path.

  • Context-dependent commands (ClearTodo, StarterList, StarterRun, EditAgentConfig, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a CoreError::InvalidState { expected: "active agent" } if the precondition fails.

Emitter trait and Event enum

#[async_trait]
pub trait Emitter: Send + Sync {
    async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
}

pub enum Event<'a> {
    // Lifecycle
    Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
    Finished { outcome: &'a RunOutcome },

    // Assistant output
    AssistantDelta(&'a str),
    AssistantMessageEnd { full_text: &'a str },

    // Tool calls
    ToolCall { id: &'a str, name: &'a str, args: &'a str },
    ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },

    // Auto-continuation
    AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },

    // Session lifecycle signals
    SessionCompressing,
    SessionCompressed { tokens_saved: Option<usize> },
    SessionAutonamed(&'a str),

    // Informational
    Info(&'a str),
    Warning(&'a str),

    // Errors
    Error(&'a CoreError),
}

pub enum EmitError {
    ClientDisconnected,
    WriteFailed(std::io::Error),
}

Three implementations ship in Phase 2; two are stubs, one is real:

  • TerminalEmitter (real) — wraps today's SseHandlermarkdown_stream/raw_stream path. This is the bulk of Phase 2's work; see "Terminal rendering details" below.
  • NullEmitter (stub, for tests) — drops all events on the floor.
  • CollectingEmitter (stub, for tests and future JSON API) — appends events to a Vec<OwnedEvent> for later inspection.

The JsonEmitter and SseEmitter implementations land in Phase 4 when the API server comes online.

RunOutcome

pub struct RunOutcome {
    pub request_id: Uuid,
    pub session_id: Option<SessionId>,
    pub final_message: Option<String>,
    pub tool_call_count: usize,
    pub turns: usize,
    pub compressed: bool,
    pub autonamed: Option<String>,
    pub auto_continued: usize,
}

RunOutcome is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.

CoreError

pub enum CoreError {
    InvalidRequest { msg: String },
    InvalidState { expected: String, found: String },
    NotFound { what: String, name: String },
    Cancelled,
    ProviderError { provider: String, msg: String },
    ToolError { tool: String, msg: String },
    EmitterError(EmitError),
    Io(std::io::Error),
    Other(anyhow::Error),
}

impl CoreError {
    pub fn is_retryable(&self) -> bool { /* ... */ }
    pub fn http_status(&self) -> u16 { /* for future API use */ }
    pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
}

Terminal Rendering Details

The TerminalEmitter is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:

Today's flow:

LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
                                                      ├─ markdown_stream (if highlight)
                                                      └─ raw_stream (else)

Both markdown_stream and raw_stream write directly to stdout via crossterm, managing cursor positions, line clears, and incremental markdown parsing themselves.

Target flow:

LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
                                                      ├─ (internal) markdown_stream state machine
                                                      └─ (internal) raw_stream state machine

The TerminalEmitter owns a RefCell<StreamRenderState> (or Mutex if we need Send) that wraps the existing markdown_stream/raw_stream state. Each emit(AssistantDelta) call feeds the chunk into this state machine exactly as SseHandler's receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.

Things that migrate 1:1 into TerminalEmitter:

  • Spinner start/stop on first delta
  • Cursor positioning for line reprint during code block growth
  • Syntax highlighting invocation via MarkdownRender
  • Color/dim output for tool call banners
  • Final newline + cursor reset on AssistantMessageEnd

Things that the engine handles, not the emitter:

  • Tool call execution (still lives in the engine loop)
  • Session state mutations (engine calls before_chat_completion / after_chat_completion on RequestContext)
  • Auto-continuation decisions (engine inspects agent runtime)
  • Compression and autoname decisions (engine)

Things the emitter decides, not the engine:

  • Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a verbose: bool flag)
  • How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
  • Whether to show a spinner at all (disabled for non-TTY output)

One gotcha: today's SseHandler itself produces the mpsc channel that LLM clients push into. In the new model, SseHandler becomes an internal helper inside the engine's streaming path that converts mpsc::Receiver<SseEvent> into Emitter::emit(Event::AssistantDelta(...)) calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.


The Engine::run Pipeline

Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via RunOptions:

impl Engine {
    pub async fn run(
        &self,
        ctx: &mut RequestContext,
        req: RunRequest,
        emitter: &dyn Emitter,
    ) -> Result<RunOutcome, CoreError> {
        let request_id = Uuid::new_v4();
        let mut outcome = RunOutcome::new(request_id);

        emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;

        // 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
        if let Some(command) = req.command {
            self.dispatch_command(ctx, command, emitter, &req.options).await?;
        }

        // 2. Early return if there's no user input (pure command)
        let Some(user_input) = req.input else {
            emitter.emit(Event::Finished { outcome: &outcome }).await?;
            return Ok(outcome);
        };

        // 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
        if req.options.apply_prelude && !ctx.prelude_applied {
            apply_prelude(ctx, &req.options.cancel).await?;
            ctx.prelude_applied = true;
        }

        // 4. Build Input from user_input + ctx
        let input = build_input(ctx, user_input, &req.options).await?;

        // 5. Wait for any in-progress compression to finish (REPL-style block)
        while ctx.is_compressing_session() {
            tokio::time::sleep(Duration::from_millis(100)).await;
        }

        // 6. Enter the turn loop
        self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;

        // 7. Maybe compress session
        if req.options.compress_session && ctx.session_needs_compression() {
            emitter.emit(Event::SessionCompressing).await?;
            compress_session(ctx).await?;
            outcome.compressed = true;
            emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
        }

        // 8. Maybe autoname session
        if req.options.autoname_session {
            if let Some(name) = maybe_autoname_session(ctx).await? {
                outcome.autonamed = Some(name.clone());
                emitter.emit(Event::SessionAutonamed(&name)).await?;
            }
        }

        // 9. Auto-continuation (agents only)
        if req.options.auto_continue {
            if let Some(continuation) = self.check_auto_continue(ctx) {
                emitter.emit(Event::AutoContinueTriggered { .. }).await?;
                outcome.auto_continued += 1;
                // Recursive call with continuation prompt
                let next_req = RunRequest {
                    input: Some(UserInput::from_continuation(continuation)),
                    command: None,
                    options: req.options.clone(),
                };
                return Box::pin(self.run(ctx, next_req, emitter)).await;
            }
        }

        emitter.emit(Event::Finished { outcome: &outcome }).await?;
        Ok(outcome)
    }

    async fn run_turn(
        &self,
        ctx: &mut RequestContext,
        mut input: Input,
        options: &RunOptions,
        emitter: &dyn Emitter,
        outcome: &mut RunOutcome,
    ) -> Result<(), CoreError> {
        loop {
            outcome.turns += 1;

            before_chat_completion(ctx, &input);

            let client = input.create_client(ctx)?;
            let (output, tool_results) = if should_stream(&input, options) {
                stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
            } else {
                buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
            };

            after_chat_completion(ctx, &input, &output, &tool_results);
            outcome.tool_call_count += tool_results.len();

            if tool_results.is_empty() {
                outcome.final_message = Some(output);
                return Ok(());
            }

            // Emit each tool call and result
            for result in &tool_results {
                emitter.emit(Event::ToolCall { .. }).await?;
                emitter.emit(Event::ToolResult { .. }).await?;
            }

            // Loop: feed tool results back in
            input = input.merge_tool_results(output, tool_results);
        }
    }
}

Key design decisions in this pipeline:

  1. Command dispatch happens first. A RunRequest that carries both a command and input runs the command first (mutating ctx), then the input flows through the now-updated context. This lets .role explain "tell me about X" work as a single atomic operation — the role is activated, then the prompt is sent under the new role.

  2. Tool loop is iterative, not recursive. Today both start_directive and ask recursively call themselves after tool results. The new run_turn uses a loop instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation.

  3. Cancellation is checked at every await point. options.cancel: CancellationToken is threaded into every async call. On cancellation, the engine emits Event::Error(CoreError::Cancelled) and returns. Today's AbortSignal pattern gets wrapped in a CancellationToken adapter during the migration.

  4. Session state hooks fire at the same points as today. before_chat_completion and after_chat_completion continue to exist on RequestContext, called from the same places in the same order. The refactor doesn't change their semantics.

  5. Emitter errors don't abort the run. If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The EmitError::ClientDisconnected case is special-cased to swallow subsequent emits. Session save + tool execution still happen.


Migration Strategy

This phase is structured as extract, unify, rewrite frontends — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.

Step 1: Create the core types

Add the new files without wiring them into anything:

  • src/engine/mod.rs — module root
  • src/engine/engine.rsEngine struct + run method (initially unimplemented!())
  • src/engine/request.rsRunRequest, UserInput, RunOptions, ContinuationKind, RunOutcome
  • src/engine/command.rsCoreCommand enum + sub-enums
  • src/engine/error.rsCoreError enum
  • src/engine/emitter.rsEmitter trait + Event enum + EmitError
  • src/engine/emitters/mod.rs — emitter module
  • src/engine/emitters/null.rsNullEmitter (test stub)
  • src/engine/emitters/collecting.rsCollectingEmitter (test stub)
  • src/engine/emitters/terminal.rsTerminalEmitter (initially unimplemented!())

Register pub mod engine; in src/main.rs. Code compiles but nothing calls it yet.

Verification: cargo check clean, cargo test passes.

Step 2: Implement TerminalEmitter against existing render code

Before wiring the engine, build the TerminalEmitter by wrapping today's SseHandler + markdown_stream + raw_stream + MarkdownRender + Spinner code. Don't change any of those modules — just construct a TerminalEmitter that holds the state they need and forwards emit(Event::AssistantDelta(...)) into them.

pub struct TerminalEmitter {
    render_state: Mutex<StreamRenderState>,
    options: TerminalEmitterOptions,
}

pub struct TerminalEmitterOptions {
    pub highlight: bool,
    pub theme: Option<String>,
    pub verbose_tool_calls: bool,
    pub show_spinner: bool,
}

impl TerminalEmitter {
    pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
}

Implement Emitter for it, mapping each Event variant to the appropriate crossterm operation:

Event TerminalEmitter action
Started Start spinner
AssistantDelta(chunk) Stop spinner (if first), feed chunk into render state
AssistantMessageEnd { full_text } Flush render state, emit trailing newline
ToolCall { name, args } Print dimmed ⚙ Using <name> banner if verbose
ToolResult { .. } Print dimmed result summary if verbose
AutoContinueTriggered Print yellow ⟳ Continuing (N/M, R todos remaining) to stderr
SessionCompressing Print Compressing session... to stderr
SessionCompressed Print Session compressed. to stderr
SessionAutonamed Print Session auto-named: <name> to stderr
Info(msg) Print to stdout
Warning(msg) Print yellow to stderr
Error(e) Print red to stderr
Finished No-op (ensures trailing newline is flushed)

Verification: write integration tests that construct a TerminalEmitter, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use assert_cmd or similar to snapshot the rendered output of each event variant.

Step 3: Implement Engine::run without wiring it

Implement Engine::run and Engine::run_turn following the pseudocode above. Use the existing helper functions (before_chat_completion, after_chat_completion, apply_prelude, create_client, call_chat_completions, call_chat_completions_streaming, maybe_compress_session, maybe_autoname_session) unchanged, just called through ctx instead of &GlobalConfig.

Implementing dispatch_command is the largest sub-task here because it needs to match all 37 CoreCommand variants and invoke the right ctx methods. Most variants are straightforward one-liners that call a corresponding method on RequestContext. A few need special handling:

  • CoreCommand::UseRole { name, trailing_text } — activate role, then if trailing_text is Some, the outer run will flow through with the trailing text as UserInput.text.
  • CoreCommand::IncludeFiles — reads files, converts to FileInput list, attaches to ctx's next input (or fails if no input is provided).
  • CoreCommand::StarterRun(id) — looks up the starter text on the active agent, fails if no agent.
  • CoreCommand::Macro — delegates to macro_execute, which may itself call Engine::run internally for LLM-triggering macros.

Verification: write unit tests for dispatch_command using NullEmitter. Each test activates a command and asserts the expected state mutation on ctx. This is ~37 tests, one per variant, and they catch the bulk of regressions early.

Then write a handful of integration tests for Engine::run with CollectingEmitter, asserting the expected event sequence for:

  • Plain prompt, no tools, streaming
  • Plain prompt, no tools, non-streaming
  • Prompt that triggers 2 tool calls
  • Prompt that triggers auto-continuation (mock the LLM response)
  • Prompt on a session that crosses the compression threshold
  • Command-only request (.info)
  • Command + prompt request (.role explain "...")

Step 4: Wire CLI to Engine::run

Replace main.rs::start_directive with a thin wrapper:

async fn start_directive(
    app: Arc<AppState>,
    ctx: &mut RequestContext,
    input_text: String,
    files: Vec<String>,
    code_mode: bool,
) -> Result<()> {
    let engine = Engine::new(app.clone());
    let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);

    let req = RunRequest {
        input: Some(UserInput::from_text_and_files(input_text, files)),
        command: None,
        options: {
            let mut o = RunOptions::cli();
            o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
            o
        },
    };

    match engine.run(ctx, req, &emitter).await {
        Ok(_outcome) => Ok(()),
        Err(CoreError::Cancelled) => Ok(()),
        Err(e) => Err(e.into()),
    }
}

Verification: manual smoke test. Run loki "hello", loki --code "write a rust hello world", loki --role explain "what is TCP". All should produce identical output to before the change.

Step 5: Wire REPL to Engine::run

Replace repl/mod.rs::ask with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls run_repl_command stays. run_repl_command for non-dot-command lines constructs a RunRequest { input: Some(...), .. } and calls Engine::run. Dot-commands get parsed into CoreCommand and called as RunRequest { command: Some(...), input: None, .. } (or with input if they carry trailing text).

// In Repl:
async fn handle_line(&mut self, line: &str) -> Result<()> {
    let req = if let Some(rest) = line.strip_prefix('.') {
        parse_dot_command_to_run_request(rest, &self.ctx)?
    } else {
        RunRequest {
            input: Some(UserInput::from_text(line.to_string())),
            command: None,
            options: RunOptions::repl_turn(),
        }
    };

    match self.engine.run(&mut self.ctx, req, &self.emitter).await {
        Ok(_) => Ok(()),
        Err(CoreError::Cancelled) => Ok(()),
        Err(e) => {
            self.emitter.emit(Event::Error(&e)).await.ok();
            Ok(())
        }
    }
}

Verification: manual smoke test of the REPL. Run through a typical session:

  1. loki → REPL starts
  2. hello → plain prompt works
  3. .role explain → role activates
  4. what is TCP → responds under the role
  5. .session → session starts
  6. Several messages → conversation continues
  7. .info session → info prints
  8. .compress session → compression runs
  9. .agent sisyphus → agent activates with sub-agents
  10. write a hello world in rust → tool calls + output
  11. .exit agent → agent exits, previous session still active
  12. .exit → REPL exits

Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.

Step 6: Delete the old start_directive and ask

Once CLI and REPL both route through Engine::run and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run cargo check and cargo test.

Verification: full test suite green, no dead code warnings.

Step 7: Tidy and document

  • Add rustdoc comments on Engine, RunRequest, RunOptions, Emitter, Event, CoreCommand, CoreError.
  • Add an examples/ subdirectory under src/engine/ showing how to call the engine with each emitter.
  • Update docs/AGENTS.md with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature).
  • Update docs/REST-API-ARCHITECTURE.md to remove any "in Phase 2" placeholders.

Risks and Watch Items

Risk Severity Mitigation
Terminal rendering regressions High Golden-file snapshot tests for every Event variant. Manual smoke tests across all common REPL flows. Keep TerminalEmitter as a thin wrapper — no logic changes in the render code itself.
Auto-continuation recursion limits Medium The new Engine::run uses Box::pin for the auto-continuation recursive call. Verify with a mock LLM that max_auto_continues = 100 doesn't blow the stack.
Cancellation during tool execution Medium Tool execution currently uses AbortSignal; the new path uses CancellationToken. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors).
Command parsing fidelity Medium The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated parse_dot_command_to_run_request function with unit tests for every edge case found in today's code.
Macro execution recursion Medium .macro can invoke LLM calls, which now go through Engine::run, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing.
Emitter error propagation Low Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first EmitError::ClientDisconnected — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted.
Spinner interleaving with tool output Low Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically.
Feature flag: auto_continue in CLI Low After Phase 2, CLI could support auto-continuation but it's not exposed. Decision: leave it off by default in RunOptions::cli(), add a --auto-continue flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor.

What Phase 2 Does NOT Do

  • No new features. Everything that worked before works the same way after.
  • No API server. JsonEmitter and SseEmitter are placeholders — Phase 4 implements them.
  • No SessionStore abstraction. That's Phase 3.
  • No ToolScope unification. That landed in Phase 1 Step 6.5.
  • No changes to LLM client code. call_chat_completions and call_chat_completions_streaming keep their existing signatures.
  • No MCP factory pooling. That's Phase 5.
  • No dot-command syntax changes. The REPL still accepts exactly the same dot-commands; they just parse into CoreCommand instead of being hand-dispatched in run_repl_command.

The sole goal of Phase 2 is: extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.


Entry Criteria (from Phase 1)

Before starting Phase 2, Phase 1 must be complete:

  • GlobalConfig type alias is removed
  • AppState and RequestContext are the only state holders
  • All 91 callsites in the original migration table have been updated
  • cargo test passes with no Config-based tests remaining
  • CLI and REPL manual smoke tests pass identically to pre-Phase-1

Exit Criteria (Phase 2 complete)

  • src/engine/ module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError
  • TerminalEmitter implemented and wrapping all existing render paths
  • NullEmitter and CollectingEmitter implemented
  • start_directive in main.rs is a thin wrapper around Engine::run
  • REPL's per-line handler routes through Engine::run
  • All 37 CoreCommand variants implemented with unit tests
  • Integration tests for the 7 engine scenarios listed in Step 3
  • Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
  • cargo check, cargo test, cargo clippy all clean
  • Phase 3 (SessionStore abstraction) can begin