33 KiB
Phase 2 Implementation Plan: Engine + Emitter
Overview
Phase 1 splits Config into AppState + RequestContext. Phase 2 takes the unified state and introduces the Engine — a single core function that replaces CLI's start_directive() and REPL's ask() — plus an Emitter trait that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call Engine::run() with different Emitter implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.
Estimated effort: ~1 week
Risk: Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior.
Depends on: Phase 1 Steps 0–10 complete (GlobalConfig eliminated, RequestContext wired through all entry points).
Why Phase 2 Exists
Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:
- Streaming flag handling.
start_directiveforces non-streaming when extracting code;asknever extracts code. - Auto-continuation loop.
askhas complex logic forauto_continue_count, todo inspection, and continuation prompt injection.start_directivehas none. - Session compression.
asktriggersmaybe_compress_sessionand awaits completion;start_directivenever compresses. - Session autoname.
askcallsmaybe_autoname_sessionafter each turn;start_directivedoesn't. - Cleanup on exit.
start_directivecallsexit_session()at the end;asklets the REPL loop handle it.
Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one Engine::run() that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., auto_continue: bool on RunRequest).
The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via crossterm. An Emitter implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.
The Architecture After Phase 2
┌─────────┐ ┌─────────┐ ┌─────────┐
│ CLI │ │ REPL │ │ API │ (Phase 4)
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────┐
│ Engine::run(ctx, req, emitter) │
│ ┌────────────────────────────────────────────┐ │
│ │ 1. Apply CoreCommand (if any) │ │
│ │ 2. Build Input from req │ │
│ │ 3. apply_prelude (first turn only) │ │
│ │ 4. before_chat_completion │ │
│ │ 5. Stream or buffered LLM call │ │
│ │ ├─ emit Started │ │
│ │ ├─ emit AssistantDelta (per chunk) │ │
│ │ ├─ emit ToolCall │ │
│ │ ├─ execute tool │ │
│ │ ├─ emit ToolResult │ │
│ │ └─ loop on tool results │ │
│ │ 6. after_chat_completion │ │
│ │ 7. maybe_compress_session │ │
│ │ 8. maybe_autoname_session │ │
│ │ 9. Auto-continuation (if applicable) │ │
│ │ 10. emit Finished │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
TerminalEmitter TerminalEmitter JsonEmitter / SseEmitter
Core Types
Engine
pub struct Engine {
pub app: Arc<AppState>,
}
impl Engine {
pub fn new(app: Arc<AppState>) -> Self { Self { app } }
pub async fn run(
&self,
ctx: &mut RequestContext,
req: RunRequest,
emitter: &dyn Emitter,
) -> Result<RunOutcome, CoreError>;
}
Engine is intentionally a thin wrapper around Arc<AppState>. All per-turn state lives on RequestContext, so the engine itself has no per-call fields. This makes it cheap to clone and makes Engine::run trivially testable.
RunRequest
pub struct RunRequest {
pub input: Option<UserInput>,
pub command: Option<CoreCommand>,
pub options: RunOptions,
}
pub struct UserInput {
pub text: String,
pub files: Vec<FileInput>,
pub media: Vec<MediaInput>,
pub continuation: Option<ContinuationKind>,
}
pub enum ContinuationKind {
Continue,
Regenerate,
}
pub struct RunOptions {
pub stream: Option<bool>,
pub extract_code: bool,
pub auto_continue: bool,
pub compress_session: bool,
pub autoname_session: bool,
pub apply_prelude: bool,
pub with_embeddings: bool,
pub cancel: CancellationToken,
}
impl RunOptions {
pub fn cli() -> Self { /* today's start_directive defaults */ }
pub fn repl_turn() -> Self { /* today's ask defaults */ }
pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
pub fn api_session() -> Self { /* API session defaults */ }
}
Two things to notice:
-
inputisOption. ARunRequestcan carry just acommand(e.g.,.role explain) with no user text, just an input (a plain prompt), or both (the.role <name> <text>form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path. -
RunOptionsis the knob panel that replaces the five divergences. CLI today hasauto_continue: false, compress_session: false, autoname_session: false; REPL has all threetrue. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructingRunOptions::cli()and flippingauto_continue = true— a capability that doesn't exist today.
CoreCommand
pub enum CoreCommand {
// State setters
SetModel(String),
UsePrompt(String),
UseRole { name: String, trailing_text: Option<String> },
UseSession(Option<String>),
UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
UseRag(Option<String>),
// Exit commands
ExitRole,
ExitSession,
ExitRag,
ExitAgent,
// State queries
Info(InfoScope),
RagSources,
// Config mutation
Set { key: String, value: String },
// Session actions
CompressSession,
EmptySession,
SaveSession { name: Option<String> },
EditSession,
// Role actions
SaveRole { name: Option<String> },
EditRole,
// RAG actions
EditRagDocs,
RebuildRag,
// Agent actions
EditAgentConfig,
ClearTodo,
StarterList,
StarterRun(usize),
// File input shortcut
IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },
// Macro execution
Macro { name: String, args: Vec<String> },
// Vault
VaultAdd(String),
VaultGet(String),
VaultUpdate(String),
VaultDelete(String),
VaultList,
// Miscellaneous
EditConfig,
Authenticate,
Delete(DeleteKind),
Copy,
Help,
}
pub enum InfoScope {
System,
Role,
Session,
Rag,
Agent,
}
pub enum DeleteKind {
Role(String),
Session(String),
Rag(String),
Macro(String),
AgentData(String),
}
This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:
-
LLM-triggering commands (
UsePrompt,UseRolewith trailing_text,IncludeFileswith trailing_text,StarterRun,Macrothat contains LLM calls, and the continuation variantsContinue/Regenerateexpressed viaUserInput.continuation) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them asRunRequest { command: Some(_), input: Some(_), .. }— command runs first, then input flows through. -
Asynchronous commands that return immediately (
EditConfig,EditRole,EditRagDocs,EditAgentConfig, mostVault*,Delete) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits aResultevent, and returns without invoking the LLM path. -
Context-dependent commands (
ClearTodo,StarterList,StarterRun,EditAgentConfig, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns aCoreError::InvalidState { expected: "active agent" }if the precondition fails.
Emitter trait and Event enum
#[async_trait]
pub trait Emitter: Send + Sync {
async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
}
pub enum Event<'a> {
// Lifecycle
Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
Finished { outcome: &'a RunOutcome },
// Assistant output
AssistantDelta(&'a str),
AssistantMessageEnd { full_text: &'a str },
// Tool calls
ToolCall { id: &'a str, name: &'a str, args: &'a str },
ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },
// Auto-continuation
AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },
// Session lifecycle signals
SessionCompressing,
SessionCompressed { tokens_saved: Option<usize> },
SessionAutonamed(&'a str),
// Informational
Info(&'a str),
Warning(&'a str),
// Errors
Error(&'a CoreError),
}
pub enum EmitError {
ClientDisconnected,
WriteFailed(std::io::Error),
}
Three implementations ship in Phase 2; two are stubs, one is real:
TerminalEmitter(real) — wraps today'sSseHandler→markdown_stream/raw_streampath. This is the bulk of Phase 2's work; see "Terminal rendering details" below.NullEmitter(stub, for tests) — drops all events on the floor.CollectingEmitter(stub, for tests and future JSON API) — appends events to aVec<OwnedEvent>for later inspection.
The JsonEmitter and SseEmitter implementations land in Phase 4 when the API server comes online.
RunOutcome
pub struct RunOutcome {
pub request_id: Uuid,
pub session_id: Option<SessionId>,
pub final_message: Option<String>,
pub tool_call_count: usize,
pub turns: usize,
pub compressed: bool,
pub autonamed: Option<String>,
pub auto_continued: usize,
}
RunOutcome is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.
CoreError
pub enum CoreError {
InvalidRequest { msg: String },
InvalidState { expected: String, found: String },
NotFound { what: String, name: String },
Cancelled,
ProviderError { provider: String, msg: String },
ToolError { tool: String, msg: String },
EmitterError(EmitError),
Io(std::io::Error),
Other(anyhow::Error),
}
impl CoreError {
pub fn is_retryable(&self) -> bool { /* ... */ }
pub fn http_status(&self) -> u16 { /* for future API use */ }
pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
}
Terminal Rendering Details
The TerminalEmitter is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:
Today's flow:
LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
├─ markdown_stream (if highlight)
└─ raw_stream (else)
Both markdown_stream and raw_stream write directly to stdout via crossterm, managing cursor positions, line clears, and incremental markdown parsing themselves.
Target flow:
LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
├─ (internal) markdown_stream state machine
└─ (internal) raw_stream state machine
The TerminalEmitter owns a RefCell<StreamRenderState> (or Mutex if we need Send) that wraps the existing markdown_stream/raw_stream state. Each emit(AssistantDelta) call feeds the chunk into this state machine exactly as SseHandler's receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.
Things that migrate 1:1 into TerminalEmitter:
- Spinner start/stop on first delta
- Cursor positioning for line reprint during code block growth
- Syntax highlighting invocation via
MarkdownRender - Color/dim output for tool call banners
- Final newline + cursor reset on
AssistantMessageEnd
Things that the engine handles, not the emitter:
- Tool call execution (still lives in the engine loop)
- Session state mutations (engine calls
before_chat_completion/after_chat_completiononRequestContext) - Auto-continuation decisions (engine inspects agent runtime)
- Compression and autoname decisions (engine)
Things the emitter decides, not the engine:
- Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a
verbose: boolflag) - How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
- Whether to show a spinner at all (disabled for non-TTY output)
One gotcha: today's SseHandler itself produces the mpsc channel that LLM clients push into. In the new model, SseHandler becomes an internal helper inside the engine's streaming path that converts mpsc::Receiver<SseEvent> into Emitter::emit(Event::AssistantDelta(...)) calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.
The Engine::run Pipeline
Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via RunOptions:
impl Engine {
pub async fn run(
&self,
ctx: &mut RequestContext,
req: RunRequest,
emitter: &dyn Emitter,
) -> Result<RunOutcome, CoreError> {
let request_id = Uuid::new_v4();
let mut outcome = RunOutcome::new(request_id);
emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;
// 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
if let Some(command) = req.command {
self.dispatch_command(ctx, command, emitter, &req.options).await?;
}
// 2. Early return if there's no user input (pure command)
let Some(user_input) = req.input else {
emitter.emit(Event::Finished { outcome: &outcome }).await?;
return Ok(outcome);
};
// 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
if req.options.apply_prelude && !ctx.prelude_applied {
apply_prelude(ctx, &req.options.cancel).await?;
ctx.prelude_applied = true;
}
// 4. Build Input from user_input + ctx
let input = build_input(ctx, user_input, &req.options).await?;
// 5. Wait for any in-progress compression to finish (REPL-style block)
while ctx.is_compressing_session() {
tokio::time::sleep(Duration::from_millis(100)).await;
}
// 6. Enter the turn loop
self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;
// 7. Maybe compress session
if req.options.compress_session && ctx.session_needs_compression() {
emitter.emit(Event::SessionCompressing).await?;
compress_session(ctx).await?;
outcome.compressed = true;
emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
}
// 8. Maybe autoname session
if req.options.autoname_session {
if let Some(name) = maybe_autoname_session(ctx).await? {
outcome.autonamed = Some(name.clone());
emitter.emit(Event::SessionAutonamed(&name)).await?;
}
}
// 9. Auto-continuation (agents only)
if req.options.auto_continue {
if let Some(continuation) = self.check_auto_continue(ctx) {
emitter.emit(Event::AutoContinueTriggered { .. }).await?;
outcome.auto_continued += 1;
// Recursive call with continuation prompt
let next_req = RunRequest {
input: Some(UserInput::from_continuation(continuation)),
command: None,
options: req.options.clone(),
};
return Box::pin(self.run(ctx, next_req, emitter)).await;
}
}
emitter.emit(Event::Finished { outcome: &outcome }).await?;
Ok(outcome)
}
async fn run_turn(
&self,
ctx: &mut RequestContext,
mut input: Input,
options: &RunOptions,
emitter: &dyn Emitter,
outcome: &mut RunOutcome,
) -> Result<(), CoreError> {
loop {
outcome.turns += 1;
before_chat_completion(ctx, &input);
let client = input.create_client(ctx)?;
let (output, tool_results) = if should_stream(&input, options) {
stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
} else {
buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
};
after_chat_completion(ctx, &input, &output, &tool_results);
outcome.tool_call_count += tool_results.len();
if tool_results.is_empty() {
outcome.final_message = Some(output);
return Ok(());
}
// Emit each tool call and result
for result in &tool_results {
emitter.emit(Event::ToolCall { .. }).await?;
emitter.emit(Event::ToolResult { .. }).await?;
}
// Loop: feed tool results back in
input = input.merge_tool_results(output, tool_results);
}
}
}
Key design decisions in this pipeline:
-
Command dispatch happens first. A
RunRequestthat carries both a command and input runs the command first (mutatingctx), then the input flows through the now-updated context. This lets.role explain "tell me about X"work as a single atomic operation — the role is activated, then the prompt is sent under the new role. -
Tool loop is iterative, not recursive. Today both
start_directiveandaskrecursively call themselves after tool results. The newrun_turnuses aloopinstead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation. -
Cancellation is checked at every await point.
options.cancel: CancellationTokenis threaded into every async call. On cancellation, the engine emitsEvent::Error(CoreError::Cancelled)and returns. Today'sAbortSignalpattern gets wrapped in aCancellationTokenadapter during the migration. -
Session state hooks fire at the same points as today.
before_chat_completionandafter_chat_completioncontinue to exist onRequestContext, called from the same places in the same order. The refactor doesn't change their semantics. -
Emitter errors don't abort the run. If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The
EmitError::ClientDisconnectedcase is special-cased to swallow subsequent emits. Session save + tool execution still happen.
Migration Strategy
This phase is structured as extract, unify, rewrite frontends — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.
Step 1: Create the core types
Add the new files without wiring them into anything:
src/engine/mod.rs— module rootsrc/engine/engine.rs—Enginestruct +runmethod (initiallyunimplemented!())src/engine/request.rs—RunRequest,UserInput,RunOptions,ContinuationKind,RunOutcomesrc/engine/command.rs—CoreCommandenum + sub-enumssrc/engine/error.rs—CoreErrorenumsrc/engine/emitter.rs—Emittertrait +Eventenum +EmitErrorsrc/engine/emitters/mod.rs— emitter modulesrc/engine/emitters/null.rs—NullEmitter(test stub)src/engine/emitters/collecting.rs—CollectingEmitter(test stub)src/engine/emitters/terminal.rs—TerminalEmitter(initiallyunimplemented!())
Register pub mod engine; in src/main.rs. Code compiles but nothing calls it yet.
Verification: cargo check clean, cargo test passes.
Step 2: Implement TerminalEmitter against existing render code
Before wiring the engine, build the TerminalEmitter by wrapping today's SseHandler + markdown_stream + raw_stream + MarkdownRender + Spinner code. Don't change any of those modules — just construct a TerminalEmitter that holds the state they need and forwards emit(Event::AssistantDelta(...)) into them.
pub struct TerminalEmitter {
render_state: Mutex<StreamRenderState>,
options: TerminalEmitterOptions,
}
pub struct TerminalEmitterOptions {
pub highlight: bool,
pub theme: Option<String>,
pub verbose_tool_calls: bool,
pub show_spinner: bool,
}
impl TerminalEmitter {
pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
}
Implement Emitter for it, mapping each Event variant to the appropriate crossterm operation:
| Event | TerminalEmitter action |
|---|---|
Started |
Start spinner |
AssistantDelta(chunk) |
Stop spinner (if first), feed chunk into render state |
AssistantMessageEnd { full_text } |
Flush render state, emit trailing newline |
ToolCall { name, args } |
Print dimmed ⚙ Using <name> banner if verbose |
ToolResult { .. } |
Print dimmed result summary if verbose |
AutoContinueTriggered |
Print yellow ⟳ Continuing (N/M, R todos remaining) to stderr |
SessionCompressing |
Print Compressing session... to stderr |
SessionCompressed |
Print Session compressed. to stderr |
SessionAutonamed |
Print Session auto-named: <name> to stderr |
Info(msg) |
Print to stdout |
Warning(msg) |
Print yellow to stderr |
Error(e) |
Print red to stderr |
Finished |
No-op (ensures trailing newline is flushed) |
Verification: write integration tests that construct a TerminalEmitter, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use assert_cmd or similar to snapshot the rendered output of each event variant.
Step 3: Implement Engine::run without wiring it
Implement Engine::run and Engine::run_turn following the pseudocode above. Use the existing helper functions (before_chat_completion, after_chat_completion, apply_prelude, create_client, call_chat_completions, call_chat_completions_streaming, maybe_compress_session, maybe_autoname_session) unchanged, just called through ctx instead of &GlobalConfig.
Implementing dispatch_command is the largest sub-task here because it needs to match all 37 CoreCommand variants and invoke the right ctx methods. Most variants are straightforward one-liners that call a corresponding method on RequestContext. A few need special handling:
CoreCommand::UseRole { name, trailing_text }— activate role, then iftrailing_textisSome, the outerrunwill flow through with the trailing text asUserInput.text.CoreCommand::IncludeFiles— reads files, converts toFileInputlist, attaches toctx's next input (or fails if no input is provided).CoreCommand::StarterRun(id)— looks up the starter text on the active agent, fails if no agent.CoreCommand::Macro— delegates tomacro_execute, which may itself callEngine::runinternally for LLM-triggering macros.
Verification: write unit tests for dispatch_command using NullEmitter. Each test activates a command and asserts the expected state mutation on ctx. This is ~37 tests, one per variant, and they catch the bulk of regressions early.
Then write a handful of integration tests for Engine::run with CollectingEmitter, asserting the expected event sequence for:
- Plain prompt, no tools, streaming
- Plain prompt, no tools, non-streaming
- Prompt that triggers 2 tool calls
- Prompt that triggers auto-continuation (mock the LLM response)
- Prompt on a session that crosses the compression threshold
- Command-only request (
.info) - Command + prompt request (
.role explain "...")
Step 4: Wire CLI to Engine::run
Replace main.rs::start_directive with a thin wrapper:
async fn start_directive(
app: Arc<AppState>,
ctx: &mut RequestContext,
input_text: String,
files: Vec<String>,
code_mode: bool,
) -> Result<()> {
let engine = Engine::new(app.clone());
let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);
let req = RunRequest {
input: Some(UserInput::from_text_and_files(input_text, files)),
command: None,
options: {
let mut o = RunOptions::cli();
o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
o
},
};
match engine.run(ctx, req, &emitter).await {
Ok(_outcome) => Ok(()),
Err(CoreError::Cancelled) => Ok(()),
Err(e) => Err(e.into()),
}
}
Verification: manual smoke test. Run loki "hello", loki --code "write a rust hello world", loki --role explain "what is TCP". All should produce identical output to before the change.
Step 5: Wire REPL to Engine::run
Replace repl/mod.rs::ask with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls run_repl_command stays. run_repl_command for non-dot-command lines constructs a RunRequest { input: Some(...), .. } and calls Engine::run. Dot-commands get parsed into CoreCommand and called as RunRequest { command: Some(...), input: None, .. } (or with input if they carry trailing text).
// In Repl:
async fn handle_line(&mut self, line: &str) -> Result<()> {
let req = if let Some(rest) = line.strip_prefix('.') {
parse_dot_command_to_run_request(rest, &self.ctx)?
} else {
RunRequest {
input: Some(UserInput::from_text(line.to_string())),
command: None,
options: RunOptions::repl_turn(),
}
};
match self.engine.run(&mut self.ctx, req, &self.emitter).await {
Ok(_) => Ok(()),
Err(CoreError::Cancelled) => Ok(()),
Err(e) => {
self.emitter.emit(Event::Error(&e)).await.ok();
Ok(())
}
}
}
Verification: manual smoke test of the REPL. Run through a typical session:
loki→ REPL startshello→ plain prompt works.role explain→ role activateswhat is TCP→ responds under the role.session→ session starts- Several messages → conversation continues
.info session→ info prints.compress session→ compression runs.agent sisyphus→ agent activates with sub-agentswrite a hello world in rust→ tool calls + output.exit agent→ agent exits, previous session still active.exit→ REPL exits
Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.
Step 6: Delete the old start_directive and ask
Once CLI and REPL both route through Engine::run and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run cargo check and cargo test.
Verification: full test suite green, no dead code warnings.
Step 7: Tidy and document
- Add rustdoc comments on
Engine,RunRequest,RunOptions,Emitter,Event,CoreCommand,CoreError. - Add an
examples/subdirectory undersrc/engine/showing how to call the engine with each emitter. - Update
docs/AGENTS.mdwith a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature). - Update
docs/REST-API-ARCHITECTURE.mdto remove any "in Phase 2" placeholders.
Risks and Watch Items
| Risk | Severity | Mitigation |
|---|---|---|
| Terminal rendering regressions | High | Golden-file snapshot tests for every Event variant. Manual smoke tests across all common REPL flows. Keep TerminalEmitter as a thin wrapper — no logic changes in the render code itself. |
| Auto-continuation recursion limits | Medium | The new Engine::run uses Box::pin for the auto-continuation recursive call. Verify with a mock LLM that max_auto_continues = 100 doesn't blow the stack. |
| Cancellation during tool execution | Medium | Tool execution currently uses AbortSignal; the new path uses CancellationToken. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). |
| Command parsing fidelity | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated parse_dot_command_to_run_request function with unit tests for every edge case found in today's code. |
| Macro execution recursion | Medium | .macro can invoke LLM calls, which now go through Engine::run, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. |
| Emitter error propagation | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first EmitError::ClientDisconnected — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. |
| Spinner interleaving with tool output | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. |
Feature flag: auto_continue in CLI |
Low | After Phase 2, CLI could support auto-continuation but it's not exposed. Decision: leave it off by default in RunOptions::cli(), add a --auto-continue flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. |
What Phase 2 Does NOT Do
- No new features. Everything that worked before works the same way after.
- No API server.
JsonEmitterandSseEmitterare placeholders — Phase 4 implements them. - No
SessionStoreabstraction. That's Phase 3. - No
ToolScopeunification. That landed in Phase 1 Step 6.5. - No changes to LLM client code.
call_chat_completionsandcall_chat_completions_streamingkeep their existing signatures. - No MCP factory pooling. That's Phase 5.
- No dot-command syntax changes. The REPL still accepts exactly the same dot-commands; they just parse into
CoreCommandinstead of being hand-dispatched inrun_repl_command.
The sole goal of Phase 2 is: extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.
Entry Criteria (from Phase 1)
Before starting Phase 2, Phase 1 must be complete:
GlobalConfigtype alias is removedAppStateandRequestContextare the only state holders- All 91 callsites in the original migration table have been updated
cargo testpasses with noConfig-based tests remaining- CLI and REPL manual smoke tests pass identically to pre-Phase-1
Exit Criteria (Phase 2 complete)
src/engine/module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreErrorTerminalEmitterimplemented and wrapping all existing render pathsNullEmitterandCollectingEmitterimplementedstart_directivein main.rs is a thin wrapper aroundEngine::run- REPL's per-line handler routes through
Engine::run - All 37
CoreCommandvariants implemented with unit tests - Integration tests for the 7 engine scenarios listed in Step 3
- Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
cargo check,cargo test,cargo clippyall clean- Phase 3 (SessionStore abstraction) can begin