This commit is contained in:
2026-04-15 12:56:00 -06:00
parent ff3419a714
commit 63b6678e73
82 changed files with 14800 additions and 3310 deletions
+108
View File
@@ -0,0 +1,108 @@
# Phase 1 QA — Test Implementation Plan
## Purpose
Verify that all existing Loki behaviors are preserved after the
Phase 1 refactoring (Config god-state → AppState + RequestContext
split). Tests should validate behavior, not implementation details,
unless a specific implementation pattern is fragile and needs
regression protection.
## Reference codebases
- **Old code**: `~/code/testing/loki` (branch: `develop`)
- **New code**: `~/code/loki` (branch: working branch with Phase 1)
## Process (per iteration)
1. Read the previous iteration's test implementation notes (if any)
2. Read the test plan file for the current feature area
3. Read the old code to identify the logic that creates those flows
4. While reading old code:
- Note additional behaviors not in the plan file → update the file
- Note feature overlaps / context-switching scenarios → add tests
5. Create unit/integration tests in the new code
6. Ensure all tests pass
7. Write test implementation notes for the iteration
8. Pause for user approval before proceeding to next iteration
## Test philosophy
- **Behavior over implementation**: Test what the system DOES, not
HOW it does it internally
- **Exception**: If implementation logic is fragile and a slight
change would break Loki, add an implementation-specific test
- **No business logic changes**: Only modify non-test code if a
genuine bug is discovered (old behavior missing in new code)
- **Context switching**: Pay special attention to state transitions
(role→agent, MCP-enabled→disabled, etc.)
## Test location
All new tests go in `tests/` directory as integration tests, or
inline as `#[cfg(test)] mod tests` in the relevant source file,
depending on what's being tested:
- **Unit tests** (pure logic, no I/O): inline in source file
- **Integration tests** (multi-module, state transitions): `tests/`
- **Behavior tests** (config parsing, tool resolution): can be either
## Feature areas (test plan files)
Each feature area has a plan file in `docs/testing/plans/`. The
files are numbered for execution order (dependencies first):
| # | File | Feature area | Priority |
|---|---|---|---|
| 01 | `01-config-and-appconfig.md` | Config loading, AppConfig fields, defaults | High |
| 02 | `02-roles.md` | Role loading, retrieval, role-likes, temp roles | High |
| 03 | `03-sessions.md` | Session create/load/save, compression, autoname | High |
| 04 | `04-agents.md` | Agent init, tool compilation, variables, lifecycle | Critical |
| 05 | `05-mcp-lifecycle.md` | MCP server start/stop, factory, runtime, scope transitions | Critical |
| 06 | `06-tool-evaluation.md` | eval_tool_calls, ToolCall dispatch, tool handlers | Critical |
| 07 | `07-input-construction.md` | Input::from_str, from_files, field capturing, function selection | High |
| 08 | `08-request-context.md` | RequestContext methods, scope transitions, state management | Critical |
| 09 | `09-repl-commands.md` | REPL command handlers, state assertions, argument parsing | High |
| 10 | `10-cli-flags.md` | CLI argument handling, mode switching, early exits | High |
| 11 | `11-sub-agent-spawning.md` | Supervisor, child agents, escalation, messaging | Critical |
| 12 | `12-rag.md` | RAG init/load/search, embeddings, document management | Medium |
| 13 | `13-completions-and-prompt.md` | Tab completion, prompt rendering, highlighter | Medium |
| 14 | `14-macros.md` | Macro loading, execution, variable interpolation | Medium |
| 15 | `15-vault.md` | Secret management, interpolation in MCP config | Medium |
| 16 | `16-functions-and-tools.md` | Function declarations, tool compilation, binaries | High |
## Iteration tracking
Each completed iteration produces a notes file at:
`docs/testing/notes/ITERATION-<N>-NOTES.md`
These notes contain:
- Which plan file(s) were addressed
- Tests created (file paths, test names)
- Bugs discovered (if any)
- Observations for future iterations
- Updates made to other plan files
## Intentional improvements (NEW ≠ OLD)
These are behavioral changes that are intentional and should NOT
be tested for old-code parity:
| # | What | Old | New |
|---|---|---|---|
| 1 | Agent list hides `.shared` | Shown | Hidden |
| 2 | Tool file priority | Filesystem order | .sh > .py > .ts > .js |
| 3 | MCP disabled + agent | Warning, continues | Error, blocks |
| 4 | Role MCP warning | Always when mcp_support=false | Only when role has MCP |
| 5 | Enabled tools completions | Shows internal tools | Hides user__/mcp_/todo__/agent__ |
| 6 | MCP server completions | Only aliases | Configured servers + aliases |
## How to pick up in a new session
If context is lost (new chat session):
1. Read this file first
2. Read the latest `docs/testing/notes/ITERATION-<N>-NOTES.md`
3. That file tells you which plan file to work on next
4. Read that plan file
5. Follow the process above
+52
View File
@@ -0,0 +1,52 @@
# Iteration 1 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/01-config-and-appconfig.md`
## Tests created
| File | Test name | What it verifies |
|---|---|---|
| `src/config/mod.rs` | `config_defaults_match_expected` | All Config::default() fields match old code values |
| `src/config/app_config.rs` | `to_app_config_copies_serialized_fields` | to_app_config copies model_id, temperature, top_p, dry_run, stream, save, highlight, compression_threshold, rag_top_k |
| `src/config/app_config.rs` | `to_app_config_copies_clients` | clients field populated (empty by default) |
| `src/config/app_config.rs` | `to_app_config_copies_mapping_fields` | mapping_tools and mapping_mcp_servers copied correctly |
| `src/config/app_config.rs` | `editor_returns_configured_value` | editor() returns configured value |
| `src/config/app_config.rs` | `editor_falls_back_to_env` | editor() doesn't panic without config |
| `src/config/app_config.rs` | `light_theme_default_is_false` | light_theme() default |
| `src/config/app_config.rs` | `sync_models_url_has_default` | sync_models_url() has non-empty default |
| `src/config/request_context.rs` | `to_request_context_creates_clean_state` | RequestContext starts with clean state (no role/session/agent, empty tool_scope, no agent_runtime) |
| `src/config/request_context.rs` | `update_app_config_persists_changes` | Dynamic config updates via clone-mutate-replace persist |
**Total: 10 new tests (59 → 69)**
## Bugs discovered
None. The `save` default was `false` in both old and new code
(my plan file incorrectly said `true` — corrected).
## Observations for future iterations
1. The `Config::default().save` is `false`, but the plan file
01 incorrectly listed it as `true`. Plan file should be
updated to reflect the actual default.
2. `AppConfig::default()` doesn't exist natively (no derive).
Tests construct it via `Config::default().to_app_config()`.
This is fine since that's how it's created in production.
3. The `visible_tools` field computation happens during
`Config::init` (not `to_app_config`). Testing the full
visible_tools resolution requires integration-level testing
with actual tool files. Deferred to plan file 16
(functions-and-tools).
4. Testing `Config::init` directly is difficult because it reads
from the filesystem, starts MCP servers, etc. The unit tests
focus on the conversion paths which are the Phase 1 surface.
## Next iteration
Plan file 02: Roles — role loading, retrieve_role, use_role/exit_role,
use_prompt, extract_role, one-shot role messages, MCP context switching.
+71
View File
@@ -0,0 +1,71 @@
# Iteration 2 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/02-roles.md`
## Tests created
### src/config/role.rs (12 new tests, 15 total)
| Test name | What it verifies |
|---|---|
| `role_new_parses_prompt` | Role::new extracts prompt text |
| `role_new_parses_metadata` | Metadata block parses model, temperature, top_p |
| `role_new_parses_enabled_tools` | enabled_tools from metadata |
| `role_new_parses_enabled_mcp_servers` | enabled_mcp_servers from metadata |
| `role_new_no_metadata_has_none_fields` | No metadata → all optional fields None |
| `role_builtin_shell_loads` | Built-in "shell" role loads |
| `role_builtin_code_loads` | Built-in "code" role loads |
| `role_builtin_nonexistent_errors` | Non-existent built-in → error |
| `role_default_has_empty_fields` | Default role has empty name/prompt |
| `role_set_model_updates_model` | set_model() changes the model |
| `role_set_temperature_works` | set_temperature() changes temperature |
| `role_export_includes_metadata` | export() includes metadata and prompt |
### src/config/request_context.rs (5 new tests, 7 total)
| Test name | What it verifies |
|---|---|
| `use_role_obj_sets_role` | use_role_obj sets role on ctx |
| `exit_role_clears_role` | exit_role clears role from ctx |
| `use_prompt_creates_temp_role` | use_prompt creates TEMP_ROLE_NAME role |
| `extract_role_returns_standalone_role` | extract_role returns active role |
| `extract_role_returns_default_when_nothing_active` | extract_role returns default role |
**Total: 17 new tests (69 → 86)**
## Bugs discovered
None. Role parsing behavior matches between old and new code.
## Observations for future iterations
1. `retrieve_role` (which calls `Model::retrieve_model`) can't be
easily unit-tested without a real client config. It depends on
having at least one configured client. Deferred to integration
testing or plan 08 (RequestContext scope transitions).
2. The `use_role` async method (which calls `rebuild_tool_scope`)
requires async test runtime and MCP infrastructure. Deferred to
plan 05 (MCP lifecycle) and 08 (RequestContext).
3. `use_role_obj` correctly rejects when agent is active — tested
implicitly through the error path, but creating a mock Agent
is complex. Noted for plan 04 (agents).
4. The `extract_role` priority order (session > agent > role > default)
is important behavioral contract. Tests verify the role and
default cases. Session and agent cases deferred to plans 03, 04.
5. Added `create_test_ctx()` helper to request_context.rs tests.
Future iterations should reuse this.
## Plan file updates
Updated 02-roles.md to mark completed items.
## Next iteration
Plan file 03: Sessions — session create/load/save, compression,
autoname, carry-over, exit, context switching.
+76
View File
@@ -0,0 +1,76 @@
# Iteration 3 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/03-sessions.md`
## Tests created
### src/config/session.rs (15 new tests)
| Test name | What it verifies |
|---|---|
| `session_default_is_empty` | Default session is empty, no name, no role, not dirty |
| `session_new_from_ctx_captures_save_session` | new_from_ctx captures name, empty, not dirty |
| `session_set_role_captures_role_info` | set_role copies model_id, temperature, role_name, marks dirty |
| `session_clear_role` | clear_role removes role_name |
| `session_guard_empty_passes_when_empty` | guard_empty OK when empty |
| `session_needs_compression_threshold` | Empty session doesn't need compression |
| `session_needs_compression_returns_false_when_compressing` | Already compressing → false |
| `session_needs_compression_returns_false_when_threshold_zero` | Zero threshold → false |
| `session_set_compressing_flag` | set_compressing toggles flag |
| `session_set_save_session_this_time` | Doesn't panic |
| `session_save_session_returns_configured_value` | save_session get/set roundtrip |
| `session_compress_moves_messages` | compress moves messages to compressed, adds system |
| `session_is_not_empty_after_compress` | Session with compressed messages is not empty |
| `session_need_autoname_default_false` | Default session doesn't need autoname |
| `session_set_autonaming_doesnt_panic` | set_autonaming safe without autoname |
### src/config/request_context.rs (4 new tests, 11 total)
| Test name | What it verifies |
|---|---|
| `exit_session_clears_session` | exit_session removes session from ctx |
| `empty_session_clears_messages` | empty_session keeps session but clears it |
| `maybe_compress_session_returns_false_when_no_session` | No session → no compression |
| `maybe_autoname_session_returns_false_when_no_session` | No session → no autoname |
**Total: 19 new tests (86 → 105)**
## Bugs discovered
None. Session behavior matches between old and new code.
## Observations for future iterations
1. `Session::new_from_ctx` and `Session::load_from_ctx` have
`#[allow(dead_code)]` annotations — they were bridge methods.
Should verify if they're still needed or if the old `Session::new`
and `Session::load` (which take `&Config`) should be cleaned up
in a future pass.
2. The `compress` method moves messages to `compressed_messages` and
adds a single system message with the summary. This is a critical
behavioral contract — if the summary format changes, sessions
could break.
3. `needs_compression` uses `self.compression_threshold` (session-
level) with fallback to the global threshold. This priority
(session > global) is important behavior.
4. Session carry-over (the "incorporate last Q&A?" prompt) happens
inside `use_session` which is async and involves user interaction
(inquire::Confirm). Can't unit test this — needs integration test
or manual verification.
5. The `extract_role` test for session-active case should verify that
`session.to_role()` is returned. Added note to plan 02.
## Plan file updates
Updated 03-sessions.md to mark completed items.
## Next iteration
Plan file 04: Agents — agent init, tool compilation, variables,
lifecycle, MCP, RAG, auto-continuation.
+71
View File
@@ -0,0 +1,71 @@
# Iteration 4 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/04-agents.md`
## Tests created
### src/config/agent.rs (4 new tests)
| Test name | What it verifies |
|---|---|
| `agent_config_parses_from_yaml` | Full AgentConfig YAML with all fields |
| `agent_config_defaults` | Minimal AgentConfig gets correct defaults |
| `agent_config_with_model` | model_id, temperature, top_p from YAML |
| `agent_config_inject_defaults_true` | inject_todo/spawn_instructions default true |
### src/config/agent_runtime.rs (2 new tests)
| Test name | What it verifies |
|---|---|
| `agent_runtime_new_defaults` | All fields default correctly |
| `agent_runtime_builder_pattern` | with_depth, with_parent_supervisor work |
### src/config/request_context.rs (6 new tests, 17 total)
| Test name | What it verifies |
|---|---|
| `exit_agent_clears_all_agent_state` | exit_agent clears agent, agent_runtime, rag |
| `current_depth_returns_zero_without_agent` | Default depth is 0 |
| `current_depth_returns_agent_runtime_depth` | Depth from agent_runtime |
| `supervisor_returns_none_without_agent` | No agent → no supervisor |
| `inbox_returns_none_without_agent` | No agent → no inbox |
| `root_escalation_queue_returns_none_without_agent` | No agent → no queue |
**Total: 12 new tests (105 → 117)**
## Bugs discovered
None.
## Observations for future iterations
1. `Agent::init` can't be unit tested easily — requires agent config
files, tool files on disk. Integration tests with temp directories
would be needed for full coverage.
2. AgentConfig default values verified:
- `max_concurrent_agents` = 4
- `max_agent_depth` = 3
- `max_auto_continues` = 10
- `inject_todo_instructions` = true
- `inject_spawn_instructions` = true
These are important behavioral contracts.
3. The `exit_agent` test shows that clearing agent state also
rebuilds the tool_scope with fresh functions. This is the
correct behavior for returning to the global context.
4. Agent variable interpolation (special vars like __os__, __cwd__)
happens in Agent::init which is filesystem-dependent. Deferred.
5. `list_agents()` (which filters hidden dirs) is tested via the
`.shared` exclusion noted in improvements. Could add a unit test
with a temp dir if needed.
## Next iteration
Plan file 05: MCP Lifecycle — the most critical test area. McpFactory,
McpRuntime, spawn_mcp_server, rebuild_tool_scope MCP integration,
scope transition MCP behavior.
@@ -0,0 +1,62 @@
# Test Plan: Config Loading and AppConfig
## Feature description
Loki loads its configuration from a YAML file (`config.yaml`) into
a `Config` struct, then converts it to `AppConfig` (immutable,
shared) + `RequestContext` (mutable, per-request). The `AppConfig`
holds all serialized fields; `RequestContext` holds runtime state.
## Behaviors to test
### Config loading
- [ ] Config loads from YAML file with all supported fields
- [x] Missing optional fields get correct defaults (config_defaults_match_expected)
- [ ] `model_id` defaults to first available model if empty (requires Config::init, integration test)
- [x] `temperature`, `top_p` default to `None`
- [x] `stream` defaults to `true`
- [x] `save` defaults to `false` (CORRECTED: was listed as true)
- [x] `highlight` defaults to `true`
- [x] `dry_run` defaults to `false`
- [x] `function_calling_support` defaults to `true`
- [x] `mcp_server_support` defaults to `true`
- [x] `compression_threshold` defaults to `4000`
- [ ] `document_loaders` populated from config and defaults (requires Config::init)
- [x] `clients` parsed from config (to_app_config_copies_clients)
### AppConfig conversion
- [x] `to_app_config()` copies all serialized fields correctly
- [x] `clients` field populated on AppConfig
- [ ] `visible_tools` correctly computed from `enabled_tools` config (deferred to plan 16)
- [x] `mapping_tools` correctly parsed
- [x] `mapping_mcp_servers` correctly parsed
- [ ] `user_agent` resolved (auto → crate name/version)
### RequestContext conversion
- [x] `to_request_context()` copies all runtime fields (to_request_context_creates_clean_state)
- [ ] `model` field populated with resolved model (requires Model::retrieve_model)
- [ ] `working_mode` set correctly (Repl vs Cmd)
- [x] `tool_scope` starts with default (empty)
- [x] `agent_runtime` starts as `None`
### AppConfig field accessors
- [x] `editor()` returns configured editor or $EDITOR
- [x] `light_theme()` returns theme flag
- [ ] `render_options()` returns options for markdown rendering
- [x] `sync_models_url()` returns configured or default URL
### Dynamic config updates
- [x] `update_app_config` closure correctly clones and replaces Arc
- [x] Changes to `dry_run`, `stream`, `save` persist across calls
- [x] Changes visible to subsequent `ctx.app.config` reads
## Context switching scenarios
- [ ] AppConfig remains immutable after construction (no field mutation)
- [ ] Multiple RequestContexts can share the same AppState
- [ ] Changing AppConfig fields (via clone-mutate-replace) doesn't
affect other references to the old Arc
## Old code reference
- `src/config/mod.rs``Config` struct, `Config::init`, defaults
- `src/config/bridge.rs``to_app_config`, `to_request_context`
- `src/config/app_config.rs``AppConfig` struct and methods
+68
View File
@@ -0,0 +1,68 @@
# Test Plan: Roles
## Feature description
Roles define a system prompt + optional model/temperature/MCP config
that customizes LLM behavior. Roles can be built-in or user-defined
(markdown files). Roles are "role-likes" — sessions and agents also
implement the RoleLike trait.
## Behaviors to test
### Role loading
- [x] Built-in roles load correctly (shell, code)
- [ ] User-defined roles load from markdown files (requires filesystem)
- [x] Role parses model_id from metadata
- [x] Role parses temperature, top_p from metadata
- [x] Role parses enabled_tools from metadata
- [x] Role parses enabled_mcp_servers from metadata
- [ ] Role with no model_id inherits current model (requires retrieve_role + client config)
- [ ] Role with no temperature inherits from AppConfig (requires retrieve_role)
- [ ] Role with no top_p inherits from AppConfig (requires retrieve_role)
### retrieve_role
- [ ] Retrieves by name from file system
- [ ] Resolves model via Model::retrieve_model
- [ ] Falls back to current model if role has no model_id
- [ ] Sets temperature/top_p from AppConfig when role doesn't specify
### use_role (scope transition)
- [x] Sets role on RequestContext (use_role_obj_sets_role)
- [ ] Triggers rebuild_tool_scope (async, deferred to plan 05/08)
- [ ] MCP servers start if role has enabled_mcp_servers (deferred to plan 05)
- [ ] MCP meta functions added to function list (deferred to plan 05)
- [ ] Previous role cleared when switching (deferred to plan 08)
- [x] Role-like temperature/top_p take effect (role_set_temperature_works)
### exit_role
- [x] Clears role from RequestContext (exit_role_clears_role)
- [ ] Followed by bootstrap_tools to restore global tool scope (async, deferred)
- [ ] MCP servers from role are stopped (deferred to plan 05)
- [ ] Global MCP servers restored (deferred to plan 05)
### use_prompt (temp role)
- [x] Creates a TEMP_ROLE_NAME role with the prompt text (use_prompt_creates_temp_role)
- [x] Uses current model
- [x] Activates via use_role_obj
### extract_role
- [ ] Returns role from agent if agent active (deferred to plan 04)
- [ ] Returns role from session if session active with role (deferred to plan 03)
- [x] Returns standalone role if active (extract_role_returns_standalone_role)
- [x] Returns default role if none active (extract_role_returns_default_when_nothing_active)
### One-shot role messages (REPL)
- [ ] `.role coder write hello` sends message with role, then exits role
- [ ] Original state restored after one-shot
## Context switching scenarios
- [ ] Role → different role: old role replaced, MCP swapped
- [ ] Role → session: role cleared, session takes over
- [ ] Role with MCP → exit: MCP servers stop, global MCP restored
- [ ] No MCP → role with MCP: servers start
- [ ] Role with MCP → role without MCP: servers stop
## Old code reference
- `src/config/mod.rs``use_role`, `exit_role`, `retrieve_role`
- `src/config/role.rs``Role` struct, parsing
- `src/config/request_context.rs``use_role`, `exit_role`, `use_prompt`, `retrieve_role`
+66
View File
@@ -0,0 +1,66 @@
# Test Plan: Sessions
## Feature description
Sessions persist conversation history across multiple turns. They
store messages, role context, model info, and optional MCP config.
Sessions can be temporary, named, or auto-named.
## Behaviors to test
### Session creation
- [ ] Temp session created with TEMP_SESSION_NAME
- [ ] Named session created at correct file path
- [ ] New session captures current role via extract_role
- [ ] New session captures save_session from AppConfig
- [ ] Session tracks model_id
### Session loading
- [ ] Named session loads from YAML file
- [ ] Loaded session resolves model via Model::retrieve_model
- [ ] Loaded session restores role_prompt if role exists
- [ ] Auto-named sessions (prefixed `_/`) handled correctly
### Session saving
- [ ] Session saved to correct path
- [ ] Session file contains messages, model_id, role info
- [ ] save_session flag controls whether session is persisted
- [ ] set_save_session_this_time overrides for current turn
### Session lifecycle
- [ ] use_session creates or loads session
- [ ] Already in session → error
- [ ] exit_session saves and clears
- [ ] empty_session clears messages but keeps session active
### Session carry-over
- [ ] New empty session with last_message prompts "incorporate?"
- [ ] If accepted, last Q&A added to session
- [ ] If declined, session starts fresh
- [ ] Only prompts when continuous and output not empty
### Session compression
- [ ] maybe_compress_session returns true when threshold exceeded
- [ ] compress_session reduces message count
- [ ] Compression message shown to user
- [ ] Session usable after compression
### Session autoname
- [ ] maybe_autoname_session returns true for new sessions
- [ ] Auto-naming sets session name based on content
- [ ] Autoname only triggers once per session
### Session info
- [ ] session_info returns formatted session details
- [ ] Shows message count, model, role, tokens
## Context switching scenarios
- [ ] Session → role change: role updated within session
- [ ] Session → exit session: messages saved, state cleared
- [ ] Agent session → exit: agent session cleanup
- [ ] Session with MCP → exit: MCP servers handled
## Old code reference
- `src/config/mod.rs``use_session`, `exit_session`, `empty_session`
- `src/config/session.rs``Session` struct, new, load, save
- `src/config/request_context.rs``use_session`, `exit_session`
+77
View File
@@ -0,0 +1,77 @@
# Test Plan: Agents
## Feature description
Agents combine a role (instructions), tools (bash/python/ts scripts),
optional RAG, optional MCP servers, and optional sub-agent spawning
capability. Agent::init compiles tools, resolves model, loads RAG,
and sets up the agent environment.
## Behaviors to test
### Agent initialization
- [ ] Agent::init loads config.yaml from agent directory
- [ ] Agent tools compiled from tools.sh / tools.py / tools.ts
- [ ] Tool file priority: .sh > .py > .ts > .js
- [ ] Global tools loaded (from global_tools config)
- [ ] Model resolved from agent config or defaults to current
- [ ] Agent with no model_id uses current model
- [ ] Temperature/top_p from agent config applied
- [ ] Dynamic instructions (_instructions function) invoked if configured
- [ ] Static instructions loaded from config
- [ ] Agent variables interpolated into instructions
- [ ] Special variables (__os__, __cwd__, __now__, etc.) interpolated
- [ ] Agent .env file loaded if present
- [ ] Built-in agents installed on first run (skip if exists)
### Agent tools
- [ ] Agent-specific tools available as function declarations
- [ ] Global tools (from global_tools) also available
- [ ] Tool binaries built in agent bin directory
- [ ] clear_agent_bin_dir removes old binaries before rebuild
- [ ] Tool declarations include name, description, parameters
### Agent with MCP
- [ ] MCP servers listed in agent config started
- [ ] MCP meta functions (invoke/search/describe) added
- [ ] Agent with MCP but mcp_server_support=false → error
- [ ] MCP servers stopped on agent exit
### Agent with RAG
- [ ] RAG documents loaded from agent config
- [ ] RAG available during agent conversation
- [ ] RAG search results included in context
### Agent sessions
- [ ] Agent session started (temp or named)
- [ ] agent_session config used if no explicit session
- [ ] Agent session variables initialized
### Agent lifecycle
- [ ] use_agent checks function_calling_support
- [ ] use_agent errors if agent already active
- [ ] exit_agent clears agent, session, rag, supervisor
- [ ] exit_agent restores global tool scope
### Auto-continuation
- [ ] Agents with auto_continue=true continue after incomplete todos
- [ ] max_auto_continues limits continuation attempts
- [ ] Continuation prompt sent with todo state
- [ ] clear todo stops continuation
### Conversation starters
- [ ] Starters loaded from agent config
- [ ] .starter lists available starters
- [ ] .starter <n> sends the starter as a message
## Context switching scenarios
- [ ] Agent → exit: tools cleared, MCP stopped, session ended
- [ ] Agent with MCP → exit: MCP servers released, global MCP restored
- [ ] Already in agent → start agent: error
- [ ] Agent with RAG → exit: RAG cleared
## Old code reference
- `src/config/agent.rs` — Agent::init, agent config parsing
- `src/config/mod.rs` — use_agent, exit_agent
- `src/config/request_context.rs` — use_agent, exit_agent
- `src/function/mod.rs` — Functions::init_agent, tool compilation
+98
View File
@@ -0,0 +1,98 @@
# Test Plan: MCP Server Lifecycle
## Feature description
MCP (Model Context Protocol) servers are external tools that run
as subprocesses communicating via stdio. Loki manages their lifecycle
through McpFactory (start/share via Weak dedup) and McpRuntime
(per-scope active server handles). Servers are started/stopped
during scope transitions (role/session/agent enter/exit).
## Behaviors to test
### MCP config loading
- [ ] mcp.json parsed correctly from functions directory
- [ ] Server specs include command, args, env, cwd
- [ ] Vault secrets interpolated in mcp.json
- [ ] Missing secrets reported as warnings
- [ ] McpServersConfig stored on AppState.mcp_config
### McpFactory
- [ ] acquire() spawns new server when none active
- [ ] acquire() returns existing handle via Weak upgrade
- [ ] acquire() spawns fresh when Weak is dead
- [ ] Multiple acquire() calls for same spec share handle
- [ ] Different specs get different handles
- [ ] McpServerKey built correctly from spec (sorted args/env)
### McpRuntime
- [ ] insert() adds server handle by name
- [ ] get() retrieves handle by name
- [ ] server_names() returns all active names
- [ ] is_empty() correct for empty/non-empty
- [ ] search() finds tools by keyword (BM25 ranking)
- [ ] describe() returns tool input schema
- [ ] invoke() calls tool on server and returns result
### spawn_mcp_server
- [ ] Builds Command from spec (command, args, env, cwd)
- [ ] Creates TokioChildProcess transport
- [ ] Completes rmcp handshake (serve)
- [ ] Returns Arc<ConnectedServer>
- [ ] Log file created when log_path provided
### rebuild_tool_scope (MCP integration)
- [ ] Empty enabled_mcp_servers → no servers acquired
- [ ] "all" → all configured servers acquired
- [ ] Comma-separated list → only listed servers acquired
- [ ] Mapping resolution: alias → actual server key(s)
- [ ] MCP meta functions appended for each started server
- [ ] Old ToolScope dropped (releasing old server handles)
- [ ] Loading spinner shown during acquisition
- [ ] AbortSignal properly threaded through
### Server lifecycle during scope transitions
- [ ] Enter role with MCP: servers start
- [ ] Exit role: servers stop (handle dropped)
- [ ] Enter role A (MCP-X) → exit → enter role B (MCP-Y):
X stops, Y starts
- [ ] Enter role with MCP → exit to no MCP: servers stop,
global MCP restored
- [ ] Start REPL with global MCP → enter agent with different MCP:
agent MCP takes over
- [ ] Exit agent: agent MCP stops, global MCP restored
### MCP tool invocation chain
- [ ] LLM calls mcp__search_<server> → search results returned
- [ ] LLM calls mcp__describe_<server> tool_name → schema returned
- [ ] LLM calls mcp__invoke_<server> tool args → tool executed
- [ ] Server not found → "MCP server not found in runtime" error
- [ ] Tool not found → appropriate error
### MCP support flag
- [ ] mcp_server_support=false → no MCP servers started
- [ ] mcp_server_support=false + agent with MCP → error (blocks)
- [ ] mcp_server_support=false + role with MCP → warning, continues
- [ ] .set mcp_server_support true → MCP servers start
### MCP in child agents
- [ ] Child agent MCP servers acquired via factory
- [ ] Child agent MCP runtime populated
- [ ] Child agent MCP tool invocations work
- [ ] Child agent exit drops MCP handles
## Context switching scenarios (comprehensive)
- [ ] No MCP → role with MCP → exit role → no MCP
- [ ] Global MCP-A → role MCP-B → exit role → global MCP-A
- [ ] Global MCP-A → agent MCP-B → exit agent → global MCP-A
- [ ] Role MCP-A → session MCP-B (overrides) → exit session
- [ ] Agent MCP → child agent MCP → child exits → parent MCP intact
- [ ] .set enabled_mcp_servers X → .set enabled_mcp_servers Y:
X released, Y acquired
- [ ] .set enabled_mcp_servers null → all released
## Old code reference
- `src/mcp/mod.rs` — McpRegistry, init, reinit, start/stop
- `src/config/mcp_factory.rs` — McpFactory, acquire, McpServerKey
- `src/config/tool_scope.rs` — ToolScope, McpRuntime
- `src/config/request_context.rs` — rebuild_tool_scope, bootstrap_tools
+59
View File
@@ -0,0 +1,59 @@
# Test Plan: Tool Evaluation
## Feature description
When the LLM returns tool calls, `eval_tool_calls` dispatches each
call to the appropriate handler. Handlers include: shell tools
(bash/python/ts scripts), MCP tools, supervisor tools (agent spawn),
todo tools, and user interaction tools.
## Behaviors to test
### eval_tool_calls dispatch
- [ ] Calls dispatched to correct handler by function name prefix
- [ ] Tool results returned for each call
- [ ] Multiple concurrent tool calls processed
- [ ] Tool call tracker updated (chain length, repeats)
- [ ] Root agent (depth 0) checks escalation queue after eval
- [ ] Escalation notifications injected into results
### ToolCall::eval routing
- [ ] agent__* → handle_supervisor_tool
- [ ] todo__* → handle_todo_tool
- [ ] user__* → handle_user_tool (depth 0) or escalate (depth > 0)
- [ ] mcp_invoke_* → invoke_mcp_tool
- [ ] mcp_search_* → search_mcp_tools
- [ ] mcp_describe_* → describe_mcp_tool
- [ ] Other → shell tool execution
### Shell tool execution
- [ ] Tool binary found and executed
- [ ] Arguments passed correctly
- [ ] Environment variables set (LLM_OUTPUT, etc.)
- [ ] Tool output returned as result
- [ ] Tool failure → error returned as tool result (not panic)
### Tool call tracking
- [ ] Tracker counts consecutive identical calls
- [ ] Max repeats triggers warning
- [ ] Chain length tracked across turns
- [ ] Tracker state preserved across tool-result loops
### Function selection
- [ ] select_functions filters by role's enabled_tools
- [ ] select_functions includes MCP meta functions for enabled servers
- [ ] select_functions includes agent functions when agent active
- [ ] "all" enables all functions
- [ ] Comma-separated list enables specific functions
## Context switching scenarios
- [ ] Tool calls during agent → agent tools available
- [ ] Tool calls during role → role tools available
- [ ] Tool calls with MCP → MCP invoke/search/describe work
- [ ] No agent → no agent__/todo__ tools in declarations
## Old code reference
- `src/function/mod.rs` — eval_tool_calls, ToolCall::eval
- `src/function/supervisor.rs` — handle_supervisor_tool
- `src/function/todo.rs` — handle_todo_tool
- `src/function/user_interaction.rs` — handle_user_tool
@@ -0,0 +1,58 @@
# Test Plan: Input Construction
## Feature description
`Input` encapsulates a single chat turn's data: text, files, role,
model, session context, RAG embeddings, and function declarations.
It's constructed at the start of each turn and captures all needed
state from `RequestContext`.
## Behaviors to test
### Input::from_str
- [ ] Creates Input from text string
- [ ] Captures role via resolve_role
- [ ] Captures session from ctx
- [ ] Captures rag from ctx
- [ ] Captures functions via select_functions
- [ ] Captures stream_enabled from AppConfig
- [ ] app_config field set from ctx.app.config
- [ ] Empty text → is_empty() returns true
### Input::from_files
- [ ] Loads file contents
- [ ] Supports multiple files
- [ ] Supports directories (recursive)
- [ ] Supports URLs (fetches content)
- [ ] Supports loader syntax (e.g., jina:url)
- [ ] Last message carry-over (%% syntax)
- [ ] Combines file content with text
- [ ] document_loaders from AppConfig used
### resolve_role
- [ ] Returns provided role if given
- [ ] Extracts role from agent if agent active
- [ ] Extracts role from session if session has role
- [ ] Returns default model-based role otherwise
- [ ] with_session flag set correctly
- [ ] with_agent flag set correctly
### Input methods
- [ ] stream() returns stream_enabled && !model.no_stream()
- [ ] create_client() uses app_config to init client
- [ ] prepare_completion_data() uses captured functions
- [ ] build_messages() uses captured session
- [ ] echo_messages() uses captured session
- [ ] set_regenerate(role) refreshes role
- [ ] use_embeddings() searches RAG if present
- [ ] merge_tool_results() creates continuation input
## Context switching scenarios
- [ ] Input with agent → agent functions selected
- [ ] Input with MCP → MCP meta functions in declarations
- [ ] Input with RAG → embeddings included after use_embeddings
- [ ] Input without session → no session messages in build_messages
## Old code reference
- `src/config/input.rs` — Input struct, from_str, from_files
- `src/config/mod.rs` — select_functions, extract_role
+69
View File
@@ -0,0 +1,69 @@
# Test Plan: RequestContext
## Feature description
`RequestContext` is the per-request mutable state container. It holds
the active model, role, session, agent, RAG, tool scope, and agent
runtime. It provides methods for scope transitions, state queries,
and chat completion lifecycle.
## Behaviors to test
### State management
- [ ] info() returns formatted system info
- [ ] state() returns correct StateFlags combination
- [ ] current_model() returns active model
- [ ] role_info(), session_info(), rag_info(), agent_info() format correctly
- [ ] sysinfo() returns system details
- [ ] working_mode correctly distinguishes Repl vs Cmd
### Scope transitions
- [ ] use_role changes role, rebuilds tool scope
- [ ] use_session creates/loads session, rebuilds tool scope
- [ ] use_agent initializes agent with all subsystems
- [ ] exit_role clears role
- [ ] exit_session saves and clears session
- [ ] exit_agent clears agent, supervisor, rag, session
- [ ] exit_rag clears rag
- [ ] bootstrap_tools rebuilds tool scope with global MCP
### Chat completion lifecycle
- [ ] before_chat_completion sets up for API call
- [ ] after_chat_completion saves messages, updates state
- [ ] discontinuous_last_message marks last message as non-continuous
### ToolScope management
- [ ] rebuild_tool_scope creates fresh Functions
- [ ] rebuild_tool_scope acquires MCP servers via factory
- [ ] rebuild_tool_scope appends user interaction functions in REPL mode
- [ ] rebuild_tool_scope appends MCP meta functions for started servers
- [ ] Tool tracker preserved across scope rebuilds
### AgentRuntime management
- [ ] agent_runtime populated by use_agent
- [ ] agent_runtime cleared by exit_agent
- [ ] Accessor methods (current_depth, supervisor, inbox, etc.) return
correct values when agent active
- [ ] Accessor methods return defaults when no agent
### Settings update
- [ ] update() handles all .set keys correctly
- [ ] update_app_config() clones and replaces Arc properly
- [ ] delete() handles all delete subcommands
### Session helpers
- [ ] list_sessions() returns session names
- [ ] list_autoname_sessions() returns auto-named sessions
- [ ] session_file() returns correct path
- [ ] save_session() persists session
- [ ] empty_session() clears messages
## Context switching scenarios
- [ ] No state → use_role → exit_role → no state
- [ ] No state → use_agent → exit_agent → no state
- [ ] Role → use_agent (error: agent requires exiting role first)
- [ ] Agent → exit_agent → use_role (clean transition)
## Old code reference
- `src/config/request_context.rs` — all methods
- `src/config/mod.rs` — original Config methods (for parity)
+61
View File
@@ -0,0 +1,61 @@
# Test Plan: REPL Commands
## Feature description
The REPL processes dot-commands (`.role`, `.session`, `.agent`, etc.)
and plain text (chat messages). Each command has state assertions
(e.g., `.info role` requires an active role).
## Behaviors to test
### Command parsing
- [ ] Dot-commands parsed correctly (command + args)
- [ ] Multi-line input (:::) handled
- [ ] Plain text treated as chat message
- [ ] Empty input ignored
### State assertions (REPL_COMMANDS array)
- [ ] Each command's assert_state enforced correctly
- [ ] Invalid state → command rejected with appropriate error
- [ ] Commands with AssertState::pass() always available
### Command handlers (each one)
- [ ] .help — prints help text
- [ ] .info [subcommand] — displays appropriate info
- [ ] .model <name> — switches model
- [ ] .prompt <text> — sets temp role
- [ ] .role <name> [text] — enters role or one-shot
- [ ] .session [name] — starts/resumes session
- [ ] .agent <name> [session] [key=value] — starts agent
- [ ] .rag [name] — initializes RAG
- [ ] .starter [n] — lists or executes conversation starter
- [ ] .set <key> <value> — updates setting
- [ ] .delete <type> — deletes item
- [ ] .exit [type] — exits scope or REPL
- [ ] .save role/session [name] — saves to file
- [ ] .edit role/session/config/agent-config/rag-docs — opens editor
- [ ] .empty session — clears session
- [ ] .compress session — compresses session
- [ ] .rebuild rag — rebuilds RAG
- [ ] .sources rag — shows RAG sources
- [ ] .copy — copies last response
- [ ] .continue — continues response
- [ ] .regenerate — regenerates response
- [ ] .file <path> [-- text] — includes files
- [ ] .macro <name> [text] — runs/creates macro
- [ ] .authenticate — OAuth flow
- [ ] .vault <cmd> [name] — vault operations
- [ ] .clear todo — clears agent todo
### ask function (chat flow)
- [ ] Input constructed from text
- [ ] Embeddings applied if RAG active
- [ ] Waits for compression to complete
- [ ] before_chat_completion called
- [ ] Streaming vs non-streaming based on config
- [ ] Tool results loop (recursive ask with merged results)
- [ ] after_chat_completion called
- [ ] Auto-continuation for agents with todos
## Old code reference
- `src/repl/mod.rs` — run_repl_command, ask, REPL_COMMANDS
+56
View File
@@ -0,0 +1,56 @@
# Test Plan: CLI Flags
## Feature description
Loki CLI accepts flags for model, role, session, agent, file input,
execution mode, and various info/list commands. Flags determine
the execution path through main.rs.
## Behaviors to test
### Early-exit flags
- [ ] --info prints info and exits
- [ ] --list-models prints models and exits
- [ ] --list-roles prints roles and exits
- [ ] --list-sessions prints sessions and exits
- [ ] --list-agents prints agents and exits
- [ ] --list-rags prints RAGs and exits
- [ ] --list-macros prints macros and exits
- [ ] --sync-models fetches and exits
- [ ] --build-tools (with --agent) builds and exits
- [ ] --authenticate runs OAuth and exits
- [ ] --completions generates shell completions and exits
- [ ] Vault flags (--add/get/update/delete-secret, --list-secrets) and exit
### Mode selection
- [ ] No text/file → REPL mode
- [ ] Text provided → command mode (single-shot)
- [ ] --agent → agent mode
- [ ] --role → role mode
- [ ] --execute (-e) → shell execute mode
- [ ] --code (-c) → code output mode
- [ ] --prompt → temp role mode
- [ ] --macro → macro execution mode
### Flag combinations
- [ ] --model + any mode → model applied
- [ ] --session + --role → session with role
- [ ] --session + --agent → agent with session
- [ ] --agent + --agent-variable → variables set
- [ ] --dry-run + any mode → input shown, no API call
- [ ] --no-stream + any mode → non-streaming response
- [ ] --file + text → file content + text combined
- [ ] --empty-session + --session → fresh session
- [ ] --save-session + --session → force save
### Prelude
- [ ] apply_prelude runs before main execution
- [ ] Prelude "role:name" loads role
- [ ] Prelude "session:name" loads session
- [ ] Prelude "session:role" loads both
- [ ] Prelude skipped if macro_flag set
- [ ] Prelude skipped if state already has role/session/agent
## Old code reference
- `src/cli/mod.rs` — Cli struct, flag definitions
- `src/main.rs` — run(), flag processing, mode branching
@@ -0,0 +1,59 @@
# Test Plan: Sub-Agent Spawning
## Feature description
Agents with can_spawn_agents=true can spawn child agents that run
in parallel as background tokio tasks. Children communicate results
back to the parent via collect/check. Escalation allows children
to request user input through the parent.
## Behaviors to test
### Spawn
- [ ] agent__spawn creates child agent in background
- [ ] Child gets own RequestContext with incremented depth
- [ ] Child gets own session, model, functions
- [ ] Child gets shared root_escalation_queue
- [ ] Child gets inbox for teammate messaging
- [ ] Child MCP servers acquired if configured
- [ ] Max concurrent agents enforced
- [ ] Max depth enforced
- [ ] Agent not found → error
- [ ] can_spawn_agents=false → no spawn tools available
### Collect/Check
- [ ] agent__check returns PENDING or result
- [ ] agent__collect blocks until done, returns output
- [ ] Output summarization when exceeds threshold
- [ ] Summarization uses configured model
### Task queue
- [ ] agent__task_create creates tasks with dependencies
- [ ] agent__task_complete marks done, unblocks dependents
- [ ] Auto-dispatch spawns agent for unblocked tasks
- [ ] agent__task_list shows all tasks with status
### Escalation
- [ ] Child calls user__ask → escalation created
- [ ] Parent sees pending_escalations notification
- [ ] agent__reply_escalation unblocks child
- [ ] Escalation timeout → fallback message
### Teammate messaging
- [ ] agent__send_message delivers to sibling inbox
- [ ] agent__check_inbox drains messages
### Child agent lifecycle
- [ ] run_child_agent loops: create input → call completions → process results
- [ ] Child uses before/after_chat_completion
- [ ] Child tool calls evaluated via eval_tool_calls
- [ ] Child exits cleanly, supervisor cancels on completion
## Context switching scenarios
- [ ] Parent spawns child with MCP → child MCP works independently
- [ ] Parent exits agent → all children cancelled
- [ ] Multiple children share escalation queue correctly
## Old code reference
- `src/function/supervisor.rs` — all handler functions
- `src/supervisor/` — Supervisor, EscalationQueue, Inbox, TaskQueue
+17
View File
@@ -0,0 +1,17 @@
# Test Plan: RAG
## Behaviors to test
- [ ] Rag::init creates new RAG with embedding model
- [ ] Rag::load loads existing RAG from disk
- [ ] Rag::create builds vector store from documents
- [ ] Rag::refresh_document_paths updates document list
- [ ] RAG search returns relevant embeddings
- [ ] RAG template formats context + sources + input
- [ ] Reranker model applied when configured
- [ ] top_k controls number of results
- [ ] RAG sources tracked for .sources command
- [ ] exit_rag clears RAG from context
## Old code reference
- `src/rag/mod.rs` — Rag struct and methods
- `src/config/request_context.rs` — use_rag, edit_rag_docs, rebuild_rag
@@ -0,0 +1,30 @@
# Test Plan: Tab Completion and Prompt
## Behaviors to test
### Tab completion (repl_complete)
- [ ] .role<TAB> → role names (no hidden files)
- [ ] .agent<TAB> → agent names (no .shared)
- [ ] .session<TAB> → session names
- [ ] .rag<TAB> → RAG names
- [ ] .macro<TAB> → macro names
- [ ] .model<TAB> → model names with descriptions
- [ ] .set <TAB> → setting keys (sorted)
- [ ] .set temperature <TAB> → current value suggestions
- [ ] .set enabled_tools <TAB> → tool names (no internal tools)
- [ ] .set enabled_mcp_servers <TAB> → configured servers + aliases
- [ ] .delete <TAB> → type names
- [ ] .vault <TAB> → subcommands
- [ ] .agent <name> <TAB> → session names for that agent
- [ ] Fuzzy filtering applied to all completions
### Prompt rendering
- [ ] Left prompt shows role/session/agent name
- [ ] Right prompt shows model name
- [ ] Prompt updates after scope transitions
- [ ] Multi-line indicator shown during ::: input
## Old code reference
- `src/config/request_context.rs` — repl_complete
- `src/repl/completer.rs` — ReplCompleter
- `src/repl/prompt.rs` — ReplPrompt
+14
View File
@@ -0,0 +1,14 @@
# Test Plan: Macros
## Behaviors to test
- [ ] Macro loaded from YAML file
- [ ] Macro steps executed sequentially
- [ ] Each step runs through run_repl_command
- [ ] Variable interpolation in macro steps
- [ ] Built-in macros installed on first run
- [ ] macro_execute creates isolated RequestContext
- [ ] Macro context inherits tool scope from parent
- [ ] Macro context has macro_flag set
## Old code reference
- `src/config/macros.rs` — macro_execute, Macro struct
+16
View File
@@ -0,0 +1,16 @@
# Test Plan: Vault
## Behaviors to test
- [ ] Vault add stores encrypted secret
- [ ] Vault get decrypts and returns secret
- [ ] Vault update replaces secret value
- [ ] Vault delete removes secret
- [ ] Vault list shows all secret names
- [ ] Secrets interpolated in MCP config (mcp.json)
- [ ] Missing secrets produce warning during MCP init
- [ ] Vault accessible from REPL (.vault commands)
- [ ] Vault accessible from CLI (--add/get/update/delete-secret)
## Old code reference
- `src/vault/mod.rs` — GlobalVault, operations
- `src/mcp/mod.rs` — interpolate_secrets
@@ -0,0 +1,43 @@
# Test Plan: Functions and Tools
## Behaviors to test
### Function declarations
- [ ] Functions::init loads from visible_tools config
- [ ] Tool declarations parsed from bash scripts (argc annotations)
- [ ] Tool declarations parsed from python scripts (docstrings)
- [ ] Tool declarations parsed from typescript (JSDoc + type inference)
- [ ] Each declaration has name, description, parameters
- [ ] Agent tools loaded via Functions::init_agent
- [ ] Global tools loaded via build_global_tool_declarations
### Tool compilation
- [ ] Bash tools compiled to bin directory
- [ ] Python tools compiled to bin directory
- [ ] TypeScript tools compiled to bin directory
- [ ] clear_agent_bin_dir removes old binaries
- [ ] Tool file priority: .sh > .py > .ts > .js
### User interaction functions
- [ ] append_user_interaction_functions adds user__ask/confirm/input/checkbox
- [ ] Only appended in REPL mode
- [ ] User interaction tools work at depth 0 (direct prompt)
- [ ] User interaction tools escalate at depth > 0
### MCP meta functions
- [ ] append_mcp_meta_functions adds invoke/search/describe per server
- [ ] Meta functions removed when ToolScope rebuilt without those servers
- [ ] Function names follow mcp_invoke_<server> pattern
### Function selection
- [ ] select_functions filters by role's enabled_tools
- [ ] "all" enables everything
- [ ] Specific tool names enabled selectively
- [ ] mapping_tools aliases resolved
- [ ] Agent functions included when agent active
- [ ] MCP meta functions included when servers active
## Old code reference
- `src/function/mod.rs` — Functions struct, init, init_agent
- `src/config/paths.rs` — agent_functions_file (priority)
- `src/parsers/` — bash, python, typescript parsers