docs: Documentation for the RESTful API POC

This commit is contained in:
2026-05-01 14:45:13 -06:00
parent 69648afe27
commit e0b15b3f63
73 changed files with 2 additions and 2 deletions
@@ -0,0 +1,108 @@
# Phase 1 QA — Test Implementation Plan
## Purpose
Verify that all existing Loki behaviors are preserved after the
Phase 1 refactoring (Config god-state → AppState + RequestContext
split). Tests should validate behavior, not implementation details,
unless a specific implementation pattern is fragile and needs
regression protection.
## Reference codebases
- **Old code**: `~/code/testing/loki` (branch: `develop`)
- **New code**: `~/code/loki` (branch: working branch with Phase 1)
## Process (per iteration)
1. Read the previous iteration's test implementation notes (if any)
2. Read the test plan file for the current feature area
3. Read the old code to identify the logic that creates those flows
4. While reading old code:
- Note additional behaviors not in the plan file → update the file
- Note feature overlaps / context-switching scenarios → add tests
5. Create unit/integration tests in the new code
6. Ensure all tests pass
7. Write test implementation notes for the iteration
8. Pause for user approval before proceeding to next iteration
## Test philosophy
- **Behavior over implementation**: Test what the system DOES, not
HOW it does it internally
- **Exception**: If implementation logic is fragile and a slight
change would break Loki, add an implementation-specific test
- **No business logic changes**: Only modify non-test code if a
genuine bug is discovered (old behavior missing in new code)
- **Context switching**: Pay special attention to state transitions
(role→agent, MCP-enabled→disabled, etc.)
## Test location
All new tests go in `tests/` directory as integration tests, or
inline as `#[cfg(test)] mod tests` in the relevant source file,
depending on what's being tested:
- **Unit tests** (pure logic, no I/O): inline in source file
- **Integration tests** (multi-module, state transitions): `tests/`
- **Behavior tests** (config parsing, tool resolution): can be either
## Feature areas (test plan files)
Each feature area has a plan file in `docs/testing/plans/`. The
files are numbered for execution order (dependencies first):
| # | File | Feature area | Priority | Status |
|---|---|---|---|---|
| 01 | `01-config-and-appconfig.md` | Config loading, AppConfig fields, defaults | High | ✅ Iter 1-4 |
| 02 | `02-roles.md` | Role loading, retrieval, role-likes, temp roles | High | ✅ Iter 1-4 |
| 03 | `03-sessions.md` | Session create/load/save, compression, autoname | High | ✅ Iter 1-4 |
| 04 | `04-agents.md` | Agent init, tool compilation, variables, lifecycle | Critical | ✅ Iter 1-4 |
| 05 | `05-mcp-lifecycle.md` | MCP server start/stop, factory, runtime, scope transitions | Critical | ✅ Iter 5 |
| 06 | `06-tool-evaluation.md` | eval_tool_calls, ToolCall dispatch, tool handlers | Critical | ✅ Iter 6 |
| 07 | `07-input-construction.md` | Input::from_str, from_files, field capturing, function selection | High | ✅ Iter 7 |
| 08 | `08-request-context.md` | RequestContext methods, scope transitions, state management | Critical | ✅ Iter 8 |
| 09 | `09-repl-commands.md` | REPL command handlers, state assertions, argument parsing | High | ✅ Iter 9 |
| 10 | `10-cli-flags.md` | CLI argument handling, mode switching, early exits | High | ✅ Iter 10 |
| 11 | `11-sub-agent-spawning.md` | Supervisor, child agents, escalation, messaging | Critical | ✅ Iter 11 |
| 12 | `12-rag.md` | RAG init/load/search, embeddings, document management | Medium | ✅ Iter 12 |
| 13 | `13-completions-and-prompt.md` | Tab completion, prompt rendering, highlighter | Medium | ✅ Iter 13 |
| 14 | `14-macros.md` | Macro loading, execution, variable interpolation | Medium | ✅ Iter 13 |
| 15 | `15-vault.md` | Secret management, interpolation in MCP config | Medium | ✅ Iter 13 |
| 16 | `16-functions-and-tools.md` | Function declarations, tool compilation, binaries | High | ✅ Iter 13 |
## Iteration tracking
Each completed iteration produces a notes file at:
`docs/testing/notes/ITERATION-<N>-NOTES.md`
These notes contain:
- Which plan file(s) were addressed
- Tests created (file paths, test names)
- Bugs discovered (if any)
- Observations for future iterations
- Updates made to other plan files
## Intentional improvements (NEW ≠ OLD)
These are behavioral changes that are intentional and should NOT
be tested for old-code parity:
| # | What | Old | New |
|---|---|---|---|
| 1 | Agent list hides `.shared` | Shown | Hidden |
| 2 | Tool file priority | Filesystem order | .sh > .py > .ts > .js |
| 3 | MCP disabled + agent | Warning, continues | Error, blocks |
| 4 | Role MCP warning | Always when mcp_support=false | Only when role has MCP |
| 5 | Enabled tools completions | Shows internal tools | Hides user__/mcp_/todo__/agent__ |
| 6 | MCP server completions | Only aliases | Configured servers + aliases |
## How to pick up in a new session
If context is lost (new chat session):
1. Read this file first
2. Read the latest `docs/testing/notes/ITERATION-<N>-NOTES.md`
3. That file tells you which plan file to work on next
4. Read that plan file
5. Follow the process above
@@ -0,0 +1,52 @@
# Iteration 1 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/01-config-and-appconfig.md`
## Tests created
| File | Test name | What it verifies |
|---|---|---|
| `src/config/mod.rs` | `config_defaults_match_expected` | All Config::default() fields match old code values |
| `src/config/app_config.rs` | `to_app_config_copies_serialized_fields` | to_app_config copies model_id, temperature, top_p, dry_run, stream, save, highlight, compression_threshold, rag_top_k |
| `src/config/app_config.rs` | `to_app_config_copies_clients` | clients field populated (empty by default) |
| `src/config/app_config.rs` | `to_app_config_copies_mapping_fields` | mapping_tools and mapping_mcp_servers copied correctly |
| `src/config/app_config.rs` | `editor_returns_configured_value` | editor() returns configured value |
| `src/config/app_config.rs` | `editor_falls_back_to_env` | editor() doesn't panic without config |
| `src/config/app_config.rs` | `light_theme_default_is_false` | light_theme() default |
| `src/config/app_config.rs` | `sync_models_url_has_default` | sync_models_url() has non-empty default |
| `src/config/request_context.rs` | `to_request_context_creates_clean_state` | RequestContext starts with clean state (no role/session/agent, empty tool_scope, no agent_runtime) |
| `src/config/request_context.rs` | `update_app_config_persists_changes` | Dynamic config updates via clone-mutate-replace persist |
**Total: 10 new tests (59 → 69)**
## Bugs discovered
None. The `save` default was `false` in both old and new code
(my plan file incorrectly said `true` — corrected).
## Observations for future iterations
1. The `Config::default().save` is `false`, but the plan file
01 incorrectly listed it as `true`. Plan file should be
updated to reflect the actual default.
2. `AppConfig::default()` doesn't exist natively (no derive).
Tests construct it via `Config::default().to_app_config()`.
This is fine since that's how it's created in production.
3. The `visible_tools` field computation happens during
`Config::init` (not `to_app_config`). Testing the full
visible_tools resolution requires integration-level testing
with actual tool files. Deferred to plan file 16
(functions-and-tools).
4. Testing `Config::init` directly is difficult because it reads
from the filesystem, starts MCP servers, etc. The unit tests
focus on the conversion paths which are the Phase 1 surface.
## Next iteration
Plan file 02: Roles — role loading, retrieve_role, use_role/exit_role,
use_prompt, extract_role, one-shot role messages, MCP context switching.
@@ -0,0 +1,86 @@
# Iteration 10 — Test Implementation Notes
## Plan files addressed
- `docs/testing/plans/09-repl-commands.md` (completed in same session)
- `docs/testing/plans/10-cli-flags.md`
## Tests created
### src/config/mod.rs (8 new tests — iteration 9)
AssertState::assert tests for all 4 variants + pass/bare.
### src/repl/mod.rs (31 new tests — iteration 9)
REPL_COMMANDS array validation, command state assertions for 13
specific commands, parse_command edge cases, split_first_arg,
ReplCommand::is_valid, multiline regex.
### src/cli/mod.rs (31 new tests — iteration 10)
| Test name | What it verifies |
|---|---|
| `parse_no_args_defaults` | All flags default unset |
| `parse_model_flag` | --model value |
| `parse_model_short_flag` | -m value |
| `parse_role_flag` | --role value |
| `parse_session_with_name` | --session value |
| `parse_agent_flag` | --agent value |
| `parse_agent_short_flag` | -a value |
| `parse_execute_flag` | -e flag |
| `parse_code_flag` | -c flag |
| `parse_no_stream_flag` | -S flag |
| `parse_dry_run_flag` | --dry-run flag |
| `parse_info_flag` | --info flag |
| `parse_list_flags` | All 6 --list-* flags |
| `parse_file_flag_single` | Single -f |
| `parse_file_flag_multiple` | Multiple -f accumulate |
| `parse_trailing_text` | Trailing args as text vec |
| `parse_prompt_flag` | --prompt value |
| `parse_empty_session_flag` | --empty-session flag |
| `parse_save_session_flag` | --save-session flag |
| `parse_build_tools_flag` | --build-tools flag |
| `parse_sync_models_flag` | --sync-models flag |
| `parse_model_with_role` | -m + -r combined |
| `parse_agent_with_file_and_text` | -a + -f + text combined |
| `parse_role_with_session` | -r + -s combined |
| `cli_text_returns_none_when_no_text_no_stdin` | No input → None |
| `cli_text_joins_trailing_args` | Args joined with spaces |
| `parse_add_secret_flag` | --add-secret value |
| `parse_get_secret_flag` | --get-secret value |
| `parse_list_secrets_flag` | --list-secrets flag |
| `parse_rag_flag` | --rag value |
| `parse_macro_flag` | --macro value |
**Total: 70 new tests across iterations 9+10 (342 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **Clap parsing is fully testable**: Using `try_parse_from` with
synthetic arg arrays, all flag parsing and combinations can be
verified without running the actual binary.
2. **Cli::text() has stdin dependency**: When stdin is not a
terminal, it reads from stdin. This branch can't be easily
unit-tested. The terminal-detection branch (no stdin) is tested.
3. **Prelude is async + filesystem**: apply_prelude needs real role
and session files. Deferred to integration tests.
4. **Mode selection is runtime behavior**: The actual mode branching
(REPL vs CMD) happens in main.rs based on parsed flags. Testing
the flag parsing verifies the inputs to that branching logic.
5. **Exclusive flags**: Vault flags (--add-secret, --get-secret,
etc.) are marked `exclusive = true` in clap, meaning they
can't be combined with other args. This is enforced by clap.
## Next iteration
Plan file 11: Sub-Agent Spawning — supervisor, child agents,
escalation, messaging.
@@ -0,0 +1,159 @@
# Iteration 11 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/11-sub-agent-spawning.md`
## Tests created
### src/supervisor/escalation.rs (11 new tests)
| Test name | What it verifies |
|---|---|
| `queue_default_has_no_pending` | Default queue empty |
| `submit_and_has_pending` | Submit makes has_pending true |
| `submit_returns_id` | Returns the request's id |
| `take_removes_request` | Take removes and empties queue |
| `take_nonexistent_returns_none` | Missing id → None |
| `pending_summary_contains_fields` | Summary has id, agent_id, question |
| `pending_summary_includes_options_when_present` | Options included |
| `pending_summary_empty_when_no_requests` | Empty queue → empty summary |
| `reply_reaches_receiver` | oneshot channel delivers reply |
| `new_escalation_id_has_prefix` | Starts with "esc_" |
| `new_escalation_id_unique` | Two calls produce different ids |
### src/supervisor/mailbox.rs (8 new tests)
| Test name | What it verifies |
|---|---|
| `inbox_new_is_empty` | New inbox drains empty |
| `inbox_default_is_empty` | Default inbox drains empty |
| `deliver_and_drain` | Deliver + drain returns message |
| `drain_empties_inbox` | Second drain returns empty |
| `drain_orders_shutdown_before_task_before_text` | Priority ordering |
| `clone_preserves_messages` | Clone has same messages |
| `clone_is_independent` | Clone doesn't share mutations |
| `multiple_deliveries` | 5 messages all drained |
### src/supervisor/mod.rs (12 new tests)
| Test name | What it verifies |
|---|---|
| `supervisor_new_empty` | Initial state: 0 active, correct limits |
| `supervisor_register_increments_count` | Register increases active_count |
| `supervisor_register_rejects_at_capacity` | At max → error with "at capacity" |
| `supervisor_register_rejects_exceeding_depth` | Over max_depth → error |
| `supervisor_register_allows_at_max_depth` | Exactly max_depth → ok |
| `supervisor_take_removes_handle` | Take decrements count |
| `supervisor_take_nonexistent_returns_none` | Missing → None |
| `supervisor_list_agents` | Lists all registered agent ids/names |
| `supervisor_inbox_returns_handle_inbox` | Inbox accessor works |
| `supervisor_task_queue_accessible` | task_queue/task_queue_mut work |
| `agent_exit_status_equality` | Completed == Completed, != Failed |
### src/supervisor/taskqueue.rs (10 new tests, 16 total)
| Test name | What it verifies |
|---|---|
| `test_fail_sets_status` | fail() sets TaskStatus::Failed |
| `test_get_returns_none_for_missing` | get() on nonexistent → None |
| `test_dispatch_agent_stored` | dispatch_agent and prompt captured |
| `test_claim_blocked_task_fails` | Can't claim blocked task |
| `test_list_sorted_by_id` | list() returns numeric order |
| `test_default_is_empty` | TaskQueue::default() empty |
| `test_dependency_on_nonexistent_task_errors` | Bad dep → error |
| `test_complete_nonexistent_returns_empty` | Complete unknown → empty |
| `test_task_node_is_runnable` | Pending + unblocked = runnable |
| `test_task_node_not_runnable_when_blocked` | Blocked = not runnable |
### src/function/supervisor.rs (36 new handler integration tests)
| Test name | What it verifies |
|---|---|
| `handle_list_empty_supervisor` | Empty supervisor → 0 active, empty agents |
| `handle_list_with_agents` | Registered agents appear in list |
| `handle_list_no_supervisor_errors` | No supervisor → error |
| `handle_check_unknown_agent` | Check unknown → error status |
| `handle_check_pending_agent` | Check running agent → pending status |
| `handle_cancel_registered_agent` | Cancel removes and signals abort |
| `handle_cancel_unknown_agent` | Cancel unknown → error status |
| `handle_cancel_no_supervisor_errors` | No supervisor → error |
| `handle_send_message_to_registered_agent` | Message delivered to inbox |
| `handle_send_message_to_unknown_agent` | Unknown agent → error status |
| `handle_check_inbox_with_messages` | Inbox drains messages with count |
| `handle_check_inbox_no_inbox` | No inbox → count 0 |
| `handle_check_inbox_empty_inbox` | Empty inbox → count 0 |
| `handle_reply_escalation_success` | Reply delivered via oneshot |
| `handle_reply_escalation_missing_id` | Missing id → error status |
| `handle_reply_escalation_no_queue_errors` | No queue → error |
| `handle_task_create_simple` | Simple task created with id |
| `handle_task_create_with_dependencies` | Task with blocked_by |
| `handle_task_create_with_dispatch_agent` | Auto-dispatch flag set |
| `handle_task_create_agent_without_prompt_errors` | Agent without prompt → error |
| `handle_task_list_empty` | Empty queue → empty tasks array |
| `handle_task_list_with_tasks` | Tasks listed |
| `handle_task_complete_unblocks_dependents` | Complete unblocks with newly_runnable |
| `handle_task_fail_marks_failed` | Fail sets status |
| `handle_task_fail_reports_blocked_dependents` | Reports blocked deps |
| `handle_task_fail_missing_task` | Missing task → error status |
| `dispatch_unknown_action_errors` | Unknown action → error |
| `dispatch_routes_list` | agent__list → handle_list |
| `dispatch_routes_task_list` | agent__task_list → handle_task_list |
| `new_for_child_inherits_escalation_queue` | Shared Arc |
| `new_for_child_sets_depth_and_id` | Depth and self_agent_id |
| `new_for_child_has_inbox` | Shared inbox Arc |
| `new_for_child_inherits_parent_supervisor` | parent_supervisor set |
| `new_for_child_starts_with_empty_scope` | Empty functions, mcp, role, session |
| `ensure_root_escalation_queue_creates_on_first_call` | Lazy init |
| `ensure_root_escalation_queue_returns_same_on_second_call` | Same Arc |
### Infrastructure
- Added `AppState::test_default()` method for cross-module test construction
- Refactored `input.rs` and `request_context.rs` test helpers to use `test_default()`
**Total: 76 new tests (418 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **Supervisor.register enforces both capacity and depth**: These
are the two runaway safeguards. Both tested at boundaries
(at capacity, at max_depth, over max_depth).
2. **EscalationQueue uses oneshot channels**: The reply_tx/rx pair
enables async blocking-wait semantics for child agents. The
channel delivery is verified end-to-end in the test.
3. **Inbox drain ordering is a priority system**: Shutdown messages
come first, then task completions, then text. This ensures
lifecycle-critical messages aren't buried under chat.
4. **AgentHandle requires a tokio JoinHandle**: Creating test
handles requires a tokio runtime. Used `rt.spawn()` with
`mem::forget(rt)` to keep the handle alive. This is a test-only
pattern — not ideal but necessary since JoinHandle can't be
mocked.
5. **handle_spawn requires real agent config on disk**: This is the
only handler that calls Agent::init. All other handlers (list,
check, cancel, messaging, tasks, escalation) work with just a
RequestContext + Supervisor, which we can construct in tests.
6. **Handler integration tests cover the full dispatch chain**: The
tests call handler functions with real RequestContext instances
containing real Supervisor/EscalationQueue/Inbox instances. This
verifies the JSON arg parsing, supervisor interactions, and
response formatting all at once.
7. **AppState::test_default() centralizes test construction**: Added
a `#[cfg(test)]` constructor that avoids importing private
modules (mcp_factory, rag_cache) from outside the config module.
## Next iteration
Plan file 12: RAG — RAG init/load/search, embeddings, document
management.
@@ -0,0 +1,71 @@
# Iteration 12 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/12-rag.md`
## Tests created
### src/rag/mod.rs (22 new tests)
| Test name | What it verifies |
|---|---|
| `document_id_round_trip` | new(5,17) → split → (5,17) |
| `document_id_zero_zero` | new(0,0) → split → (0,0) |
| `document_id_large_values` | new(1000,9999) round-trips |
| `document_id_debug_format` | Debug produces "3-7" format |
| `document_id_equality` | Same file+doc → equal |
| `document_id_inequality` | Different doc → not equal |
| `document_id_ordering` | (0,1) < (1,0) |
| `rag_document_new` | Sets page_content, empty metadata |
| `rag_document_default` | Empty content and metadata |
| `rag_data_new_defaults` | All fields set correctly |
| `rag_data_get_returns_document` | Gets by file+doc index |
| `rag_data_get_returns_none_for_missing_file` | Missing file → None |
| `rag_data_get_returns_none_for_missing_document` | Missing doc index → None |
| `rag_data_del_removes_files_and_vectors` | Del removes both |
| `rag_data_del_nonexistent_is_noop` | Del missing → noop |
| `rag_data_add_inserts_files_and_vectors` | Add inserts files+vectors, updates next_file_id |
| `rag_template_contains_placeholders` | __CONTEXT__, __SOURCES__, __INPUT__ present |
| `get_separators_returns_language_specific` | rs/py/md have language separators |
| `get_separators_unknown_returns_defaults` | xyz → DEFAULT_SEPARATORS |
| `get_separators_all_known_extensions` | All 22 known extensions differ from defaults |
| `rag_data_build_bm25_empty` | Empty data → no search results |
| `rag_data_build_bm25_finds_documents` | BM25 finds "rust" in first doc |
**Total: 22 new tests (440 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **Rag struct can't be constructed without an embedding model**:
Rag::init requires prompting the user for model selection,
Rag::load requires a YAML file on disk, and Rag::create
requires pre-built RagData with vectors. All RAG lifecycle
operations are I/O-bound.
2. **DocumentId uses bit packing**: file_index in the upper half,
document_index in the lower half of a usize. This is tested
with round-trip, zero, and large-value cases.
3. **RagData operations (get/del/add) are fully testable**: These
are pure data structure operations that don't need I/O. The
BM25 search engine can also be built and queried in tests.
4. **The text splitter already has comprehensive tests**: 5 existing
tests cover split_text, create_documents, chunk headers,
markdown splitting, and HTML splitting. No additional splitter
tests needed.
5. **get_separators covers 22 language extensions**: All are
verified to return language-specific separators rather than
defaults. This ensures the splitter uses appropriate chunk
boundaries for each language.
## Next iteration
Plan file 13: Completions and Prompt — tab completion, prompt
rendering, highlighter.
@@ -0,0 +1,107 @@
# Iteration 13 — Test Implementation Notes
## Plan files addressed
- `docs/testing/plans/12-rag.md` (completed in same session)
- `docs/testing/plans/13-completions-and-prompt.md`
- `docs/testing/plans/14-macros.md`
- `docs/testing/plans/15-vault.md`
- `docs/testing/plans/16-functions-and-tools.md`
## Tests created
### src/rag/mod.rs (22 new tests — iteration 12)
DocumentId round-trip/equality/ordering/debug, RagDocument new/default,
RagData new/get/del/add/build_bm25, RAG_TEMPLATE placeholders,
get_separators language mapping.
### src/config/macros.rs (21 new tests — iteration 13)
| Test name | What it verifies |
|---|---|
| `resolve_no_variables` | Empty vars → empty output |
| `resolve_required_variable_provided` | Arg maps to variable |
| `resolve_required_variable_missing_errors` | Missing required → error |
| `resolve_default_variable_uses_default` | Default used when no arg |
| `resolve_default_variable_overridden` | Arg overrides default |
| `resolve_rest_variable_captures_all_remaining` | Rest joins remaining args |
| `resolve_rest_variable_with_default` | Rest default used |
| `resolve_multiple_variables` | Mixed required + default |
| `usage_no_variables` | Just macro name |
| `usage_required_variable` | <name> format |
| `usage_optional_variable` | [name] format |
| `usage_rest_variable` | <name>... format |
| `usage_rest_with_default` | [name]... format |
| `usage_mixed_variables` | Mixed format |
| `interpolate_replaces_variables` | {{name}} → value |
| `interpolate_multiple_variables` | Multiple replacements |
| `interpolate_no_variables_passthrough` | No vars → unchanged |
| `interpolate_variable_not_found_left_as_is` | Missing var → {{name}} kept |
| `deserialize_macro_from_yaml` | Full YAML with steps + variables |
| `deserialize_macro_with_defaults` | Variables with defaults + rest |
| `deserialize_macro_no_variables` | Steps only, empty vars default |
### src/vault/mod.rs (6 new tests)
| Test name | What it verifies |
|---|---|
| `secret_re_matches_double_braces` | {{MY_SECRET}} captured |
| `secret_re_matches_with_surrounding_text` | Captures in context |
| `secret_re_no_match_single_braces` | {NOT} not matched |
| `secret_re_no_match_plain_text` | No match for plain text |
| `secret_re_matches_with_spaces` | {{ SPACED }} captured |
| `vault_default_creates_instance` | Default has no password file |
### src/parsers/common.rs (8 new tests)
| Test name | What it verifies |
|---|---|
| `underscore_simple` | No-op for simple names |
| `underscore_dashes_to_underscores` | my-func → my_func |
| `underscore_spaces_to_underscores` | my func → my_func |
| `underscore_special_chars_removed` | @! → _ |
| `underscore_consecutive_specials_collapsed` | --- → single _ |
| `underscore_leading_trailing_stripped` | -name- → name |
| `underscore_uppercase_lowered` | MyFunc → myfunc |
| `underscore_mixed` | Get-User Info → get_user_info |
**Total: 57 new tests across iterations 12+13 (475 total in suite)**
## Bugs discovered
None.
## Observations
1. **Macro::resolve_variables has 3 variable modes**: required
(no default), optional (with default), and rest (captures
remaining args). All three modes tested with multiple
combinations.
2. **Macro::interpolate_command is a simple string replacement**:
{{key}} → value. Missing keys are left as-is (no error),
which is the correct behavior for gradual interpolation.
3. **SECRET_RE uses fancy_regex**: The `{{(.+)}}` pattern requires
double braces. Single braces don't match, which prevents false
positives on JSON-like content.
4. **Vault operations all require terminal interaction or password
file**: add_secret and update_secret prompt for passwords via
inquire. get_secret/delete_secret/list_secrets need a tokio
runtime + password file. These are integration-test territory.
5. **parsers::common::underscore is more than s/-/_/**: It lowercases,
replaces all non-alphanumeric chars with _, collapses consecutive
underscores, and strips leading/trailing underscores. Thorough
edge cases tested.
6. **Python and TypeScript parsers have excellent existing test
suites**: ~400 lines of tests each covering declaration parsing,
type inference, docstring extraction. No additional tests needed.
## Final summary
All 16 plan files have been addressed across iterations 1-13.
475 total tests, all passing, 0 errors.
@@ -0,0 +1,100 @@
# Iteration 14 — Integration Test Implementation Notes
## Focus
Filesystem-based integration tests (Tier 1 + Tier 2) for behaviors
that were previously untestable without real config directories.
## Infrastructure changes
1. **Added `serial_test` dev-dependency** — Env-var-based config dir
isolation (`TestConfigDirGuard`) requires serialization to prevent
parallel test races. All 25 tests using `TestConfigDirGuard` now
use `#[serial]`.
2. **Added `src/test_helpers.rs`** — Shared test utilities module
(`#[cfg(test)]`) with `TestConfigDirGuard`, `default_app_state`,
`create_test_ctx`, and `run_async` helpers, available to all
modules. Not yet used by all modules (existing module-local
helpers kept for backward compatibility).
## Tests created
### src/config/request_context.rs (17 new integration tests)
| Test name | What it verifies |
|---|---|
| `retrieve_role_from_markdown_file` | Writes .md file, retrieves role with correct name/prompt |
| `retrieve_role_builtin_exists` | Built-in roles retrievable |
| `retrieve_role_nonexistent_errors` | Unknown role → error |
| `retrieve_role_no_model_id_inherits_current_model` | No model_id → uses current model |
| `list_roles_finds_markdown_files` | .md files listed, .txt ignored |
| `list_roles_empty_dir` | Empty roles dir → empty list |
| `session_new_from_ctx_captures_state` | Name captured, starts empty |
| `session_save_creates_file` | Save creates YAML file on disk |
| `use_session_errors_when_already_in_session` | Double session → error |
| `use_session_creates_temp_session` | None → temp session |
| `use_session_creates_named_session` | Name → named session |
| `exit_session_roundtrip` | use_session → exit_session → None |
| `use_role_obj_and_exit_role_full_cycle` | Set role → exit → None |
| `use_role_obj_twice_replaces_role` | Second role replaces first |
| `list_macros_finds_yaml_files` | .yaml macro files listed |
| `list_rags_finds_yaml_files` | .yaml RAG files listed |
| `list_rags_empty_dir` | Empty RAGs dir → empty list |
### src/config/input.rs (5 new integration tests)
| Test name | What it verifies |
|---|---|
| `from_files_loads_single_text_file` | File content + text combined |
| `from_files_loads_multiple_files` | Multiple files all loaded |
| `from_files_with_no_paths_just_text` | No files → just text |
| `from_files_with_external_command` | Backtick command executed |
| `from_files_nonexistent_file_errors` | Missing file → error |
### Serialization fixes (6 existing tests)
Added `#[serial]` to all `rebuild_tool_scope_*` tests to prevent
env-var race conditions with filesystem integration tests.
**Total: 22 new tests (497 total in suite)**
## Bugs discovered
1. **Test parallelism race condition with env vars**: The
`TestConfigDirGuard` sets a process-global env var. When tests
run in parallel, two guards stomp each other's values. Fixed
by adding `serial_test` crate and `#[serial]` attribute to all
filesystem-dependent tests.
## Observations
1. **Session loading from disk requires Model::retrieve_model**:
`Session::load_from_ctx` calls `Model::retrieve_model` to
resolve the session's model_id. Without a valid model provider
config, this fails. Session loading tests are limited to
`new_from_ctx` (creation) and `save` (serialization).
2. **use_session with empty session prompts user**: The Confirm
dialog for "incorporate last Q&A?" requires terminal interaction.
Tests avoid this by: (a) having no last_message, or (b) using
named sessions that already exist on disk.
3. **Input::from_files with external commands works**: The backtick
syntax (`\`echo hello\``) actually runs the command and captures
output. This is a real integration test — it runs `/bin/echo`.
4. **Vault CRUD was skipped**: Vault operations require a password
file with actual encrypted content via the `gman` crate's
`LocalProvider`. The `add_secret` method also prompts for a
password via `inquire`. Testing vault requires either mocking
the terminal or using `LocalProvider` directly with a pre-created
password file — deferred to a future iteration.
## Final counts
| Category | Tests |
|---|---|
| Unit tests (iterations 1-13) | 475 |
| Integration tests (iteration 14) | 22 |
| **Total** | **497** |
@@ -0,0 +1,71 @@
# Iteration 2 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/02-roles.md`
## Tests created
### src/config/role.rs (12 new tests, 15 total)
| Test name | What it verifies |
|---|---|
| `role_new_parses_prompt` | Role::new extracts prompt text |
| `role_new_parses_metadata` | Metadata block parses model, temperature, top_p |
| `role_new_parses_enabled_tools` | enabled_tools from metadata |
| `role_new_parses_enabled_mcp_servers` | enabled_mcp_servers from metadata |
| `role_new_no_metadata_has_none_fields` | No metadata → all optional fields None |
| `role_builtin_shell_loads` | Built-in "shell" role loads |
| `role_builtin_code_loads` | Built-in "code" role loads |
| `role_builtin_nonexistent_errors` | Non-existent built-in → error |
| `role_default_has_empty_fields` | Default role has empty name/prompt |
| `role_set_model_updates_model` | set_model() changes the model |
| `role_set_temperature_works` | set_temperature() changes temperature |
| `role_export_includes_metadata` | export() includes metadata and prompt |
### src/config/request_context.rs (5 new tests, 7 total)
| Test name | What it verifies |
|---|---|
| `use_role_obj_sets_role` | use_role_obj sets role on ctx |
| `exit_role_clears_role` | exit_role clears role from ctx |
| `use_prompt_creates_temp_role` | use_prompt creates TEMP_ROLE_NAME role |
| `extract_role_returns_standalone_role` | extract_role returns active role |
| `extract_role_returns_default_when_nothing_active` | extract_role returns default role |
**Total: 17 new tests (69 → 86)**
## Bugs discovered
None. Role parsing behavior matches between old and new code.
## Observations for future iterations
1. `retrieve_role` (which calls `Model::retrieve_model`) can't be
easily unit-tested without a real client config. It depends on
having at least one configured client. Deferred to integration
testing or plan 08 (RequestContext scope transitions).
2. The `use_role` async method (which calls `rebuild_tool_scope`)
requires async test runtime and MCP infrastructure. Deferred to
plan 05 (MCP lifecycle) and 08 (RequestContext).
3. `use_role_obj` correctly rejects when agent is active — tested
implicitly through the error path, but creating a mock Agent
is complex. Noted for plan 04 (agents).
4. The `extract_role` priority order (session > agent > role > default)
is important behavioral contract. Tests verify the role and
default cases. Session and agent cases deferred to plans 03, 04.
5. Added `create_test_ctx()` helper to request_context.rs tests.
Future iterations should reuse this.
## Plan file updates
Updated 02-roles.md to mark completed items.
## Next iteration
Plan file 03: Sessions — session create/load/save, compression,
autoname, carry-over, exit, context switching.
@@ -0,0 +1,76 @@
# Iteration 3 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/03-sessions.md`
## Tests created
### src/config/session.rs (15 new tests)
| Test name | What it verifies |
|---|---|
| `session_default_is_empty` | Default session is empty, no name, no role, not dirty |
| `session_new_from_ctx_captures_save_session` | new_from_ctx captures name, empty, not dirty |
| `session_set_role_captures_role_info` | set_role copies model_id, temperature, role_name, marks dirty |
| `session_clear_role` | clear_role removes role_name |
| `session_guard_empty_passes_when_empty` | guard_empty OK when empty |
| `session_needs_compression_threshold` | Empty session doesn't need compression |
| `session_needs_compression_returns_false_when_compressing` | Already compressing → false |
| `session_needs_compression_returns_false_when_threshold_zero` | Zero threshold → false |
| `session_set_compressing_flag` | set_compressing toggles flag |
| `session_set_save_session_this_time` | Doesn't panic |
| `session_save_session_returns_configured_value` | save_session get/set roundtrip |
| `session_compress_moves_messages` | compress moves messages to compressed, adds system |
| `session_is_not_empty_after_compress` | Session with compressed messages is not empty |
| `session_need_autoname_default_false` | Default session doesn't need autoname |
| `session_set_autonaming_doesnt_panic` | set_autonaming safe without autoname |
### src/config/request_context.rs (4 new tests, 11 total)
| Test name | What it verifies |
|---|---|
| `exit_session_clears_session` | exit_session removes session from ctx |
| `empty_session_clears_messages` | empty_session keeps session but clears it |
| `maybe_compress_session_returns_false_when_no_session` | No session → no compression |
| `maybe_autoname_session_returns_false_when_no_session` | No session → no autoname |
**Total: 19 new tests (86 → 105)**
## Bugs discovered
None. Session behavior matches between old and new code.
## Observations for future iterations
1. `Session::new_from_ctx` and `Session::load_from_ctx` have
`#[allow(dead_code)]` annotations — they were bridge methods.
Should verify if they're still needed or if the old `Session::new`
and `Session::load` (which take `&Config`) should be cleaned up
in a future pass.
2. The `compress` method moves messages to `compressed_messages` and
adds a single system message with the summary. This is a critical
behavioral contract — if the summary format changes, sessions
could break.
3. `needs_compression` uses `self.compression_threshold` (session-
level) with fallback to the global threshold. This priority
(session > global) is important behavior.
4. Session carry-over (the "incorporate last Q&A?" prompt) happens
inside `use_session` which is async and involves user interaction
(inquire::Confirm). Can't unit test this — needs integration test
or manual verification.
5. The `extract_role` test for session-active case should verify that
`session.to_role()` is returned. Added note to plan 02.
## Plan file updates
Updated 03-sessions.md to mark completed items.
## Next iteration
Plan file 04: Agents — agent init, tool compilation, variables,
lifecycle, MCP, RAG, auto-continuation.
@@ -0,0 +1,71 @@
# Iteration 4 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/04-agents.md`
## Tests created
### src/config/agent.rs (4 new tests)
| Test name | What it verifies |
|---|---|
| `agent_config_parses_from_yaml` | Full AgentConfig YAML with all fields |
| `agent_config_defaults` | Minimal AgentConfig gets correct defaults |
| `agent_config_with_model` | model_id, temperature, top_p from YAML |
| `agent_config_inject_defaults_true` | inject_todo/spawn_instructions default true |
### src/config/agent_runtime.rs (2 new tests)
| Test name | What it verifies |
|---|---|
| `agent_runtime_new_defaults` | All fields default correctly |
| `agent_runtime_builder_pattern` | with_depth, with_parent_supervisor work |
### src/config/request_context.rs (6 new tests, 17 total)
| Test name | What it verifies |
|---|---|
| `exit_agent_clears_all_agent_state` | exit_agent clears agent, agent_runtime, rag |
| `current_depth_returns_zero_without_agent` | Default depth is 0 |
| `current_depth_returns_agent_runtime_depth` | Depth from agent_runtime |
| `supervisor_returns_none_without_agent` | No agent → no supervisor |
| `inbox_returns_none_without_agent` | No agent → no inbox |
| `root_escalation_queue_returns_none_without_agent` | No agent → no queue |
**Total: 12 new tests (105 → 117)**
## Bugs discovered
None.
## Observations for future iterations
1. `Agent::init` can't be unit tested easily — requires agent config
files, tool files on disk. Integration tests with temp directories
would be needed for full coverage.
2. AgentConfig default values verified:
- `max_concurrent_agents` = 4
- `max_agent_depth` = 3
- `max_auto_continues` = 10
- `inject_todo_instructions` = true
- `inject_spawn_instructions` = true
These are important behavioral contracts.
3. The `exit_agent` test shows that clearing agent state also
rebuilds the tool_scope with fresh functions. This is the
correct behavior for returning to the global context.
4. Agent variable interpolation (special vars like __os__, __cwd__)
happens in Agent::init which is filesystem-dependent. Deferred.
5. `list_agents()` (which filters hidden dirs) is tested via the
`.shared` exclusion noted in improvements. Could add a unit test
with a temp dir if needed.
## Next iteration
Plan file 05: MCP Lifecycle — the most critical test area. McpFactory,
McpRuntime, spawn_mcp_server, rebuild_tool_scope MCP integration,
scope transition MCP behavior.
@@ -0,0 +1,129 @@
# Iteration 5 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/05-mcp-lifecycle.md`
## Tests created
### src/config/mcp_factory.rs (12 new tests)
| Test name | What it verifies |
|---|---|
| `key_from_stdio_spec_captures_command_args_env` | McpServerKey extracts command, args, env from stdio spec |
| `key_from_stdio_spec_sorts_args_and_env` | Args and env are sorted for deterministic key hashing |
| `key_from_stdio_spec_defaults_empty_when_none` | None args/env default to empty vecs |
| `key_from_remote_http_spec` | Http transport key captures url and transport type |
| `key_from_remote_sse_spec_with_sorted_headers` | SSE headers sorted for deterministic keys |
| `key_equality_same_spec_produces_equal_keys` | Same spec → equal keys (sharing contract) |
| `key_inequality_different_names` | Different server names → different keys |
| `key_inequality_different_commands` | Different commands → different keys (isolation contract) |
| `key_env_bool_and_int_coerce_to_string` | JsonField::Bool/Int coerced to String in key |
| `factory_try_get_active_returns_none_when_empty` | Empty factory returns None |
| `factory_try_get_active_returns_none_for_unknown_key` | Unknown key returns None |
| `factory_default_has_empty_active_map` | Default factory has empty internal map |
### src/config/tool_scope.rs (6 new tests)
| Test name | What it verifies |
|---|---|
| `mcp_runtime_new_is_empty` | New McpRuntime has no servers |
| `mcp_runtime_default_is_empty` | Default McpRuntime is empty |
| `mcp_runtime_get_returns_none_for_missing_server` | get() on nonexistent server returns None |
| `tool_scope_default_has_empty_mcp_runtime` | Default ToolScope has empty MCP runtime |
| `tool_scope_default_has_empty_functions` | Default ToolScope has no functions |
| `tool_scope_default_tracker_has_no_loops` | Default ToolScope tracker detects no loops |
### src/mcp/mod.rs (30 new tests)
| Test name | What it verifies |
|---|---|
| `validate_stdio_with_command_succeeds` | Valid stdio spec passes |
| `validate_stdio_missing_command_fails` | Stdio without command is rejected |
| `validate_stdio_with_url_fails` | Stdio with url (remote field) is rejected |
| `validate_stdio_with_headers_fails` | Stdio with headers (remote field) is rejected |
| `validate_http_with_url_succeeds` | Valid http spec passes |
| `validate_http_missing_url_fails` | Http without url is rejected |
| `validate_http_with_command_fails` | Http with command (stdio field) is rejected |
| `validate_http_with_args_fails` | Http with args (stdio field) is rejected |
| `validate_http_with_cwd_fails` | Http with cwd (stdio field) is rejected |
| `validate_sse_with_url_succeeds` | Valid SSE spec passes |
| `validate_sse_missing_url_fails` | SSE without url is rejected |
| `is_remote_true_for_http_and_sse` | Http and SSE are remote transports |
| `is_remote_false_for_stdio` | Stdio is not remote |
| `deserialize_stdio_server_from_json` | Full stdio spec from JSON |
| `deserialize_http_server_from_json` | Http spec with headers from JSON |
| `deserialize_env_with_mixed_types` | Env with String, Bool, Int values |
| `deserialize_multiple_servers` | Multiple server entries parsed |
| `deserialize_empty_servers_map` | Empty mcpServers map parsed |
| `deserialize_server_with_cwd` | cwd field parsed correctly |
| `resolve_all_returns_all_configured_servers` | "all" resolves to all config keys |
| `resolve_comma_separated_returns_matching_servers` | Comma-separated list filters correctly |
| `resolve_single_server_name` | Single name resolved |
| `resolve_none_returns_empty` | None enabled → empty list |
| `resolve_no_config_returns_empty` | No config → empty list |
| `resolve_nonexistent_server_filtered_out` | Unknown names silently filtered |
| `resolve_all_nonexistent_returns_empty` | All unknown → empty list |
| `resolve_trims_whitespace` | Whitespace in comma list trimmed |
| `registry_default_is_empty` | Default registry: empty, no config, no log |
| `registry_with_config_reports_config` | Config accessor works |
| `meta_function_prefixes_are_correct` | mcp_invoke/search/describe prefixes |
### src/config/request_context.rs (6 new tests)
| Test name | What it verifies |
|---|---|
| `rebuild_tool_scope_mcp_disabled_skips_servers` | mcp_server_support=false → empty runtime |
| `rebuild_tool_scope_no_enabled_servers_yields_empty_runtime` | None enabled → empty runtime |
| `rebuild_tool_scope_no_mcp_config_yields_empty_runtime` | No mcp_config → empty runtime |
| `rebuild_tool_scope_preserves_tool_tracker` | Tracker survives rebuild |
| `rebuild_tool_scope_repl_mode_appends_user_interaction_functions` | REPL adds user__ functions |
| `rebuild_tool_scope_cmd_mode_no_user_interaction_functions` | CMD skips user__ functions |
**Total: 54 new tests (176 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **ConnectedServer untestable without subprocess**: `ConnectedServer`
(= `RunningService<RoleClient, ()>`) cannot be constructed without
a real MCP server subprocess. This blocks unit testing for:
- McpFactory.acquire() full flow (spawn + insert + Weak sharing)
- McpRuntime.insert/get with real handles
- McpRuntime.search/describe/invoke (need live tool catalog)
- All scope transition tests (role/session/agent MCP start/stop)
These require integration tests with a mock MCP server binary
(e.g., a simple echo server). Recommended for a dedicated
integration test iteration.
2. **McpServerKey sorting guarantees sharing correctness**: The
sorting of args, env, and headers in McpServerKey::from_spec
is critical — without it, HashMap key equality would be
non-deterministic. Tests verify this explicitly.
3. **rebuild_tool_scope has 3 guard clauses that prevent server
acquisition**: mcp_server_support=false, mcp_config=None,
enabled_mcp_servers=None. All three paths tested.
4. **REPL vs CMD mode differs in user interaction functions**: The
`rebuild_tool_scope` method conditionally appends `user__*`
functions only in REPL mode. Tested both paths.
5. **McpServer::validate enforces strict transport/field separation**:
Stdio servers cannot have url/headers, remote servers cannot have
command/args/cwd. This prevents misconfiguration. All cross-field
conflict cases tested.
6. **McpRegistry.resolve_server_ids is private** but tested via
`#[cfg(test)]` in the same module. It's the core of server ID
resolution for "all", comma-separated, and empty cases.
## Next iteration
Plan file 06: Tool Evaluation — eval_tool_calls, ToolCall dispatch,
tool handlers, MCP tool invocation chain (mcp__search, mcp__describe,
mcp__invoke).
@@ -0,0 +1,96 @@
# Iteration 6 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/06-tool-evaluation.md`
## Tests created
### src/function/mod.rs (36 new tests)
| Test name | What it verifies |
|---|---|
| `toolcall_new_sets_fields` | ToolCall::new sets name, arguments, id |
| `toolcall_default_has_empty_fields` | Default ToolCall has empty/null fields |
| `toolcall_with_thought_signature` | with_thought_signature sets value |
| `toolcall_with_thought_signature_none` | with_thought_signature(None) clears |
| `dedup_removes_duplicate_ids_keeps_last` | Duplicate ids → last occurrence kept |
| `dedup_keeps_unique_ids` | Unique ids → all kept |
| `dedup_keeps_calls_without_ids` | No-id calls always kept |
| `dedup_preserves_last_occurrence_order` | Ordering based on last occurrence position |
| `dedup_empty_input_returns_empty` | Empty vec → empty result |
| `dedup_mixed_with_and_without_ids` | Mixed id/no-id dedup behavior |
| `tracker_default_values` | Default max_repeats=2, chain_len=3 |
| `tracker_no_loop_on_fresh_tracker` | Fresh tracker returns None |
| `tracker_no_loop_below_threshold` | Below max_repeats → no loop |
| `tracker_detects_loop_at_max_repeats` | At max_repeats → loop detected |
| `tracker_different_args_no_loop` | Different args break loop detection |
| `tracker_different_names_no_loop` | Different names break loop detection |
| `tracker_chain_detection` | Chain of identical calls detected |
| `tracker_record_call_respects_capacity` | Capacity bounded by chain_len * max_repeats |
| `tracker_loop_message_contains_call_history` | Loop message includes call_history JSON |
| `prefix_constants_are_correct` | All 6 prefixes: todo__, agent__, user__, mcp_invoke/search/describe |
| `functions_default_is_empty` | Default Functions has no declarations |
| `functions_append_todo_adds_declarations` | 5 todo tools: init, add, done, list, clear |
| `functions_append_supervisor_adds_declarations` | Supervisor: spawn, check, collect, list, cancel, reply |
| `functions_append_teammate_adds_declarations` | Teammate: send_message, check_inbox |
| `functions_append_user_interaction_adds_declarations` | User: ask, confirm, input, checkbox |
| `functions_append_mcp_meta_creates_three_per_server` | 3 MCP meta functions per server |
| `functions_append_mcp_meta_multiple_servers` | Multiple servers → 3 each |
| `functions_append_mcp_meta_empty_servers` | Empty servers → no declarations |
| `functions_find_returns_declaration` | find() returns matching declaration |
| `functions_find_returns_none_for_missing` | find() returns None for unknown |
| `functions_contains_true_for_existing` | contains() true for known function |
| `functions_contains_false_for_missing` | contains() false for unknown |
| `functions_mcp_invoke_declaration_has_tool_and_arguments_params` | Invoke schema: tool + arguments params |
| `functions_mcp_search_declaration_has_query_and_top_k_params` | Search schema: query + top_k params |
| `functions_mcp_describe_declaration_has_tool_param` | Describe schema: tool param |
| `functions_supervisor_includes_task_queue_tools` | Task queue: create, list, complete, fail |
| `tool_result_stores_call_and_output` | ToolResult::new stores both fields |
**Total: 36 new tests (212 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **ToolCall::dedup keeps the LAST occurrence**: The implementation
iterates in reverse and reverses again, so when duplicate ids
exist, the last occurrence wins. My initial tests assumed first-
wins behavior — caught and corrected during the iteration.
2. **ToolCall::eval requires full RequestContext**: The dispatch
routing (`agent__*`, `todo__*`, `user__*`, `mcp_*`, shell
fallback) cannot be unit-tested because `eval()` takes
`&mut RequestContext` which requires an initialized AppState.
The prefix routing is verified indirectly through prefix
constant tests and function declaration tests.
3. **Functions::init requires filesystem**: It calls
`build_global_tool_declarations` which reads tool files from
disk. Can't unit-test without a temp directory with actual
tool scripts. Function filtering by `enabled_tools` is thus
deferred.
4. **All function declaration appenders are fully testable**: The
`append_*` methods on Functions work without I/O and produce
the exact function declarations the LLM sees. This is the most
important behavioral contract to test.
5. **MCP meta function schemas are critical**: The invoke, search,
and describe meta functions each have specific parameter schemas
(tool+arguments, query+top_k, tool). Tests verify these schemas
exist with correct fields and required params.
6. **ToolCallTracker loop detection has two mechanisms**:
- Consecutive repeat detection (same call N times in a row)
- Chain detection (same call repeated across the last chain_len
entries)
Both are tested independently.
## Next iteration
Plan file 07: Input Construction — Input::from_str, from_files,
field capturing, function selection.
@@ -0,0 +1,97 @@
# Iteration 7 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/07-input-construction.md`
## Tests created
### src/config/input.rs (31 new tests)
| Test name | What it verifies |
|---|---|
| `resolve_role_with_explicit_role` | Explicit role returned, with_session/agent false |
| `resolve_role_without_role_no_session_no_agent` | Default role, both flags false |
| `resolve_role_without_role_with_session` | with_session true when session present |
| `resolve_role_explicit_role_overrides_session_flag` | Explicit role forces with_session=false |
| `resolve_paths_detects_last_reply_syntax` | %% sets with_last_reply=true |
| `resolve_paths_detects_url` | https:// classified as remote URL |
| `resolve_paths_detects_external_command` | Backtick-wrapped → external command |
| `resolve_paths_empty_input` | Empty vec → all empty, no last reply |
| `resolve_paths_rejects_url_with_glob_suffix` | URL** → error |
| `resolve_paths_mixed_inputs` | %% + URL + cmd all detected |
| `input_from_str_captures_text` | Text stored correctly |
| `input_from_str_with_explicit_role` | Role name captured |
| `input_from_str_captures_stream_from_config` | stream=false from config |
| `input_is_empty_with_no_text_and_no_medias` | Empty text + no medias = empty |
| `input_is_not_empty_with_text` | Text present = not empty |
| `input_set_text_changes_text` | set_text updates text |
| `input_text_returns_patched_when_set` | Patched text overrides |
| `input_clear_patch_restores_original` | clear_patch removes override |
| `input_set_continue_output_accumulates` | Multiple calls concatenate |
| `input_set_regenerate_sets_flag_and_clears_tool_calls` | Flag set, tool_calls cleared |
| `input_summary_truncates_long_text` | >80 chars → truncated with ... |
| `input_summary_preserves_short_text` | Short text unchanged |
| `input_raw_with_no_files` | Raw returns just text |
| `input_render_with_no_medias` | Render returns just text |
| `input_with_agent_false_when_no_agent` | No agent context → false |
| `input_session_returns_none_when_with_session_false` | Explicit role → no session access |
| `input_session_returns_some_when_with_session_true` | Session context → session access |
| `is_image_recognizes_image_extensions` | png/jpeg/jpg/webp/gif recognized |
| `is_image_rejects_non_image_extensions` | txt/rs/pdf rejected |
| `resolve_data_url_returns_path_for_known_hash` | Hash lookup returns path |
| `resolve_data_url_returns_original_for_non_data_url` | Non-data URL returned as-is |
### src/config/request_context.rs (7 new tests)
| Test name | What it verifies |
|---|---|
| `select_functions_returns_none_when_no_tools_enabled` | No enabled_tools → None |
| `select_functions_returns_none_when_function_calling_disabled` | function_calling_support=false → None |
| `select_functions_all_enabled_tools_returns_all_non_mcp` | "all" → all non-MCP declarations |
| `select_functions_comma_separated_filters` | Comma list → matching subset |
| `select_enabled_mcp_servers_returns_empty_when_mcp_disabled` | mcp_server_support=false → empty |
| `select_enabled_mcp_servers_all_returns_all_mcp_functions` | "all" → all MCP functions |
| `select_enabled_mcp_servers_comma_filters` | Server name → only that server's 3 functions |
**Total: 38 new tests (250 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **Input::from_files is async and I/O-heavy**: It fetches URLs,
reads files from disk, expands globs, and runs external commands.
Full testing requires integration tests with temp files/dirs.
2. **resolve_role with agent**: Testing requires an initialized
Agent (which needs config files on disk). The agent path is
tested indirectly through the existing `exit_agent` test in
iteration 4.
3. **resolve_paths is a pure function**: No I/O, fully testable.
It cleanly separates path classification (URL vs local vs cmd
vs loader) from actual loading. Good design for testing.
4. **select_functions has complex filtering**: It filters non-MCP
declarations by enabled_tools, then adds user__ functions for
non-agent contexts, then merges agent-specific functions. The
MCP selection mirrors this with MCP-prefixed declarations.
Both paths fully tested.
5. **Input captures state at construction time**: All fields
(stream_enabled, session, rag, functions) are captured from
RequestContext at Input creation. This snapshot-at-creation
pattern means the Input is independent of later context changes.
6. **The %% syntax for last-reply carry-over** is detected in
resolve_paths (pure function) but the actual last_reply
retrieval happens in from_files (async). Tested the detection
part.
## Next iteration
Plan file 08: Request Context — RequestContext methods, scope
transitions, state management.
@@ -0,0 +1,69 @@
# Iteration 8 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/08-request-context.md`
## Tests created
### src/config/request_context.rs (22 new tests, 51 total in file)
| Test name | What it verifies |
|---|---|
| `state_empty_context` | Empty context → empty StateFlags |
| `state_with_role_only` | Role set → ROLE flag |
| `state_with_empty_session` | Empty session → SESSION_EMPTY flag |
| `state_flags_combine_role_and_session` | Multiple flags combine correctly |
| `role_info_errors_when_no_role` | No role → error |
| `role_info_succeeds_with_role` | Role present → exports prompt |
| `agent_info_errors_when_no_agent` | No agent → error |
| `rag_info_errors_when_no_rag` | No RAG → error |
| `use_role_obj_errors_when_agent_active` | Agent blocks role assignment |
| `exit_rag_clears_rag` | exit_rag() sets rag to None |
| `discontinuous_last_message_sets_continuous_false` | Marks last message non-continuous |
| `discontinuous_last_message_noop_when_none` | No last message → no-op |
| `before_chat_completion_sets_last_message` | Creates LastMessage with empty output |
| `role_like_mut_returns_none_when_empty` | No active scope → None |
| `role_like_mut_returns_role_when_only_role` | Role only → returns role |
| `role_like_mut_prefers_session_over_role` | Session takes priority |
| `working_mode_cmd` | CMD mode flags correct |
| `working_mode_repl` | REPL mode flags correct |
| `session_file_returns_yaml_path` | Correct .yaml suffix |
| `session_file_with_subdir` | subdir/name → nested path |
| `is_compressing_session_false_when_no_session` | No session → false |
| `is_compressing_session_false_with_default_session` | Default session → false |
**Total: 22 new tests (272 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **Rag struct has no Default**: Rag requires an AppConfig, name,
embedding model, and HNSW index. Can't create test instances
without heavy setup. RAG-related state tests (state with RAG,
exit_rag with actual RAG) deferred.
2. **role_like_mut priority is session > agent > role > None**:
The session-over-role priority is verified. Agent priority
can't be easily tested without agent init (filesystem).
3. **StateFlags is a bitflags type**: Tested empty, individual
flags (ROLE, SESSION_EMPTY), and combinations. The SESSION
flag (non-empty session) requires adding messages to a session
which needs more setup — deferred.
4. **info() and sysinfo() require model provider config**: These
format system info strings that include model details. Testing
requires a valid model provider configuration.
5. **The RequestContext test file now has 51 tests** spanning
iterations 1, 4, 5, 7, and 8. It's the most heavily tested
module, which matches its role as the central state container.
## Next iteration
Plan file 09: REPL Commands — REPL command handlers, state
assertions, argument parsing.
@@ -0,0 +1,90 @@
# Iteration 9 — Test Implementation Notes
## Plan file addressed
`docs/testing/plans/09-repl-commands.md`
## Tests created
### src/config/mod.rs (8 new tests)
| Test name | What it verifies |
|---|---|
| `assert_state_pass_always_true` | pass() true for all flag combos |
| `assert_state_bare_only_empty` | bare() only matches empty |
| `assert_state_true_requires_flag_present` | True requires any match |
| `assert_state_true_with_multiple_flags_any_match` | OR semantics for True flags |
| `assert_state_false_requires_flag_absent` | False requires all absent |
| `assert_state_false_with_multiple_flags` | Multiple False flags all checked |
| `assert_state_truefalse_requires_true_present_and_false_absent` | Both conditions |
| `assert_state_equal_exact_match` | Exact flag equality |
### src/repl/mod.rs (31 new tests, 33 total in file)
| Test name | What it verifies |
|---|---|
| `repl_commands_has_39_entries` | Array size |
| `repl_commands_all_start_with_dot` | All commands dotted |
| `repl_commands_no_empty_descriptions` | All have descriptions |
| `repl_commands_help_is_always_available` | .help → pass |
| `repl_commands_exit_is_always_available` | .exit → pass |
| `repl_commands_info_role_requires_role` | .info role → True(ROLE) |
| `repl_commands_session_blocked_when_already_in_session` | .session → False(SESSION) |
| `repl_commands_exit_session_requires_session` | .exit session → True(SESSION) |
| `repl_commands_exit_agent_requires_agent` | .exit agent → True(AGENT) |
| `repl_commands_agent_only_when_bare` | .agent → Equal(empty) |
| `repl_commands_role_blocked_in_session_or_agent` | .role → False(SESSION\|AGENT) |
| `repl_commands_prompt_blocked_in_session_or_agent` | .prompt → False(SESSION\|AGENT) |
| `repl_commands_rag_blocked_in_agent` | .rag → False(AGENT) |
| `repl_commands_starter_requires_agent` | .starter → True(AGENT) |
| `repl_commands_clear_todo_requires_agent` | .clear todo → True(AGENT) |
| `repl_commands_edit_role_requires_role_not_session` | .edit role → TrueFalse |
| `repl_commands_exit_rag_requires_rag_not_agent` | .exit rag → TrueFalse |
| `parse_command_plain_text_returns_none` | Plain text → None |
| `parse_command_empty_returns_none` | Empty → None |
| `parse_command_whitespace_only_returns_none` | Whitespace → None |
| `parse_command_dot_only` | Single dot → (".", None) |
| `split_first_arg_none_input` | None → None |
| `split_first_arg_single_word` | "role" → ("role", None) |
| `split_first_arg_two_words` | "role x" → ("role", Some("x")) |
| `split_first_arg_with_extra_spaces` | Extra spaces trimmed |
| `repl_command_is_valid_pass_always_true` | pass → always valid |
| `repl_command_is_valid_respects_true` | True → enforced |
| `repl_command_is_valid_respects_false` | False → enforced |
| `multiline_regex_captures_content_between_markers` | :::content::: captured |
| `multiline_regex_does_not_match_single_marker` | Unclosed → no match |
| `multiline_regex_does_not_match_plain_text` | Plain text → no match |
**Total: 39 new tests (311 total in suite)**
## Bugs discovered
None.
## Observations for future iterations
1. **AssertState has 4 variants with distinct semantics**:
- True: any of the required flags must be present (OR)
- False: all of the forbidden flags must be absent (AND)
- TrueFalse: True AND False simultaneously
- Equal: exact flag match
This is a critical invariant for REPL command availability.
2. **The .agent command uses AssertState::bare()** (Equal(empty)),
meaning it's only available when NO other scope is active. This
is stricter than False — it requires exactly empty state.
3. **All 39 REPL commands** have correct dot prefixes and non-empty
descriptions. Verified as structural invariants.
4. **The multiline ::: syntax** is handled by a regex that requires
both opening and closing markers. The ReplValidator marks
single-marker input as Incomplete for the line editor.
5. **Command handler tests** (the actual .role, .session, .agent
implementations) require full async RequestContext with
filesystem access. These are integration tests and are deferred.
## Next iteration
Check the TEST-IMPLEMENTATION-PLAN.md for what plan file comes next.
@@ -0,0 +1,62 @@
# Test Plan: Config Loading and AppConfig
## Feature description
Loki loads its configuration from a YAML file (`config.yaml`) into
a `Config` struct, then converts it to `AppConfig` (immutable,
shared) + `RequestContext` (mutable, per-request). The `AppConfig`
holds all serialized fields; `RequestContext` holds runtime state.
## Behaviors to test
### Config loading
- [ ] Config loads from YAML file with all supported fields
- [x] Missing optional fields get correct defaults (config_defaults_match_expected)
- [ ] `model_id` defaults to first available model if empty (requires Config::init, integration test)
- [x] `temperature`, `top_p` default to `None`
- [x] `stream` defaults to `true`
- [x] `save` defaults to `false` (CORRECTED: was listed as true)
- [x] `highlight` defaults to `true`
- [x] `dry_run` defaults to `false`
- [x] `function_calling_support` defaults to `true`
- [x] `mcp_server_support` defaults to `true`
- [x] `compression_threshold` defaults to `4000`
- [ ] `document_loaders` populated from config and defaults (requires Config::init)
- [x] `clients` parsed from config (to_app_config_copies_clients)
### AppConfig conversion
- [x] `to_app_config()` copies all serialized fields correctly
- [x] `clients` field populated on AppConfig
- [ ] `visible_tools` correctly computed from `enabled_tools` config (deferred to plan 16)
- [x] `mapping_tools` correctly parsed
- [x] `mapping_mcp_servers` correctly parsed
- [ ] `user_agent` resolved (auto → crate name/version)
### RequestContext conversion
- [x] `to_request_context()` copies all runtime fields (to_request_context_creates_clean_state)
- [ ] `model` field populated with resolved model (requires Model::retrieve_model)
- [ ] `working_mode` set correctly (Repl vs Cmd)
- [x] `tool_scope` starts with default (empty)
- [x] `agent_runtime` starts as `None`
### AppConfig field accessors
- [x] `editor()` returns configured editor or $EDITOR
- [x] `light_theme()` returns theme flag
- [ ] `render_options()` returns options for markdown rendering
- [x] `sync_models_url()` returns configured or default URL
### Dynamic config updates
- [x] `update_app_config` closure correctly clones and replaces Arc
- [x] Changes to `dry_run`, `stream`, `save` persist across calls
- [x] Changes visible to subsequent `ctx.app.config` reads
## Context switching scenarios
- [ ] AppConfig remains immutable after construction (no field mutation)
- [ ] Multiple RequestContexts can share the same AppState
- [ ] Changing AppConfig fields (via clone-mutate-replace) doesn't
affect other references to the old Arc
## Old code reference
- `src/config/mod.rs``Config` struct, `Config::init`, defaults
- `src/config/bridge.rs``to_app_config`, `to_request_context`
- `src/config/app_config.rs``AppConfig` struct and methods
@@ -0,0 +1,68 @@
# Test Plan: Roles
## Feature description
Roles define a system prompt + optional model/temperature/MCP config
that customizes LLM behavior. Roles can be built-in or user-defined
(markdown files). Roles are "role-likes" — sessions and agents also
implement the RoleLike trait.
## Behaviors to test
### Role loading
- [x] Built-in roles load correctly (shell, code)
- [ ] User-defined roles load from markdown files (requires filesystem)
- [x] Role parses model_id from metadata
- [x] Role parses temperature, top_p from metadata
- [x] Role parses enabled_tools from metadata
- [x] Role parses enabled_mcp_servers from metadata
- [ ] Role with no model_id inherits current model (requires retrieve_role + client config)
- [ ] Role with no temperature inherits from AppConfig (requires retrieve_role)
- [ ] Role with no top_p inherits from AppConfig (requires retrieve_role)
### retrieve_role
- [ ] Retrieves by name from file system
- [ ] Resolves model via Model::retrieve_model
- [ ] Falls back to current model if role has no model_id
- [ ] Sets temperature/top_p from AppConfig when role doesn't specify
### use_role (scope transition)
- [x] Sets role on RequestContext (use_role_obj_sets_role)
- [ ] Triggers rebuild_tool_scope (async, deferred to plan 05/08)
- [ ] MCP servers start if role has enabled_mcp_servers (deferred to plan 05)
- [ ] MCP meta functions added to function list (deferred to plan 05)
- [ ] Previous role cleared when switching (deferred to plan 08)
- [x] Role-like temperature/top_p take effect (role_set_temperature_works)
### exit_role
- [x] Clears role from RequestContext (exit_role_clears_role)
- [ ] Followed by bootstrap_tools to restore global tool scope (async, deferred)
- [ ] MCP servers from role are stopped (deferred to plan 05)
- [ ] Global MCP servers restored (deferred to plan 05)
### use_prompt (temp role)
- [x] Creates a TEMP_ROLE_NAME role with the prompt text (use_prompt_creates_temp_role)
- [x] Uses current model
- [x] Activates via use_role_obj
### extract_role
- [ ] Returns role from agent if agent active (deferred to plan 04)
- [ ] Returns role from session if session active with role (deferred to plan 03)
- [x] Returns standalone role if active (extract_role_returns_standalone_role)
- [x] Returns default role if none active (extract_role_returns_default_when_nothing_active)
### One-shot role messages (REPL)
- [ ] `.role coder write hello` sends message with role, then exits role
- [ ] Original state restored after one-shot
## Context switching scenarios
- [ ] Role → different role: old role replaced, MCP swapped
- [ ] Role → session: role cleared, session takes over
- [ ] Role with MCP → exit: MCP servers stop, global MCP restored
- [ ] No MCP → role with MCP: servers start
- [ ] Role with MCP → role without MCP: servers stop
## Old code reference
- `src/config/mod.rs``use_role`, `exit_role`, `retrieve_role`
- `src/config/role.rs``Role` struct, parsing
- `src/config/request_context.rs``use_role`, `exit_role`, `use_prompt`, `retrieve_role`
@@ -0,0 +1,66 @@
# Test Plan: Sessions
## Feature description
Sessions persist conversation history across multiple turns. They
store messages, role context, model info, and optional MCP config.
Sessions can be temporary, named, or auto-named.
## Behaviors to test
### Session creation
- [ ] Temp session created with TEMP_SESSION_NAME
- [ ] Named session created at correct file path
- [ ] New session captures current role via extract_role
- [ ] New session captures save_session from AppConfig
- [ ] Session tracks model_id
### Session loading
- [ ] Named session loads from YAML file
- [ ] Loaded session resolves model via Model::retrieve_model
- [ ] Loaded session restores role_prompt if role exists
- [ ] Auto-named sessions (prefixed `_/`) handled correctly
### Session saving
- [ ] Session saved to correct path
- [ ] Session file contains messages, model_id, role info
- [ ] save_session flag controls whether session is persisted
- [ ] set_save_session_this_time overrides for current turn
### Session lifecycle
- [ ] use_session creates or loads session
- [ ] Already in session → error
- [ ] exit_session saves and clears
- [ ] empty_session clears messages but keeps session active
### Session carry-over
- [ ] New empty session with last_message prompts "incorporate?"
- [ ] If accepted, last Q&A added to session
- [ ] If declined, session starts fresh
- [ ] Only prompts when continuous and output not empty
### Session compression
- [ ] maybe_compress_session returns true when threshold exceeded
- [ ] compress_session reduces message count
- [ ] Compression message shown to user
- [ ] Session usable after compression
### Session autoname
- [ ] maybe_autoname_session returns true for new sessions
- [ ] Auto-naming sets session name based on content
- [ ] Autoname only triggers once per session
### Session info
- [ ] session_info returns formatted session details
- [ ] Shows message count, model, role, tokens
## Context switching scenarios
- [ ] Session → role change: role updated within session
- [ ] Session → exit session: messages saved, state cleared
- [ ] Agent session → exit: agent session cleanup
- [ ] Session with MCP → exit: MCP servers handled
## Old code reference
- `src/config/mod.rs``use_session`, `exit_session`, `empty_session`
- `src/config/session.rs``Session` struct, new, load, save
- `src/config/request_context.rs``use_session`, `exit_session`
@@ -0,0 +1,77 @@
# Test Plan: Agents
## Feature description
Agents combine a role (instructions), tools (bash/python/ts scripts),
optional RAG, optional MCP servers, and optional sub-agent spawning
capability. Agent::init compiles tools, resolves model, loads RAG,
and sets up the agent environment.
## Behaviors to test
### Agent initialization
- [ ] Agent::init loads config.yaml from agent directory
- [ ] Agent tools compiled from tools.sh / tools.py / tools.ts
- [ ] Tool file priority: .sh > .py > .ts > .js
- [ ] Global tools loaded (from global_tools config)
- [ ] Model resolved from agent config or defaults to current
- [ ] Agent with no model_id uses current model
- [ ] Temperature/top_p from agent config applied
- [ ] Dynamic instructions (_instructions function) invoked if configured
- [ ] Static instructions loaded from config
- [ ] Agent variables interpolated into instructions
- [ ] Special variables (__os__, __cwd__, __now__, etc.) interpolated
- [ ] Agent .env file loaded if present
- [ ] Built-in agents installed on first run (skip if exists)
### Agent tools
- [ ] Agent-specific tools available as function declarations
- [ ] Global tools (from global_tools) also available
- [ ] Tool binaries built in agent bin directory
- [ ] clear_agent_bin_dir removes old binaries before rebuild
- [ ] Tool declarations include name, description, parameters
### Agent with MCP
- [ ] MCP servers listed in agent config started
- [ ] MCP meta functions (invoke/search/describe) added
- [ ] Agent with MCP but mcp_server_support=false → error
- [ ] MCP servers stopped on agent exit
### Agent with RAG
- [ ] RAG documents loaded from agent config
- [ ] RAG available during agent conversation
- [ ] RAG search results included in context
### Agent sessions
- [ ] Agent session started (temp or named)
- [ ] agent_session config used if no explicit session
- [ ] Agent session variables initialized
### Agent lifecycle
- [ ] use_agent checks function_calling_support
- [ ] use_agent errors if agent already active
- [ ] exit_agent clears agent, session, rag, supervisor
- [ ] exit_agent restores global tool scope
### Auto-continuation
- [ ] Agents with auto_continue=true continue after incomplete todos
- [ ] max_auto_continues limits continuation attempts
- [ ] Continuation prompt sent with todo state
- [ ] clear todo stops continuation
### Conversation starters
- [ ] Starters loaded from agent config
- [ ] .starter lists available starters
- [ ] .starter <n> sends the starter as a message
## Context switching scenarios
- [ ] Agent → exit: tools cleared, MCP stopped, session ended
- [ ] Agent with MCP → exit: MCP servers released, global MCP restored
- [ ] Already in agent → start agent: error
- [ ] Agent with RAG → exit: RAG cleared
## Old code reference
- `src/config/agent.rs` — Agent::init, agent config parsing
- `src/config/mod.rs` — use_agent, exit_agent
- `src/config/request_context.rs` — use_agent, exit_agent
- `src/function/mod.rs` — Functions::init_agent, tool compilation
@@ -0,0 +1,118 @@
# Test Plan: MCP Server Lifecycle
## Feature description
MCP (Model Context Protocol) servers are external tools that run
as subprocesses communicating via stdio. Loki manages their lifecycle
through McpFactory (start/share via Weak dedup) and McpRuntime
(per-scope active server handles). Servers are started/stopped
during scope transitions (role/session/agent enter/exit).
## Behaviors to test
### MCP config loading
- [x] mcp.json parsed correctly from functions directory
- [x] Server specs include command, args, env, cwd
- [ ] Vault secrets interpolated in mcp.json
- [ ] Missing secrets reported as warnings
- [x] McpServersConfig stored on AppState.mcp_config
### McpFactory
- [ ] acquire() spawns new server when none active (requires real subprocess)
- [ ] acquire() returns existing handle via Weak upgrade (requires real subprocess)
- [ ] acquire() spawns fresh when Weak is dead (requires real subprocess)
- [ ] Multiple acquire() calls for same spec share handle (requires real subprocess)
- [x] Different specs get different handles (via key inequality)
- [x] McpServerKey built correctly from spec (sorted args/env)
### McpRuntime
- [ ] insert() adds server handle by name (requires Arc<ConnectedServer>)
- [ ] get() retrieves handle by name (requires Arc<ConnectedServer>)
- [x] server_names() returns all active names
- [x] is_empty() correct for empty/non-empty
- [ ] search() finds tools by keyword (BM25 ranking) (requires live server)
- [ ] describe() returns tool input schema (requires live server)
- [ ] invoke() calls tool on server and returns result (requires live server)
### spawn_mcp_server
- [ ] Builds Command from spec (command, args, env, cwd) (integration test)
- [ ] Creates TokioChildProcess transport (integration test)
- [ ] Completes rmcp handshake (serve) (integration test)
- [ ] Returns Arc<ConnectedServer> (integration test)
- [ ] Log file created when log_path provided (integration test)
### rebuild_tool_scope (MCP integration)
- [x] Empty enabled_mcp_servers → no servers acquired
- [ ] "all" → all configured servers acquired (requires real subprocess)
- [ ] Comma-separated list → only listed servers acquired (requires real subprocess)
- [ ] Mapping resolution: alias → actual server key(s) (requires real subprocess)
- [ ] MCP meta functions appended for each started server (requires real subprocess)
- [ ] Old ToolScope dropped (releasing old server handles) (requires real subprocess)
- [ ] Loading spinner shown during acquisition (UI test)
- [ ] AbortSignal properly threaded through (integration test)
### Server lifecycle during scope transitions
- [ ] Enter role with MCP: servers start (integration test)
- [ ] Exit role: servers stop (handle dropped) (integration test)
- [ ] Enter role A (MCP-X) → exit → enter role B (MCP-Y):
X stops, Y starts (integration test)
- [ ] Enter role with MCP → exit to no MCP: servers stop,
global MCP restored (integration test)
- [ ] Start REPL with global MCP → enter agent with different MCP:
agent MCP takes over (integration test)
- [ ] Exit agent: agent MCP stops, global MCP restored (integration test)
### MCP tool invocation chain
- [ ] LLM calls mcp__search_<server> → search results returned (integration test)
- [ ] LLM calls mcp__describe_<server> tool_name → schema returned (integration test)
- [ ] LLM calls mcp__invoke_<server> tool args → tool executed (integration test)
- [ ] Server not found → "MCP server not found in runtime" error (tested via McpRuntime.get)
- [ ] Tool not found → appropriate error (requires live server)
### MCP support flag
- [x] mcp_server_support=false → no MCP servers started
- [ ] mcp_server_support=false + agent with MCP → error (blocks) (requires agent init)
- [ ] mcp_server_support=false + role with MCP → warning, continues (requires role init)
- [ ] .set mcp_server_support true → MCP servers start (requires live server)
### MCP in child agents
- [ ] Child agent MCP servers acquired via factory (integration test)
- [ ] Child agent MCP runtime populated (integration test)
- [ ] Child agent MCP tool invocations work (integration test)
- [ ] Child agent exit drops MCP handles (integration test)
## Context switching scenarios (comprehensive)
- [ ] No MCP → role with MCP → exit role → no MCP (integration test)
- [ ] Global MCP-A → role MCP-B → exit role → global MCP-A (integration test)
- [ ] Global MCP-A → agent MCP-B → exit agent → global MCP-A (integration test)
- [ ] Role MCP-A → session MCP-B (overrides) → exit session (integration test)
- [ ] Agent MCP → child agent MCP → child exits → parent MCP intact (integration test)
- [ ] .set enabled_mcp_servers X → .set enabled_mcp_servers Y:
X released, Y acquired (integration test)
- [ ] .set enabled_mcp_servers null → all released (integration test)
## Additional behaviors tested (not in original plan)
- [x] McpServerKey equality: same spec → equal keys
- [x] McpServerKey inequality: different names → different keys
- [x] McpServerKey inequality: different commands → different keys
- [x] McpServerKey env coercion: Bool/Int → String
- [x] McpFactory default has empty active map
- [x] McpServer::is_remote() true for Http/Sse, false for Stdio
- [x] McpServer::validate() all cross-field conflicts (6 cases)
- [x] McpServersConfig: empty servers map, multiple servers, cwd field
- [x] McpRegistry: default state, config accessor
- [x] McpRegistry: resolve with whitespace trimming
- [x] McpRegistry: resolve all-nonexistent returns empty
- [x] rebuild_tool_scope: no mcp_config yields empty runtime
- [x] rebuild_tool_scope: preserves tool_tracker across rebuild
- [x] rebuild_tool_scope: REPL mode appends user interaction functions
- [x] rebuild_tool_scope: CMD mode excludes user interaction functions
- [x] MCP meta function name prefix constants are correct
- [x] ToolScope default: empty functions, runtime, tracker
## Old code reference
- `src/mcp/mod.rs` — McpRegistry, init, reinit, start/stop
- `src/config/mcp_factory.rs` — McpFactory, acquire, McpServerKey
- `src/config/tool_scope.rs` — ToolScope, McpRuntime
- `src/config/request_context.rs` — rebuild_tool_scope, bootstrap_tools
@@ -0,0 +1,85 @@
# Test Plan: Tool Evaluation
## Feature description
When the LLM returns tool calls, `eval_tool_calls` dispatches each
call to the appropriate handler. Handlers include: shell tools
(bash/python/ts scripts), MCP tools, supervisor tools (agent spawn),
todo tools, and user interaction tools.
## Behaviors to test
### eval_tool_calls dispatch
- [ ] Calls dispatched to correct handler by function name prefix (requires RequestContext)
- [ ] Tool results returned for each call (requires RequestContext)
- [ ] Multiple concurrent tool calls processed (requires RequestContext)
- [x] Tool call tracker updated (chain length, repeats)
- [ ] Root agent (depth 0) checks escalation queue after eval (requires RequestContext)
- [ ] Escalation notifications injected into results (requires RequestContext)
### ToolCall::eval routing
- [ ] agent__* → handle_supervisor_tool (requires RequestContext)
- [ ] todo__* → handle_todo_tool (requires RequestContext)
- [ ] user__* → handle_user_tool (depth 0) or escalate (depth > 0) (requires RequestContext)
- [ ] mcp_invoke_* → invoke_mcp_tool (requires RequestContext + live MCP)
- [ ] mcp_search_* → search_mcp_tools (requires RequestContext + live MCP)
- [ ] mcp_describe_* → describe_mcp_tool (requires RequestContext + live MCP)
- [ ] Other → shell tool execution (requires RequestContext + binary)
### Shell tool execution
- [ ] Tool binary found and executed (integration test)
- [ ] Arguments passed correctly (integration test)
- [ ] Environment variables set (LLM_OUTPUT, etc.) (integration test)
- [ ] Tool output returned as result (integration test)
- [ ] Tool failure → error returned as tool result (not panic) (integration test)
### Tool call tracking
- [x] Tracker counts consecutive identical calls
- [x] Max repeats triggers warning
- [x] Chain length tracked across turns
- [x] Tracker state preserved across tool-result loops
### Function selection
- [ ] select_functions filters by role's enabled_tools (requires filesystem)
- [x] select_functions includes MCP meta functions for enabled servers
- [x] select_functions includes agent functions when agent active (via append tests)
- [ ] "all" enables all functions (requires filesystem)
- [ ] Comma-separated list enables specific functions (requires filesystem)
## Context switching scenarios
- [ ] Tool calls during agent → agent tools available (integration test)
- [ ] Tool calls during role → role tools available (integration test)
- [ ] Tool calls with MCP → MCP invoke/search/describe work (integration test)
- [x] No agent → no agent__/todo__ tools in declarations (via Functions::default)
## Additional behaviors tested (not in original plan)
- [x] ToolCall::new sets name, arguments, id correctly
- [x] ToolCall::default has empty/null fields
- [x] ToolCall::with_thought_signature sets and clears
- [x] ToolCall::dedup keeps last occurrence for duplicate ids
- [x] ToolCall::dedup keeps all calls without ids
- [x] ToolCall::dedup empty input returns empty
- [x] ToolCall::dedup mixed with/without ids
- [x] ToolCallTracker default values (max_repeats=2, chain_len=3)
- [x] ToolCallTracker no loop on fresh tracker
- [x] ToolCallTracker no loop below threshold
- [x] ToolCallTracker different args breaks loop
- [x] ToolCallTracker different names breaks loop
- [x] ToolCallTracker record_call respects capacity
- [x] ToolCallTracker loop message includes call_history
- [x] All 6 prefix constants verified
- [x] Functions::append_todo adds all 5 todo tools
- [x] Functions::append_supervisor adds spawn/check/collect/list/cancel/reply + task queue
- [x] Functions::append_teammate adds send_message/check_inbox
- [x] Functions::append_user_interaction adds ask/confirm/input/checkbox
- [x] Functions::append_mcp_meta creates 3 per server with correct schemas
- [x] Functions::append_mcp_meta empty servers → no declarations
- [x] Functions::find/contains work correctly
- [x] ToolResult::new stores call and output
## Old code reference
- `src/function/mod.rs` — eval_tool_calls, ToolCall::eval
- `src/function/supervisor.rs` — handle_supervisor_tool
- `src/function/todo.rs` — handle_todo_tool
- `src/function/user_interaction.rs` — handle_user_tool
@@ -0,0 +1,88 @@
# Test Plan: Input Construction
## Feature description
`Input` encapsulates a single chat turn's data: text, files, role,
model, session context, RAG embeddings, and function declarations.
It's constructed at the start of each turn and captures all needed
state from `RequestContext`.
## Behaviors to test
### Input::from_str
- [x] Creates Input from text string
- [x] Captures role via resolve_role
- [x] Captures session from ctx
- [ ] Captures rag from ctx (requires RAG setup)
- [ ] Captures functions via select_functions (tested separately)
- [x] Captures stream_enabled from AppConfig
- [x] app_config field set from ctx.app.config
- [x] Empty text → is_empty() returns true
### Input::from_files
- [ ] Loads file contents (async + filesystem)
- [ ] Supports multiple files (async + filesystem)
- [ ] Supports directories (recursive) (async + filesystem)
- [ ] Supports URLs (fetches content) (async + network)
- [ ] Supports loader syntax (e.g., jina:url) (async + loader)
- [x] Last message carry-over (%% syntax) (via resolve_paths)
- [ ] Combines file content with text (async)
- [ ] document_loaders from AppConfig used (async)
### resolve_role
- [x] Returns provided role if given
- [ ] Extracts role from agent if agent active (requires agent init)
- [x] Extracts role from session if session has role
- [x] Returns default model-based role otherwise
- [x] with_session flag set correctly
- [x] with_agent flag set correctly
### Input methods
- [ ] stream() returns stream_enabled && !model.no_stream() (requires Model with no_stream)
- [ ] create_client() uses app_config to init client (requires client config)
- [ ] prepare_completion_data() uses captured functions (requires Model)
- [ ] build_messages() uses captured session (requires Message setup)
- [ ] echo_messages() uses captured session (requires Message setup)
- [x] set_regenerate(role) refreshes role
- [ ] use_embeddings() searches RAG if present (requires RAG)
- [ ] merge_tool_results() creates continuation input (requires ToolResult)
## Context switching scenarios
- [ ] Input with agent → agent functions selected (requires agent init)
- [x] Input with MCP → MCP meta functions in declarations (via select_functions tests)
- [ ] Input with RAG → embeddings included after use_embeddings (requires RAG)
- [x] Input without session → no session messages in build_messages (via session() test)
## Additional behaviors tested (not in original plan)
- [x] resolve_role: explicit role overrides session flag
- [x] resolve_paths: empty input
- [x] resolve_paths: URL detection (https://)
- [x] resolve_paths: external command detection (backtick syntax)
- [x] resolve_paths: rejects URL with glob suffix
- [x] resolve_paths: mixed inputs (%%, URL, external cmd)
- [x] Input::set_text changes text
- [x] Input::patched_text overrides text()
- [x] Input::clear_patch restores original
- [x] Input::set_continue_output accumulates
- [x] Input::summary truncates long text with ...
- [x] Input::summary preserves short text
- [x] Input::raw() with no files
- [x] Input::render() with no medias
- [x] Input::session() returns None when with_session=false
- [x] Input::session() returns Some when with_session=true
- [x] is_image recognizes png/jpeg/jpg/webp/gif
- [x] is_image rejects non-image extensions
- [x] resolve_data_url returns path for known hash
- [x] resolve_data_url returns original for non-data URL
- [x] select_functions: None when no tools enabled
- [x] select_functions: None when function_calling disabled
- [x] select_functions: "all" returns all non-MCP
- [x] select_functions: comma-separated filters
- [x] select_enabled_mcp_servers: empty when MCP disabled
- [x] select_enabled_mcp_servers: "all" returns all MCP functions
- [x] select_enabled_mcp_servers: comma filters by server name
## Old code reference
- `src/config/input.rs` — Input struct, from_str, from_files
- `src/config/mod.rs` — select_functions, extract_role
@@ -0,0 +1,87 @@
# Test Plan: RequestContext
## Feature description
`RequestContext` is the per-request mutable state container. It holds
the active model, role, session, agent, RAG, tool scope, and agent
runtime. It provides methods for scope transitions, state queries,
and chat completion lifecycle.
## Behaviors to test
### State management
- [ ] info() returns formatted system info (requires model provider config)
- [x] state() returns correct StateFlags combination
- [ ] current_model() returns active model (tested implicitly via extract_role)
- [x] role_info() errors when no role, succeeds with role
- [ ] session_info() format (requires filesystem for sessions)
- [x] rag_info() errors when no rag
- [x] agent_info() errors when no agent
- [ ] sysinfo() returns system details (requires model provider config)
- [x] working_mode correctly distinguishes Repl vs Cmd
### Scope transitions
- [x] use_role changes role (via use_role_obj)
- [ ] use_session creates/loads session, rebuilds tool scope (async + filesystem)
- [x] use_agent initializes agent with all subsystems (via exit_agent test)
- [x] exit_role clears role
- [x] exit_session saves and clears session
- [x] exit_agent clears agent, supervisor, rag, session
- [x] exit_rag clears rag
- [ ] bootstrap_tools rebuilds tool scope with global MCP (async + MCP servers)
### Chat completion lifecycle
- [x] before_chat_completion sets up for API call
- [ ] after_chat_completion saves messages, updates state (async + client)
- [x] discontinuous_last_message marks last message as non-continuous
### ToolScope management
- [x] rebuild_tool_scope creates fresh Functions
- [ ] rebuild_tool_scope acquires MCP servers via factory (requires live MCP)
- [x] rebuild_tool_scope appends user interaction functions in REPL mode
- [ ] rebuild_tool_scope appends MCP meta functions for started servers (requires live MCP)
- [x] Tool tracker preserved across scope rebuilds
### AgentRuntime management
- [x] agent_runtime populated by use_agent (via exit_agent test)
- [x] agent_runtime cleared by exit_agent
- [x] Accessor methods (current_depth, supervisor, inbox, etc.) return
correct values when agent active
- [x] Accessor methods return defaults when no agent
### Settings update
- [ ] update() handles all .set keys correctly (requires REPL command infra)
- [x] update_app_config() clones and replaces Arc properly
- [ ] delete() handles all delete subcommands (requires REPL command infra)
### Session helpers
- [ ] list_sessions() returns session names (requires filesystem)
- [ ] list_autoname_sessions() returns auto-named sessions (requires filesystem)
- [x] session_file() returns correct path
- [ ] save_session() persists session (requires filesystem)
- [x] empty_session() clears messages
## Context switching scenarios
- [x] No state → use_role → exit_role → no state
- [x] No state → use_agent → exit_agent → no state
- [x] Agent active → use_role_obj errors
- [ ] Agent → exit_agent → use_role (clean transition) (async)
## Additional behaviors tested (not in original plan)
- [x] state() empty context returns empty flags
- [x] state() role only → ROLE flag
- [x] state() empty session → SESSION_EMPTY flag
- [x] state() role + session flags combine
- [x] discontinuous_last_message noop when no last_message
- [x] before_chat_completion creates LastMessage with empty output and continuous=true
- [x] role_like_mut returns None when no active scope
- [x] role_like_mut returns role when only role active
- [x] role_like_mut prefers session over role
- [x] session_file handles subdir/name format
- [x] is_compressing_session false with no session
- [x] is_compressing_session false with default session
## Old code reference
- `src/config/request_context.rs` — all methods
- `src/config/mod.rs` — original Config methods (for parity)
@@ -0,0 +1,92 @@
# Test Plan: REPL Commands
## Feature description
The REPL processes dot-commands (`.role`, `.session`, `.agent`, etc.)
and plain text (chat messages). Each command has state assertions
(e.g., `.info role` requires an active role).
## Behaviors to test
### Command parsing
- [x] Dot-commands parsed correctly (command + args)
- [x] Multi-line input (:::) handled (regex)
- [x] Plain text treated as chat message (parse_command returns None)
- [x] Empty input ignored (parse_command returns None)
### State assertions (REPL_COMMANDS array)
- [x] Each command's assert_state enforced correctly
- [x] Invalid state → command rejected (via is_valid)
- [x] Commands with AssertState::pass() always available
### Command handlers (each one)
- [ ] .help — prints help text
- [ ] .info [subcommand] — displays appropriate info
- [ ] .model <name> — switches model
- [ ] .prompt <text> — sets temp role
- [ ] .role <name> [text] — enters role or one-shot
- [ ] .session [name] — starts/resumes session
- [ ] .agent <name> [session] [key=value] — starts agent
- [ ] .rag [name] — initializes RAG
- [ ] .starter [n] — lists or executes conversation starter
- [ ] .set <key> <value> — updates setting
- [ ] .delete <type> — deletes item
- [ ] .exit [type] — exits scope or REPL
- [ ] .save role/session [name] — saves to file
- [ ] .edit role/session/config/agent-config/rag-docs — opens editor
- [ ] .empty session — clears session
- [ ] .compress session — compresses session
- [ ] .rebuild rag — rebuilds RAG
- [ ] .sources rag — shows RAG sources
- [ ] .copy — copies last response
- [ ] .continue — continues response
- [ ] .regenerate — regenerates response
- [ ] .file <path> [-- text] — includes files
- [ ] .macro <name> [text] — runs/creates macro
- [ ] .authenticate — OAuth flow
- [ ] .vault <cmd> [name] — vault operations
- [ ] .clear todo — clears agent todo
### ask function (chat flow)
- [ ] Input constructed from text
- [ ] Embeddings applied if RAG active
- [ ] Waits for compression to complete
- [ ] before_chat_completion called
- [ ] Streaming vs non-streaming based on config
- [ ] Tool results loop (recursive ask with merged results)
- [ ] after_chat_completion called
- [ ] Auto-continuation for agents with todos
## Additional behaviors tested (not in original plan)
- [x] AssertState::pass() always returns true (all flag combos)
- [x] AssertState::bare() only matches empty flags
- [x] AssertState::True requires any matching flag present
- [x] AssertState::True with multiple flags — any match suffices
- [x] AssertState::False requires all specified flags absent
- [x] AssertState::False with multiple flags
- [x] AssertState::TrueFalse — true present AND false absent
- [x] AssertState::Equal — exact flag match
- [x] REPL_COMMANDS has exactly 39 entries
- [x] All commands start with '.'
- [x] All commands have non-empty descriptions
- [x] .help, .exit always available (pass)
- [x] .info role requires ROLE
- [x] .session blocked when already in session
- [x] .exit session requires session
- [x] .exit agent requires agent
- [x] .agent only when bare (no role/session/agent)
- [x] .role blocked in session/agent
- [x] .prompt blocked in session/agent
- [x] .rag blocked in agent
- [x] .starter requires agent
- [x] .clear todo requires agent
- [x] .edit role requires ROLE, blocked in SESSION
- [x] .exit rag requires RAG, blocked in AGENT
- [x] split_first_arg: None, single word, two words, extra spaces
- [x] parse_command: plain text, empty, whitespace, dot only
- [x] ReplCommand::is_valid with pass/True/False
- [x] Multiline regex: captures content, rejects unclosed, rejects plain text
## Old code reference
- `src/repl/mod.rs` — run_repl_command, ask, REPL_COMMANDS
@@ -0,0 +1,67 @@
# Test Plan: CLI Flags
## Feature description
Loki CLI accepts flags for model, role, session, agent, file input,
execution mode, and various info/list commands. Flags determine
the execution path through main.rs.
## Behaviors to test
### Early-exit flags
- [x] --info parsed correctly
- [x] --list-models parsed correctly
- [x] --list-roles parsed correctly
- [x] --list-sessions parsed correctly
- [x] --list-agents parsed correctly
- [x] --list-rags parsed correctly
- [x] --list-macros parsed correctly
- [x] --sync-models parsed correctly
- [x] --build-tools parsed correctly
- [ ] --authenticate runs OAuth and exits (integration)
- [ ] --completions generates shell completions and exits (integration)
- [x] Vault flags (--add/get/update/delete-secret, --list-secrets) parsed
### Mode selection
- [x] No text/file → text returns None (REPL indicator)
- [x] Text provided → text joined and returned
- [x] --agent → agent field set
- [x] --role → role field set
- [x] --execute (-e) → execute flag set
- [x] --code (-c) → code flag set
- [x] --prompt → prompt field set
- [x] --macro → macro_name field set
### Flag combinations
- [x] --model + --role parsed together
- [x] --session + --role parsed together
- [ ] --session + --agent → agent with session (integration)
- [ ] --agent + --agent-variable → variables set (integration)
- [x] --dry-run flag parsed
- [x] --no-stream (-S) flag parsed
- [x] --file + text → both parsed
- [x] --empty-session + --session parsed
- [x] --save-session + --session parsed
### Prelude
- [ ] apply_prelude runs before main execution (async + filesystem)
- [ ] Prelude "role:name" loads role (async + filesystem)
- [ ] Prelude "session:name" loads session (async + filesystem)
- [ ] Prelude "session:role" loads both (async + filesystem)
- [ ] Prelude skipped if macro_flag set (async)
- [ ] Prelude skipped if state already has role/session/agent (async)
## Additional behaviors tested (not in original plan)
- [x] Default Cli has all flags unset/empty
- [x] Short flags: -m, -r, -a, -s, -e, -c, -S, -f
- [x] Multiple -f flags accumulate
- [x] Trailing text args collected as vec
- [x] Cli::text() returns None with no args (terminal stdin)
- [x] Cli::text() joins trailing args with spaces
- [x] --rag flag parsed
- [x] --macro flag parsed
## Old code reference
- `src/cli/mod.rs` — Cli struct, flag definitions
- `src/main.rs` — run(), flag processing, mode branching
@@ -0,0 +1,106 @@
# Test Plan: Sub-Agent Spawning
## Feature description
Agents with can_spawn_agents=true can spawn child agents that run
in parallel as background tokio tasks. Children communicate results
back to the parent via collect/check. Escalation allows children
to request user input through the parent.
## Behaviors to test
### Spawn
- [ ] agent__spawn creates child agent in background (requires agent config on disk)
- [x] Child gets own RequestContext with incremented depth (new_for_child)
- [x] Child starts with empty scope (new_for_child)
- [x] Child gets shared root_escalation_queue (new_for_child)
- [x] Child gets inbox for teammate messaging (new_for_child)
- [x] Child inherits parent_supervisor (new_for_child)
- [ ] Child MCP servers acquired if configured (requires live MCP)
- [x] Max concurrent agents enforced (Supervisor.register)
- [x] Max depth enforced (Supervisor.register)
- [ ] Agent not found → error (requires agent config on disk)
- [ ] can_spawn_agents=false → no spawn tools available (requires agent init)
### Collect/Check
- [x] agent__check returns PENDING for running agent
- [x] agent__check returns error for unknown agent
- [ ] agent__collect blocks until done, returns output (requires real child completion)
- [ ] Output summarization when exceeds threshold (requires LLM client)
- [ ] Summarization uses configured model (requires LLM client)
### Task queue (handler integration tests)
- [x] handle_task_create creates tasks (simple, with deps, with dispatch_agent)
- [x] handle_task_create errors when agent set without prompt
- [x] handle_task_complete unblocks dependents
- [x] handle_task_list shows all tasks
- [x] handle_task_fail marks failed and reports blocked dependents
- [x] handle_task_fail returns error for missing task
### Escalation (handler integration tests)
- [x] handle_reply_escalation delivers reply via oneshot channel
- [x] handle_reply_escalation errors for missing escalation_id
- [x] handle_reply_escalation errors when no queue
- [x] Pending summary contains correct fields
- [x] Reply reaches receiver via oneshot channel
- [ ] Escalation timeout → fallback message (requires tokio timeout)
### Teammate messaging (handler integration tests)
- [x] handle_send_message delivers to registered agent's inbox
- [x] handle_send_message errors for unknown agent
- [x] handle_check_inbox returns messages with count
- [x] handle_check_inbox returns empty when no inbox
- [x] handle_check_inbox returns empty for empty inbox
### Cancel/List (handler integration tests)
- [x] handle_list returns empty for fresh supervisor
- [x] handle_list returns registered agents
- [x] handle_list errors when no supervisor
- [x] handle_cancel removes agent and signals abort
- [x] handle_cancel errors for unknown agent
- [x] handle_cancel errors when no supervisor
### Dispatch routing
- [x] Unknown action → error with "Unknown supervisor action"
- [x] agent__list routes to handle_list
- [x] agent__task_list routes to handle_task_list
### Child agent lifecycle
- [ ] run_child_agent loops (requires LLM client)
- [ ] Child uses before/after_chat_completion (requires LLM client)
- [ ] Child tool calls evaluated (requires LLM client)
- [ ] Child exits cleanly (requires LLM client)
## Context switching scenarios
- [ ] Parent spawns child with MCP (requires live MCP + agent config)
- [ ] Parent exits agent → all children cancelled (requires agent init)
- [x] Multiple children share escalation queue (new_for_child + ensure_root_escalation_queue)
## Additional behaviors tested (not in original plan)
- [x] EscalationQueue: default, submit, take, take_nonexistent, has_pending
- [x] EscalationQueue: pending_summary with/without options, empty
- [x] EscalationQueue: reply via oneshot channel
- [x] new_escalation_id: prefix and uniqueness
- [x] Inbox: new/default empty, deliver+drain, drain empties, multiple deliveries
- [x] Inbox: clone preserves messages, clone is independent
- [x] Supervisor: new defaults, register count, take removes, take nonexistent
- [x] Supervisor: inbox accessor, list_agents, task_queue accessible
- [x] Supervisor: register allows at max_depth boundary
- [x] AgentExitStatus: equality/inequality
- [x] TaskQueue: fail sets status, get missing returns None
- [x] TaskQueue: dispatch_agent/prompt stored, claim blocked fails
- [x] TaskQueue: list sorted by id, default empty
- [x] TaskQueue: dependency on nonexistent errors, complete nonexistent
- [x] TaskNode: is_runnable when pending+unblocked, not when blocked
## Integration handler tests added
- [x] All handle_* functions tested via handler integration tests (36 tests)
- [x] new_for_child: depth, id, inbox, escalation queue, parent supervisor, empty scope
- [x] ensure_root_escalation_queue: lazy init, same Arc on repeated calls
- [x] AppState::test_default() helper added for cross-module test construction
## Old code reference
- `src/function/supervisor.rs` — all handler functions
- `src/supervisor/` — Supervisor, EscalationQueue, Inbox, TaskQueue
+33
View File
@@ -0,0 +1,33 @@
# Test Plan: RAG
## Behaviors to test
- [ ] Rag::init creates new RAG with embedding model (requires LLM client)
- [ ] Rag::load loads existing RAG from disk (requires filesystem)
- [ ] Rag::create builds vector store from documents (requires embedding model)
- [ ] Rag::refresh_document_paths updates document list (requires filesystem)
- [ ] RAG search returns relevant embeddings (requires embedding model)
- [x] RAG template contains required placeholders
- [ ] Reranker model applied when configured (requires LLM client)
- [ ] top_k controls number of results (requires embedding model)
- [ ] RAG sources tracked for .sources command (requires full Rag struct)
- [x] exit_rag clears RAG from context (tested in iteration 8)
## Additional behaviors tested
- [x] DocumentId: new/split round-trip, zero/zero, large values
- [x] DocumentId: Debug format ("file-doc"), equality, inequality, ordering
- [x] RagDocument: new with content, default empty
- [x] RagData: new sets all defaults, empty collections
- [x] RagData::get: returns document, None for missing file, None for missing doc index
- [x] RagData::del: removes files + associated vectors, noop for nonexistent
- [x] RagData::add: inserts files, vectors, updates next_file_id
- [x] RagData::build_bm25: empty data returns no results
- [x] RagData::build_bm25: finds documents by keyword (BM25 ranking)
- [x] RAG_TEMPLATE: contains __CONTEXT__, __SOURCES__, __INPUT__
- [x] get_separators: Rust/Python/Markdown return language-specific
- [x] get_separators: unknown extension returns defaults
- [x] get_separators: all 22 known extensions have language-specific separators
## Old code reference
- `src/rag/mod.rs` — Rag struct and methods
- `src/config/request_context.rs` — use_rag, edit_rag_docs, rebuild_rag
@@ -0,0 +1,35 @@
# Test Plan: Tab Completion and Prompt
## Behaviors to test
### Tab completion (repl_complete)
- [ ] .role<TAB> → role names (no hidden files)
- [ ] .agent<TAB> → agent names (no .shared)
- [ ] .session<TAB> → session names
- [ ] .rag<TAB> → RAG names
- [ ] .macro<TAB> → macro names
- [ ] .model<TAB> → model names with descriptions
- [ ] .set <TAB> → setting keys (sorted)
- [ ] .set temperature <TAB> → current value suggestions
- [ ] .set enabled_tools <TAB> → tool names (no internal tools)
- [ ] .set enabled_mcp_servers <TAB> → configured servers + aliases
- [ ] .delete <TAB> → type names
- [ ] .vault <TAB> → subcommands
- [ ] .agent <name> <TAB> → session names for that agent
- [ ] Fuzzy filtering applied to all completions
### Prompt rendering
- [ ] Left prompt shows role/session/agent name
- [ ] Right prompt shows model name
- [ ] Prompt updates after scope transitions
- [ ] Multi-line indicator shown during ::: input
## Status
Most completion logic requires filesystem access for role/session/agent lists.
The `split_line` function has existing tests. Prompt rendering methods are trivial
wrappers around stored strings. Low additional unit-test yield.
## Old code reference
- `src/config/request_context.rs` — repl_complete
- `src/repl/completer.rs` — ReplCompleter (split_line already tested)
- `src/repl/prompt.rs` — ReplPrompt
@@ -0,0 +1,24 @@
# Test Plan: Macros
## Behaviors to test
- [ ] Macro loaded from YAML file (requires filesystem)
- [ ] Macro steps executed sequentially (requires async + RequestContext)
- [ ] Each step runs through run_repl_command (requires async)
- [x] Variable interpolation in macro steps
- [ ] Built-in macros installed on first run (requires filesystem)
- [ ] macro_execute creates isolated RequestContext (requires async)
- [ ] Macro context inherits tool scope from parent (requires async)
- [ ] Macro context has macro_flag set (requires async)
## Additional behaviors tested
- [x] resolve_variables: no variables, required provided, required missing errors
- [x] resolve_variables: default used, default overridden
- [x] resolve_variables: rest captures remaining args, rest with default
- [x] resolve_variables: multiple variables mixed
- [x] usage: no variables, required, optional, rest, rest+default, mixed
- [x] interpolate_command: single, multiple, no vars, missing var passthrough
- [x] YAML deserialization: with variables, with defaults, no variables
## Old code reference
- `src/config/macros.rs` — macro_execute, Macro struct
@@ -0,0 +1,25 @@
# Test Plan: Vault
## Behaviors to test
- [ ] Vault add stores encrypted secret (requires terminal + password file)
- [ ] Vault get decrypts and returns secret (requires password file)
- [ ] Vault update replaces secret value (requires terminal + password file)
- [ ] Vault delete removes secret (requires password file)
- [ ] Vault list shows all secret names (requires password file)
- [ ] Secrets interpolated in MCP config (mcp.json) (requires Vault with secrets)
- [ ] Missing secrets produce warning during MCP init (requires Vault)
- [x] Vault accessible from CLI (flag parsing tested in iteration 10)
- [ ] Vault accessible from REPL (.vault commands) (requires REPL infra)
## Additional behaviors tested
- [x] SECRET_RE matches {{DOUBLE_BRACES}}
- [x] SECRET_RE matches with surrounding text
- [x] SECRET_RE does not match {SINGLE_BRACES}
- [x] SECRET_RE does not match plain text
- [x] SECRET_RE matches with spaces inside braces
- [x] Vault::default() creates instance with no password file
## Old code reference
- `src/vault/mod.rs` — GlobalVault, operations
- `src/mcp/mod.rs` — interpolate_secrets
@@ -0,0 +1,57 @@
# Test Plan: Functions and Tools
## Behaviors to test
### Function declarations
- [ ] Functions::init loads from visible_tools config
- [ ] Tool declarations parsed from bash scripts (argc annotations)
- [ ] Tool declarations parsed from python scripts (docstrings)
- [ ] Tool declarations parsed from typescript (JSDoc + type inference)
- [ ] Each declaration has name, description, parameters
- [ ] Agent tools loaded via Functions::init_agent
- [ ] Global tools loaded via build_global_tool_declarations
### Tool compilation
- [ ] Bash tools compiled to bin directory
- [ ] Python tools compiled to bin directory
- [ ] TypeScript tools compiled to bin directory
- [ ] clear_agent_bin_dir removes old binaries
- [ ] Tool file priority: .sh > .py > .ts > .js
### User interaction functions
- [ ] append_user_interaction_functions adds user__ask/confirm/input/checkbox
- [ ] Only appended in REPL mode
- [ ] User interaction tools work at depth 0 (direct prompt)
- [ ] User interaction tools escalate at depth > 0
### MCP meta functions
- [ ] append_mcp_meta_functions adds invoke/search/describe per server
- [ ] Meta functions removed when ToolScope rebuilt without those servers
- [ ] Function names follow mcp_invoke_<server> pattern
### Function selection
- [ ] select_functions filters by role's enabled_tools
- [ ] "all" enables everything
- [ ] Specific tool names enabled selectively
- [ ] mapping_tools aliases resolved
- [ ] Agent functions included when agent active
- [ ] MCP meta functions included when servers active
## Status
- Function declarations, append methods, find/contains tested in iteration 6
- MCP meta functions tested in iterations 5-7
- Function selection tested in iteration 7
- User interaction functions tested in iterations 6-7
- Python parser: extensive existing tests (400+ lines)
- TypeScript parser: extensive existing tests (400+ lines)
- parsers::common::underscore tested in iteration 13
- Functions::init and tool compilation require filesystem
## Additional behaviors tested
- [x] parsers::common::underscore: simple, dashes, spaces, special chars, consecutive, leading/trailing, uppercase, mixed
## Old code reference
- `src/function/mod.rs` — Functions struct, init, init_agent
- `src/config/paths.rs` — agent_functions_file (priority)
- `src/parsers/` — bash, python, typescript parsers