docs: Documentation for the RESTful API POC
This commit is contained in:
@@ -0,0 +1,108 @@
|
||||
# Phase 1 QA — Test Implementation Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
Verify that all existing Loki behaviors are preserved after the
|
||||
Phase 1 refactoring (Config god-state → AppState + RequestContext
|
||||
split). Tests should validate behavior, not implementation details,
|
||||
unless a specific implementation pattern is fragile and needs
|
||||
regression protection.
|
||||
|
||||
## Reference codebases
|
||||
|
||||
- **Old code**: `~/code/testing/loki` (branch: `develop`)
|
||||
- **New code**: `~/code/loki` (branch: working branch with Phase 1)
|
||||
|
||||
## Process (per iteration)
|
||||
|
||||
1. Read the previous iteration's test implementation notes (if any)
|
||||
2. Read the test plan file for the current feature area
|
||||
3. Read the old code to identify the logic that creates those flows
|
||||
4. While reading old code:
|
||||
- Note additional behaviors not in the plan file → update the file
|
||||
- Note feature overlaps / context-switching scenarios → add tests
|
||||
5. Create unit/integration tests in the new code
|
||||
6. Ensure all tests pass
|
||||
7. Write test implementation notes for the iteration
|
||||
8. Pause for user approval before proceeding to next iteration
|
||||
|
||||
## Test philosophy
|
||||
|
||||
- **Behavior over implementation**: Test what the system DOES, not
|
||||
HOW it does it internally
|
||||
- **Exception**: If implementation logic is fragile and a slight
|
||||
change would break Loki, add an implementation-specific test
|
||||
- **No business logic changes**: Only modify non-test code if a
|
||||
genuine bug is discovered (old behavior missing in new code)
|
||||
- **Context switching**: Pay special attention to state transitions
|
||||
(role→agent, MCP-enabled→disabled, etc.)
|
||||
|
||||
## Test location
|
||||
|
||||
All new tests go in `tests/` directory as integration tests, or
|
||||
inline as `#[cfg(test)] mod tests` in the relevant source file,
|
||||
depending on what's being tested:
|
||||
|
||||
- **Unit tests** (pure logic, no I/O): inline in source file
|
||||
- **Integration tests** (multi-module, state transitions): `tests/`
|
||||
- **Behavior tests** (config parsing, tool resolution): can be either
|
||||
|
||||
## Feature areas (test plan files)
|
||||
|
||||
Each feature area has a plan file in `docs/testing/plans/`. The
|
||||
files are numbered for execution order (dependencies first):
|
||||
|
||||
| # | File | Feature area | Priority | Status |
|
||||
|---|---|---|---|---|
|
||||
| 01 | `01-config-and-appconfig.md` | Config loading, AppConfig fields, defaults | High | ✅ Iter 1-4 |
|
||||
| 02 | `02-roles.md` | Role loading, retrieval, role-likes, temp roles | High | ✅ Iter 1-4 |
|
||||
| 03 | `03-sessions.md` | Session create/load/save, compression, autoname | High | ✅ Iter 1-4 |
|
||||
| 04 | `04-agents.md` | Agent init, tool compilation, variables, lifecycle | Critical | ✅ Iter 1-4 |
|
||||
| 05 | `05-mcp-lifecycle.md` | MCP server start/stop, factory, runtime, scope transitions | Critical | ✅ Iter 5 |
|
||||
| 06 | `06-tool-evaluation.md` | eval_tool_calls, ToolCall dispatch, tool handlers | Critical | ✅ Iter 6 |
|
||||
| 07 | `07-input-construction.md` | Input::from_str, from_files, field capturing, function selection | High | ✅ Iter 7 |
|
||||
| 08 | `08-request-context.md` | RequestContext methods, scope transitions, state management | Critical | ✅ Iter 8 |
|
||||
| 09 | `09-repl-commands.md` | REPL command handlers, state assertions, argument parsing | High | ✅ Iter 9 |
|
||||
| 10 | `10-cli-flags.md` | CLI argument handling, mode switching, early exits | High | ✅ Iter 10 |
|
||||
| 11 | `11-sub-agent-spawning.md` | Supervisor, child agents, escalation, messaging | Critical | ✅ Iter 11 |
|
||||
| 12 | `12-rag.md` | RAG init/load/search, embeddings, document management | Medium | ✅ Iter 12 |
|
||||
| 13 | `13-completions-and-prompt.md` | Tab completion, prompt rendering, highlighter | Medium | ✅ Iter 13 |
|
||||
| 14 | `14-macros.md` | Macro loading, execution, variable interpolation | Medium | ✅ Iter 13 |
|
||||
| 15 | `15-vault.md` | Secret management, interpolation in MCP config | Medium | ✅ Iter 13 |
|
||||
| 16 | `16-functions-and-tools.md` | Function declarations, tool compilation, binaries | High | ✅ Iter 13 |
|
||||
|
||||
## Iteration tracking
|
||||
|
||||
Each completed iteration produces a notes file at:
|
||||
`docs/testing/notes/ITERATION-<N>-NOTES.md`
|
||||
|
||||
These notes contain:
|
||||
- Which plan file(s) were addressed
|
||||
- Tests created (file paths, test names)
|
||||
- Bugs discovered (if any)
|
||||
- Observations for future iterations
|
||||
- Updates made to other plan files
|
||||
|
||||
## Intentional improvements (NEW ≠ OLD)
|
||||
|
||||
These are behavioral changes that are intentional and should NOT
|
||||
be tested for old-code parity:
|
||||
|
||||
| # | What | Old | New |
|
||||
|---|---|---|---|
|
||||
| 1 | Agent list hides `.shared` | Shown | Hidden |
|
||||
| 2 | Tool file priority | Filesystem order | .sh > .py > .ts > .js |
|
||||
| 3 | MCP disabled + agent | Warning, continues | Error, blocks |
|
||||
| 4 | Role MCP warning | Always when mcp_support=false | Only when role has MCP |
|
||||
| 5 | Enabled tools completions | Shows internal tools | Hides user__/mcp_/todo__/agent__ |
|
||||
| 6 | MCP server completions | Only aliases | Configured servers + aliases |
|
||||
|
||||
## How to pick up in a new session
|
||||
|
||||
If context is lost (new chat session):
|
||||
|
||||
1. Read this file first
|
||||
2. Read the latest `docs/testing/notes/ITERATION-<N>-NOTES.md`
|
||||
3. That file tells you which plan file to work on next
|
||||
4. Read that plan file
|
||||
5. Follow the process above
|
||||
@@ -0,0 +1,52 @@
|
||||
# Iteration 1 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/01-config-and-appconfig.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
| File | Test name | What it verifies |
|
||||
|---|---|---|
|
||||
| `src/config/mod.rs` | `config_defaults_match_expected` | All Config::default() fields match old code values |
|
||||
| `src/config/app_config.rs` | `to_app_config_copies_serialized_fields` | to_app_config copies model_id, temperature, top_p, dry_run, stream, save, highlight, compression_threshold, rag_top_k |
|
||||
| `src/config/app_config.rs` | `to_app_config_copies_clients` | clients field populated (empty by default) |
|
||||
| `src/config/app_config.rs` | `to_app_config_copies_mapping_fields` | mapping_tools and mapping_mcp_servers copied correctly |
|
||||
| `src/config/app_config.rs` | `editor_returns_configured_value` | editor() returns configured value |
|
||||
| `src/config/app_config.rs` | `editor_falls_back_to_env` | editor() doesn't panic without config |
|
||||
| `src/config/app_config.rs` | `light_theme_default_is_false` | light_theme() default |
|
||||
| `src/config/app_config.rs` | `sync_models_url_has_default` | sync_models_url() has non-empty default |
|
||||
| `src/config/request_context.rs` | `to_request_context_creates_clean_state` | RequestContext starts with clean state (no role/session/agent, empty tool_scope, no agent_runtime) |
|
||||
| `src/config/request_context.rs` | `update_app_config_persists_changes` | Dynamic config updates via clone-mutate-replace persist |
|
||||
|
||||
**Total: 10 new tests (59 → 69)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None. The `save` default was `false` in both old and new code
|
||||
(my plan file incorrectly said `true` — corrected).
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. The `Config::default().save` is `false`, but the plan file
|
||||
01 incorrectly listed it as `true`. Plan file should be
|
||||
updated to reflect the actual default.
|
||||
|
||||
2. `AppConfig::default()` doesn't exist natively (no derive).
|
||||
Tests construct it via `Config::default().to_app_config()`.
|
||||
This is fine since that's how it's created in production.
|
||||
|
||||
3. The `visible_tools` field computation happens during
|
||||
`Config::init` (not `to_app_config`). Testing the full
|
||||
visible_tools resolution requires integration-level testing
|
||||
with actual tool files. Deferred to plan file 16
|
||||
(functions-and-tools).
|
||||
|
||||
4. Testing `Config::init` directly is difficult because it reads
|
||||
from the filesystem, starts MCP servers, etc. The unit tests
|
||||
focus on the conversion paths which are the Phase 1 surface.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 02: Roles — role loading, retrieve_role, use_role/exit_role,
|
||||
use_prompt, extract_role, one-shot role messages, MCP context switching.
|
||||
@@ -0,0 +1,86 @@
|
||||
# Iteration 10 — Test Implementation Notes
|
||||
|
||||
## Plan files addressed
|
||||
|
||||
- `docs/testing/plans/09-repl-commands.md` (completed in same session)
|
||||
- `docs/testing/plans/10-cli-flags.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/mod.rs (8 new tests — iteration 9)
|
||||
|
||||
AssertState::assert tests for all 4 variants + pass/bare.
|
||||
|
||||
### src/repl/mod.rs (31 new tests — iteration 9)
|
||||
|
||||
REPL_COMMANDS array validation, command state assertions for 13
|
||||
specific commands, parse_command edge cases, split_first_arg,
|
||||
ReplCommand::is_valid, multiline regex.
|
||||
|
||||
### src/cli/mod.rs (31 new tests — iteration 10)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `parse_no_args_defaults` | All flags default unset |
|
||||
| `parse_model_flag` | --model value |
|
||||
| `parse_model_short_flag` | -m value |
|
||||
| `parse_role_flag` | --role value |
|
||||
| `parse_session_with_name` | --session value |
|
||||
| `parse_agent_flag` | --agent value |
|
||||
| `parse_agent_short_flag` | -a value |
|
||||
| `parse_execute_flag` | -e flag |
|
||||
| `parse_code_flag` | -c flag |
|
||||
| `parse_no_stream_flag` | -S flag |
|
||||
| `parse_dry_run_flag` | --dry-run flag |
|
||||
| `parse_info_flag` | --info flag |
|
||||
| `parse_list_flags` | All 6 --list-* flags |
|
||||
| `parse_file_flag_single` | Single -f |
|
||||
| `parse_file_flag_multiple` | Multiple -f accumulate |
|
||||
| `parse_trailing_text` | Trailing args as text vec |
|
||||
| `parse_prompt_flag` | --prompt value |
|
||||
| `parse_empty_session_flag` | --empty-session flag |
|
||||
| `parse_save_session_flag` | --save-session flag |
|
||||
| `parse_build_tools_flag` | --build-tools flag |
|
||||
| `parse_sync_models_flag` | --sync-models flag |
|
||||
| `parse_model_with_role` | -m + -r combined |
|
||||
| `parse_agent_with_file_and_text` | -a + -f + text combined |
|
||||
| `parse_role_with_session` | -r + -s combined |
|
||||
| `cli_text_returns_none_when_no_text_no_stdin` | No input → None |
|
||||
| `cli_text_joins_trailing_args` | Args joined with spaces |
|
||||
| `parse_add_secret_flag` | --add-secret value |
|
||||
| `parse_get_secret_flag` | --get-secret value |
|
||||
| `parse_list_secrets_flag` | --list-secrets flag |
|
||||
| `parse_rag_flag` | --rag value |
|
||||
| `parse_macro_flag` | --macro value |
|
||||
|
||||
**Total: 70 new tests across iterations 9+10 (342 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **Clap parsing is fully testable**: Using `try_parse_from` with
|
||||
synthetic arg arrays, all flag parsing and combinations can be
|
||||
verified without running the actual binary.
|
||||
|
||||
2. **Cli::text() has stdin dependency**: When stdin is not a
|
||||
terminal, it reads from stdin. This branch can't be easily
|
||||
unit-tested. The terminal-detection branch (no stdin) is tested.
|
||||
|
||||
3. **Prelude is async + filesystem**: apply_prelude needs real role
|
||||
and session files. Deferred to integration tests.
|
||||
|
||||
4. **Mode selection is runtime behavior**: The actual mode branching
|
||||
(REPL vs CMD) happens in main.rs based on parsed flags. Testing
|
||||
the flag parsing verifies the inputs to that branching logic.
|
||||
|
||||
5. **Exclusive flags**: Vault flags (--add-secret, --get-secret,
|
||||
etc.) are marked `exclusive = true` in clap, meaning they
|
||||
can't be combined with other args. This is enforced by clap.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 11: Sub-Agent Spawning — supervisor, child agents,
|
||||
escalation, messaging.
|
||||
@@ -0,0 +1,159 @@
|
||||
# Iteration 11 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/11-sub-agent-spawning.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/supervisor/escalation.rs (11 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `queue_default_has_no_pending` | Default queue empty |
|
||||
| `submit_and_has_pending` | Submit makes has_pending true |
|
||||
| `submit_returns_id` | Returns the request's id |
|
||||
| `take_removes_request` | Take removes and empties queue |
|
||||
| `take_nonexistent_returns_none` | Missing id → None |
|
||||
| `pending_summary_contains_fields` | Summary has id, agent_id, question |
|
||||
| `pending_summary_includes_options_when_present` | Options included |
|
||||
| `pending_summary_empty_when_no_requests` | Empty queue → empty summary |
|
||||
| `reply_reaches_receiver` | oneshot channel delivers reply |
|
||||
| `new_escalation_id_has_prefix` | Starts with "esc_" |
|
||||
| `new_escalation_id_unique` | Two calls produce different ids |
|
||||
|
||||
### src/supervisor/mailbox.rs (8 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `inbox_new_is_empty` | New inbox drains empty |
|
||||
| `inbox_default_is_empty` | Default inbox drains empty |
|
||||
| `deliver_and_drain` | Deliver + drain returns message |
|
||||
| `drain_empties_inbox` | Second drain returns empty |
|
||||
| `drain_orders_shutdown_before_task_before_text` | Priority ordering |
|
||||
| `clone_preserves_messages` | Clone has same messages |
|
||||
| `clone_is_independent` | Clone doesn't share mutations |
|
||||
| `multiple_deliveries` | 5 messages all drained |
|
||||
|
||||
### src/supervisor/mod.rs (12 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `supervisor_new_empty` | Initial state: 0 active, correct limits |
|
||||
| `supervisor_register_increments_count` | Register increases active_count |
|
||||
| `supervisor_register_rejects_at_capacity` | At max → error with "at capacity" |
|
||||
| `supervisor_register_rejects_exceeding_depth` | Over max_depth → error |
|
||||
| `supervisor_register_allows_at_max_depth` | Exactly max_depth → ok |
|
||||
| `supervisor_take_removes_handle` | Take decrements count |
|
||||
| `supervisor_take_nonexistent_returns_none` | Missing → None |
|
||||
| `supervisor_list_agents` | Lists all registered agent ids/names |
|
||||
| `supervisor_inbox_returns_handle_inbox` | Inbox accessor works |
|
||||
| `supervisor_task_queue_accessible` | task_queue/task_queue_mut work |
|
||||
| `agent_exit_status_equality` | Completed == Completed, != Failed |
|
||||
|
||||
### src/supervisor/taskqueue.rs (10 new tests, 16 total)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `test_fail_sets_status` | fail() sets TaskStatus::Failed |
|
||||
| `test_get_returns_none_for_missing` | get() on nonexistent → None |
|
||||
| `test_dispatch_agent_stored` | dispatch_agent and prompt captured |
|
||||
| `test_claim_blocked_task_fails` | Can't claim blocked task |
|
||||
| `test_list_sorted_by_id` | list() returns numeric order |
|
||||
| `test_default_is_empty` | TaskQueue::default() empty |
|
||||
| `test_dependency_on_nonexistent_task_errors` | Bad dep → error |
|
||||
| `test_complete_nonexistent_returns_empty` | Complete unknown → empty |
|
||||
| `test_task_node_is_runnable` | Pending + unblocked = runnable |
|
||||
| `test_task_node_not_runnable_when_blocked` | Blocked = not runnable |
|
||||
|
||||
### src/function/supervisor.rs (36 new handler integration tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `handle_list_empty_supervisor` | Empty supervisor → 0 active, empty agents |
|
||||
| `handle_list_with_agents` | Registered agents appear in list |
|
||||
| `handle_list_no_supervisor_errors` | No supervisor → error |
|
||||
| `handle_check_unknown_agent` | Check unknown → error status |
|
||||
| `handle_check_pending_agent` | Check running agent → pending status |
|
||||
| `handle_cancel_registered_agent` | Cancel removes and signals abort |
|
||||
| `handle_cancel_unknown_agent` | Cancel unknown → error status |
|
||||
| `handle_cancel_no_supervisor_errors` | No supervisor → error |
|
||||
| `handle_send_message_to_registered_agent` | Message delivered to inbox |
|
||||
| `handle_send_message_to_unknown_agent` | Unknown agent → error status |
|
||||
| `handle_check_inbox_with_messages` | Inbox drains messages with count |
|
||||
| `handle_check_inbox_no_inbox` | No inbox → count 0 |
|
||||
| `handle_check_inbox_empty_inbox` | Empty inbox → count 0 |
|
||||
| `handle_reply_escalation_success` | Reply delivered via oneshot |
|
||||
| `handle_reply_escalation_missing_id` | Missing id → error status |
|
||||
| `handle_reply_escalation_no_queue_errors` | No queue → error |
|
||||
| `handle_task_create_simple` | Simple task created with id |
|
||||
| `handle_task_create_with_dependencies` | Task with blocked_by |
|
||||
| `handle_task_create_with_dispatch_agent` | Auto-dispatch flag set |
|
||||
| `handle_task_create_agent_without_prompt_errors` | Agent without prompt → error |
|
||||
| `handle_task_list_empty` | Empty queue → empty tasks array |
|
||||
| `handle_task_list_with_tasks` | Tasks listed |
|
||||
| `handle_task_complete_unblocks_dependents` | Complete unblocks with newly_runnable |
|
||||
| `handle_task_fail_marks_failed` | Fail sets status |
|
||||
| `handle_task_fail_reports_blocked_dependents` | Reports blocked deps |
|
||||
| `handle_task_fail_missing_task` | Missing task → error status |
|
||||
| `dispatch_unknown_action_errors` | Unknown action → error |
|
||||
| `dispatch_routes_list` | agent__list → handle_list |
|
||||
| `dispatch_routes_task_list` | agent__task_list → handle_task_list |
|
||||
| `new_for_child_inherits_escalation_queue` | Shared Arc |
|
||||
| `new_for_child_sets_depth_and_id` | Depth and self_agent_id |
|
||||
| `new_for_child_has_inbox` | Shared inbox Arc |
|
||||
| `new_for_child_inherits_parent_supervisor` | parent_supervisor set |
|
||||
| `new_for_child_starts_with_empty_scope` | Empty functions, mcp, role, session |
|
||||
| `ensure_root_escalation_queue_creates_on_first_call` | Lazy init |
|
||||
| `ensure_root_escalation_queue_returns_same_on_second_call` | Same Arc |
|
||||
|
||||
### Infrastructure
|
||||
|
||||
- Added `AppState::test_default()` method for cross-module test construction
|
||||
- Refactored `input.rs` and `request_context.rs` test helpers to use `test_default()`
|
||||
|
||||
**Total: 76 new tests (418 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **Supervisor.register enforces both capacity and depth**: These
|
||||
are the two runaway safeguards. Both tested at boundaries
|
||||
(at capacity, at max_depth, over max_depth).
|
||||
|
||||
2. **EscalationQueue uses oneshot channels**: The reply_tx/rx pair
|
||||
enables async blocking-wait semantics for child agents. The
|
||||
channel delivery is verified end-to-end in the test.
|
||||
|
||||
3. **Inbox drain ordering is a priority system**: Shutdown messages
|
||||
come first, then task completions, then text. This ensures
|
||||
lifecycle-critical messages aren't buried under chat.
|
||||
|
||||
4. **AgentHandle requires a tokio JoinHandle**: Creating test
|
||||
handles requires a tokio runtime. Used `rt.spawn()` with
|
||||
`mem::forget(rt)` to keep the handle alive. This is a test-only
|
||||
pattern — not ideal but necessary since JoinHandle can't be
|
||||
mocked.
|
||||
|
||||
5. **handle_spawn requires real agent config on disk**: This is the
|
||||
only handler that calls Agent::init. All other handlers (list,
|
||||
check, cancel, messaging, tasks, escalation) work with just a
|
||||
RequestContext + Supervisor, which we can construct in tests.
|
||||
|
||||
6. **Handler integration tests cover the full dispatch chain**: The
|
||||
tests call handler functions with real RequestContext instances
|
||||
containing real Supervisor/EscalationQueue/Inbox instances. This
|
||||
verifies the JSON arg parsing, supervisor interactions, and
|
||||
response formatting all at once.
|
||||
|
||||
7. **AppState::test_default() centralizes test construction**: Added
|
||||
a `#[cfg(test)]` constructor that avoids importing private
|
||||
modules (mcp_factory, rag_cache) from outside the config module.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 12: RAG — RAG init/load/search, embeddings, document
|
||||
management.
|
||||
@@ -0,0 +1,71 @@
|
||||
# Iteration 12 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/12-rag.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/rag/mod.rs (22 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `document_id_round_trip` | new(5,17) → split → (5,17) |
|
||||
| `document_id_zero_zero` | new(0,0) → split → (0,0) |
|
||||
| `document_id_large_values` | new(1000,9999) round-trips |
|
||||
| `document_id_debug_format` | Debug produces "3-7" format |
|
||||
| `document_id_equality` | Same file+doc → equal |
|
||||
| `document_id_inequality` | Different doc → not equal |
|
||||
| `document_id_ordering` | (0,1) < (1,0) |
|
||||
| `rag_document_new` | Sets page_content, empty metadata |
|
||||
| `rag_document_default` | Empty content and metadata |
|
||||
| `rag_data_new_defaults` | All fields set correctly |
|
||||
| `rag_data_get_returns_document` | Gets by file+doc index |
|
||||
| `rag_data_get_returns_none_for_missing_file` | Missing file → None |
|
||||
| `rag_data_get_returns_none_for_missing_document` | Missing doc index → None |
|
||||
| `rag_data_del_removes_files_and_vectors` | Del removes both |
|
||||
| `rag_data_del_nonexistent_is_noop` | Del missing → noop |
|
||||
| `rag_data_add_inserts_files_and_vectors` | Add inserts files+vectors, updates next_file_id |
|
||||
| `rag_template_contains_placeholders` | __CONTEXT__, __SOURCES__, __INPUT__ present |
|
||||
| `get_separators_returns_language_specific` | rs/py/md have language separators |
|
||||
| `get_separators_unknown_returns_defaults` | xyz → DEFAULT_SEPARATORS |
|
||||
| `get_separators_all_known_extensions` | All 22 known extensions differ from defaults |
|
||||
| `rag_data_build_bm25_empty` | Empty data → no search results |
|
||||
| `rag_data_build_bm25_finds_documents` | BM25 finds "rust" in first doc |
|
||||
|
||||
**Total: 22 new tests (440 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **Rag struct can't be constructed without an embedding model**:
|
||||
Rag::init requires prompting the user for model selection,
|
||||
Rag::load requires a YAML file on disk, and Rag::create
|
||||
requires pre-built RagData with vectors. All RAG lifecycle
|
||||
operations are I/O-bound.
|
||||
|
||||
2. **DocumentId uses bit packing**: file_index in the upper half,
|
||||
document_index in the lower half of a usize. This is tested
|
||||
with round-trip, zero, and large-value cases.
|
||||
|
||||
3. **RagData operations (get/del/add) are fully testable**: These
|
||||
are pure data structure operations that don't need I/O. The
|
||||
BM25 search engine can also be built and queried in tests.
|
||||
|
||||
4. **The text splitter already has comprehensive tests**: 5 existing
|
||||
tests cover split_text, create_documents, chunk headers,
|
||||
markdown splitting, and HTML splitting. No additional splitter
|
||||
tests needed.
|
||||
|
||||
5. **get_separators covers 22 language extensions**: All are
|
||||
verified to return language-specific separators rather than
|
||||
defaults. This ensures the splitter uses appropriate chunk
|
||||
boundaries for each language.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 13: Completions and Prompt — tab completion, prompt
|
||||
rendering, highlighter.
|
||||
@@ -0,0 +1,107 @@
|
||||
# Iteration 13 — Test Implementation Notes
|
||||
|
||||
## Plan files addressed
|
||||
|
||||
- `docs/testing/plans/12-rag.md` (completed in same session)
|
||||
- `docs/testing/plans/13-completions-and-prompt.md`
|
||||
- `docs/testing/plans/14-macros.md`
|
||||
- `docs/testing/plans/15-vault.md`
|
||||
- `docs/testing/plans/16-functions-and-tools.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/rag/mod.rs (22 new tests — iteration 12)
|
||||
|
||||
DocumentId round-trip/equality/ordering/debug, RagDocument new/default,
|
||||
RagData new/get/del/add/build_bm25, RAG_TEMPLATE placeholders,
|
||||
get_separators language mapping.
|
||||
|
||||
### src/config/macros.rs (21 new tests — iteration 13)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `resolve_no_variables` | Empty vars → empty output |
|
||||
| `resolve_required_variable_provided` | Arg maps to variable |
|
||||
| `resolve_required_variable_missing_errors` | Missing required → error |
|
||||
| `resolve_default_variable_uses_default` | Default used when no arg |
|
||||
| `resolve_default_variable_overridden` | Arg overrides default |
|
||||
| `resolve_rest_variable_captures_all_remaining` | Rest joins remaining args |
|
||||
| `resolve_rest_variable_with_default` | Rest default used |
|
||||
| `resolve_multiple_variables` | Mixed required + default |
|
||||
| `usage_no_variables` | Just macro name |
|
||||
| `usage_required_variable` | <name> format |
|
||||
| `usage_optional_variable` | [name] format |
|
||||
| `usage_rest_variable` | <name>... format |
|
||||
| `usage_rest_with_default` | [name]... format |
|
||||
| `usage_mixed_variables` | Mixed format |
|
||||
| `interpolate_replaces_variables` | {{name}} → value |
|
||||
| `interpolate_multiple_variables` | Multiple replacements |
|
||||
| `interpolate_no_variables_passthrough` | No vars → unchanged |
|
||||
| `interpolate_variable_not_found_left_as_is` | Missing var → {{name}} kept |
|
||||
| `deserialize_macro_from_yaml` | Full YAML with steps + variables |
|
||||
| `deserialize_macro_with_defaults` | Variables with defaults + rest |
|
||||
| `deserialize_macro_no_variables` | Steps only, empty vars default |
|
||||
|
||||
### src/vault/mod.rs (6 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `secret_re_matches_double_braces` | {{MY_SECRET}} captured |
|
||||
| `secret_re_matches_with_surrounding_text` | Captures in context |
|
||||
| `secret_re_no_match_single_braces` | {NOT} not matched |
|
||||
| `secret_re_no_match_plain_text` | No match for plain text |
|
||||
| `secret_re_matches_with_spaces` | {{ SPACED }} captured |
|
||||
| `vault_default_creates_instance` | Default has no password file |
|
||||
|
||||
### src/parsers/common.rs (8 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `underscore_simple` | No-op for simple names |
|
||||
| `underscore_dashes_to_underscores` | my-func → my_func |
|
||||
| `underscore_spaces_to_underscores` | my func → my_func |
|
||||
| `underscore_special_chars_removed` | @! → _ |
|
||||
| `underscore_consecutive_specials_collapsed` | --- → single _ |
|
||||
| `underscore_leading_trailing_stripped` | -name- → name |
|
||||
| `underscore_uppercase_lowered` | MyFunc → myfunc |
|
||||
| `underscore_mixed` | Get-User Info → get_user_info |
|
||||
|
||||
**Total: 57 new tests across iterations 12+13 (475 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations
|
||||
|
||||
1. **Macro::resolve_variables has 3 variable modes**: required
|
||||
(no default), optional (with default), and rest (captures
|
||||
remaining args). All three modes tested with multiple
|
||||
combinations.
|
||||
|
||||
2. **Macro::interpolate_command is a simple string replacement**:
|
||||
{{key}} → value. Missing keys are left as-is (no error),
|
||||
which is the correct behavior for gradual interpolation.
|
||||
|
||||
3. **SECRET_RE uses fancy_regex**: The `{{(.+)}}` pattern requires
|
||||
double braces. Single braces don't match, which prevents false
|
||||
positives on JSON-like content.
|
||||
|
||||
4. **Vault operations all require terminal interaction or password
|
||||
file**: add_secret and update_secret prompt for passwords via
|
||||
inquire. get_secret/delete_secret/list_secrets need a tokio
|
||||
runtime + password file. These are integration-test territory.
|
||||
|
||||
5. **parsers::common::underscore is more than s/-/_/**: It lowercases,
|
||||
replaces all non-alphanumeric chars with _, collapses consecutive
|
||||
underscores, and strips leading/trailing underscores. Thorough
|
||||
edge cases tested.
|
||||
|
||||
6. **Python and TypeScript parsers have excellent existing test
|
||||
suites**: ~400 lines of tests each covering declaration parsing,
|
||||
type inference, docstring extraction. No additional tests needed.
|
||||
|
||||
## Final summary
|
||||
|
||||
All 16 plan files have been addressed across iterations 1-13.
|
||||
475 total tests, all passing, 0 errors.
|
||||
@@ -0,0 +1,100 @@
|
||||
# Iteration 14 — Integration Test Implementation Notes
|
||||
|
||||
## Focus
|
||||
|
||||
Filesystem-based integration tests (Tier 1 + Tier 2) for behaviors
|
||||
that were previously untestable without real config directories.
|
||||
|
||||
## Infrastructure changes
|
||||
|
||||
1. **Added `serial_test` dev-dependency** — Env-var-based config dir
|
||||
isolation (`TestConfigDirGuard`) requires serialization to prevent
|
||||
parallel test races. All 25 tests using `TestConfigDirGuard` now
|
||||
use `#[serial]`.
|
||||
|
||||
2. **Added `src/test_helpers.rs`** — Shared test utilities module
|
||||
(`#[cfg(test)]`) with `TestConfigDirGuard`, `default_app_state`,
|
||||
`create_test_ctx`, and `run_async` helpers, available to all
|
||||
modules. Not yet used by all modules (existing module-local
|
||||
helpers kept for backward compatibility).
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/request_context.rs (17 new integration tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `retrieve_role_from_markdown_file` | Writes .md file, retrieves role with correct name/prompt |
|
||||
| `retrieve_role_builtin_exists` | Built-in roles retrievable |
|
||||
| `retrieve_role_nonexistent_errors` | Unknown role → error |
|
||||
| `retrieve_role_no_model_id_inherits_current_model` | No model_id → uses current model |
|
||||
| `list_roles_finds_markdown_files` | .md files listed, .txt ignored |
|
||||
| `list_roles_empty_dir` | Empty roles dir → empty list |
|
||||
| `session_new_from_ctx_captures_state` | Name captured, starts empty |
|
||||
| `session_save_creates_file` | Save creates YAML file on disk |
|
||||
| `use_session_errors_when_already_in_session` | Double session → error |
|
||||
| `use_session_creates_temp_session` | None → temp session |
|
||||
| `use_session_creates_named_session` | Name → named session |
|
||||
| `exit_session_roundtrip` | use_session → exit_session → None |
|
||||
| `use_role_obj_and_exit_role_full_cycle` | Set role → exit → None |
|
||||
| `use_role_obj_twice_replaces_role` | Second role replaces first |
|
||||
| `list_macros_finds_yaml_files` | .yaml macro files listed |
|
||||
| `list_rags_finds_yaml_files` | .yaml RAG files listed |
|
||||
| `list_rags_empty_dir` | Empty RAGs dir → empty list |
|
||||
|
||||
### src/config/input.rs (5 new integration tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `from_files_loads_single_text_file` | File content + text combined |
|
||||
| `from_files_loads_multiple_files` | Multiple files all loaded |
|
||||
| `from_files_with_no_paths_just_text` | No files → just text |
|
||||
| `from_files_with_external_command` | Backtick command executed |
|
||||
| `from_files_nonexistent_file_errors` | Missing file → error |
|
||||
|
||||
### Serialization fixes (6 existing tests)
|
||||
|
||||
Added `#[serial]` to all `rebuild_tool_scope_*` tests to prevent
|
||||
env-var race conditions with filesystem integration tests.
|
||||
|
||||
**Total: 22 new tests (497 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
1. **Test parallelism race condition with env vars**: The
|
||||
`TestConfigDirGuard` sets a process-global env var. When tests
|
||||
run in parallel, two guards stomp each other's values. Fixed
|
||||
by adding `serial_test` crate and `#[serial]` attribute to all
|
||||
filesystem-dependent tests.
|
||||
|
||||
## Observations
|
||||
|
||||
1. **Session loading from disk requires Model::retrieve_model**:
|
||||
`Session::load_from_ctx` calls `Model::retrieve_model` to
|
||||
resolve the session's model_id. Without a valid model provider
|
||||
config, this fails. Session loading tests are limited to
|
||||
`new_from_ctx` (creation) and `save` (serialization).
|
||||
|
||||
2. **use_session with empty session prompts user**: The Confirm
|
||||
dialog for "incorporate last Q&A?" requires terminal interaction.
|
||||
Tests avoid this by: (a) having no last_message, or (b) using
|
||||
named sessions that already exist on disk.
|
||||
|
||||
3. **Input::from_files with external commands works**: The backtick
|
||||
syntax (`\`echo hello\``) actually runs the command and captures
|
||||
output. This is a real integration test — it runs `/bin/echo`.
|
||||
|
||||
4. **Vault CRUD was skipped**: Vault operations require a password
|
||||
file with actual encrypted content via the `gman` crate's
|
||||
`LocalProvider`. The `add_secret` method also prompts for a
|
||||
password via `inquire`. Testing vault requires either mocking
|
||||
the terminal or using `LocalProvider` directly with a pre-created
|
||||
password file — deferred to a future iteration.
|
||||
|
||||
## Final counts
|
||||
|
||||
| Category | Tests |
|
||||
|---|---|
|
||||
| Unit tests (iterations 1-13) | 475 |
|
||||
| Integration tests (iteration 14) | 22 |
|
||||
| **Total** | **497** |
|
||||
@@ -0,0 +1,71 @@
|
||||
# Iteration 2 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/02-roles.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/role.rs (12 new tests, 15 total)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `role_new_parses_prompt` | Role::new extracts prompt text |
|
||||
| `role_new_parses_metadata` | Metadata block parses model, temperature, top_p |
|
||||
| `role_new_parses_enabled_tools` | enabled_tools from metadata |
|
||||
| `role_new_parses_enabled_mcp_servers` | enabled_mcp_servers from metadata |
|
||||
| `role_new_no_metadata_has_none_fields` | No metadata → all optional fields None |
|
||||
| `role_builtin_shell_loads` | Built-in "shell" role loads |
|
||||
| `role_builtin_code_loads` | Built-in "code" role loads |
|
||||
| `role_builtin_nonexistent_errors` | Non-existent built-in → error |
|
||||
| `role_default_has_empty_fields` | Default role has empty name/prompt |
|
||||
| `role_set_model_updates_model` | set_model() changes the model |
|
||||
| `role_set_temperature_works` | set_temperature() changes temperature |
|
||||
| `role_export_includes_metadata` | export() includes metadata and prompt |
|
||||
|
||||
### src/config/request_context.rs (5 new tests, 7 total)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `use_role_obj_sets_role` | use_role_obj sets role on ctx |
|
||||
| `exit_role_clears_role` | exit_role clears role from ctx |
|
||||
| `use_prompt_creates_temp_role` | use_prompt creates TEMP_ROLE_NAME role |
|
||||
| `extract_role_returns_standalone_role` | extract_role returns active role |
|
||||
| `extract_role_returns_default_when_nothing_active` | extract_role returns default role |
|
||||
|
||||
**Total: 17 new tests (69 → 86)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None. Role parsing behavior matches between old and new code.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. `retrieve_role` (which calls `Model::retrieve_model`) can't be
|
||||
easily unit-tested without a real client config. It depends on
|
||||
having at least one configured client. Deferred to integration
|
||||
testing or plan 08 (RequestContext scope transitions).
|
||||
|
||||
2. The `use_role` async method (which calls `rebuild_tool_scope`)
|
||||
requires async test runtime and MCP infrastructure. Deferred to
|
||||
plan 05 (MCP lifecycle) and 08 (RequestContext).
|
||||
|
||||
3. `use_role_obj` correctly rejects when agent is active — tested
|
||||
implicitly through the error path, but creating a mock Agent
|
||||
is complex. Noted for plan 04 (agents).
|
||||
|
||||
4. The `extract_role` priority order (session > agent > role > default)
|
||||
is important behavioral contract. Tests verify the role and
|
||||
default cases. Session and agent cases deferred to plans 03, 04.
|
||||
|
||||
5. Added `create_test_ctx()` helper to request_context.rs tests.
|
||||
Future iterations should reuse this.
|
||||
|
||||
## Plan file updates
|
||||
|
||||
Updated 02-roles.md to mark completed items.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 03: Sessions — session create/load/save, compression,
|
||||
autoname, carry-over, exit, context switching.
|
||||
@@ -0,0 +1,76 @@
|
||||
# Iteration 3 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/03-sessions.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/session.rs (15 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `session_default_is_empty` | Default session is empty, no name, no role, not dirty |
|
||||
| `session_new_from_ctx_captures_save_session` | new_from_ctx captures name, empty, not dirty |
|
||||
| `session_set_role_captures_role_info` | set_role copies model_id, temperature, role_name, marks dirty |
|
||||
| `session_clear_role` | clear_role removes role_name |
|
||||
| `session_guard_empty_passes_when_empty` | guard_empty OK when empty |
|
||||
| `session_needs_compression_threshold` | Empty session doesn't need compression |
|
||||
| `session_needs_compression_returns_false_when_compressing` | Already compressing → false |
|
||||
| `session_needs_compression_returns_false_when_threshold_zero` | Zero threshold → false |
|
||||
| `session_set_compressing_flag` | set_compressing toggles flag |
|
||||
| `session_set_save_session_this_time` | Doesn't panic |
|
||||
| `session_save_session_returns_configured_value` | save_session get/set roundtrip |
|
||||
| `session_compress_moves_messages` | compress moves messages to compressed, adds system |
|
||||
| `session_is_not_empty_after_compress` | Session with compressed messages is not empty |
|
||||
| `session_need_autoname_default_false` | Default session doesn't need autoname |
|
||||
| `session_set_autonaming_doesnt_panic` | set_autonaming safe without autoname |
|
||||
|
||||
### src/config/request_context.rs (4 new tests, 11 total)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `exit_session_clears_session` | exit_session removes session from ctx |
|
||||
| `empty_session_clears_messages` | empty_session keeps session but clears it |
|
||||
| `maybe_compress_session_returns_false_when_no_session` | No session → no compression |
|
||||
| `maybe_autoname_session_returns_false_when_no_session` | No session → no autoname |
|
||||
|
||||
**Total: 19 new tests (86 → 105)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None. Session behavior matches between old and new code.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. `Session::new_from_ctx` and `Session::load_from_ctx` have
|
||||
`#[allow(dead_code)]` annotations — they were bridge methods.
|
||||
Should verify if they're still needed or if the old `Session::new`
|
||||
and `Session::load` (which take `&Config`) should be cleaned up
|
||||
in a future pass.
|
||||
|
||||
2. The `compress` method moves messages to `compressed_messages` and
|
||||
adds a single system message with the summary. This is a critical
|
||||
behavioral contract — if the summary format changes, sessions
|
||||
could break.
|
||||
|
||||
3. `needs_compression` uses `self.compression_threshold` (session-
|
||||
level) with fallback to the global threshold. This priority
|
||||
(session > global) is important behavior.
|
||||
|
||||
4. Session carry-over (the "incorporate last Q&A?" prompt) happens
|
||||
inside `use_session` which is async and involves user interaction
|
||||
(inquire::Confirm). Can't unit test this — needs integration test
|
||||
or manual verification.
|
||||
|
||||
5. The `extract_role` test for session-active case should verify that
|
||||
`session.to_role()` is returned. Added note to plan 02.
|
||||
|
||||
## Plan file updates
|
||||
|
||||
Updated 03-sessions.md to mark completed items.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 04: Agents — agent init, tool compilation, variables,
|
||||
lifecycle, MCP, RAG, auto-continuation.
|
||||
@@ -0,0 +1,71 @@
|
||||
# Iteration 4 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/04-agents.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/agent.rs (4 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `agent_config_parses_from_yaml` | Full AgentConfig YAML with all fields |
|
||||
| `agent_config_defaults` | Minimal AgentConfig gets correct defaults |
|
||||
| `agent_config_with_model` | model_id, temperature, top_p from YAML |
|
||||
| `agent_config_inject_defaults_true` | inject_todo/spawn_instructions default true |
|
||||
|
||||
### src/config/agent_runtime.rs (2 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `agent_runtime_new_defaults` | All fields default correctly |
|
||||
| `agent_runtime_builder_pattern` | with_depth, with_parent_supervisor work |
|
||||
|
||||
### src/config/request_context.rs (6 new tests, 17 total)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `exit_agent_clears_all_agent_state` | exit_agent clears agent, agent_runtime, rag |
|
||||
| `current_depth_returns_zero_without_agent` | Default depth is 0 |
|
||||
| `current_depth_returns_agent_runtime_depth` | Depth from agent_runtime |
|
||||
| `supervisor_returns_none_without_agent` | No agent → no supervisor |
|
||||
| `inbox_returns_none_without_agent` | No agent → no inbox |
|
||||
| `root_escalation_queue_returns_none_without_agent` | No agent → no queue |
|
||||
|
||||
**Total: 12 new tests (105 → 117)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. `Agent::init` can't be unit tested easily — requires agent config
|
||||
files, tool files on disk. Integration tests with temp directories
|
||||
would be needed for full coverage.
|
||||
|
||||
2. AgentConfig default values verified:
|
||||
- `max_concurrent_agents` = 4
|
||||
- `max_agent_depth` = 3
|
||||
- `max_auto_continues` = 10
|
||||
- `inject_todo_instructions` = true
|
||||
- `inject_spawn_instructions` = true
|
||||
These are important behavioral contracts.
|
||||
|
||||
3. The `exit_agent` test shows that clearing agent state also
|
||||
rebuilds the tool_scope with fresh functions. This is the
|
||||
correct behavior for returning to the global context.
|
||||
|
||||
4. Agent variable interpolation (special vars like __os__, __cwd__)
|
||||
happens in Agent::init which is filesystem-dependent. Deferred.
|
||||
|
||||
5. `list_agents()` (which filters hidden dirs) is tested via the
|
||||
`.shared` exclusion noted in improvements. Could add a unit test
|
||||
with a temp dir if needed.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 05: MCP Lifecycle — the most critical test area. McpFactory,
|
||||
McpRuntime, spawn_mcp_server, rebuild_tool_scope MCP integration,
|
||||
scope transition MCP behavior.
|
||||
@@ -0,0 +1,129 @@
|
||||
# Iteration 5 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/05-mcp-lifecycle.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/mcp_factory.rs (12 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `key_from_stdio_spec_captures_command_args_env` | McpServerKey extracts command, args, env from stdio spec |
|
||||
| `key_from_stdio_spec_sorts_args_and_env` | Args and env are sorted for deterministic key hashing |
|
||||
| `key_from_stdio_spec_defaults_empty_when_none` | None args/env default to empty vecs |
|
||||
| `key_from_remote_http_spec` | Http transport key captures url and transport type |
|
||||
| `key_from_remote_sse_spec_with_sorted_headers` | SSE headers sorted for deterministic keys |
|
||||
| `key_equality_same_spec_produces_equal_keys` | Same spec → equal keys (sharing contract) |
|
||||
| `key_inequality_different_names` | Different server names → different keys |
|
||||
| `key_inequality_different_commands` | Different commands → different keys (isolation contract) |
|
||||
| `key_env_bool_and_int_coerce_to_string` | JsonField::Bool/Int coerced to String in key |
|
||||
| `factory_try_get_active_returns_none_when_empty` | Empty factory returns None |
|
||||
| `factory_try_get_active_returns_none_for_unknown_key` | Unknown key returns None |
|
||||
| `factory_default_has_empty_active_map` | Default factory has empty internal map |
|
||||
|
||||
### src/config/tool_scope.rs (6 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `mcp_runtime_new_is_empty` | New McpRuntime has no servers |
|
||||
| `mcp_runtime_default_is_empty` | Default McpRuntime is empty |
|
||||
| `mcp_runtime_get_returns_none_for_missing_server` | get() on nonexistent server returns None |
|
||||
| `tool_scope_default_has_empty_mcp_runtime` | Default ToolScope has empty MCP runtime |
|
||||
| `tool_scope_default_has_empty_functions` | Default ToolScope has no functions |
|
||||
| `tool_scope_default_tracker_has_no_loops` | Default ToolScope tracker detects no loops |
|
||||
|
||||
### src/mcp/mod.rs (30 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `validate_stdio_with_command_succeeds` | Valid stdio spec passes |
|
||||
| `validate_stdio_missing_command_fails` | Stdio without command is rejected |
|
||||
| `validate_stdio_with_url_fails` | Stdio with url (remote field) is rejected |
|
||||
| `validate_stdio_with_headers_fails` | Stdio with headers (remote field) is rejected |
|
||||
| `validate_http_with_url_succeeds` | Valid http spec passes |
|
||||
| `validate_http_missing_url_fails` | Http without url is rejected |
|
||||
| `validate_http_with_command_fails` | Http with command (stdio field) is rejected |
|
||||
| `validate_http_with_args_fails` | Http with args (stdio field) is rejected |
|
||||
| `validate_http_with_cwd_fails` | Http with cwd (stdio field) is rejected |
|
||||
| `validate_sse_with_url_succeeds` | Valid SSE spec passes |
|
||||
| `validate_sse_missing_url_fails` | SSE without url is rejected |
|
||||
| `is_remote_true_for_http_and_sse` | Http and SSE are remote transports |
|
||||
| `is_remote_false_for_stdio` | Stdio is not remote |
|
||||
| `deserialize_stdio_server_from_json` | Full stdio spec from JSON |
|
||||
| `deserialize_http_server_from_json` | Http spec with headers from JSON |
|
||||
| `deserialize_env_with_mixed_types` | Env with String, Bool, Int values |
|
||||
| `deserialize_multiple_servers` | Multiple server entries parsed |
|
||||
| `deserialize_empty_servers_map` | Empty mcpServers map parsed |
|
||||
| `deserialize_server_with_cwd` | cwd field parsed correctly |
|
||||
| `resolve_all_returns_all_configured_servers` | "all" resolves to all config keys |
|
||||
| `resolve_comma_separated_returns_matching_servers` | Comma-separated list filters correctly |
|
||||
| `resolve_single_server_name` | Single name resolved |
|
||||
| `resolve_none_returns_empty` | None enabled → empty list |
|
||||
| `resolve_no_config_returns_empty` | No config → empty list |
|
||||
| `resolve_nonexistent_server_filtered_out` | Unknown names silently filtered |
|
||||
| `resolve_all_nonexistent_returns_empty` | All unknown → empty list |
|
||||
| `resolve_trims_whitespace` | Whitespace in comma list trimmed |
|
||||
| `registry_default_is_empty` | Default registry: empty, no config, no log |
|
||||
| `registry_with_config_reports_config` | Config accessor works |
|
||||
| `meta_function_prefixes_are_correct` | mcp_invoke/search/describe prefixes |
|
||||
|
||||
### src/config/request_context.rs (6 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `rebuild_tool_scope_mcp_disabled_skips_servers` | mcp_server_support=false → empty runtime |
|
||||
| `rebuild_tool_scope_no_enabled_servers_yields_empty_runtime` | None enabled → empty runtime |
|
||||
| `rebuild_tool_scope_no_mcp_config_yields_empty_runtime` | No mcp_config → empty runtime |
|
||||
| `rebuild_tool_scope_preserves_tool_tracker` | Tracker survives rebuild |
|
||||
| `rebuild_tool_scope_repl_mode_appends_user_interaction_functions` | REPL adds user__ functions |
|
||||
| `rebuild_tool_scope_cmd_mode_no_user_interaction_functions` | CMD skips user__ functions |
|
||||
|
||||
**Total: 54 new tests (176 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **ConnectedServer untestable without subprocess**: `ConnectedServer`
|
||||
(= `RunningService<RoleClient, ()>`) cannot be constructed without
|
||||
a real MCP server subprocess. This blocks unit testing for:
|
||||
- McpFactory.acquire() full flow (spawn + insert + Weak sharing)
|
||||
- McpRuntime.insert/get with real handles
|
||||
- McpRuntime.search/describe/invoke (need live tool catalog)
|
||||
- All scope transition tests (role/session/agent MCP start/stop)
|
||||
|
||||
These require integration tests with a mock MCP server binary
|
||||
(e.g., a simple echo server). Recommended for a dedicated
|
||||
integration test iteration.
|
||||
|
||||
2. **McpServerKey sorting guarantees sharing correctness**: The
|
||||
sorting of args, env, and headers in McpServerKey::from_spec
|
||||
is critical — without it, HashMap key equality would be
|
||||
non-deterministic. Tests verify this explicitly.
|
||||
|
||||
3. **rebuild_tool_scope has 3 guard clauses that prevent server
|
||||
acquisition**: mcp_server_support=false, mcp_config=None,
|
||||
enabled_mcp_servers=None. All three paths tested.
|
||||
|
||||
4. **REPL vs CMD mode differs in user interaction functions**: The
|
||||
`rebuild_tool_scope` method conditionally appends `user__*`
|
||||
functions only in REPL mode. Tested both paths.
|
||||
|
||||
5. **McpServer::validate enforces strict transport/field separation**:
|
||||
Stdio servers cannot have url/headers, remote servers cannot have
|
||||
command/args/cwd. This prevents misconfiguration. All cross-field
|
||||
conflict cases tested.
|
||||
|
||||
6. **McpRegistry.resolve_server_ids is private** but tested via
|
||||
`#[cfg(test)]` in the same module. It's the core of server ID
|
||||
resolution for "all", comma-separated, and empty cases.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 06: Tool Evaluation — eval_tool_calls, ToolCall dispatch,
|
||||
tool handlers, MCP tool invocation chain (mcp__search, mcp__describe,
|
||||
mcp__invoke).
|
||||
@@ -0,0 +1,96 @@
|
||||
# Iteration 6 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/06-tool-evaluation.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/function/mod.rs (36 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `toolcall_new_sets_fields` | ToolCall::new sets name, arguments, id |
|
||||
| `toolcall_default_has_empty_fields` | Default ToolCall has empty/null fields |
|
||||
| `toolcall_with_thought_signature` | with_thought_signature sets value |
|
||||
| `toolcall_with_thought_signature_none` | with_thought_signature(None) clears |
|
||||
| `dedup_removes_duplicate_ids_keeps_last` | Duplicate ids → last occurrence kept |
|
||||
| `dedup_keeps_unique_ids` | Unique ids → all kept |
|
||||
| `dedup_keeps_calls_without_ids` | No-id calls always kept |
|
||||
| `dedup_preserves_last_occurrence_order` | Ordering based on last occurrence position |
|
||||
| `dedup_empty_input_returns_empty` | Empty vec → empty result |
|
||||
| `dedup_mixed_with_and_without_ids` | Mixed id/no-id dedup behavior |
|
||||
| `tracker_default_values` | Default max_repeats=2, chain_len=3 |
|
||||
| `tracker_no_loop_on_fresh_tracker` | Fresh tracker returns None |
|
||||
| `tracker_no_loop_below_threshold` | Below max_repeats → no loop |
|
||||
| `tracker_detects_loop_at_max_repeats` | At max_repeats → loop detected |
|
||||
| `tracker_different_args_no_loop` | Different args break loop detection |
|
||||
| `tracker_different_names_no_loop` | Different names break loop detection |
|
||||
| `tracker_chain_detection` | Chain of identical calls detected |
|
||||
| `tracker_record_call_respects_capacity` | Capacity bounded by chain_len * max_repeats |
|
||||
| `tracker_loop_message_contains_call_history` | Loop message includes call_history JSON |
|
||||
| `prefix_constants_are_correct` | All 6 prefixes: todo__, agent__, user__, mcp_invoke/search/describe |
|
||||
| `functions_default_is_empty` | Default Functions has no declarations |
|
||||
| `functions_append_todo_adds_declarations` | 5 todo tools: init, add, done, list, clear |
|
||||
| `functions_append_supervisor_adds_declarations` | Supervisor: spawn, check, collect, list, cancel, reply |
|
||||
| `functions_append_teammate_adds_declarations` | Teammate: send_message, check_inbox |
|
||||
| `functions_append_user_interaction_adds_declarations` | User: ask, confirm, input, checkbox |
|
||||
| `functions_append_mcp_meta_creates_three_per_server` | 3 MCP meta functions per server |
|
||||
| `functions_append_mcp_meta_multiple_servers` | Multiple servers → 3 each |
|
||||
| `functions_append_mcp_meta_empty_servers` | Empty servers → no declarations |
|
||||
| `functions_find_returns_declaration` | find() returns matching declaration |
|
||||
| `functions_find_returns_none_for_missing` | find() returns None for unknown |
|
||||
| `functions_contains_true_for_existing` | contains() true for known function |
|
||||
| `functions_contains_false_for_missing` | contains() false for unknown |
|
||||
| `functions_mcp_invoke_declaration_has_tool_and_arguments_params` | Invoke schema: tool + arguments params |
|
||||
| `functions_mcp_search_declaration_has_query_and_top_k_params` | Search schema: query + top_k params |
|
||||
| `functions_mcp_describe_declaration_has_tool_param` | Describe schema: tool param |
|
||||
| `functions_supervisor_includes_task_queue_tools` | Task queue: create, list, complete, fail |
|
||||
| `tool_result_stores_call_and_output` | ToolResult::new stores both fields |
|
||||
|
||||
**Total: 36 new tests (212 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **ToolCall::dedup keeps the LAST occurrence**: The implementation
|
||||
iterates in reverse and reverses again, so when duplicate ids
|
||||
exist, the last occurrence wins. My initial tests assumed first-
|
||||
wins behavior — caught and corrected during the iteration.
|
||||
|
||||
2. **ToolCall::eval requires full RequestContext**: The dispatch
|
||||
routing (`agent__*`, `todo__*`, `user__*`, `mcp_*`, shell
|
||||
fallback) cannot be unit-tested because `eval()` takes
|
||||
`&mut RequestContext` which requires an initialized AppState.
|
||||
The prefix routing is verified indirectly through prefix
|
||||
constant tests and function declaration tests.
|
||||
|
||||
3. **Functions::init requires filesystem**: It calls
|
||||
`build_global_tool_declarations` which reads tool files from
|
||||
disk. Can't unit-test without a temp directory with actual
|
||||
tool scripts. Function filtering by `enabled_tools` is thus
|
||||
deferred.
|
||||
|
||||
4. **All function declaration appenders are fully testable**: The
|
||||
`append_*` methods on Functions work without I/O and produce
|
||||
the exact function declarations the LLM sees. This is the most
|
||||
important behavioral contract to test.
|
||||
|
||||
5. **MCP meta function schemas are critical**: The invoke, search,
|
||||
and describe meta functions each have specific parameter schemas
|
||||
(tool+arguments, query+top_k, tool). Tests verify these schemas
|
||||
exist with correct fields and required params.
|
||||
|
||||
6. **ToolCallTracker loop detection has two mechanisms**:
|
||||
- Consecutive repeat detection (same call N times in a row)
|
||||
- Chain detection (same call repeated across the last chain_len
|
||||
entries)
|
||||
Both are tested independently.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 07: Input Construction — Input::from_str, from_files,
|
||||
field capturing, function selection.
|
||||
@@ -0,0 +1,97 @@
|
||||
# Iteration 7 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/07-input-construction.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/input.rs (31 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `resolve_role_with_explicit_role` | Explicit role returned, with_session/agent false |
|
||||
| `resolve_role_without_role_no_session_no_agent` | Default role, both flags false |
|
||||
| `resolve_role_without_role_with_session` | with_session true when session present |
|
||||
| `resolve_role_explicit_role_overrides_session_flag` | Explicit role forces with_session=false |
|
||||
| `resolve_paths_detects_last_reply_syntax` | %% sets with_last_reply=true |
|
||||
| `resolve_paths_detects_url` | https:// classified as remote URL |
|
||||
| `resolve_paths_detects_external_command` | Backtick-wrapped → external command |
|
||||
| `resolve_paths_empty_input` | Empty vec → all empty, no last reply |
|
||||
| `resolve_paths_rejects_url_with_glob_suffix` | URL** → error |
|
||||
| `resolve_paths_mixed_inputs` | %% + URL + cmd all detected |
|
||||
| `input_from_str_captures_text` | Text stored correctly |
|
||||
| `input_from_str_with_explicit_role` | Role name captured |
|
||||
| `input_from_str_captures_stream_from_config` | stream=false from config |
|
||||
| `input_is_empty_with_no_text_and_no_medias` | Empty text + no medias = empty |
|
||||
| `input_is_not_empty_with_text` | Text present = not empty |
|
||||
| `input_set_text_changes_text` | set_text updates text |
|
||||
| `input_text_returns_patched_when_set` | Patched text overrides |
|
||||
| `input_clear_patch_restores_original` | clear_patch removes override |
|
||||
| `input_set_continue_output_accumulates` | Multiple calls concatenate |
|
||||
| `input_set_regenerate_sets_flag_and_clears_tool_calls` | Flag set, tool_calls cleared |
|
||||
| `input_summary_truncates_long_text` | >80 chars → truncated with ... |
|
||||
| `input_summary_preserves_short_text` | Short text unchanged |
|
||||
| `input_raw_with_no_files` | Raw returns just text |
|
||||
| `input_render_with_no_medias` | Render returns just text |
|
||||
| `input_with_agent_false_when_no_agent` | No agent context → false |
|
||||
| `input_session_returns_none_when_with_session_false` | Explicit role → no session access |
|
||||
| `input_session_returns_some_when_with_session_true` | Session context → session access |
|
||||
| `is_image_recognizes_image_extensions` | png/jpeg/jpg/webp/gif recognized |
|
||||
| `is_image_rejects_non_image_extensions` | txt/rs/pdf rejected |
|
||||
| `resolve_data_url_returns_path_for_known_hash` | Hash lookup returns path |
|
||||
| `resolve_data_url_returns_original_for_non_data_url` | Non-data URL returned as-is |
|
||||
|
||||
### src/config/request_context.rs (7 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `select_functions_returns_none_when_no_tools_enabled` | No enabled_tools → None |
|
||||
| `select_functions_returns_none_when_function_calling_disabled` | function_calling_support=false → None |
|
||||
| `select_functions_all_enabled_tools_returns_all_non_mcp` | "all" → all non-MCP declarations |
|
||||
| `select_functions_comma_separated_filters` | Comma list → matching subset |
|
||||
| `select_enabled_mcp_servers_returns_empty_when_mcp_disabled` | mcp_server_support=false → empty |
|
||||
| `select_enabled_mcp_servers_all_returns_all_mcp_functions` | "all" → all MCP functions |
|
||||
| `select_enabled_mcp_servers_comma_filters` | Server name → only that server's 3 functions |
|
||||
|
||||
**Total: 38 new tests (250 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **Input::from_files is async and I/O-heavy**: It fetches URLs,
|
||||
reads files from disk, expands globs, and runs external commands.
|
||||
Full testing requires integration tests with temp files/dirs.
|
||||
|
||||
2. **resolve_role with agent**: Testing requires an initialized
|
||||
Agent (which needs config files on disk). The agent path is
|
||||
tested indirectly through the existing `exit_agent` test in
|
||||
iteration 4.
|
||||
|
||||
3. **resolve_paths is a pure function**: No I/O, fully testable.
|
||||
It cleanly separates path classification (URL vs local vs cmd
|
||||
vs loader) from actual loading. Good design for testing.
|
||||
|
||||
4. **select_functions has complex filtering**: It filters non-MCP
|
||||
declarations by enabled_tools, then adds user__ functions for
|
||||
non-agent contexts, then merges agent-specific functions. The
|
||||
MCP selection mirrors this with MCP-prefixed declarations.
|
||||
Both paths fully tested.
|
||||
|
||||
5. **Input captures state at construction time**: All fields
|
||||
(stream_enabled, session, rag, functions) are captured from
|
||||
RequestContext at Input creation. This snapshot-at-creation
|
||||
pattern means the Input is independent of later context changes.
|
||||
|
||||
6. **The %% syntax for last-reply carry-over** is detected in
|
||||
resolve_paths (pure function) but the actual last_reply
|
||||
retrieval happens in from_files (async). Tested the detection
|
||||
part.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 08: Request Context — RequestContext methods, scope
|
||||
transitions, state management.
|
||||
@@ -0,0 +1,69 @@
|
||||
# Iteration 8 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/08-request-context.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/request_context.rs (22 new tests, 51 total in file)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `state_empty_context` | Empty context → empty StateFlags |
|
||||
| `state_with_role_only` | Role set → ROLE flag |
|
||||
| `state_with_empty_session` | Empty session → SESSION_EMPTY flag |
|
||||
| `state_flags_combine_role_and_session` | Multiple flags combine correctly |
|
||||
| `role_info_errors_when_no_role` | No role → error |
|
||||
| `role_info_succeeds_with_role` | Role present → exports prompt |
|
||||
| `agent_info_errors_when_no_agent` | No agent → error |
|
||||
| `rag_info_errors_when_no_rag` | No RAG → error |
|
||||
| `use_role_obj_errors_when_agent_active` | Agent blocks role assignment |
|
||||
| `exit_rag_clears_rag` | exit_rag() sets rag to None |
|
||||
| `discontinuous_last_message_sets_continuous_false` | Marks last message non-continuous |
|
||||
| `discontinuous_last_message_noop_when_none` | No last message → no-op |
|
||||
| `before_chat_completion_sets_last_message` | Creates LastMessage with empty output |
|
||||
| `role_like_mut_returns_none_when_empty` | No active scope → None |
|
||||
| `role_like_mut_returns_role_when_only_role` | Role only → returns role |
|
||||
| `role_like_mut_prefers_session_over_role` | Session takes priority |
|
||||
| `working_mode_cmd` | CMD mode flags correct |
|
||||
| `working_mode_repl` | REPL mode flags correct |
|
||||
| `session_file_returns_yaml_path` | Correct .yaml suffix |
|
||||
| `session_file_with_subdir` | subdir/name → nested path |
|
||||
| `is_compressing_session_false_when_no_session` | No session → false |
|
||||
| `is_compressing_session_false_with_default_session` | Default session → false |
|
||||
|
||||
**Total: 22 new tests (272 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **Rag struct has no Default**: Rag requires an AppConfig, name,
|
||||
embedding model, and HNSW index. Can't create test instances
|
||||
without heavy setup. RAG-related state tests (state with RAG,
|
||||
exit_rag with actual RAG) deferred.
|
||||
|
||||
2. **role_like_mut priority is session > agent > role > None**:
|
||||
The session-over-role priority is verified. Agent priority
|
||||
can't be easily tested without agent init (filesystem).
|
||||
|
||||
3. **StateFlags is a bitflags type**: Tested empty, individual
|
||||
flags (ROLE, SESSION_EMPTY), and combinations. The SESSION
|
||||
flag (non-empty session) requires adding messages to a session
|
||||
which needs more setup — deferred.
|
||||
|
||||
4. **info() and sysinfo() require model provider config**: These
|
||||
format system info strings that include model details. Testing
|
||||
requires a valid model provider configuration.
|
||||
|
||||
5. **The RequestContext test file now has 51 tests** spanning
|
||||
iterations 1, 4, 5, 7, and 8. It's the most heavily tested
|
||||
module, which matches its role as the central state container.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Plan file 09: REPL Commands — REPL command handlers, state
|
||||
assertions, argument parsing.
|
||||
@@ -0,0 +1,90 @@
|
||||
# Iteration 9 — Test Implementation Notes
|
||||
|
||||
## Plan file addressed
|
||||
|
||||
`docs/testing/plans/09-repl-commands.md`
|
||||
|
||||
## Tests created
|
||||
|
||||
### src/config/mod.rs (8 new tests)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `assert_state_pass_always_true` | pass() true for all flag combos |
|
||||
| `assert_state_bare_only_empty` | bare() only matches empty |
|
||||
| `assert_state_true_requires_flag_present` | True requires any match |
|
||||
| `assert_state_true_with_multiple_flags_any_match` | OR semantics for True flags |
|
||||
| `assert_state_false_requires_flag_absent` | False requires all absent |
|
||||
| `assert_state_false_with_multiple_flags` | Multiple False flags all checked |
|
||||
| `assert_state_truefalse_requires_true_present_and_false_absent` | Both conditions |
|
||||
| `assert_state_equal_exact_match` | Exact flag equality |
|
||||
|
||||
### src/repl/mod.rs (31 new tests, 33 total in file)
|
||||
|
||||
| Test name | What it verifies |
|
||||
|---|---|
|
||||
| `repl_commands_has_39_entries` | Array size |
|
||||
| `repl_commands_all_start_with_dot` | All commands dotted |
|
||||
| `repl_commands_no_empty_descriptions` | All have descriptions |
|
||||
| `repl_commands_help_is_always_available` | .help → pass |
|
||||
| `repl_commands_exit_is_always_available` | .exit → pass |
|
||||
| `repl_commands_info_role_requires_role` | .info role → True(ROLE) |
|
||||
| `repl_commands_session_blocked_when_already_in_session` | .session → False(SESSION) |
|
||||
| `repl_commands_exit_session_requires_session` | .exit session → True(SESSION) |
|
||||
| `repl_commands_exit_agent_requires_agent` | .exit agent → True(AGENT) |
|
||||
| `repl_commands_agent_only_when_bare` | .agent → Equal(empty) |
|
||||
| `repl_commands_role_blocked_in_session_or_agent` | .role → False(SESSION\|AGENT) |
|
||||
| `repl_commands_prompt_blocked_in_session_or_agent` | .prompt → False(SESSION\|AGENT) |
|
||||
| `repl_commands_rag_blocked_in_agent` | .rag → False(AGENT) |
|
||||
| `repl_commands_starter_requires_agent` | .starter → True(AGENT) |
|
||||
| `repl_commands_clear_todo_requires_agent` | .clear todo → True(AGENT) |
|
||||
| `repl_commands_edit_role_requires_role_not_session` | .edit role → TrueFalse |
|
||||
| `repl_commands_exit_rag_requires_rag_not_agent` | .exit rag → TrueFalse |
|
||||
| `parse_command_plain_text_returns_none` | Plain text → None |
|
||||
| `parse_command_empty_returns_none` | Empty → None |
|
||||
| `parse_command_whitespace_only_returns_none` | Whitespace → None |
|
||||
| `parse_command_dot_only` | Single dot → (".", None) |
|
||||
| `split_first_arg_none_input` | None → None |
|
||||
| `split_first_arg_single_word` | "role" → ("role", None) |
|
||||
| `split_first_arg_two_words` | "role x" → ("role", Some("x")) |
|
||||
| `split_first_arg_with_extra_spaces` | Extra spaces trimmed |
|
||||
| `repl_command_is_valid_pass_always_true` | pass → always valid |
|
||||
| `repl_command_is_valid_respects_true` | True → enforced |
|
||||
| `repl_command_is_valid_respects_false` | False → enforced |
|
||||
| `multiline_regex_captures_content_between_markers` | :::content::: captured |
|
||||
| `multiline_regex_does_not_match_single_marker` | Unclosed → no match |
|
||||
| `multiline_regex_does_not_match_plain_text` | Plain text → no match |
|
||||
|
||||
**Total: 39 new tests (311 total in suite)**
|
||||
|
||||
## Bugs discovered
|
||||
|
||||
None.
|
||||
|
||||
## Observations for future iterations
|
||||
|
||||
1. **AssertState has 4 variants with distinct semantics**:
|
||||
- True: any of the required flags must be present (OR)
|
||||
- False: all of the forbidden flags must be absent (AND)
|
||||
- TrueFalse: True AND False simultaneously
|
||||
- Equal: exact flag match
|
||||
This is a critical invariant for REPL command availability.
|
||||
|
||||
2. **The .agent command uses AssertState::bare()** (Equal(empty)),
|
||||
meaning it's only available when NO other scope is active. This
|
||||
is stricter than False — it requires exactly empty state.
|
||||
|
||||
3. **All 39 REPL commands** have correct dot prefixes and non-empty
|
||||
descriptions. Verified as structural invariants.
|
||||
|
||||
4. **The multiline ::: syntax** is handled by a regex that requires
|
||||
both opening and closing markers. The ReplValidator marks
|
||||
single-marker input as Incomplete for the line editor.
|
||||
|
||||
5. **Command handler tests** (the actual .role, .session, .agent
|
||||
implementations) require full async RequestContext with
|
||||
filesystem access. These are integration tests and are deferred.
|
||||
|
||||
## Next iteration
|
||||
|
||||
Check the TEST-IMPLEMENTATION-PLAN.md for what plan file comes next.
|
||||
@@ -0,0 +1,62 @@
|
||||
# Test Plan: Config Loading and AppConfig
|
||||
|
||||
## Feature description
|
||||
|
||||
Loki loads its configuration from a YAML file (`config.yaml`) into
|
||||
a `Config` struct, then converts it to `AppConfig` (immutable,
|
||||
shared) + `RequestContext` (mutable, per-request). The `AppConfig`
|
||||
holds all serialized fields; `RequestContext` holds runtime state.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Config loading
|
||||
- [ ] Config loads from YAML file with all supported fields
|
||||
- [x] Missing optional fields get correct defaults (config_defaults_match_expected)
|
||||
- [ ] `model_id` defaults to first available model if empty (requires Config::init, integration test)
|
||||
- [x] `temperature`, `top_p` default to `None`
|
||||
- [x] `stream` defaults to `true`
|
||||
- [x] `save` defaults to `false` (CORRECTED: was listed as true)
|
||||
- [x] `highlight` defaults to `true`
|
||||
- [x] `dry_run` defaults to `false`
|
||||
- [x] `function_calling_support` defaults to `true`
|
||||
- [x] `mcp_server_support` defaults to `true`
|
||||
- [x] `compression_threshold` defaults to `4000`
|
||||
- [ ] `document_loaders` populated from config and defaults (requires Config::init)
|
||||
- [x] `clients` parsed from config (to_app_config_copies_clients)
|
||||
|
||||
### AppConfig conversion
|
||||
- [x] `to_app_config()` copies all serialized fields correctly
|
||||
- [x] `clients` field populated on AppConfig
|
||||
- [ ] `visible_tools` correctly computed from `enabled_tools` config (deferred to plan 16)
|
||||
- [x] `mapping_tools` correctly parsed
|
||||
- [x] `mapping_mcp_servers` correctly parsed
|
||||
- [ ] `user_agent` resolved (auto → crate name/version)
|
||||
|
||||
### RequestContext conversion
|
||||
- [x] `to_request_context()` copies all runtime fields (to_request_context_creates_clean_state)
|
||||
- [ ] `model` field populated with resolved model (requires Model::retrieve_model)
|
||||
- [ ] `working_mode` set correctly (Repl vs Cmd)
|
||||
- [x] `tool_scope` starts with default (empty)
|
||||
- [x] `agent_runtime` starts as `None`
|
||||
|
||||
### AppConfig field accessors
|
||||
- [x] `editor()` returns configured editor or $EDITOR
|
||||
- [x] `light_theme()` returns theme flag
|
||||
- [ ] `render_options()` returns options for markdown rendering
|
||||
- [x] `sync_models_url()` returns configured or default URL
|
||||
|
||||
### Dynamic config updates
|
||||
- [x] `update_app_config` closure correctly clones and replaces Arc
|
||||
- [x] Changes to `dry_run`, `stream`, `save` persist across calls
|
||||
- [x] Changes visible to subsequent `ctx.app.config` reads
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] AppConfig remains immutable after construction (no field mutation)
|
||||
- [ ] Multiple RequestContexts can share the same AppState
|
||||
- [ ] Changing AppConfig fields (via clone-mutate-replace) doesn't
|
||||
affect other references to the old Arc
|
||||
|
||||
## Old code reference
|
||||
- `src/config/mod.rs` — `Config` struct, `Config::init`, defaults
|
||||
- `src/config/bridge.rs` — `to_app_config`, `to_request_context`
|
||||
- `src/config/app_config.rs` — `AppConfig` struct and methods
|
||||
@@ -0,0 +1,68 @@
|
||||
# Test Plan: Roles
|
||||
|
||||
## Feature description
|
||||
|
||||
Roles define a system prompt + optional model/temperature/MCP config
|
||||
that customizes LLM behavior. Roles can be built-in or user-defined
|
||||
(markdown files). Roles are "role-likes" — sessions and agents also
|
||||
implement the RoleLike trait.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Role loading
|
||||
- [x] Built-in roles load correctly (shell, code)
|
||||
- [ ] User-defined roles load from markdown files (requires filesystem)
|
||||
- [x] Role parses model_id from metadata
|
||||
- [x] Role parses temperature, top_p from metadata
|
||||
- [x] Role parses enabled_tools from metadata
|
||||
- [x] Role parses enabled_mcp_servers from metadata
|
||||
- [ ] Role with no model_id inherits current model (requires retrieve_role + client config)
|
||||
- [ ] Role with no temperature inherits from AppConfig (requires retrieve_role)
|
||||
- [ ] Role with no top_p inherits from AppConfig (requires retrieve_role)
|
||||
|
||||
### retrieve_role
|
||||
- [ ] Retrieves by name from file system
|
||||
- [ ] Resolves model via Model::retrieve_model
|
||||
- [ ] Falls back to current model if role has no model_id
|
||||
- [ ] Sets temperature/top_p from AppConfig when role doesn't specify
|
||||
|
||||
### use_role (scope transition)
|
||||
- [x] Sets role on RequestContext (use_role_obj_sets_role)
|
||||
- [ ] Triggers rebuild_tool_scope (async, deferred to plan 05/08)
|
||||
- [ ] MCP servers start if role has enabled_mcp_servers (deferred to plan 05)
|
||||
- [ ] MCP meta functions added to function list (deferred to plan 05)
|
||||
- [ ] Previous role cleared when switching (deferred to plan 08)
|
||||
- [x] Role-like temperature/top_p take effect (role_set_temperature_works)
|
||||
|
||||
### exit_role
|
||||
- [x] Clears role from RequestContext (exit_role_clears_role)
|
||||
- [ ] Followed by bootstrap_tools to restore global tool scope (async, deferred)
|
||||
- [ ] MCP servers from role are stopped (deferred to plan 05)
|
||||
- [ ] Global MCP servers restored (deferred to plan 05)
|
||||
|
||||
### use_prompt (temp role)
|
||||
- [x] Creates a TEMP_ROLE_NAME role with the prompt text (use_prompt_creates_temp_role)
|
||||
- [x] Uses current model
|
||||
- [x] Activates via use_role_obj
|
||||
|
||||
### extract_role
|
||||
- [ ] Returns role from agent if agent active (deferred to plan 04)
|
||||
- [ ] Returns role from session if session active with role (deferred to plan 03)
|
||||
- [x] Returns standalone role if active (extract_role_returns_standalone_role)
|
||||
- [x] Returns default role if none active (extract_role_returns_default_when_nothing_active)
|
||||
|
||||
### One-shot role messages (REPL)
|
||||
- [ ] `.role coder write hello` sends message with role, then exits role
|
||||
- [ ] Original state restored after one-shot
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Role → different role: old role replaced, MCP swapped
|
||||
- [ ] Role → session: role cleared, session takes over
|
||||
- [ ] Role with MCP → exit: MCP servers stop, global MCP restored
|
||||
- [ ] No MCP → role with MCP: servers start
|
||||
- [ ] Role with MCP → role without MCP: servers stop
|
||||
|
||||
## Old code reference
|
||||
- `src/config/mod.rs` — `use_role`, `exit_role`, `retrieve_role`
|
||||
- `src/config/role.rs` — `Role` struct, parsing
|
||||
- `src/config/request_context.rs` — `use_role`, `exit_role`, `use_prompt`, `retrieve_role`
|
||||
@@ -0,0 +1,66 @@
|
||||
# Test Plan: Sessions
|
||||
|
||||
## Feature description
|
||||
|
||||
Sessions persist conversation history across multiple turns. They
|
||||
store messages, role context, model info, and optional MCP config.
|
||||
Sessions can be temporary, named, or auto-named.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Session creation
|
||||
- [ ] Temp session created with TEMP_SESSION_NAME
|
||||
- [ ] Named session created at correct file path
|
||||
- [ ] New session captures current role via extract_role
|
||||
- [ ] New session captures save_session from AppConfig
|
||||
- [ ] Session tracks model_id
|
||||
|
||||
### Session loading
|
||||
- [ ] Named session loads from YAML file
|
||||
- [ ] Loaded session resolves model via Model::retrieve_model
|
||||
- [ ] Loaded session restores role_prompt if role exists
|
||||
- [ ] Auto-named sessions (prefixed `_/`) handled correctly
|
||||
|
||||
### Session saving
|
||||
- [ ] Session saved to correct path
|
||||
- [ ] Session file contains messages, model_id, role info
|
||||
- [ ] save_session flag controls whether session is persisted
|
||||
- [ ] set_save_session_this_time overrides for current turn
|
||||
|
||||
### Session lifecycle
|
||||
- [ ] use_session creates or loads session
|
||||
- [ ] Already in session → error
|
||||
- [ ] exit_session saves and clears
|
||||
- [ ] empty_session clears messages but keeps session active
|
||||
|
||||
### Session carry-over
|
||||
- [ ] New empty session with last_message prompts "incorporate?"
|
||||
- [ ] If accepted, last Q&A added to session
|
||||
- [ ] If declined, session starts fresh
|
||||
- [ ] Only prompts when continuous and output not empty
|
||||
|
||||
### Session compression
|
||||
- [ ] maybe_compress_session returns true when threshold exceeded
|
||||
- [ ] compress_session reduces message count
|
||||
- [ ] Compression message shown to user
|
||||
- [ ] Session usable after compression
|
||||
|
||||
### Session autoname
|
||||
- [ ] maybe_autoname_session returns true for new sessions
|
||||
- [ ] Auto-naming sets session name based on content
|
||||
- [ ] Autoname only triggers once per session
|
||||
|
||||
### Session info
|
||||
- [ ] session_info returns formatted session details
|
||||
- [ ] Shows message count, model, role, tokens
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Session → role change: role updated within session
|
||||
- [ ] Session → exit session: messages saved, state cleared
|
||||
- [ ] Agent session → exit: agent session cleanup
|
||||
- [ ] Session with MCP → exit: MCP servers handled
|
||||
|
||||
## Old code reference
|
||||
- `src/config/mod.rs` — `use_session`, `exit_session`, `empty_session`
|
||||
- `src/config/session.rs` — `Session` struct, new, load, save
|
||||
- `src/config/request_context.rs` — `use_session`, `exit_session`
|
||||
@@ -0,0 +1,77 @@
|
||||
# Test Plan: Agents
|
||||
|
||||
## Feature description
|
||||
|
||||
Agents combine a role (instructions), tools (bash/python/ts scripts),
|
||||
optional RAG, optional MCP servers, and optional sub-agent spawning
|
||||
capability. Agent::init compiles tools, resolves model, loads RAG,
|
||||
and sets up the agent environment.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Agent initialization
|
||||
- [ ] Agent::init loads config.yaml from agent directory
|
||||
- [ ] Agent tools compiled from tools.sh / tools.py / tools.ts
|
||||
- [ ] Tool file priority: .sh > .py > .ts > .js
|
||||
- [ ] Global tools loaded (from global_tools config)
|
||||
- [ ] Model resolved from agent config or defaults to current
|
||||
- [ ] Agent with no model_id uses current model
|
||||
- [ ] Temperature/top_p from agent config applied
|
||||
- [ ] Dynamic instructions (_instructions function) invoked if configured
|
||||
- [ ] Static instructions loaded from config
|
||||
- [ ] Agent variables interpolated into instructions
|
||||
- [ ] Special variables (__os__, __cwd__, __now__, etc.) interpolated
|
||||
- [ ] Agent .env file loaded if present
|
||||
- [ ] Built-in agents installed on first run (skip if exists)
|
||||
|
||||
### Agent tools
|
||||
- [ ] Agent-specific tools available as function declarations
|
||||
- [ ] Global tools (from global_tools) also available
|
||||
- [ ] Tool binaries built in agent bin directory
|
||||
- [ ] clear_agent_bin_dir removes old binaries before rebuild
|
||||
- [ ] Tool declarations include name, description, parameters
|
||||
|
||||
### Agent with MCP
|
||||
- [ ] MCP servers listed in agent config started
|
||||
- [ ] MCP meta functions (invoke/search/describe) added
|
||||
- [ ] Agent with MCP but mcp_server_support=false → error
|
||||
- [ ] MCP servers stopped on agent exit
|
||||
|
||||
### Agent with RAG
|
||||
- [ ] RAG documents loaded from agent config
|
||||
- [ ] RAG available during agent conversation
|
||||
- [ ] RAG search results included in context
|
||||
|
||||
### Agent sessions
|
||||
- [ ] Agent session started (temp or named)
|
||||
- [ ] agent_session config used if no explicit session
|
||||
- [ ] Agent session variables initialized
|
||||
|
||||
### Agent lifecycle
|
||||
- [ ] use_agent checks function_calling_support
|
||||
- [ ] use_agent errors if agent already active
|
||||
- [ ] exit_agent clears agent, session, rag, supervisor
|
||||
- [ ] exit_agent restores global tool scope
|
||||
|
||||
### Auto-continuation
|
||||
- [ ] Agents with auto_continue=true continue after incomplete todos
|
||||
- [ ] max_auto_continues limits continuation attempts
|
||||
- [ ] Continuation prompt sent with todo state
|
||||
- [ ] clear todo stops continuation
|
||||
|
||||
### Conversation starters
|
||||
- [ ] Starters loaded from agent config
|
||||
- [ ] .starter lists available starters
|
||||
- [ ] .starter <n> sends the starter as a message
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Agent → exit: tools cleared, MCP stopped, session ended
|
||||
- [ ] Agent with MCP → exit: MCP servers released, global MCP restored
|
||||
- [ ] Already in agent → start agent: error
|
||||
- [ ] Agent with RAG → exit: RAG cleared
|
||||
|
||||
## Old code reference
|
||||
- `src/config/agent.rs` — Agent::init, agent config parsing
|
||||
- `src/config/mod.rs` — use_agent, exit_agent
|
||||
- `src/config/request_context.rs` — use_agent, exit_agent
|
||||
- `src/function/mod.rs` — Functions::init_agent, tool compilation
|
||||
@@ -0,0 +1,118 @@
|
||||
# Test Plan: MCP Server Lifecycle
|
||||
|
||||
## Feature description
|
||||
|
||||
MCP (Model Context Protocol) servers are external tools that run
|
||||
as subprocesses communicating via stdio. Loki manages their lifecycle
|
||||
through McpFactory (start/share via Weak dedup) and McpRuntime
|
||||
(per-scope active server handles). Servers are started/stopped
|
||||
during scope transitions (role/session/agent enter/exit).
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### MCP config loading
|
||||
- [x] mcp.json parsed correctly from functions directory
|
||||
- [x] Server specs include command, args, env, cwd
|
||||
- [ ] Vault secrets interpolated in mcp.json
|
||||
- [ ] Missing secrets reported as warnings
|
||||
- [x] McpServersConfig stored on AppState.mcp_config
|
||||
|
||||
### McpFactory
|
||||
- [ ] acquire() spawns new server when none active (requires real subprocess)
|
||||
- [ ] acquire() returns existing handle via Weak upgrade (requires real subprocess)
|
||||
- [ ] acquire() spawns fresh when Weak is dead (requires real subprocess)
|
||||
- [ ] Multiple acquire() calls for same spec share handle (requires real subprocess)
|
||||
- [x] Different specs get different handles (via key inequality)
|
||||
- [x] McpServerKey built correctly from spec (sorted args/env)
|
||||
|
||||
### McpRuntime
|
||||
- [ ] insert() adds server handle by name (requires Arc<ConnectedServer>)
|
||||
- [ ] get() retrieves handle by name (requires Arc<ConnectedServer>)
|
||||
- [x] server_names() returns all active names
|
||||
- [x] is_empty() correct for empty/non-empty
|
||||
- [ ] search() finds tools by keyword (BM25 ranking) (requires live server)
|
||||
- [ ] describe() returns tool input schema (requires live server)
|
||||
- [ ] invoke() calls tool on server and returns result (requires live server)
|
||||
|
||||
### spawn_mcp_server
|
||||
- [ ] Builds Command from spec (command, args, env, cwd) (integration test)
|
||||
- [ ] Creates TokioChildProcess transport (integration test)
|
||||
- [ ] Completes rmcp handshake (serve) (integration test)
|
||||
- [ ] Returns Arc<ConnectedServer> (integration test)
|
||||
- [ ] Log file created when log_path provided (integration test)
|
||||
|
||||
### rebuild_tool_scope (MCP integration)
|
||||
- [x] Empty enabled_mcp_servers → no servers acquired
|
||||
- [ ] "all" → all configured servers acquired (requires real subprocess)
|
||||
- [ ] Comma-separated list → only listed servers acquired (requires real subprocess)
|
||||
- [ ] Mapping resolution: alias → actual server key(s) (requires real subprocess)
|
||||
- [ ] MCP meta functions appended for each started server (requires real subprocess)
|
||||
- [ ] Old ToolScope dropped (releasing old server handles) (requires real subprocess)
|
||||
- [ ] Loading spinner shown during acquisition (UI test)
|
||||
- [ ] AbortSignal properly threaded through (integration test)
|
||||
|
||||
### Server lifecycle during scope transitions
|
||||
- [ ] Enter role with MCP: servers start (integration test)
|
||||
- [ ] Exit role: servers stop (handle dropped) (integration test)
|
||||
- [ ] Enter role A (MCP-X) → exit → enter role B (MCP-Y):
|
||||
X stops, Y starts (integration test)
|
||||
- [ ] Enter role with MCP → exit to no MCP: servers stop,
|
||||
global MCP restored (integration test)
|
||||
- [ ] Start REPL with global MCP → enter agent with different MCP:
|
||||
agent MCP takes over (integration test)
|
||||
- [ ] Exit agent: agent MCP stops, global MCP restored (integration test)
|
||||
|
||||
### MCP tool invocation chain
|
||||
- [ ] LLM calls mcp__search_<server> → search results returned (integration test)
|
||||
- [ ] LLM calls mcp__describe_<server> tool_name → schema returned (integration test)
|
||||
- [ ] LLM calls mcp__invoke_<server> tool args → tool executed (integration test)
|
||||
- [ ] Server not found → "MCP server not found in runtime" error (tested via McpRuntime.get)
|
||||
- [ ] Tool not found → appropriate error (requires live server)
|
||||
|
||||
### MCP support flag
|
||||
- [x] mcp_server_support=false → no MCP servers started
|
||||
- [ ] mcp_server_support=false + agent with MCP → error (blocks) (requires agent init)
|
||||
- [ ] mcp_server_support=false + role with MCP → warning, continues (requires role init)
|
||||
- [ ] .set mcp_server_support true → MCP servers start (requires live server)
|
||||
|
||||
### MCP in child agents
|
||||
- [ ] Child agent MCP servers acquired via factory (integration test)
|
||||
- [ ] Child agent MCP runtime populated (integration test)
|
||||
- [ ] Child agent MCP tool invocations work (integration test)
|
||||
- [ ] Child agent exit drops MCP handles (integration test)
|
||||
|
||||
## Context switching scenarios (comprehensive)
|
||||
- [ ] No MCP → role with MCP → exit role → no MCP (integration test)
|
||||
- [ ] Global MCP-A → role MCP-B → exit role → global MCP-A (integration test)
|
||||
- [ ] Global MCP-A → agent MCP-B → exit agent → global MCP-A (integration test)
|
||||
- [ ] Role MCP-A → session MCP-B (overrides) → exit session (integration test)
|
||||
- [ ] Agent MCP → child agent MCP → child exits → parent MCP intact (integration test)
|
||||
- [ ] .set enabled_mcp_servers X → .set enabled_mcp_servers Y:
|
||||
X released, Y acquired (integration test)
|
||||
- [ ] .set enabled_mcp_servers null → all released (integration test)
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] McpServerKey equality: same spec → equal keys
|
||||
- [x] McpServerKey inequality: different names → different keys
|
||||
- [x] McpServerKey inequality: different commands → different keys
|
||||
- [x] McpServerKey env coercion: Bool/Int → String
|
||||
- [x] McpFactory default has empty active map
|
||||
- [x] McpServer::is_remote() true for Http/Sse, false for Stdio
|
||||
- [x] McpServer::validate() all cross-field conflicts (6 cases)
|
||||
- [x] McpServersConfig: empty servers map, multiple servers, cwd field
|
||||
- [x] McpRegistry: default state, config accessor
|
||||
- [x] McpRegistry: resolve with whitespace trimming
|
||||
- [x] McpRegistry: resolve all-nonexistent returns empty
|
||||
- [x] rebuild_tool_scope: no mcp_config yields empty runtime
|
||||
- [x] rebuild_tool_scope: preserves tool_tracker across rebuild
|
||||
- [x] rebuild_tool_scope: REPL mode appends user interaction functions
|
||||
- [x] rebuild_tool_scope: CMD mode excludes user interaction functions
|
||||
- [x] MCP meta function name prefix constants are correct
|
||||
- [x] ToolScope default: empty functions, runtime, tracker
|
||||
|
||||
## Old code reference
|
||||
- `src/mcp/mod.rs` — McpRegistry, init, reinit, start/stop
|
||||
- `src/config/mcp_factory.rs` — McpFactory, acquire, McpServerKey
|
||||
- `src/config/tool_scope.rs` — ToolScope, McpRuntime
|
||||
- `src/config/request_context.rs` — rebuild_tool_scope, bootstrap_tools
|
||||
@@ -0,0 +1,85 @@
|
||||
# Test Plan: Tool Evaluation
|
||||
|
||||
## Feature description
|
||||
|
||||
When the LLM returns tool calls, `eval_tool_calls` dispatches each
|
||||
call to the appropriate handler. Handlers include: shell tools
|
||||
(bash/python/ts scripts), MCP tools, supervisor tools (agent spawn),
|
||||
todo tools, and user interaction tools.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### eval_tool_calls dispatch
|
||||
- [ ] Calls dispatched to correct handler by function name prefix (requires RequestContext)
|
||||
- [ ] Tool results returned for each call (requires RequestContext)
|
||||
- [ ] Multiple concurrent tool calls processed (requires RequestContext)
|
||||
- [x] Tool call tracker updated (chain length, repeats)
|
||||
- [ ] Root agent (depth 0) checks escalation queue after eval (requires RequestContext)
|
||||
- [ ] Escalation notifications injected into results (requires RequestContext)
|
||||
|
||||
### ToolCall::eval routing
|
||||
- [ ] agent__* → handle_supervisor_tool (requires RequestContext)
|
||||
- [ ] todo__* → handle_todo_tool (requires RequestContext)
|
||||
- [ ] user__* → handle_user_tool (depth 0) or escalate (depth > 0) (requires RequestContext)
|
||||
- [ ] mcp_invoke_* → invoke_mcp_tool (requires RequestContext + live MCP)
|
||||
- [ ] mcp_search_* → search_mcp_tools (requires RequestContext + live MCP)
|
||||
- [ ] mcp_describe_* → describe_mcp_tool (requires RequestContext + live MCP)
|
||||
- [ ] Other → shell tool execution (requires RequestContext + binary)
|
||||
|
||||
### Shell tool execution
|
||||
- [ ] Tool binary found and executed (integration test)
|
||||
- [ ] Arguments passed correctly (integration test)
|
||||
- [ ] Environment variables set (LLM_OUTPUT, etc.) (integration test)
|
||||
- [ ] Tool output returned as result (integration test)
|
||||
- [ ] Tool failure → error returned as tool result (not panic) (integration test)
|
||||
|
||||
### Tool call tracking
|
||||
- [x] Tracker counts consecutive identical calls
|
||||
- [x] Max repeats triggers warning
|
||||
- [x] Chain length tracked across turns
|
||||
- [x] Tracker state preserved across tool-result loops
|
||||
|
||||
### Function selection
|
||||
- [ ] select_functions filters by role's enabled_tools (requires filesystem)
|
||||
- [x] select_functions includes MCP meta functions for enabled servers
|
||||
- [x] select_functions includes agent functions when agent active (via append tests)
|
||||
- [ ] "all" enables all functions (requires filesystem)
|
||||
- [ ] Comma-separated list enables specific functions (requires filesystem)
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Tool calls during agent → agent tools available (integration test)
|
||||
- [ ] Tool calls during role → role tools available (integration test)
|
||||
- [ ] Tool calls with MCP → MCP invoke/search/describe work (integration test)
|
||||
- [x] No agent → no agent__/todo__ tools in declarations (via Functions::default)
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] ToolCall::new sets name, arguments, id correctly
|
||||
- [x] ToolCall::default has empty/null fields
|
||||
- [x] ToolCall::with_thought_signature sets and clears
|
||||
- [x] ToolCall::dedup keeps last occurrence for duplicate ids
|
||||
- [x] ToolCall::dedup keeps all calls without ids
|
||||
- [x] ToolCall::dedup empty input returns empty
|
||||
- [x] ToolCall::dedup mixed with/without ids
|
||||
- [x] ToolCallTracker default values (max_repeats=2, chain_len=3)
|
||||
- [x] ToolCallTracker no loop on fresh tracker
|
||||
- [x] ToolCallTracker no loop below threshold
|
||||
- [x] ToolCallTracker different args breaks loop
|
||||
- [x] ToolCallTracker different names breaks loop
|
||||
- [x] ToolCallTracker record_call respects capacity
|
||||
- [x] ToolCallTracker loop message includes call_history
|
||||
- [x] All 6 prefix constants verified
|
||||
- [x] Functions::append_todo adds all 5 todo tools
|
||||
- [x] Functions::append_supervisor adds spawn/check/collect/list/cancel/reply + task queue
|
||||
- [x] Functions::append_teammate adds send_message/check_inbox
|
||||
- [x] Functions::append_user_interaction adds ask/confirm/input/checkbox
|
||||
- [x] Functions::append_mcp_meta creates 3 per server with correct schemas
|
||||
- [x] Functions::append_mcp_meta empty servers → no declarations
|
||||
- [x] Functions::find/contains work correctly
|
||||
- [x] ToolResult::new stores call and output
|
||||
|
||||
## Old code reference
|
||||
- `src/function/mod.rs` — eval_tool_calls, ToolCall::eval
|
||||
- `src/function/supervisor.rs` — handle_supervisor_tool
|
||||
- `src/function/todo.rs` — handle_todo_tool
|
||||
- `src/function/user_interaction.rs` — handle_user_tool
|
||||
@@ -0,0 +1,88 @@
|
||||
# Test Plan: Input Construction
|
||||
|
||||
## Feature description
|
||||
|
||||
`Input` encapsulates a single chat turn's data: text, files, role,
|
||||
model, session context, RAG embeddings, and function declarations.
|
||||
It's constructed at the start of each turn and captures all needed
|
||||
state from `RequestContext`.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Input::from_str
|
||||
- [x] Creates Input from text string
|
||||
- [x] Captures role via resolve_role
|
||||
- [x] Captures session from ctx
|
||||
- [ ] Captures rag from ctx (requires RAG setup)
|
||||
- [ ] Captures functions via select_functions (tested separately)
|
||||
- [x] Captures stream_enabled from AppConfig
|
||||
- [x] app_config field set from ctx.app.config
|
||||
- [x] Empty text → is_empty() returns true
|
||||
|
||||
### Input::from_files
|
||||
- [ ] Loads file contents (async + filesystem)
|
||||
- [ ] Supports multiple files (async + filesystem)
|
||||
- [ ] Supports directories (recursive) (async + filesystem)
|
||||
- [ ] Supports URLs (fetches content) (async + network)
|
||||
- [ ] Supports loader syntax (e.g., jina:url) (async + loader)
|
||||
- [x] Last message carry-over (%% syntax) (via resolve_paths)
|
||||
- [ ] Combines file content with text (async)
|
||||
- [ ] document_loaders from AppConfig used (async)
|
||||
|
||||
### resolve_role
|
||||
- [x] Returns provided role if given
|
||||
- [ ] Extracts role from agent if agent active (requires agent init)
|
||||
- [x] Extracts role from session if session has role
|
||||
- [x] Returns default model-based role otherwise
|
||||
- [x] with_session flag set correctly
|
||||
- [x] with_agent flag set correctly
|
||||
|
||||
### Input methods
|
||||
- [ ] stream() returns stream_enabled && !model.no_stream() (requires Model with no_stream)
|
||||
- [ ] create_client() uses app_config to init client (requires client config)
|
||||
- [ ] prepare_completion_data() uses captured functions (requires Model)
|
||||
- [ ] build_messages() uses captured session (requires Message setup)
|
||||
- [ ] echo_messages() uses captured session (requires Message setup)
|
||||
- [x] set_regenerate(role) refreshes role
|
||||
- [ ] use_embeddings() searches RAG if present (requires RAG)
|
||||
- [ ] merge_tool_results() creates continuation input (requires ToolResult)
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Input with agent → agent functions selected (requires agent init)
|
||||
- [x] Input with MCP → MCP meta functions in declarations (via select_functions tests)
|
||||
- [ ] Input with RAG → embeddings included after use_embeddings (requires RAG)
|
||||
- [x] Input without session → no session messages in build_messages (via session() test)
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] resolve_role: explicit role overrides session flag
|
||||
- [x] resolve_paths: empty input
|
||||
- [x] resolve_paths: URL detection (https://)
|
||||
- [x] resolve_paths: external command detection (backtick syntax)
|
||||
- [x] resolve_paths: rejects URL with glob suffix
|
||||
- [x] resolve_paths: mixed inputs (%%, URL, external cmd)
|
||||
- [x] Input::set_text changes text
|
||||
- [x] Input::patched_text overrides text()
|
||||
- [x] Input::clear_patch restores original
|
||||
- [x] Input::set_continue_output accumulates
|
||||
- [x] Input::summary truncates long text with ...
|
||||
- [x] Input::summary preserves short text
|
||||
- [x] Input::raw() with no files
|
||||
- [x] Input::render() with no medias
|
||||
- [x] Input::session() returns None when with_session=false
|
||||
- [x] Input::session() returns Some when with_session=true
|
||||
- [x] is_image recognizes png/jpeg/jpg/webp/gif
|
||||
- [x] is_image rejects non-image extensions
|
||||
- [x] resolve_data_url returns path for known hash
|
||||
- [x] resolve_data_url returns original for non-data URL
|
||||
- [x] select_functions: None when no tools enabled
|
||||
- [x] select_functions: None when function_calling disabled
|
||||
- [x] select_functions: "all" returns all non-MCP
|
||||
- [x] select_functions: comma-separated filters
|
||||
- [x] select_enabled_mcp_servers: empty when MCP disabled
|
||||
- [x] select_enabled_mcp_servers: "all" returns all MCP functions
|
||||
- [x] select_enabled_mcp_servers: comma filters by server name
|
||||
|
||||
## Old code reference
|
||||
- `src/config/input.rs` — Input struct, from_str, from_files
|
||||
- `src/config/mod.rs` — select_functions, extract_role
|
||||
@@ -0,0 +1,87 @@
|
||||
# Test Plan: RequestContext
|
||||
|
||||
## Feature description
|
||||
|
||||
`RequestContext` is the per-request mutable state container. It holds
|
||||
the active model, role, session, agent, RAG, tool scope, and agent
|
||||
runtime. It provides methods for scope transitions, state queries,
|
||||
and chat completion lifecycle.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### State management
|
||||
- [ ] info() returns formatted system info (requires model provider config)
|
||||
- [x] state() returns correct StateFlags combination
|
||||
- [ ] current_model() returns active model (tested implicitly via extract_role)
|
||||
- [x] role_info() errors when no role, succeeds with role
|
||||
- [ ] session_info() format (requires filesystem for sessions)
|
||||
- [x] rag_info() errors when no rag
|
||||
- [x] agent_info() errors when no agent
|
||||
- [ ] sysinfo() returns system details (requires model provider config)
|
||||
- [x] working_mode correctly distinguishes Repl vs Cmd
|
||||
|
||||
### Scope transitions
|
||||
- [x] use_role changes role (via use_role_obj)
|
||||
- [ ] use_session creates/loads session, rebuilds tool scope (async + filesystem)
|
||||
- [x] use_agent initializes agent with all subsystems (via exit_agent test)
|
||||
- [x] exit_role clears role
|
||||
- [x] exit_session saves and clears session
|
||||
- [x] exit_agent clears agent, supervisor, rag, session
|
||||
- [x] exit_rag clears rag
|
||||
- [ ] bootstrap_tools rebuilds tool scope with global MCP (async + MCP servers)
|
||||
|
||||
### Chat completion lifecycle
|
||||
- [x] before_chat_completion sets up for API call
|
||||
- [ ] after_chat_completion saves messages, updates state (async + client)
|
||||
- [x] discontinuous_last_message marks last message as non-continuous
|
||||
|
||||
### ToolScope management
|
||||
- [x] rebuild_tool_scope creates fresh Functions
|
||||
- [ ] rebuild_tool_scope acquires MCP servers via factory (requires live MCP)
|
||||
- [x] rebuild_tool_scope appends user interaction functions in REPL mode
|
||||
- [ ] rebuild_tool_scope appends MCP meta functions for started servers (requires live MCP)
|
||||
- [x] Tool tracker preserved across scope rebuilds
|
||||
|
||||
### AgentRuntime management
|
||||
- [x] agent_runtime populated by use_agent (via exit_agent test)
|
||||
- [x] agent_runtime cleared by exit_agent
|
||||
- [x] Accessor methods (current_depth, supervisor, inbox, etc.) return
|
||||
correct values when agent active
|
||||
- [x] Accessor methods return defaults when no agent
|
||||
|
||||
### Settings update
|
||||
- [ ] update() handles all .set keys correctly (requires REPL command infra)
|
||||
- [x] update_app_config() clones and replaces Arc properly
|
||||
- [ ] delete() handles all delete subcommands (requires REPL command infra)
|
||||
|
||||
### Session helpers
|
||||
- [ ] list_sessions() returns session names (requires filesystem)
|
||||
- [ ] list_autoname_sessions() returns auto-named sessions (requires filesystem)
|
||||
- [x] session_file() returns correct path
|
||||
- [ ] save_session() persists session (requires filesystem)
|
||||
- [x] empty_session() clears messages
|
||||
|
||||
## Context switching scenarios
|
||||
- [x] No state → use_role → exit_role → no state
|
||||
- [x] No state → use_agent → exit_agent → no state
|
||||
- [x] Agent active → use_role_obj errors
|
||||
- [ ] Agent → exit_agent → use_role (clean transition) (async)
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] state() empty context returns empty flags
|
||||
- [x] state() role only → ROLE flag
|
||||
- [x] state() empty session → SESSION_EMPTY flag
|
||||
- [x] state() role + session flags combine
|
||||
- [x] discontinuous_last_message noop when no last_message
|
||||
- [x] before_chat_completion creates LastMessage with empty output and continuous=true
|
||||
- [x] role_like_mut returns None when no active scope
|
||||
- [x] role_like_mut returns role when only role active
|
||||
- [x] role_like_mut prefers session over role
|
||||
- [x] session_file handles subdir/name format
|
||||
- [x] is_compressing_session false with no session
|
||||
- [x] is_compressing_session false with default session
|
||||
|
||||
## Old code reference
|
||||
- `src/config/request_context.rs` — all methods
|
||||
- `src/config/mod.rs` — original Config methods (for parity)
|
||||
@@ -0,0 +1,92 @@
|
||||
# Test Plan: REPL Commands
|
||||
|
||||
## Feature description
|
||||
|
||||
The REPL processes dot-commands (`.role`, `.session`, `.agent`, etc.)
|
||||
and plain text (chat messages). Each command has state assertions
|
||||
(e.g., `.info role` requires an active role).
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Command parsing
|
||||
- [x] Dot-commands parsed correctly (command + args)
|
||||
- [x] Multi-line input (:::) handled (regex)
|
||||
- [x] Plain text treated as chat message (parse_command returns None)
|
||||
- [x] Empty input ignored (parse_command returns None)
|
||||
|
||||
### State assertions (REPL_COMMANDS array)
|
||||
- [x] Each command's assert_state enforced correctly
|
||||
- [x] Invalid state → command rejected (via is_valid)
|
||||
- [x] Commands with AssertState::pass() always available
|
||||
|
||||
### Command handlers (each one)
|
||||
- [ ] .help — prints help text
|
||||
- [ ] .info [subcommand] — displays appropriate info
|
||||
- [ ] .model <name> — switches model
|
||||
- [ ] .prompt <text> — sets temp role
|
||||
- [ ] .role <name> [text] — enters role or one-shot
|
||||
- [ ] .session [name] — starts/resumes session
|
||||
- [ ] .agent <name> [session] [key=value] — starts agent
|
||||
- [ ] .rag [name] — initializes RAG
|
||||
- [ ] .starter [n] — lists or executes conversation starter
|
||||
- [ ] .set <key> <value> — updates setting
|
||||
- [ ] .delete <type> — deletes item
|
||||
- [ ] .exit [type] — exits scope or REPL
|
||||
- [ ] .save role/session [name] — saves to file
|
||||
- [ ] .edit role/session/config/agent-config/rag-docs — opens editor
|
||||
- [ ] .empty session — clears session
|
||||
- [ ] .compress session — compresses session
|
||||
- [ ] .rebuild rag — rebuilds RAG
|
||||
- [ ] .sources rag — shows RAG sources
|
||||
- [ ] .copy — copies last response
|
||||
- [ ] .continue — continues response
|
||||
- [ ] .regenerate — regenerates response
|
||||
- [ ] .file <path> [-- text] — includes files
|
||||
- [ ] .macro <name> [text] — runs/creates macro
|
||||
- [ ] .authenticate — OAuth flow
|
||||
- [ ] .vault <cmd> [name] — vault operations
|
||||
- [ ] .clear todo — clears agent todo
|
||||
|
||||
### ask function (chat flow)
|
||||
- [ ] Input constructed from text
|
||||
- [ ] Embeddings applied if RAG active
|
||||
- [ ] Waits for compression to complete
|
||||
- [ ] before_chat_completion called
|
||||
- [ ] Streaming vs non-streaming based on config
|
||||
- [ ] Tool results loop (recursive ask with merged results)
|
||||
- [ ] after_chat_completion called
|
||||
- [ ] Auto-continuation for agents with todos
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] AssertState::pass() always returns true (all flag combos)
|
||||
- [x] AssertState::bare() only matches empty flags
|
||||
- [x] AssertState::True requires any matching flag present
|
||||
- [x] AssertState::True with multiple flags — any match suffices
|
||||
- [x] AssertState::False requires all specified flags absent
|
||||
- [x] AssertState::False with multiple flags
|
||||
- [x] AssertState::TrueFalse — true present AND false absent
|
||||
- [x] AssertState::Equal — exact flag match
|
||||
- [x] REPL_COMMANDS has exactly 39 entries
|
||||
- [x] All commands start with '.'
|
||||
- [x] All commands have non-empty descriptions
|
||||
- [x] .help, .exit always available (pass)
|
||||
- [x] .info role requires ROLE
|
||||
- [x] .session blocked when already in session
|
||||
- [x] .exit session requires session
|
||||
- [x] .exit agent requires agent
|
||||
- [x] .agent only when bare (no role/session/agent)
|
||||
- [x] .role blocked in session/agent
|
||||
- [x] .prompt blocked in session/agent
|
||||
- [x] .rag blocked in agent
|
||||
- [x] .starter requires agent
|
||||
- [x] .clear todo requires agent
|
||||
- [x] .edit role requires ROLE, blocked in SESSION
|
||||
- [x] .exit rag requires RAG, blocked in AGENT
|
||||
- [x] split_first_arg: None, single word, two words, extra spaces
|
||||
- [x] parse_command: plain text, empty, whitespace, dot only
|
||||
- [x] ReplCommand::is_valid with pass/True/False
|
||||
- [x] Multiline regex: captures content, rejects unclosed, rejects plain text
|
||||
|
||||
## Old code reference
|
||||
- `src/repl/mod.rs` — run_repl_command, ask, REPL_COMMANDS
|
||||
@@ -0,0 +1,67 @@
|
||||
# Test Plan: CLI Flags
|
||||
|
||||
## Feature description
|
||||
|
||||
Loki CLI accepts flags for model, role, session, agent, file input,
|
||||
execution mode, and various info/list commands. Flags determine
|
||||
the execution path through main.rs.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Early-exit flags
|
||||
- [x] --info parsed correctly
|
||||
- [x] --list-models parsed correctly
|
||||
- [x] --list-roles parsed correctly
|
||||
- [x] --list-sessions parsed correctly
|
||||
- [x] --list-agents parsed correctly
|
||||
- [x] --list-rags parsed correctly
|
||||
- [x] --list-macros parsed correctly
|
||||
- [x] --sync-models parsed correctly
|
||||
- [x] --build-tools parsed correctly
|
||||
- [ ] --authenticate runs OAuth and exits (integration)
|
||||
- [ ] --completions generates shell completions and exits (integration)
|
||||
- [x] Vault flags (--add/get/update/delete-secret, --list-secrets) parsed
|
||||
|
||||
### Mode selection
|
||||
- [x] No text/file → text returns None (REPL indicator)
|
||||
- [x] Text provided → text joined and returned
|
||||
- [x] --agent → agent field set
|
||||
- [x] --role → role field set
|
||||
- [x] --execute (-e) → execute flag set
|
||||
- [x] --code (-c) → code flag set
|
||||
- [x] --prompt → prompt field set
|
||||
- [x] --macro → macro_name field set
|
||||
|
||||
### Flag combinations
|
||||
- [x] --model + --role parsed together
|
||||
- [x] --session + --role parsed together
|
||||
- [ ] --session + --agent → agent with session (integration)
|
||||
- [ ] --agent + --agent-variable → variables set (integration)
|
||||
- [x] --dry-run flag parsed
|
||||
- [x] --no-stream (-S) flag parsed
|
||||
- [x] --file + text → both parsed
|
||||
- [x] --empty-session + --session parsed
|
||||
- [x] --save-session + --session parsed
|
||||
|
||||
### Prelude
|
||||
- [ ] apply_prelude runs before main execution (async + filesystem)
|
||||
- [ ] Prelude "role:name" loads role (async + filesystem)
|
||||
- [ ] Prelude "session:name" loads session (async + filesystem)
|
||||
- [ ] Prelude "session:role" loads both (async + filesystem)
|
||||
- [ ] Prelude skipped if macro_flag set (async)
|
||||
- [ ] Prelude skipped if state already has role/session/agent (async)
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] Default Cli has all flags unset/empty
|
||||
- [x] Short flags: -m, -r, -a, -s, -e, -c, -S, -f
|
||||
- [x] Multiple -f flags accumulate
|
||||
- [x] Trailing text args collected as vec
|
||||
- [x] Cli::text() returns None with no args (terminal stdin)
|
||||
- [x] Cli::text() joins trailing args with spaces
|
||||
- [x] --rag flag parsed
|
||||
- [x] --macro flag parsed
|
||||
|
||||
## Old code reference
|
||||
- `src/cli/mod.rs` — Cli struct, flag definitions
|
||||
- `src/main.rs` — run(), flag processing, mode branching
|
||||
@@ -0,0 +1,106 @@
|
||||
# Test Plan: Sub-Agent Spawning
|
||||
|
||||
## Feature description
|
||||
|
||||
Agents with can_spawn_agents=true can spawn child agents that run
|
||||
in parallel as background tokio tasks. Children communicate results
|
||||
back to the parent via collect/check. Escalation allows children
|
||||
to request user input through the parent.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Spawn
|
||||
- [ ] agent__spawn creates child agent in background (requires agent config on disk)
|
||||
- [x] Child gets own RequestContext with incremented depth (new_for_child)
|
||||
- [x] Child starts with empty scope (new_for_child)
|
||||
- [x] Child gets shared root_escalation_queue (new_for_child)
|
||||
- [x] Child gets inbox for teammate messaging (new_for_child)
|
||||
- [x] Child inherits parent_supervisor (new_for_child)
|
||||
- [ ] Child MCP servers acquired if configured (requires live MCP)
|
||||
- [x] Max concurrent agents enforced (Supervisor.register)
|
||||
- [x] Max depth enforced (Supervisor.register)
|
||||
- [ ] Agent not found → error (requires agent config on disk)
|
||||
- [ ] can_spawn_agents=false → no spawn tools available (requires agent init)
|
||||
|
||||
### Collect/Check
|
||||
- [x] agent__check returns PENDING for running agent
|
||||
- [x] agent__check returns error for unknown agent
|
||||
- [ ] agent__collect blocks until done, returns output (requires real child completion)
|
||||
- [ ] Output summarization when exceeds threshold (requires LLM client)
|
||||
- [ ] Summarization uses configured model (requires LLM client)
|
||||
|
||||
### Task queue (handler integration tests)
|
||||
- [x] handle_task_create creates tasks (simple, with deps, with dispatch_agent)
|
||||
- [x] handle_task_create errors when agent set without prompt
|
||||
- [x] handle_task_complete unblocks dependents
|
||||
- [x] handle_task_list shows all tasks
|
||||
- [x] handle_task_fail marks failed and reports blocked dependents
|
||||
- [x] handle_task_fail returns error for missing task
|
||||
|
||||
### Escalation (handler integration tests)
|
||||
- [x] handle_reply_escalation delivers reply via oneshot channel
|
||||
- [x] handle_reply_escalation errors for missing escalation_id
|
||||
- [x] handle_reply_escalation errors when no queue
|
||||
- [x] Pending summary contains correct fields
|
||||
- [x] Reply reaches receiver via oneshot channel
|
||||
- [ ] Escalation timeout → fallback message (requires tokio timeout)
|
||||
|
||||
### Teammate messaging (handler integration tests)
|
||||
- [x] handle_send_message delivers to registered agent's inbox
|
||||
- [x] handle_send_message errors for unknown agent
|
||||
- [x] handle_check_inbox returns messages with count
|
||||
- [x] handle_check_inbox returns empty when no inbox
|
||||
- [x] handle_check_inbox returns empty for empty inbox
|
||||
|
||||
### Cancel/List (handler integration tests)
|
||||
- [x] handle_list returns empty for fresh supervisor
|
||||
- [x] handle_list returns registered agents
|
||||
- [x] handle_list errors when no supervisor
|
||||
- [x] handle_cancel removes agent and signals abort
|
||||
- [x] handle_cancel errors for unknown agent
|
||||
- [x] handle_cancel errors when no supervisor
|
||||
|
||||
### Dispatch routing
|
||||
- [x] Unknown action → error with "Unknown supervisor action"
|
||||
- [x] agent__list routes to handle_list
|
||||
- [x] agent__task_list routes to handle_task_list
|
||||
|
||||
### Child agent lifecycle
|
||||
- [ ] run_child_agent loops (requires LLM client)
|
||||
- [ ] Child uses before/after_chat_completion (requires LLM client)
|
||||
- [ ] Child tool calls evaluated (requires LLM client)
|
||||
- [ ] Child exits cleanly (requires LLM client)
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Parent spawns child with MCP (requires live MCP + agent config)
|
||||
- [ ] Parent exits agent → all children cancelled (requires agent init)
|
||||
- [x] Multiple children share escalation queue (new_for_child + ensure_root_escalation_queue)
|
||||
|
||||
## Additional behaviors tested (not in original plan)
|
||||
|
||||
- [x] EscalationQueue: default, submit, take, take_nonexistent, has_pending
|
||||
- [x] EscalationQueue: pending_summary with/without options, empty
|
||||
- [x] EscalationQueue: reply via oneshot channel
|
||||
- [x] new_escalation_id: prefix and uniqueness
|
||||
- [x] Inbox: new/default empty, deliver+drain, drain empties, multiple deliveries
|
||||
- [x] Inbox: clone preserves messages, clone is independent
|
||||
- [x] Supervisor: new defaults, register count, take removes, take nonexistent
|
||||
- [x] Supervisor: inbox accessor, list_agents, task_queue accessible
|
||||
- [x] Supervisor: register allows at max_depth boundary
|
||||
- [x] AgentExitStatus: equality/inequality
|
||||
- [x] TaskQueue: fail sets status, get missing returns None
|
||||
- [x] TaskQueue: dispatch_agent/prompt stored, claim blocked fails
|
||||
- [x] TaskQueue: list sorted by id, default empty
|
||||
- [x] TaskQueue: dependency on nonexistent errors, complete nonexistent
|
||||
- [x] TaskNode: is_runnable when pending+unblocked, not when blocked
|
||||
|
||||
## Integration handler tests added
|
||||
|
||||
- [x] All handle_* functions tested via handler integration tests (36 tests)
|
||||
- [x] new_for_child: depth, id, inbox, escalation queue, parent supervisor, empty scope
|
||||
- [x] ensure_root_escalation_queue: lazy init, same Arc on repeated calls
|
||||
- [x] AppState::test_default() helper added for cross-module test construction
|
||||
|
||||
## Old code reference
|
||||
- `src/function/supervisor.rs` — all handler functions
|
||||
- `src/supervisor/` — Supervisor, EscalationQueue, Inbox, TaskQueue
|
||||
@@ -0,0 +1,33 @@
|
||||
# Test Plan: RAG
|
||||
|
||||
## Behaviors to test
|
||||
- [ ] Rag::init creates new RAG with embedding model (requires LLM client)
|
||||
- [ ] Rag::load loads existing RAG from disk (requires filesystem)
|
||||
- [ ] Rag::create builds vector store from documents (requires embedding model)
|
||||
- [ ] Rag::refresh_document_paths updates document list (requires filesystem)
|
||||
- [ ] RAG search returns relevant embeddings (requires embedding model)
|
||||
- [x] RAG template contains required placeholders
|
||||
- [ ] Reranker model applied when configured (requires LLM client)
|
||||
- [ ] top_k controls number of results (requires embedding model)
|
||||
- [ ] RAG sources tracked for .sources command (requires full Rag struct)
|
||||
- [x] exit_rag clears RAG from context (tested in iteration 8)
|
||||
|
||||
## Additional behaviors tested
|
||||
|
||||
- [x] DocumentId: new/split round-trip, zero/zero, large values
|
||||
- [x] DocumentId: Debug format ("file-doc"), equality, inequality, ordering
|
||||
- [x] RagDocument: new with content, default empty
|
||||
- [x] RagData: new sets all defaults, empty collections
|
||||
- [x] RagData::get: returns document, None for missing file, None for missing doc index
|
||||
- [x] RagData::del: removes files + associated vectors, noop for nonexistent
|
||||
- [x] RagData::add: inserts files, vectors, updates next_file_id
|
||||
- [x] RagData::build_bm25: empty data returns no results
|
||||
- [x] RagData::build_bm25: finds documents by keyword (BM25 ranking)
|
||||
- [x] RAG_TEMPLATE: contains __CONTEXT__, __SOURCES__, __INPUT__
|
||||
- [x] get_separators: Rust/Python/Markdown return language-specific
|
||||
- [x] get_separators: unknown extension returns defaults
|
||||
- [x] get_separators: all 22 known extensions have language-specific separators
|
||||
|
||||
## Old code reference
|
||||
- `src/rag/mod.rs` — Rag struct and methods
|
||||
- `src/config/request_context.rs` — use_rag, edit_rag_docs, rebuild_rag
|
||||
@@ -0,0 +1,35 @@
|
||||
# Test Plan: Tab Completion and Prompt
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Tab completion (repl_complete)
|
||||
- [ ] .role<TAB> → role names (no hidden files)
|
||||
- [ ] .agent<TAB> → agent names (no .shared)
|
||||
- [ ] .session<TAB> → session names
|
||||
- [ ] .rag<TAB> → RAG names
|
||||
- [ ] .macro<TAB> → macro names
|
||||
- [ ] .model<TAB> → model names with descriptions
|
||||
- [ ] .set <TAB> → setting keys (sorted)
|
||||
- [ ] .set temperature <TAB> → current value suggestions
|
||||
- [ ] .set enabled_tools <TAB> → tool names (no internal tools)
|
||||
- [ ] .set enabled_mcp_servers <TAB> → configured servers + aliases
|
||||
- [ ] .delete <TAB> → type names
|
||||
- [ ] .vault <TAB> → subcommands
|
||||
- [ ] .agent <name> <TAB> → session names for that agent
|
||||
- [ ] Fuzzy filtering applied to all completions
|
||||
|
||||
### Prompt rendering
|
||||
- [ ] Left prompt shows role/session/agent name
|
||||
- [ ] Right prompt shows model name
|
||||
- [ ] Prompt updates after scope transitions
|
||||
- [ ] Multi-line indicator shown during ::: input
|
||||
|
||||
## Status
|
||||
Most completion logic requires filesystem access for role/session/agent lists.
|
||||
The `split_line` function has existing tests. Prompt rendering methods are trivial
|
||||
wrappers around stored strings. Low additional unit-test yield.
|
||||
|
||||
## Old code reference
|
||||
- `src/config/request_context.rs` — repl_complete
|
||||
- `src/repl/completer.rs` — ReplCompleter (split_line already tested)
|
||||
- `src/repl/prompt.rs` — ReplPrompt
|
||||
@@ -0,0 +1,24 @@
|
||||
# Test Plan: Macros
|
||||
|
||||
## Behaviors to test
|
||||
- [ ] Macro loaded from YAML file (requires filesystem)
|
||||
- [ ] Macro steps executed sequentially (requires async + RequestContext)
|
||||
- [ ] Each step runs through run_repl_command (requires async)
|
||||
- [x] Variable interpolation in macro steps
|
||||
- [ ] Built-in macros installed on first run (requires filesystem)
|
||||
- [ ] macro_execute creates isolated RequestContext (requires async)
|
||||
- [ ] Macro context inherits tool scope from parent (requires async)
|
||||
- [ ] Macro context has macro_flag set (requires async)
|
||||
|
||||
## Additional behaviors tested
|
||||
|
||||
- [x] resolve_variables: no variables, required provided, required missing errors
|
||||
- [x] resolve_variables: default used, default overridden
|
||||
- [x] resolve_variables: rest captures remaining args, rest with default
|
||||
- [x] resolve_variables: multiple variables mixed
|
||||
- [x] usage: no variables, required, optional, rest, rest+default, mixed
|
||||
- [x] interpolate_command: single, multiple, no vars, missing var passthrough
|
||||
- [x] YAML deserialization: with variables, with defaults, no variables
|
||||
|
||||
## Old code reference
|
||||
- `src/config/macros.rs` — macro_execute, Macro struct
|
||||
@@ -0,0 +1,25 @@
|
||||
# Test Plan: Vault
|
||||
|
||||
## Behaviors to test
|
||||
- [ ] Vault add stores encrypted secret (requires terminal + password file)
|
||||
- [ ] Vault get decrypts and returns secret (requires password file)
|
||||
- [ ] Vault update replaces secret value (requires terminal + password file)
|
||||
- [ ] Vault delete removes secret (requires password file)
|
||||
- [ ] Vault list shows all secret names (requires password file)
|
||||
- [ ] Secrets interpolated in MCP config (mcp.json) (requires Vault with secrets)
|
||||
- [ ] Missing secrets produce warning during MCP init (requires Vault)
|
||||
- [x] Vault accessible from CLI (flag parsing tested in iteration 10)
|
||||
- [ ] Vault accessible from REPL (.vault commands) (requires REPL infra)
|
||||
|
||||
## Additional behaviors tested
|
||||
|
||||
- [x] SECRET_RE matches {{DOUBLE_BRACES}}
|
||||
- [x] SECRET_RE matches with surrounding text
|
||||
- [x] SECRET_RE does not match {SINGLE_BRACES}
|
||||
- [x] SECRET_RE does not match plain text
|
||||
- [x] SECRET_RE matches with spaces inside braces
|
||||
- [x] Vault::default() creates instance with no password file
|
||||
|
||||
## Old code reference
|
||||
- `src/vault/mod.rs` — GlobalVault, operations
|
||||
- `src/mcp/mod.rs` — interpolate_secrets
|
||||
@@ -0,0 +1,57 @@
|
||||
# Test Plan: Functions and Tools
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Function declarations
|
||||
- [ ] Functions::init loads from visible_tools config
|
||||
- [ ] Tool declarations parsed from bash scripts (argc annotations)
|
||||
- [ ] Tool declarations parsed from python scripts (docstrings)
|
||||
- [ ] Tool declarations parsed from typescript (JSDoc + type inference)
|
||||
- [ ] Each declaration has name, description, parameters
|
||||
- [ ] Agent tools loaded via Functions::init_agent
|
||||
- [ ] Global tools loaded via build_global_tool_declarations
|
||||
|
||||
### Tool compilation
|
||||
- [ ] Bash tools compiled to bin directory
|
||||
- [ ] Python tools compiled to bin directory
|
||||
- [ ] TypeScript tools compiled to bin directory
|
||||
- [ ] clear_agent_bin_dir removes old binaries
|
||||
- [ ] Tool file priority: .sh > .py > .ts > .js
|
||||
|
||||
### User interaction functions
|
||||
- [ ] append_user_interaction_functions adds user__ask/confirm/input/checkbox
|
||||
- [ ] Only appended in REPL mode
|
||||
- [ ] User interaction tools work at depth 0 (direct prompt)
|
||||
- [ ] User interaction tools escalate at depth > 0
|
||||
|
||||
### MCP meta functions
|
||||
- [ ] append_mcp_meta_functions adds invoke/search/describe per server
|
||||
- [ ] Meta functions removed when ToolScope rebuilt without those servers
|
||||
- [ ] Function names follow mcp_invoke_<server> pattern
|
||||
|
||||
### Function selection
|
||||
- [ ] select_functions filters by role's enabled_tools
|
||||
- [ ] "all" enables everything
|
||||
- [ ] Specific tool names enabled selectively
|
||||
- [ ] mapping_tools aliases resolved
|
||||
- [ ] Agent functions included when agent active
|
||||
- [ ] MCP meta functions included when servers active
|
||||
|
||||
## Status
|
||||
- Function declarations, append methods, find/contains tested in iteration 6
|
||||
- MCP meta functions tested in iterations 5-7
|
||||
- Function selection tested in iteration 7
|
||||
- User interaction functions tested in iterations 6-7
|
||||
- Python parser: extensive existing tests (400+ lines)
|
||||
- TypeScript parser: extensive existing tests (400+ lines)
|
||||
- parsers::common::underscore tested in iteration 13
|
||||
- Functions::init and tool compilation require filesystem
|
||||
|
||||
## Additional behaviors tested
|
||||
|
||||
- [x] parsers::common::underscore: simple, dashes, spaces, special chars, consecutive, leading/trailing, uppercase, mixed
|
||||
|
||||
## Old code reference
|
||||
- `src/function/mod.rs` — Functions struct, init, init_agent
|
||||
- `src/config/paths.rs` — agent_functions_file (priority)
|
||||
- `src/parsers/` — bash, python, typescript parsers
|
||||
Reference in New Issue
Block a user