docs: Documentation for the RESTful API POC

2026-05-01 14:45:13 -06:00
parent ca03f6f9d7
commit 7ea3044a37
73 changed files with 2 additions and 2 deletions
@@ -0,0 +1,52 @@
+# Iteration 1 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/01-config-and-appconfig.md`
+
+## Tests created
+
+| File | Test name | What it verifies |
+|---|---|---|
+| `src/config/mod.rs` | `config_defaults_match_expected` | All Config::default() fields match old code values |
+| `src/config/app_config.rs` | `to_app_config_copies_serialized_fields` | to_app_config copies model_id, temperature, top_p, dry_run, stream, save, highlight, compression_threshold, rag_top_k |
+| `src/config/app_config.rs` | `to_app_config_copies_clients` | clients field populated (empty by default) |
+| `src/config/app_config.rs` | `to_app_config_copies_mapping_fields` | mapping_tools and mapping_mcp_servers copied correctly |
+| `src/config/app_config.rs` | `editor_returns_configured_value` | editor() returns configured value |
+| `src/config/app_config.rs` | `editor_falls_back_to_env` | editor() doesn't panic without config |
+| `src/config/app_config.rs` | `light_theme_default_is_false` | light_theme() default |
+| `src/config/app_config.rs` | `sync_models_url_has_default` | sync_models_url() has non-empty default |
+| `src/config/request_context.rs` | `to_request_context_creates_clean_state` | RequestContext starts with clean state (no role/session/agent, empty tool_scope, no agent_runtime) |
+| `src/config/request_context.rs` | `update_app_config_persists_changes` | Dynamic config updates via clone-mutate-replace persist |
+
+**Total: 10 new tests (59 → 69)**
+
+## Bugs discovered
+
+None. The `save` default was `false` in both old and new code
+(my plan file incorrectly said `true` — corrected).
+
+## Observations for future iterations
+
+1. The `Config::default().save` is `false`, but the plan file
+   01 incorrectly listed it as `true`. Plan file should be
+   updated to reflect the actual default.
+
+2. `AppConfig::default()` doesn't exist natively (no derive).
+   Tests construct it via `Config::default().to_app_config()`.
+   This is fine since that's how it's created in production.
+
+3. The `visible_tools` field computation happens during
+   `Config::init` (not `to_app_config`). Testing the full
+   visible_tools resolution requires integration-level testing
+   with actual tool files. Deferred to plan file 16
+   (functions-and-tools).
+
+4. Testing `Config::init` directly is difficult because it reads
+   from the filesystem, starts MCP servers, etc. The unit tests
+   focus on the conversion paths which are the Phase 1 surface.
+
+## Next iteration
+
+Plan file 02: Roles — role loading, retrieve_role, use_role/exit_role,
+use_prompt, extract_role, one-shot role messages, MCP context switching.
@@ -0,0 +1,86 @@
+# Iteration 10 — Test Implementation Notes
+
+## Plan files addressed
+
+- `docs/testing/plans/09-repl-commands.md` (completed in same session)
+- `docs/testing/plans/10-cli-flags.md`
+
+## Tests created
+
+### src/config/mod.rs (8 new tests — iteration 9)
+
+AssertState::assert tests for all 4 variants + pass/bare.
+
+### src/repl/mod.rs (31 new tests — iteration 9)
+
+REPL_COMMANDS array validation, command state assertions for 13
+specific commands, parse_command edge cases, split_first_arg,
+ReplCommand::is_valid, multiline regex.
+
+### src/cli/mod.rs (31 new tests — iteration 10)
+
+| Test name | What it verifies |
+|---|---|
+| `parse_no_args_defaults` | All flags default unset |
+| `parse_model_flag` | --model value |
+| `parse_model_short_flag` | -m value |
+| `parse_role_flag` | --role value |
+| `parse_session_with_name` | --session value |
+| `parse_agent_flag` | --agent value |
+| `parse_agent_short_flag` | -a value |
+| `parse_execute_flag` | -e flag |
+| `parse_code_flag` | -c flag |
+| `parse_no_stream_flag` | -S flag |
+| `parse_dry_run_flag` | --dry-run flag |
+| `parse_info_flag` | --info flag |
+| `parse_list_flags` | All 6 --list-* flags |
+| `parse_file_flag_single` | Single -f |
+| `parse_file_flag_multiple` | Multiple -f accumulate |
+| `parse_trailing_text` | Trailing args as text vec |
+| `parse_prompt_flag` | --prompt value |
+| `parse_empty_session_flag` | --empty-session flag |
+| `parse_save_session_flag` | --save-session flag |
+| `parse_build_tools_flag` | --build-tools flag |
+| `parse_sync_models_flag` | --sync-models flag |
+| `parse_model_with_role` | -m + -r combined |
+| `parse_agent_with_file_and_text` | -a + -f + text combined |
+| `parse_role_with_session` | -r + -s combined |
+| `cli_text_returns_none_when_no_text_no_stdin` | No input → None |
+| `cli_text_joins_trailing_args` | Args joined with spaces |
+| `parse_add_secret_flag` | --add-secret value |
+| `parse_get_secret_flag` | --get-secret value |
+| `parse_list_secrets_flag` | --list-secrets flag |
+| `parse_rag_flag` | --rag value |
+| `parse_macro_flag` | --macro value |
+
+**Total: 70 new tests across iterations 9+10 (342 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **Clap parsing is fully testable**: Using `try_parse_from` with
+   synthetic arg arrays, all flag parsing and combinations can be
+   verified without running the actual binary.
+
+2. **Cli::text() has stdin dependency**: When stdin is not a
+   terminal, it reads from stdin. This branch can't be easily
+   unit-tested. The terminal-detection branch (no stdin) is tested.
+
+3. **Prelude is async + filesystem**: apply_prelude needs real role
+   and session files. Deferred to integration tests.
+
+4. **Mode selection is runtime behavior**: The actual mode branching
+   (REPL vs CMD) happens in main.rs based on parsed flags. Testing
+   the flag parsing verifies the inputs to that branching logic.
+
+5. **Exclusive flags**: Vault flags (--add-secret, --get-secret,
+   etc.) are marked `exclusive = true` in clap, meaning they
+   can't be combined with other args. This is enforced by clap.
+
+## Next iteration
+
+Plan file 11: Sub-Agent Spawning — supervisor, child agents,
+escalation, messaging.
@@ -0,0 +1,159 @@
+# Iteration 11 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/11-sub-agent-spawning.md`
+
+## Tests created
+
+### src/supervisor/escalation.rs (11 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `queue_default_has_no_pending` | Default queue empty |
+| `submit_and_has_pending` | Submit makes has_pending true |
+| `submit_returns_id` | Returns the request's id |
+| `take_removes_request` | Take removes and empties queue |
+| `take_nonexistent_returns_none` | Missing id → None |
+| `pending_summary_contains_fields` | Summary has id, agent_id, question |
+| `pending_summary_includes_options_when_present` | Options included |
+| `pending_summary_empty_when_no_requests` | Empty queue → empty summary |
+| `reply_reaches_receiver` | oneshot channel delivers reply |
+| `new_escalation_id_has_prefix` | Starts with "esc_" |
+| `new_escalation_id_unique` | Two calls produce different ids |
+
+### src/supervisor/mailbox.rs (8 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `inbox_new_is_empty` | New inbox drains empty |
+| `inbox_default_is_empty` | Default inbox drains empty |
+| `deliver_and_drain` | Deliver + drain returns message |
+| `drain_empties_inbox` | Second drain returns empty |
+| `drain_orders_shutdown_before_task_before_text` | Priority ordering |
+| `clone_preserves_messages` | Clone has same messages |
+| `clone_is_independent` | Clone doesn't share mutations |
+| `multiple_deliveries` | 5 messages all drained |
+
+### src/supervisor/mod.rs (12 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `supervisor_new_empty` | Initial state: 0 active, correct limits |
+| `supervisor_register_increments_count` | Register increases active_count |
+| `supervisor_register_rejects_at_capacity` | At max → error with "at capacity" |
+| `supervisor_register_rejects_exceeding_depth` | Over max_depth → error |
+| `supervisor_register_allows_at_max_depth` | Exactly max_depth → ok |
+| `supervisor_take_removes_handle` | Take decrements count |
+| `supervisor_take_nonexistent_returns_none` | Missing → None |
+| `supervisor_list_agents` | Lists all registered agent ids/names |
+| `supervisor_inbox_returns_handle_inbox` | Inbox accessor works |
+| `supervisor_task_queue_accessible` | task_queue/task_queue_mut work |
+| `agent_exit_status_equality` | Completed == Completed, != Failed |
+
+### src/supervisor/taskqueue.rs (10 new tests, 16 total)
+
+| Test name | What it verifies |
+|---|---|
+| `test_fail_sets_status` | fail() sets TaskStatus::Failed |
+| `test_get_returns_none_for_missing` | get() on nonexistent → None |
+| `test_dispatch_agent_stored` | dispatch_agent and prompt captured |
+| `test_claim_blocked_task_fails` | Can't claim blocked task |
+| `test_list_sorted_by_id` | list() returns numeric order |
+| `test_default_is_empty` | TaskQueue::default() empty |
+| `test_dependency_on_nonexistent_task_errors` | Bad dep → error |
+| `test_complete_nonexistent_returns_empty` | Complete unknown → empty |
+| `test_task_node_is_runnable` | Pending + unblocked = runnable |
+| `test_task_node_not_runnable_when_blocked` | Blocked = not runnable |
+
+### src/function/supervisor.rs (36 new handler integration tests)
+
+| Test name | What it verifies |
+|---|---|
+| `handle_list_empty_supervisor` | Empty supervisor → 0 active, empty agents |
+| `handle_list_with_agents` | Registered agents appear in list |
+| `handle_list_no_supervisor_errors` | No supervisor → error |
+| `handle_check_unknown_agent` | Check unknown → error status |
+| `handle_check_pending_agent` | Check running agent → pending status |
+| `handle_cancel_registered_agent` | Cancel removes and signals abort |
+| `handle_cancel_unknown_agent` | Cancel unknown → error status |
+| `handle_cancel_no_supervisor_errors` | No supervisor → error |
+| `handle_send_message_to_registered_agent` | Message delivered to inbox |
+| `handle_send_message_to_unknown_agent` | Unknown agent → error status |
+| `handle_check_inbox_with_messages` | Inbox drains messages with count |
+| `handle_check_inbox_no_inbox` | No inbox → count 0 |
+| `handle_check_inbox_empty_inbox` | Empty inbox → count 0 |
+| `handle_reply_escalation_success` | Reply delivered via oneshot |
+| `handle_reply_escalation_missing_id` | Missing id → error status |
+| `handle_reply_escalation_no_queue_errors` | No queue → error |
+| `handle_task_create_simple` | Simple task created with id |
+| `handle_task_create_with_dependencies` | Task with blocked_by |
+| `handle_task_create_with_dispatch_agent` | Auto-dispatch flag set |
+| `handle_task_create_agent_without_prompt_errors` | Agent without prompt → error |
+| `handle_task_list_empty` | Empty queue → empty tasks array |
+| `handle_task_list_with_tasks` | Tasks listed |
+| `handle_task_complete_unblocks_dependents` | Complete unblocks with newly_runnable |
+| `handle_task_fail_marks_failed` | Fail sets status |
+| `handle_task_fail_reports_blocked_dependents` | Reports blocked deps |
+| `handle_task_fail_missing_task` | Missing task → error status |
+| `dispatch_unknown_action_errors` | Unknown action → error |
+| `dispatch_routes_list` | agent__list → handle_list |
+| `dispatch_routes_task_list` | agent__task_list → handle_task_list |
+| `new_for_child_inherits_escalation_queue` | Shared Arc |
+| `new_for_child_sets_depth_and_id` | Depth and self_agent_id |
+| `new_for_child_has_inbox` | Shared inbox Arc |
+| `new_for_child_inherits_parent_supervisor` | parent_supervisor set |
+| `new_for_child_starts_with_empty_scope` | Empty functions, mcp, role, session |
+| `ensure_root_escalation_queue_creates_on_first_call` | Lazy init |
+| `ensure_root_escalation_queue_returns_same_on_second_call` | Same Arc |
+
+### Infrastructure
+
+- Added `AppState::test_default()` method for cross-module test construction
+- Refactored `input.rs` and `request_context.rs` test helpers to use `test_default()`
+
+**Total: 76 new tests (418 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **Supervisor.register enforces both capacity and depth**: These
+   are the two runaway safeguards. Both tested at boundaries
+   (at capacity, at max_depth, over max_depth).
+
+2. **EscalationQueue uses oneshot channels**: The reply_tx/rx pair
+   enables async blocking-wait semantics for child agents. The
+   channel delivery is verified end-to-end in the test.
+
+3. **Inbox drain ordering is a priority system**: Shutdown messages
+   come first, then task completions, then text. This ensures
+   lifecycle-critical messages aren't buried under chat.
+
+4. **AgentHandle requires a tokio JoinHandle**: Creating test
+   handles requires a tokio runtime. Used `rt.spawn()` with
+   `mem::forget(rt)` to keep the handle alive. This is a test-only
+   pattern — not ideal but necessary since JoinHandle can't be
+   mocked.
+
+5. **handle_spawn requires real agent config on disk**: This is the
+   only handler that calls Agent::init. All other handlers (list,
+   check, cancel, messaging, tasks, escalation) work with just a
+   RequestContext + Supervisor, which we can construct in tests.
+
+6. **Handler integration tests cover the full dispatch chain**: The
+   tests call handler functions with real RequestContext instances
+   containing real Supervisor/EscalationQueue/Inbox instances. This
+   verifies the JSON arg parsing, supervisor interactions, and
+   response formatting all at once.
+
+7. **AppState::test_default() centralizes test construction**: Added
+   a `#[cfg(test)]` constructor that avoids importing private
+   modules (mcp_factory, rag_cache) from outside the config module.
+
+## Next iteration
+
+Plan file 12: RAG — RAG init/load/search, embeddings, document
+management.
@@ -0,0 +1,71 @@
+# Iteration 12 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/12-rag.md`
+
+## Tests created
+
+### src/rag/mod.rs (22 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `document_id_round_trip` | new(5,17) → split → (5,17) |
+| `document_id_zero_zero` | new(0,0) → split → (0,0) |
+| `document_id_large_values` | new(1000,9999) round-trips |
+| `document_id_debug_format` | Debug produces "3-7" format |
+| `document_id_equality` | Same file+doc → equal |
+| `document_id_inequality` | Different doc → not equal |
+| `document_id_ordering` | (0,1) < (1,0) |
+| `rag_document_new` | Sets page_content, empty metadata |
+| `rag_document_default` | Empty content and metadata |
+| `rag_data_new_defaults` | All fields set correctly |
+| `rag_data_get_returns_document` | Gets by file+doc index |
+| `rag_data_get_returns_none_for_missing_file` | Missing file → None |
+| `rag_data_get_returns_none_for_missing_document` | Missing doc index → None |
+| `rag_data_del_removes_files_and_vectors` | Del removes both |
+| `rag_data_del_nonexistent_is_noop` | Del missing → noop |
+| `rag_data_add_inserts_files_and_vectors` | Add inserts files+vectors, updates next_file_id |
+| `rag_template_contains_placeholders` | __CONTEXT__, __SOURCES__, __INPUT__ present |
+| `get_separators_returns_language_specific` | rs/py/md have language separators |
+| `get_separators_unknown_returns_defaults` | xyz → DEFAULT_SEPARATORS |
+| `get_separators_all_known_extensions` | All 22 known extensions differ from defaults |
+| `rag_data_build_bm25_empty` | Empty data → no search results |
+| `rag_data_build_bm25_finds_documents` | BM25 finds "rust" in first doc |
+
+**Total: 22 new tests (440 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **Rag struct can't be constructed without an embedding model**:
+   Rag::init requires prompting the user for model selection,
+   Rag::load requires a YAML file on disk, and Rag::create
+   requires pre-built RagData with vectors. All RAG lifecycle
+   operations are I/O-bound.
+
+2. **DocumentId uses bit packing**: file_index in the upper half,
+   document_index in the lower half of a usize. This is tested
+   with round-trip, zero, and large-value cases.
+
+3. **RagData operations (get/del/add) are fully testable**: These
+   are pure data structure operations that don't need I/O. The
+   BM25 search engine can also be built and queried in tests.
+
+4. **The text splitter already has comprehensive tests**: 5 existing
+   tests cover split_text, create_documents, chunk headers,
+   markdown splitting, and HTML splitting. No additional splitter
+   tests needed.
+
+5. **get_separators covers 22 language extensions**: All are
+   verified to return language-specific separators rather than
+   defaults. This ensures the splitter uses appropriate chunk
+   boundaries for each language.
+
+## Next iteration
+
+Plan file 13: Completions and Prompt — tab completion, prompt
+rendering, highlighter.
@@ -0,0 +1,107 @@
+# Iteration 13 — Test Implementation Notes
+
+## Plan files addressed
+
+- `docs/testing/plans/12-rag.md` (completed in same session)
+- `docs/testing/plans/13-completions-and-prompt.md`
+- `docs/testing/plans/14-macros.md`
+- `docs/testing/plans/15-vault.md`
+- `docs/testing/plans/16-functions-and-tools.md`
+
+## Tests created
+
+### src/rag/mod.rs (22 new tests — iteration 12)
+
+DocumentId round-trip/equality/ordering/debug, RagDocument new/default,
+RagData new/get/del/add/build_bm25, RAG_TEMPLATE placeholders,
+get_separators language mapping.
+
+### src/config/macros.rs (21 new tests — iteration 13)
+
+| Test name | What it verifies |
+|---|---|
+| `resolve_no_variables` | Empty vars → empty output |
+| `resolve_required_variable_provided` | Arg maps to variable |
+| `resolve_required_variable_missing_errors` | Missing required → error |
+| `resolve_default_variable_uses_default` | Default used when no arg |
+| `resolve_default_variable_overridden` | Arg overrides default |
+| `resolve_rest_variable_captures_all_remaining` | Rest joins remaining args |
+| `resolve_rest_variable_with_default` | Rest default used |
+| `resolve_multiple_variables` | Mixed required + default |
+| `usage_no_variables` | Just macro name |
+| `usage_required_variable` | <name> format |
+| `usage_optional_variable` | [name] format |
+| `usage_rest_variable` | <name>... format |
+| `usage_rest_with_default` | [name]... format |
+| `usage_mixed_variables` | Mixed format |
+| `interpolate_replaces_variables` | {{name}} → value |
+| `interpolate_multiple_variables` | Multiple replacements |
+| `interpolate_no_variables_passthrough` | No vars → unchanged |
+| `interpolate_variable_not_found_left_as_is` | Missing var → {{name}} kept |
+| `deserialize_macro_from_yaml` | Full YAML with steps + variables |
+| `deserialize_macro_with_defaults` | Variables with defaults + rest |
+| `deserialize_macro_no_variables` | Steps only, empty vars default |
+
+### src/vault/mod.rs (6 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `secret_re_matches_double_braces` | {{MY_SECRET}} captured |
+| `secret_re_matches_with_surrounding_text` | Captures in context |
+| `secret_re_no_match_single_braces` | {NOT} not matched |
+| `secret_re_no_match_plain_text` | No match for plain text |
+| `secret_re_matches_with_spaces` | {{ SPACED }} captured |
+| `vault_default_creates_instance` | Default has no password file |
+
+### src/parsers/common.rs (8 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `underscore_simple` | No-op for simple names |
+| `underscore_dashes_to_underscores` | my-func → my_func |
+| `underscore_spaces_to_underscores` | my func → my_func |
+| `underscore_special_chars_removed` | @! → _ |
+| `underscore_consecutive_specials_collapsed` | --- → single _ |
+| `underscore_leading_trailing_stripped` | -name- → name |
+| `underscore_uppercase_lowered` | MyFunc → myfunc |
+| `underscore_mixed` | Get-User Info → get_user_info |
+
+**Total: 57 new tests across iterations 12+13 (475 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations
+
+1. **Macro::resolve_variables has 3 variable modes**: required
+   (no default), optional (with default), and rest (captures
+   remaining args). All three modes tested with multiple
+   combinations.
+
+2. **Macro::interpolate_command is a simple string replacement**:
+   {{key}} → value. Missing keys are left as-is (no error),
+   which is the correct behavior for gradual interpolation.
+
+3. **SECRET_RE uses fancy_regex**: The `{{(.+)}}` pattern requires
+   double braces. Single braces don't match, which prevents false
+   positives on JSON-like content.
+
+4. **Vault operations all require terminal interaction or password
+   file**: add_secret and update_secret prompt for passwords via
+   inquire. get_secret/delete_secret/list_secrets need a tokio
+   runtime + password file. These are integration-test territory.
+
+5. **parsers::common::underscore is more than s/-/_/**: It lowercases,
+   replaces all non-alphanumeric chars with _, collapses consecutive
+   underscores, and strips leading/trailing underscores. Thorough
+   edge cases tested.
+
+6. **Python and TypeScript parsers have excellent existing test
+   suites**: ~400 lines of tests each covering declaration parsing,
+   type inference, docstring extraction. No additional tests needed.
+
+## Final summary
+
+All 16 plan files have been addressed across iterations 1-13.
+475 total tests, all passing, 0 errors.
@@ -0,0 +1,100 @@
+# Iteration 14 — Integration Test Implementation Notes
+
+## Focus
+
+Filesystem-based integration tests (Tier 1 + Tier 2) for behaviors
+that were previously untestable without real config directories.
+
+## Infrastructure changes
+
+1. **Added `serial_test` dev-dependency** — Env-var-based config dir
+   isolation (`TestConfigDirGuard`) requires serialization to prevent
+   parallel test races. All 25 tests using `TestConfigDirGuard` now
+   use `#[serial]`.
+
+2. **Added `src/test_helpers.rs`** — Shared test utilities module
+   (`#[cfg(test)]`) with `TestConfigDirGuard`, `default_app_state`,
+   `create_test_ctx`, and `run_async` helpers, available to all
+   modules. Not yet used by all modules (existing module-local
+   helpers kept for backward compatibility).
+
+## Tests created
+
+### src/config/request_context.rs (17 new integration tests)
+
+| Test name | What it verifies |
+|---|---|
+| `retrieve_role_from_markdown_file` | Writes .md file, retrieves role with correct name/prompt |
+| `retrieve_role_builtin_exists` | Built-in roles retrievable |
+| `retrieve_role_nonexistent_errors` | Unknown role → error |
+| `retrieve_role_no_model_id_inherits_current_model` | No model_id → uses current model |
+| `list_roles_finds_markdown_files` | .md files listed, .txt ignored |
+| `list_roles_empty_dir` | Empty roles dir → empty list |
+| `session_new_from_ctx_captures_state` | Name captured, starts empty |
+| `session_save_creates_file` | Save creates YAML file on disk |
+| `use_session_errors_when_already_in_session` | Double session → error |
+| `use_session_creates_temp_session` | None → temp session |
+| `use_session_creates_named_session` | Name → named session |
+| `exit_session_roundtrip` | use_session → exit_session → None |
+| `use_role_obj_and_exit_role_full_cycle` | Set role → exit → None |
+| `use_role_obj_twice_replaces_role` | Second role replaces first |
+| `list_macros_finds_yaml_files` | .yaml macro files listed |
+| `list_rags_finds_yaml_files` | .yaml RAG files listed |
+| `list_rags_empty_dir` | Empty RAGs dir → empty list |
+
+### src/config/input.rs (5 new integration tests)
+
+| Test name | What it verifies |
+|---|---|
+| `from_files_loads_single_text_file` | File content + text combined |
+| `from_files_loads_multiple_files` | Multiple files all loaded |
+| `from_files_with_no_paths_just_text` | No files → just text |
+| `from_files_with_external_command` | Backtick command executed |
+| `from_files_nonexistent_file_errors` | Missing file → error |
+
+### Serialization fixes (6 existing tests)
+
+Added `#[serial]` to all `rebuild_tool_scope_*` tests to prevent
+env-var race conditions with filesystem integration tests.
+
+**Total: 22 new tests (497 total in suite)**
+
+## Bugs discovered
+
+1. **Test parallelism race condition with env vars**: The
+   `TestConfigDirGuard` sets a process-global env var. When tests
+   run in parallel, two guards stomp each other's values. Fixed
+   by adding `serial_test` crate and `#[serial]` attribute to all
+   filesystem-dependent tests.
+
+## Observations
+
+1. **Session loading from disk requires Model::retrieve_model**:
+   `Session::load_from_ctx` calls `Model::retrieve_model` to
+   resolve the session's model_id. Without a valid model provider
+   config, this fails. Session loading tests are limited to
+   `new_from_ctx` (creation) and `save` (serialization).
+
+2. **use_session with empty session prompts user**: The Confirm
+   dialog for "incorporate last Q&A?" requires terminal interaction.
+   Tests avoid this by: (a) having no last_message, or (b) using
+   named sessions that already exist on disk.
+
+3. **Input::from_files with external commands works**: The backtick
+   syntax (`\`echo hello\``) actually runs the command and captures
+   output. This is a real integration test — it runs `/bin/echo`.
+
+4. **Vault CRUD was skipped**: Vault operations require a password
+   file with actual encrypted content via the `gman` crate's
+   `LocalProvider`. The `add_secret` method also prompts for a
+   password via `inquire`. Testing vault requires either mocking
+   the terminal or using `LocalProvider` directly with a pre-created
+   password file — deferred to a future iteration.
+
+## Final counts
+
+| Category | Tests |
+|---|---|
+| Unit tests (iterations 1-13) | 475 |
+| Integration tests (iteration 14) | 22 |
+| **Total** | **497** |
@@ -0,0 +1,71 @@
+# Iteration 2 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/02-roles.md`
+
+## Tests created
+
+### src/config/role.rs (12 new tests, 15 total)
+
+| Test name | What it verifies |
+|---|---|
+| `role_new_parses_prompt` | Role::new extracts prompt text |
+| `role_new_parses_metadata` | Metadata block parses model, temperature, top_p |
+| `role_new_parses_enabled_tools` | enabled_tools from metadata |
+| `role_new_parses_enabled_mcp_servers` | enabled_mcp_servers from metadata |
+| `role_new_no_metadata_has_none_fields` | No metadata → all optional fields None |
+| `role_builtin_shell_loads` | Built-in "shell" role loads |
+| `role_builtin_code_loads` | Built-in "code" role loads |
+| `role_builtin_nonexistent_errors` | Non-existent built-in → error |
+| `role_default_has_empty_fields` | Default role has empty name/prompt |
+| `role_set_model_updates_model` | set_model() changes the model |
+| `role_set_temperature_works` | set_temperature() changes temperature |
+| `role_export_includes_metadata` | export() includes metadata and prompt |
+
+### src/config/request_context.rs (5 new tests, 7 total)
+
+| Test name | What it verifies |
+|---|---|
+| `use_role_obj_sets_role` | use_role_obj sets role on ctx |
+| `exit_role_clears_role` | exit_role clears role from ctx |
+| `use_prompt_creates_temp_role` | use_prompt creates TEMP_ROLE_NAME role |
+| `extract_role_returns_standalone_role` | extract_role returns active role |
+| `extract_role_returns_default_when_nothing_active` | extract_role returns default role |
+
+**Total: 17 new tests (69 → 86)**
+
+## Bugs discovered
+
+None. Role parsing behavior matches between old and new code.
+
+## Observations for future iterations
+
+1. `retrieve_role` (which calls `Model::retrieve_model`) can't be
+   easily unit-tested without a real client config. It depends on
+   having at least one configured client. Deferred to integration
+   testing or plan 08 (RequestContext scope transitions).
+
+2. The `use_role` async method (which calls `rebuild_tool_scope`)
+   requires async test runtime and MCP infrastructure. Deferred to
+   plan 05 (MCP lifecycle) and 08 (RequestContext).
+
+3. `use_role_obj` correctly rejects when agent is active — tested
+   implicitly through the error path, but creating a mock Agent
+   is complex. Noted for plan 04 (agents).
+
+4. The `extract_role` priority order (session > agent > role > default)
+   is important behavioral contract. Tests verify the role and
+   default cases. Session and agent cases deferred to plans 03, 04.
+
+5. Added `create_test_ctx()` helper to request_context.rs tests.
+   Future iterations should reuse this.
+
+## Plan file updates
+
+Updated 02-roles.md to mark completed items.
+
+## Next iteration
+
+Plan file 03: Sessions — session create/load/save, compression,
+autoname, carry-over, exit, context switching.
@@ -0,0 +1,76 @@
+# Iteration 3 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/03-sessions.md`
+
+## Tests created
+
+### src/config/session.rs (15 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `session_default_is_empty` | Default session is empty, no name, no role, not dirty |
+| `session_new_from_ctx_captures_save_session` | new_from_ctx captures name, empty, not dirty |
+| `session_set_role_captures_role_info` | set_role copies model_id, temperature, role_name, marks dirty |
+| `session_clear_role` | clear_role removes role_name |
+| `session_guard_empty_passes_when_empty` | guard_empty OK when empty |
+| `session_needs_compression_threshold` | Empty session doesn't need compression |
+| `session_needs_compression_returns_false_when_compressing` | Already compressing → false |
+| `session_needs_compression_returns_false_when_threshold_zero` | Zero threshold → false |
+| `session_set_compressing_flag` | set_compressing toggles flag |
+| `session_set_save_session_this_time` | Doesn't panic |
+| `session_save_session_returns_configured_value` | save_session get/set roundtrip |
+| `session_compress_moves_messages` | compress moves messages to compressed, adds system |
+| `session_is_not_empty_after_compress` | Session with compressed messages is not empty |
+| `session_need_autoname_default_false` | Default session doesn't need autoname |
+| `session_set_autonaming_doesnt_panic` | set_autonaming safe without autoname |
+
+### src/config/request_context.rs (4 new tests, 11 total)
+
+| Test name | What it verifies |
+|---|---|
+| `exit_session_clears_session` | exit_session removes session from ctx |
+| `empty_session_clears_messages` | empty_session keeps session but clears it |
+| `maybe_compress_session_returns_false_when_no_session` | No session → no compression |
+| `maybe_autoname_session_returns_false_when_no_session` | No session → no autoname |
+
+**Total: 19 new tests (86 → 105)**
+
+## Bugs discovered
+
+None. Session behavior matches between old and new code.
+
+## Observations for future iterations
+
+1. `Session::new_from_ctx` and `Session::load_from_ctx` have
+   `#[allow(dead_code)]` annotations — they were bridge methods.
+   Should verify if they're still needed or if the old `Session::new`
+   and `Session::load` (which take `&Config`) should be cleaned up
+   in a future pass.
+
+2. The `compress` method moves messages to `compressed_messages` and
+   adds a single system message with the summary. This is a critical
+   behavioral contract — if the summary format changes, sessions
+   could break.
+
+3. `needs_compression` uses `self.compression_threshold` (session-
+   level) with fallback to the global threshold. This priority
+   (session > global) is important behavior.
+
+4. Session carry-over (the "incorporate last Q&A?" prompt) happens
+   inside `use_session` which is async and involves user interaction
+   (inquire::Confirm). Can't unit test this — needs integration test
+   or manual verification.
+
+5. The `extract_role` test for session-active case should verify that
+   `session.to_role()` is returned. Added note to plan 02.
+
+## Plan file updates
+
+Updated 03-sessions.md to mark completed items.
+
+## Next iteration
+
+Plan file 04: Agents — agent init, tool compilation, variables,
+lifecycle, MCP, RAG, auto-continuation.
@@ -0,0 +1,71 @@
+# Iteration 4 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/04-agents.md`
+
+## Tests created
+
+### src/config/agent.rs (4 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `agent_config_parses_from_yaml` | Full AgentConfig YAML with all fields |
+| `agent_config_defaults` | Minimal AgentConfig gets correct defaults |
+| `agent_config_with_model` | model_id, temperature, top_p from YAML |
+| `agent_config_inject_defaults_true` | inject_todo/spawn_instructions default true |
+
+### src/config/agent_runtime.rs (2 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `agent_runtime_new_defaults` | All fields default correctly |
+| `agent_runtime_builder_pattern` | with_depth, with_parent_supervisor work |
+
+### src/config/request_context.rs (6 new tests, 17 total)
+
+| Test name | What it verifies |
+|---|---|
+| `exit_agent_clears_all_agent_state` | exit_agent clears agent, agent_runtime, rag |
+| `current_depth_returns_zero_without_agent` | Default depth is 0 |
+| `current_depth_returns_agent_runtime_depth` | Depth from agent_runtime |
+| `supervisor_returns_none_without_agent` | No agent → no supervisor |
+| `inbox_returns_none_without_agent` | No agent → no inbox |
+| `root_escalation_queue_returns_none_without_agent` | No agent → no queue |
+
+**Total: 12 new tests (105 → 117)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. `Agent::init` can't be unit tested easily — requires agent config
+   files, tool files on disk. Integration tests with temp directories
+   would be needed for full coverage.
+
+2. AgentConfig default values verified:
+   - `max_concurrent_agents` = 4
+   - `max_agent_depth` = 3
+   - `max_auto_continues` = 10
+   - `inject_todo_instructions` = true
+   - `inject_spawn_instructions` = true
+   These are important behavioral contracts.
+
+3. The `exit_agent` test shows that clearing agent state also
+   rebuilds the tool_scope with fresh functions. This is the
+   correct behavior for returning to the global context.
+
+4. Agent variable interpolation (special vars like __os__, __cwd__)
+   happens in Agent::init which is filesystem-dependent. Deferred.
+
+5. `list_agents()` (which filters hidden dirs) is tested via the
+   `.shared` exclusion noted in improvements. Could add a unit test
+   with a temp dir if needed.
+
+## Next iteration
+
+Plan file 05: MCP Lifecycle — the most critical test area. McpFactory,
+McpRuntime, spawn_mcp_server, rebuild_tool_scope MCP integration,
+scope transition MCP behavior.
@@ -0,0 +1,129 @@
+# Iteration 5 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/05-mcp-lifecycle.md`
+
+## Tests created
+
+### src/config/mcp_factory.rs (12 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `key_from_stdio_spec_captures_command_args_env` | McpServerKey extracts command, args, env from stdio spec |
+| `key_from_stdio_spec_sorts_args_and_env` | Args and env are sorted for deterministic key hashing |
+| `key_from_stdio_spec_defaults_empty_when_none` | None args/env default to empty vecs |
+| `key_from_remote_http_spec` | Http transport key captures url and transport type |
+| `key_from_remote_sse_spec_with_sorted_headers` | SSE headers sorted for deterministic keys |
+| `key_equality_same_spec_produces_equal_keys` | Same spec → equal keys (sharing contract) |
+| `key_inequality_different_names` | Different server names → different keys |
+| `key_inequality_different_commands` | Different commands → different keys (isolation contract) |
+| `key_env_bool_and_int_coerce_to_string` | JsonField::Bool/Int coerced to String in key |
+| `factory_try_get_active_returns_none_when_empty` | Empty factory returns None |
+| `factory_try_get_active_returns_none_for_unknown_key` | Unknown key returns None |
+| `factory_default_has_empty_active_map` | Default factory has empty internal map |
+
+### src/config/tool_scope.rs (6 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `mcp_runtime_new_is_empty` | New McpRuntime has no servers |
+| `mcp_runtime_default_is_empty` | Default McpRuntime is empty |
+| `mcp_runtime_get_returns_none_for_missing_server` | get() on nonexistent server returns None |
+| `tool_scope_default_has_empty_mcp_runtime` | Default ToolScope has empty MCP runtime |
+| `tool_scope_default_has_empty_functions` | Default ToolScope has no functions |
+| `tool_scope_default_tracker_has_no_loops` | Default ToolScope tracker detects no loops |
+
+### src/mcp/mod.rs (30 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `validate_stdio_with_command_succeeds` | Valid stdio spec passes |
+| `validate_stdio_missing_command_fails` | Stdio without command is rejected |
+| `validate_stdio_with_url_fails` | Stdio with url (remote field) is rejected |
+| `validate_stdio_with_headers_fails` | Stdio with headers (remote field) is rejected |
+| `validate_http_with_url_succeeds` | Valid http spec passes |
+| `validate_http_missing_url_fails` | Http without url is rejected |
+| `validate_http_with_command_fails` | Http with command (stdio field) is rejected |
+| `validate_http_with_args_fails` | Http with args (stdio field) is rejected |
+| `validate_http_with_cwd_fails` | Http with cwd (stdio field) is rejected |
+| `validate_sse_with_url_succeeds` | Valid SSE spec passes |
+| `validate_sse_missing_url_fails` | SSE without url is rejected |
+| `is_remote_true_for_http_and_sse` | Http and SSE are remote transports |
+| `is_remote_false_for_stdio` | Stdio is not remote |
+| `deserialize_stdio_server_from_json` | Full stdio spec from JSON |
+| `deserialize_http_server_from_json` | Http spec with headers from JSON |
+| `deserialize_env_with_mixed_types` | Env with String, Bool, Int values |
+| `deserialize_multiple_servers` | Multiple server entries parsed |
+| `deserialize_empty_servers_map` | Empty mcpServers map parsed |
+| `deserialize_server_with_cwd` | cwd field parsed correctly |
+| `resolve_all_returns_all_configured_servers` | "all" resolves to all config keys |
+| `resolve_comma_separated_returns_matching_servers` | Comma-separated list filters correctly |
+| `resolve_single_server_name` | Single name resolved |
+| `resolve_none_returns_empty` | None enabled → empty list |
+| `resolve_no_config_returns_empty` | No config → empty list |
+| `resolve_nonexistent_server_filtered_out` | Unknown names silently filtered |
+| `resolve_all_nonexistent_returns_empty` | All unknown → empty list |
+| `resolve_trims_whitespace` | Whitespace in comma list trimmed |
+| `registry_default_is_empty` | Default registry: empty, no config, no log |
+| `registry_with_config_reports_config` | Config accessor works |
+| `meta_function_prefixes_are_correct` | mcp_invoke/search/describe prefixes |
+
+### src/config/request_context.rs (6 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `rebuild_tool_scope_mcp_disabled_skips_servers` | mcp_server_support=false → empty runtime |
+| `rebuild_tool_scope_no_enabled_servers_yields_empty_runtime` | None enabled → empty runtime |
+| `rebuild_tool_scope_no_mcp_config_yields_empty_runtime` | No mcp_config → empty runtime |
+| `rebuild_tool_scope_preserves_tool_tracker` | Tracker survives rebuild |
+| `rebuild_tool_scope_repl_mode_appends_user_interaction_functions` | REPL adds user__ functions |
+| `rebuild_tool_scope_cmd_mode_no_user_interaction_functions` | CMD skips user__ functions |
+
+**Total: 54 new tests (176 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **ConnectedServer untestable without subprocess**: `ConnectedServer`
+   (= `RunningService<RoleClient, ()>`) cannot be constructed without
+   a real MCP server subprocess. This blocks unit testing for:
+   - McpFactory.acquire() full flow (spawn + insert + Weak sharing)
+   - McpRuntime.insert/get with real handles
+   - McpRuntime.search/describe/invoke (need live tool catalog)
+   - All scope transition tests (role/session/agent MCP start/stop)
+
+   These require integration tests with a mock MCP server binary
+   (e.g., a simple echo server). Recommended for a dedicated
+   integration test iteration.
+
+2. **McpServerKey sorting guarantees sharing correctness**: The
+   sorting of args, env, and headers in McpServerKey::from_spec
+   is critical — without it, HashMap key equality would be
+   non-deterministic. Tests verify this explicitly.
+
+3. **rebuild_tool_scope has 3 guard clauses that prevent server
+   acquisition**: mcp_server_support=false, mcp_config=None,
+   enabled_mcp_servers=None. All three paths tested.
+
+4. **REPL vs CMD mode differs in user interaction functions**: The
+   `rebuild_tool_scope` method conditionally appends `user__*`
+   functions only in REPL mode. Tested both paths.
+
+5. **McpServer::validate enforces strict transport/field separation**:
+   Stdio servers cannot have url/headers, remote servers cannot have
+   command/args/cwd. This prevents misconfiguration. All cross-field
+   conflict cases tested.
+
+6. **McpRegistry.resolve_server_ids is private** but tested via
+   `#[cfg(test)]` in the same module. It's the core of server ID
+   resolution for "all", comma-separated, and empty cases.
+
+## Next iteration
+
+Plan file 06: Tool Evaluation — eval_tool_calls, ToolCall dispatch,
+tool handlers, MCP tool invocation chain (mcp__search, mcp__describe,
+mcp__invoke).
@@ -0,0 +1,96 @@
+# Iteration 6 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/06-tool-evaluation.md`
+
+## Tests created
+
+### src/function/mod.rs (36 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `toolcall_new_sets_fields` | ToolCall::new sets name, arguments, id |
+| `toolcall_default_has_empty_fields` | Default ToolCall has empty/null fields |
+| `toolcall_with_thought_signature` | with_thought_signature sets value |
+| `toolcall_with_thought_signature_none` | with_thought_signature(None) clears |
+| `dedup_removes_duplicate_ids_keeps_last` | Duplicate ids → last occurrence kept |
+| `dedup_keeps_unique_ids` | Unique ids → all kept |
+| `dedup_keeps_calls_without_ids` | No-id calls always kept |
+| `dedup_preserves_last_occurrence_order` | Ordering based on last occurrence position |
+| `dedup_empty_input_returns_empty` | Empty vec → empty result |
+| `dedup_mixed_with_and_without_ids` | Mixed id/no-id dedup behavior |
+| `tracker_default_values` | Default max_repeats=2, chain_len=3 |
+| `tracker_no_loop_on_fresh_tracker` | Fresh tracker returns None |
+| `tracker_no_loop_below_threshold` | Below max_repeats → no loop |
+| `tracker_detects_loop_at_max_repeats` | At max_repeats → loop detected |
+| `tracker_different_args_no_loop` | Different args break loop detection |
+| `tracker_different_names_no_loop` | Different names break loop detection |
+| `tracker_chain_detection` | Chain of identical calls detected |
+| `tracker_record_call_respects_capacity` | Capacity bounded by chain_len * max_repeats |
+| `tracker_loop_message_contains_call_history` | Loop message includes call_history JSON |
+| `prefix_constants_are_correct` | All 6 prefixes: todo__, agent__, user__, mcp_invoke/search/describe |
+| `functions_default_is_empty` | Default Functions has no declarations |
+| `functions_append_todo_adds_declarations` | 5 todo tools: init, add, done, list, clear |
+| `functions_append_supervisor_adds_declarations` | Supervisor: spawn, check, collect, list, cancel, reply |
+| `functions_append_teammate_adds_declarations` | Teammate: send_message, check_inbox |
+| `functions_append_user_interaction_adds_declarations` | User: ask, confirm, input, checkbox |
+| `functions_append_mcp_meta_creates_three_per_server` | 3 MCP meta functions per server |
+| `functions_append_mcp_meta_multiple_servers` | Multiple servers → 3 each |
+| `functions_append_mcp_meta_empty_servers` | Empty servers → no declarations |
+| `functions_find_returns_declaration` | find() returns matching declaration |
+| `functions_find_returns_none_for_missing` | find() returns None for unknown |
+| `functions_contains_true_for_existing` | contains() true for known function |
+| `functions_contains_false_for_missing` | contains() false for unknown |
+| `functions_mcp_invoke_declaration_has_tool_and_arguments_params` | Invoke schema: tool + arguments params |
+| `functions_mcp_search_declaration_has_query_and_top_k_params` | Search schema: query + top_k params |
+| `functions_mcp_describe_declaration_has_tool_param` | Describe schema: tool param |
+| `functions_supervisor_includes_task_queue_tools` | Task queue: create, list, complete, fail |
+| `tool_result_stores_call_and_output` | ToolResult::new stores both fields |
+
+**Total: 36 new tests (212 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **ToolCall::dedup keeps the LAST occurrence**: The implementation
+   iterates in reverse and reverses again, so when duplicate ids
+   exist, the last occurrence wins. My initial tests assumed first-
+   wins behavior — caught and corrected during the iteration.
+
+2. **ToolCall::eval requires full RequestContext**: The dispatch
+   routing (`agent__*`, `todo__*`, `user__*`, `mcp_*`, shell
+   fallback) cannot be unit-tested because `eval()` takes
+   `&mut RequestContext` which requires an initialized AppState.
+   The prefix routing is verified indirectly through prefix
+   constant tests and function declaration tests.
+
+3. **Functions::init requires filesystem**: It calls
+   `build_global_tool_declarations` which reads tool files from
+   disk. Can't unit-test without a temp directory with actual
+   tool scripts. Function filtering by `enabled_tools` is thus
+   deferred.
+
+4. **All function declaration appenders are fully testable**: The
+   `append_*` methods on Functions work without I/O and produce
+   the exact function declarations the LLM sees. This is the most
+   important behavioral contract to test.
+
+5. **MCP meta function schemas are critical**: The invoke, search,
+   and describe meta functions each have specific parameter schemas
+   (tool+arguments, query+top_k, tool). Tests verify these schemas
+   exist with correct fields and required params.
+
+6. **ToolCallTracker loop detection has two mechanisms**:
+   - Consecutive repeat detection (same call N times in a row)
+   - Chain detection (same call repeated across the last chain_len
+     entries)
+   Both are tested independently.
+
+## Next iteration
+
+Plan file 07: Input Construction — Input::from_str, from_files,
+field capturing, function selection.
@@ -0,0 +1,97 @@
+# Iteration 7 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/07-input-construction.md`
+
+## Tests created
+
+### src/config/input.rs (31 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `resolve_role_with_explicit_role` | Explicit role returned, with_session/agent false |
+| `resolve_role_without_role_no_session_no_agent` | Default role, both flags false |
+| `resolve_role_without_role_with_session` | with_session true when session present |
+| `resolve_role_explicit_role_overrides_session_flag` | Explicit role forces with_session=false |
+| `resolve_paths_detects_last_reply_syntax` | %% sets with_last_reply=true |
+| `resolve_paths_detects_url` | https:// classified as remote URL |
+| `resolve_paths_detects_external_command` | Backtick-wrapped → external command |
+| `resolve_paths_empty_input` | Empty vec → all empty, no last reply |
+| `resolve_paths_rejects_url_with_glob_suffix` | URL** → error |
+| `resolve_paths_mixed_inputs` | %% + URL + cmd all detected |
+| `input_from_str_captures_text` | Text stored correctly |
+| `input_from_str_with_explicit_role` | Role name captured |
+| `input_from_str_captures_stream_from_config` | stream=false from config |
+| `input_is_empty_with_no_text_and_no_medias` | Empty text + no medias = empty |
+| `input_is_not_empty_with_text` | Text present = not empty |
+| `input_set_text_changes_text` | set_text updates text |
+| `input_text_returns_patched_when_set` | Patched text overrides |
+| `input_clear_patch_restores_original` | clear_patch removes override |
+| `input_set_continue_output_accumulates` | Multiple calls concatenate |
+| `input_set_regenerate_sets_flag_and_clears_tool_calls` | Flag set, tool_calls cleared |
+| `input_summary_truncates_long_text` | >80 chars → truncated with ... |
+| `input_summary_preserves_short_text` | Short text unchanged |
+| `input_raw_with_no_files` | Raw returns just text |
+| `input_render_with_no_medias` | Render returns just text |
+| `input_with_agent_false_when_no_agent` | No agent context → false |
+| `input_session_returns_none_when_with_session_false` | Explicit role → no session access |
+| `input_session_returns_some_when_with_session_true` | Session context → session access |
+| `is_image_recognizes_image_extensions` | png/jpeg/jpg/webp/gif recognized |
+| `is_image_rejects_non_image_extensions` | txt/rs/pdf rejected |
+| `resolve_data_url_returns_path_for_known_hash` | Hash lookup returns path |
+| `resolve_data_url_returns_original_for_non_data_url` | Non-data URL returned as-is |
+
+### src/config/request_context.rs (7 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `select_functions_returns_none_when_no_tools_enabled` | No enabled_tools → None |
+| `select_functions_returns_none_when_function_calling_disabled` | function_calling_support=false → None |
+| `select_functions_all_enabled_tools_returns_all_non_mcp` | "all" → all non-MCP declarations |
+| `select_functions_comma_separated_filters` | Comma list → matching subset |
+| `select_enabled_mcp_servers_returns_empty_when_mcp_disabled` | mcp_server_support=false → empty |
+| `select_enabled_mcp_servers_all_returns_all_mcp_functions` | "all" → all MCP functions |
+| `select_enabled_mcp_servers_comma_filters` | Server name → only that server's 3 functions |
+
+**Total: 38 new tests (250 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **Input::from_files is async and I/O-heavy**: It fetches URLs,
+   reads files from disk, expands globs, and runs external commands.
+   Full testing requires integration tests with temp files/dirs.
+
+2. **resolve_role with agent**: Testing requires an initialized
+   Agent (which needs config files on disk). The agent path is
+   tested indirectly through the existing `exit_agent` test in
+   iteration 4.
+
+3. **resolve_paths is a pure function**: No I/O, fully testable.
+   It cleanly separates path classification (URL vs local vs cmd
+   vs loader) from actual loading. Good design for testing.
+
+4. **select_functions has complex filtering**: It filters non-MCP
+   declarations by enabled_tools, then adds user__ functions for
+   non-agent contexts, then merges agent-specific functions. The
+   MCP selection mirrors this with MCP-prefixed declarations.
+   Both paths fully tested.
+
+5. **Input captures state at construction time**: All fields
+   (stream_enabled, session, rag, functions) are captured from
+   RequestContext at Input creation. This snapshot-at-creation
+   pattern means the Input is independent of later context changes.
+
+6. **The %% syntax for last-reply carry-over** is detected in
+   resolve_paths (pure function) but the actual last_reply
+   retrieval happens in from_files (async). Tested the detection
+   part.
+
+## Next iteration
+
+Plan file 08: Request Context — RequestContext methods, scope
+transitions, state management.
@@ -0,0 +1,69 @@
+# Iteration 8 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/08-request-context.md`
+
+## Tests created
+
+### src/config/request_context.rs (22 new tests, 51 total in file)
+
+| Test name | What it verifies |
+|---|---|
+| `state_empty_context` | Empty context → empty StateFlags |
+| `state_with_role_only` | Role set → ROLE flag |
+| `state_with_empty_session` | Empty session → SESSION_EMPTY flag |
+| `state_flags_combine_role_and_session` | Multiple flags combine correctly |
+| `role_info_errors_when_no_role` | No role → error |
+| `role_info_succeeds_with_role` | Role present → exports prompt |
+| `agent_info_errors_when_no_agent` | No agent → error |
+| `rag_info_errors_when_no_rag` | No RAG → error |
+| `use_role_obj_errors_when_agent_active` | Agent blocks role assignment |
+| `exit_rag_clears_rag` | exit_rag() sets rag to None |
+| `discontinuous_last_message_sets_continuous_false` | Marks last message non-continuous |
+| `discontinuous_last_message_noop_when_none` | No last message → no-op |
+| `before_chat_completion_sets_last_message` | Creates LastMessage with empty output |
+| `role_like_mut_returns_none_when_empty` | No active scope → None |
+| `role_like_mut_returns_role_when_only_role` | Role only → returns role |
+| `role_like_mut_prefers_session_over_role` | Session takes priority |
+| `working_mode_cmd` | CMD mode flags correct |
+| `working_mode_repl` | REPL mode flags correct |
+| `session_file_returns_yaml_path` | Correct .yaml suffix |
+| `session_file_with_subdir` | subdir/name → nested path |
+| `is_compressing_session_false_when_no_session` | No session → false |
+| `is_compressing_session_false_with_default_session` | Default session → false |
+
+**Total: 22 new tests (272 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **Rag struct has no Default**: Rag requires an AppConfig, name,
+   embedding model, and HNSW index. Can't create test instances
+   without heavy setup. RAG-related state tests (state with RAG,
+   exit_rag with actual RAG) deferred.
+
+2. **role_like_mut priority is session > agent > role > None**:
+   The session-over-role priority is verified. Agent priority
+   can't be easily tested without agent init (filesystem).
+
+3. **StateFlags is a bitflags type**: Tested empty, individual
+   flags (ROLE, SESSION_EMPTY), and combinations. The SESSION
+   flag (non-empty session) requires adding messages to a session
+   which needs more setup — deferred.
+
+4. **info() and sysinfo() require model provider config**: These
+   format system info strings that include model details. Testing
+   requires a valid model provider configuration.
+
+5. **The RequestContext test file now has 51 tests** spanning
+   iterations 1, 4, 5, 7, and 8. It's the most heavily tested
+   module, which matches its role as the central state container.
+
+## Next iteration
+
+Plan file 09: REPL Commands — REPL command handlers, state
+assertions, argument parsing.
@@ -0,0 +1,90 @@
+# Iteration 9 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/09-repl-commands.md`
+
+## Tests created
+
+### src/config/mod.rs (8 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `assert_state_pass_always_true` | pass() true for all flag combos |
+| `assert_state_bare_only_empty` | bare() only matches empty |
+| `assert_state_true_requires_flag_present` | True requires any match |
+| `assert_state_true_with_multiple_flags_any_match` | OR semantics for True flags |
+| `assert_state_false_requires_flag_absent` | False requires all absent |
+| `assert_state_false_with_multiple_flags` | Multiple False flags all checked |
+| `assert_state_truefalse_requires_true_present_and_false_absent` | Both conditions |
+| `assert_state_equal_exact_match` | Exact flag equality |
+
+### src/repl/mod.rs (31 new tests, 33 total in file)
+
+| Test name | What it verifies |
+|---|---|
+| `repl_commands_has_39_entries` | Array size |
+| `repl_commands_all_start_with_dot` | All commands dotted |
+| `repl_commands_no_empty_descriptions` | All have descriptions |
+| `repl_commands_help_is_always_available` | .help → pass |
+| `repl_commands_exit_is_always_available` | .exit → pass |
+| `repl_commands_info_role_requires_role` | .info role → True(ROLE) |
+| `repl_commands_session_blocked_when_already_in_session` | .session → False(SESSION) |
+| `repl_commands_exit_session_requires_session` | .exit session → True(SESSION) |
+| `repl_commands_exit_agent_requires_agent` | .exit agent → True(AGENT) |
+| `repl_commands_agent_only_when_bare` | .agent → Equal(empty) |
+| `repl_commands_role_blocked_in_session_or_agent` | .role → False(SESSION\|AGENT) |
+| `repl_commands_prompt_blocked_in_session_or_agent` | .prompt → False(SESSION\|AGENT) |
+| `repl_commands_rag_blocked_in_agent` | .rag → False(AGENT) |
+| `repl_commands_starter_requires_agent` | .starter → True(AGENT) |
+| `repl_commands_clear_todo_requires_agent` | .clear todo → True(AGENT) |
+| `repl_commands_edit_role_requires_role_not_session` | .edit role → TrueFalse |
+| `repl_commands_exit_rag_requires_rag_not_agent` | .exit rag → TrueFalse |
+| `parse_command_plain_text_returns_none` | Plain text → None |
+| `parse_command_empty_returns_none` | Empty → None |
+| `parse_command_whitespace_only_returns_none` | Whitespace → None |
+| `parse_command_dot_only` | Single dot → (".", None) |
+| `split_first_arg_none_input` | None → None |
+| `split_first_arg_single_word` | "role" → ("role", None) |
+| `split_first_arg_two_words` | "role x" → ("role", Some("x")) |
+| `split_first_arg_with_extra_spaces` | Extra spaces trimmed |
+| `repl_command_is_valid_pass_always_true` | pass → always valid |
+| `repl_command_is_valid_respects_true` | True → enforced |
+| `repl_command_is_valid_respects_false` | False → enforced |
+| `multiline_regex_captures_content_between_markers` | :::content::: captured |
+| `multiline_regex_does_not_match_single_marker` | Unclosed → no match |
+| `multiline_regex_does_not_match_plain_text` | Plain text → no match |
+
+**Total: 39 new tests (311 total in suite)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. **AssertState has 4 variants with distinct semantics**:
+   - True: any of the required flags must be present (OR)
+   - False: all of the forbidden flags must be absent (AND)
+   - TrueFalse: True AND False simultaneously
+   - Equal: exact flag match
+   This is a critical invariant for REPL command availability.
+
+2. **The .agent command uses AssertState::bare()** (Equal(empty)),
+   meaning it's only available when NO other scope is active. This
+   is stricter than False — it requires exactly empty state.
+
+3. **All 39 REPL commands** have correct dot prefixes and non-empty
+   descriptions. Verified as structural invariants.
+
+4. **The multiline ::: syntax** is handled by a regex that requires
+   both opening and closing markers. The ReplValidator marks
+   single-marker input as Incomplete for the line editor.
+
+5. **Command handler tests** (the actual .role, .session, .agent
+   implementations) require full async RequestContext with
+   filesystem access. These are integration tests and are deferred.
+
+## Next iteration
+
+Check the TEST-IMPLEMENTATION-PLAN.md for what plan file comes next.