testing

2026-04-15 12:56:00 -06:00
parent ff3419a714
commit 63b6678e73
82 changed files with 14800 additions and 3310 deletions
@@ -0,0 +1,52 @@
+# Iteration 1 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/01-config-and-appconfig.md`
+
+## Tests created
+
+| File | Test name | What it verifies |
+|---|---|---|
+| `src/config/mod.rs` | `config_defaults_match_expected` | All Config::default() fields match old code values |
+| `src/config/app_config.rs` | `to_app_config_copies_serialized_fields` | to_app_config copies model_id, temperature, top_p, dry_run, stream, save, highlight, compression_threshold, rag_top_k |
+| `src/config/app_config.rs` | `to_app_config_copies_clients` | clients field populated (empty by default) |
+| `src/config/app_config.rs` | `to_app_config_copies_mapping_fields` | mapping_tools and mapping_mcp_servers copied correctly |
+| `src/config/app_config.rs` | `editor_returns_configured_value` | editor() returns configured value |
+| `src/config/app_config.rs` | `editor_falls_back_to_env` | editor() doesn't panic without config |
+| `src/config/app_config.rs` | `light_theme_default_is_false` | light_theme() default |
+| `src/config/app_config.rs` | `sync_models_url_has_default` | sync_models_url() has non-empty default |
+| `src/config/request_context.rs` | `to_request_context_creates_clean_state` | RequestContext starts with clean state (no role/session/agent, empty tool_scope, no agent_runtime) |
+| `src/config/request_context.rs` | `update_app_config_persists_changes` | Dynamic config updates via clone-mutate-replace persist |
+
+**Total: 10 new tests (59 → 69)**
+
+## Bugs discovered
+
+None. The `save` default was `false` in both old and new code
+(my plan file incorrectly said `true` — corrected).
+
+## Observations for future iterations
+
+1. The `Config::default().save` is `false`, but the plan file
+   01 incorrectly listed it as `true`. Plan file should be
+   updated to reflect the actual default.
+
+2. `AppConfig::default()` doesn't exist natively (no derive).
+   Tests construct it via `Config::default().to_app_config()`.
+   This is fine since that's how it's created in production.
+
+3. The `visible_tools` field computation happens during
+   `Config::init` (not `to_app_config`). Testing the full
+   visible_tools resolution requires integration-level testing
+   with actual tool files. Deferred to plan file 16
+   (functions-and-tools).
+
+4. Testing `Config::init` directly is difficult because it reads
+   from the filesystem, starts MCP servers, etc. The unit tests
+   focus on the conversion paths which are the Phase 1 surface.
+
+## Next iteration
+
+Plan file 02: Roles — role loading, retrieve_role, use_role/exit_role,
+use_prompt, extract_role, one-shot role messages, MCP context switching.
@@ -0,0 +1,71 @@
+# Iteration 2 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/02-roles.md`
+
+## Tests created
+
+### src/config/role.rs (12 new tests, 15 total)
+
+| Test name | What it verifies |
+|---|---|
+| `role_new_parses_prompt` | Role::new extracts prompt text |
+| `role_new_parses_metadata` | Metadata block parses model, temperature, top_p |
+| `role_new_parses_enabled_tools` | enabled_tools from metadata |
+| `role_new_parses_enabled_mcp_servers` | enabled_mcp_servers from metadata |
+| `role_new_no_metadata_has_none_fields` | No metadata → all optional fields None |
+| `role_builtin_shell_loads` | Built-in "shell" role loads |
+| `role_builtin_code_loads` | Built-in "code" role loads |
+| `role_builtin_nonexistent_errors` | Non-existent built-in → error |
+| `role_default_has_empty_fields` | Default role has empty name/prompt |
+| `role_set_model_updates_model` | set_model() changes the model |
+| `role_set_temperature_works` | set_temperature() changes temperature |
+| `role_export_includes_metadata` | export() includes metadata and prompt |
+
+### src/config/request_context.rs (5 new tests, 7 total)
+
+| Test name | What it verifies |
+|---|---|
+| `use_role_obj_sets_role` | use_role_obj sets role on ctx |
+| `exit_role_clears_role` | exit_role clears role from ctx |
+| `use_prompt_creates_temp_role` | use_prompt creates TEMP_ROLE_NAME role |
+| `extract_role_returns_standalone_role` | extract_role returns active role |
+| `extract_role_returns_default_when_nothing_active` | extract_role returns default role |
+
+**Total: 17 new tests (69 → 86)**
+
+## Bugs discovered
+
+None. Role parsing behavior matches between old and new code.
+
+## Observations for future iterations
+
+1. `retrieve_role` (which calls `Model::retrieve_model`) can't be
+   easily unit-tested without a real client config. It depends on
+   having at least one configured client. Deferred to integration
+   testing or plan 08 (RequestContext scope transitions).
+
+2. The `use_role` async method (which calls `rebuild_tool_scope`)
+   requires async test runtime and MCP infrastructure. Deferred to
+   plan 05 (MCP lifecycle) and 08 (RequestContext).
+
+3. `use_role_obj` correctly rejects when agent is active — tested
+   implicitly through the error path, but creating a mock Agent
+   is complex. Noted for plan 04 (agents).
+
+4. The `extract_role` priority order (session > agent > role > default)
+   is important behavioral contract. Tests verify the role and
+   default cases. Session and agent cases deferred to plans 03, 04.
+
+5. Added `create_test_ctx()` helper to request_context.rs tests.
+   Future iterations should reuse this.
+
+## Plan file updates
+
+Updated 02-roles.md to mark completed items.
+
+## Next iteration
+
+Plan file 03: Sessions — session create/load/save, compression,
+autoname, carry-over, exit, context switching.
@@ -0,0 +1,76 @@
+# Iteration 3 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/03-sessions.md`
+
+## Tests created
+
+### src/config/session.rs (15 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `session_default_is_empty` | Default session is empty, no name, no role, not dirty |
+| `session_new_from_ctx_captures_save_session` | new_from_ctx captures name, empty, not dirty |
+| `session_set_role_captures_role_info` | set_role copies model_id, temperature, role_name, marks dirty |
+| `session_clear_role` | clear_role removes role_name |
+| `session_guard_empty_passes_when_empty` | guard_empty OK when empty |
+| `session_needs_compression_threshold` | Empty session doesn't need compression |
+| `session_needs_compression_returns_false_when_compressing` | Already compressing → false |
+| `session_needs_compression_returns_false_when_threshold_zero` | Zero threshold → false |
+| `session_set_compressing_flag` | set_compressing toggles flag |
+| `session_set_save_session_this_time` | Doesn't panic |
+| `session_save_session_returns_configured_value` | save_session get/set roundtrip |
+| `session_compress_moves_messages` | compress moves messages to compressed, adds system |
+| `session_is_not_empty_after_compress` | Session with compressed messages is not empty |
+| `session_need_autoname_default_false` | Default session doesn't need autoname |
+| `session_set_autonaming_doesnt_panic` | set_autonaming safe without autoname |
+
+### src/config/request_context.rs (4 new tests, 11 total)
+
+| Test name | What it verifies |
+|---|---|
+| `exit_session_clears_session` | exit_session removes session from ctx |
+| `empty_session_clears_messages` | empty_session keeps session but clears it |
+| `maybe_compress_session_returns_false_when_no_session` | No session → no compression |
+| `maybe_autoname_session_returns_false_when_no_session` | No session → no autoname |
+
+**Total: 19 new tests (86 → 105)**
+
+## Bugs discovered
+
+None. Session behavior matches between old and new code.
+
+## Observations for future iterations
+
+1. `Session::new_from_ctx` and `Session::load_from_ctx` have
+   `#[allow(dead_code)]` annotations — they were bridge methods.
+   Should verify if they're still needed or if the old `Session::new`
+   and `Session::load` (which take `&Config`) should be cleaned up
+   in a future pass.
+
+2. The `compress` method moves messages to `compressed_messages` and
+   adds a single system message with the summary. This is a critical
+   behavioral contract — if the summary format changes, sessions
+   could break.
+
+3. `needs_compression` uses `self.compression_threshold` (session-
+   level) with fallback to the global threshold. This priority
+   (session > global) is important behavior.
+
+4. Session carry-over (the "incorporate last Q&A?" prompt) happens
+   inside `use_session` which is async and involves user interaction
+   (inquire::Confirm). Can't unit test this — needs integration test
+   or manual verification.
+
+5. The `extract_role` test for session-active case should verify that
+   `session.to_role()` is returned. Added note to plan 02.
+
+## Plan file updates
+
+Updated 03-sessions.md to mark completed items.
+
+## Next iteration
+
+Plan file 04: Agents — agent init, tool compilation, variables,
+lifecycle, MCP, RAG, auto-continuation.
@@ -0,0 +1,71 @@
+# Iteration 4 — Test Implementation Notes
+
+## Plan file addressed
+
+`docs/testing/plans/04-agents.md`
+
+## Tests created
+
+### src/config/agent.rs (4 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `agent_config_parses_from_yaml` | Full AgentConfig YAML with all fields |
+| `agent_config_defaults` | Minimal AgentConfig gets correct defaults |
+| `agent_config_with_model` | model_id, temperature, top_p from YAML |
+| `agent_config_inject_defaults_true` | inject_todo/spawn_instructions default true |
+
+### src/config/agent_runtime.rs (2 new tests)
+
+| Test name | What it verifies |
+|---|---|
+| `agent_runtime_new_defaults` | All fields default correctly |
+| `agent_runtime_builder_pattern` | with_depth, with_parent_supervisor work |
+
+### src/config/request_context.rs (6 new tests, 17 total)
+
+| Test name | What it verifies |
+|---|---|
+| `exit_agent_clears_all_agent_state` | exit_agent clears agent, agent_runtime, rag |
+| `current_depth_returns_zero_without_agent` | Default depth is 0 |
+| `current_depth_returns_agent_runtime_depth` | Depth from agent_runtime |
+| `supervisor_returns_none_without_agent` | No agent → no supervisor |
+| `inbox_returns_none_without_agent` | No agent → no inbox |
+| `root_escalation_queue_returns_none_without_agent` | No agent → no queue |
+
+**Total: 12 new tests (105 → 117)**
+
+## Bugs discovered
+
+None.
+
+## Observations for future iterations
+
+1. `Agent::init` can't be unit tested easily — requires agent config
+   files, tool files on disk. Integration tests with temp directories
+   would be needed for full coverage.
+
+2. AgentConfig default values verified:
+   - `max_concurrent_agents` = 4
+   - `max_agent_depth` = 3
+   - `max_auto_continues` = 10
+   - `inject_todo_instructions` = true
+   - `inject_spawn_instructions` = true
+   These are important behavioral contracts.
+
+3. The `exit_agent` test shows that clearing agent state also
+   rebuilds the tool_scope with fresh functions. This is the
+   correct behavior for returning to the global context.
+
+4. Agent variable interpolation (special vars like __os__, __cwd__)
+   happens in Agent::init which is filesystem-dependent. Deferred.
+
+5. `list_agents()` (which filters hidden dirs) is tested via the
+   `.shared` exclusion noted in improvements. Could add a unit test
+   with a temp dir if needed.
+
+## Next iteration
+
+Plan file 05: MCP Lifecycle — the most critical test area. McpFactory,
+McpRuntime, spawn_mcp_server, rebuild_tool_scope MCP integration,
+scope transition MCP behavior.