5.0 KiB
Iteration 6 — Test Implementation Notes
Plan file addressed
docs/testing/plans/06-tool-evaluation.md
Tests created
src/function/mod.rs (36 new tests)
| Test name | What it verifies |
|---|---|
toolcall_new_sets_fields |
ToolCall::new sets name, arguments, id |
toolcall_default_has_empty_fields |
Default ToolCall has empty/null fields |
toolcall_with_thought_signature |
with_thought_signature sets value |
toolcall_with_thought_signature_none |
with_thought_signature(None) clears |
dedup_removes_duplicate_ids_keeps_last |
Duplicate ids → last occurrence kept |
dedup_keeps_unique_ids |
Unique ids → all kept |
dedup_keeps_calls_without_ids |
No-id calls always kept |
dedup_preserves_last_occurrence_order |
Ordering based on last occurrence position |
dedup_empty_input_returns_empty |
Empty vec → empty result |
dedup_mixed_with_and_without_ids |
Mixed id/no-id dedup behavior |
tracker_default_values |
Default max_repeats=2, chain_len=3 |
tracker_no_loop_on_fresh_tracker |
Fresh tracker returns None |
tracker_no_loop_below_threshold |
Below max_repeats → no loop |
tracker_detects_loop_at_max_repeats |
At max_repeats → loop detected |
tracker_different_args_no_loop |
Different args break loop detection |
tracker_different_names_no_loop |
Different names break loop detection |
tracker_chain_detection |
Chain of identical calls detected |
tracker_record_call_respects_capacity |
Capacity bounded by chain_len * max_repeats |
tracker_loop_message_contains_call_history |
Loop message includes call_history JSON |
prefix_constants_are_correct |
All 6 prefixes: todo__, agent__, user__, mcp_invoke/search/describe |
functions_default_is_empty |
Default Functions has no declarations |
functions_append_todo_adds_declarations |
5 todo tools: init, add, done, list, clear |
functions_append_supervisor_adds_declarations |
Supervisor: spawn, check, collect, list, cancel, reply |
functions_append_teammate_adds_declarations |
Teammate: send_message, check_inbox |
functions_append_user_interaction_adds_declarations |
User: ask, confirm, input, checkbox |
functions_append_mcp_meta_creates_three_per_server |
3 MCP meta functions per server |
functions_append_mcp_meta_multiple_servers |
Multiple servers → 3 each |
functions_append_mcp_meta_empty_servers |
Empty servers → no declarations |
functions_find_returns_declaration |
find() returns matching declaration |
functions_find_returns_none_for_missing |
find() returns None for unknown |
functions_contains_true_for_existing |
contains() true for known function |
functions_contains_false_for_missing |
contains() false for unknown |
functions_mcp_invoke_declaration_has_tool_and_arguments_params |
Invoke schema: tool + arguments params |
functions_mcp_search_declaration_has_query_and_top_k_params |
Search schema: query + top_k params |
functions_mcp_describe_declaration_has_tool_param |
Describe schema: tool param |
functions_supervisor_includes_task_queue_tools |
Task queue: create, list, complete, fail |
tool_result_stores_call_and_output |
ToolResult::new stores both fields |
Total: 36 new tests (212 total in suite)
Bugs discovered
None.
Observations for future iterations
-
ToolCall::dedup keeps the LAST occurrence: The implementation iterates in reverse and reverses again, so when duplicate ids exist, the last occurrence wins. My initial tests assumed first- wins behavior — caught and corrected during the iteration.
-
ToolCall::eval requires full RequestContext: The dispatch routing (
agent__*,todo__*,user__*,mcp_*, shell fallback) cannot be unit-tested becauseeval()takes&mut RequestContextwhich requires an initialized AppState. The prefix routing is verified indirectly through prefix constant tests and function declaration tests. -
Functions::init requires filesystem: It calls
build_global_tool_declarationswhich reads tool files from disk. Can't unit-test without a temp directory with actual tool scripts. Function filtering byenabled_toolsis thus deferred. -
All function declaration appenders are fully testable: The
append_*methods on Functions work without I/O and produce the exact function declarations the LLM sees. This is the most important behavioral contract to test. -
MCP meta function schemas are critical: The invoke, search, and describe meta functions each have specific parameter schemas (tool+arguments, query+top_k, tool). Tests verify these schemas exist with correct fields and required params.
-
ToolCallTracker loop detection has two mechanisms:
- Consecutive repeat detection (same call N times in a row)
- Chain detection (same call repeated across the last chain_len entries) Both are tested independently.
Next iteration
Plan file 07: Input Construction — Input::from_str, from_files, field capturing, function selection.