# Test Plan: Tool Evaluation ## Feature description When the LLM returns tool calls, `eval_tool_calls` dispatches each call to the appropriate handler. Handlers include: shell tools (bash/python/ts scripts), MCP tools, supervisor tools (agent spawn), todo tools, and user interaction tools. ## Behaviors to test ### eval_tool_calls dispatch - [ ] Calls dispatched to correct handler by function name prefix - [ ] Tool results returned for each call - [ ] Multiple concurrent tool calls processed - [ ] Tool call tracker updated (chain length, repeats) - [ ] Root agent (depth 0) checks escalation queue after eval - [ ] Escalation notifications injected into results ### ToolCall::eval routing - [ ] agent__* → handle_supervisor_tool - [ ] todo__* → handle_todo_tool - [ ] user__* → handle_user_tool (depth 0) or escalate (depth > 0) - [ ] mcp_invoke_* → invoke_mcp_tool - [ ] mcp_search_* → search_mcp_tools - [ ] mcp_describe_* → describe_mcp_tool - [ ] Other → shell tool execution ### Shell tool execution - [ ] Tool binary found and executed - [ ] Arguments passed correctly - [ ] Environment variables set (LLM_OUTPUT, etc.) - [ ] Tool output returned as result - [ ] Tool failure → error returned as tool result (not panic) ### Tool call tracking - [ ] Tracker counts consecutive identical calls - [ ] Max repeats triggers warning - [ ] Chain length tracked across turns - [ ] Tracker state preserved across tool-result loops ### Function selection - [ ] select_functions filters by role's enabled_tools - [ ] select_functions includes MCP meta functions for enabled servers - [ ] select_functions includes agent functions when agent active - [ ] "all" enables all functions - [ ] Comma-separated list enables specific functions ## Context switching scenarios - [ ] Tool calls during agent → agent tools available - [ ] Tool calls during role → role tools available - [ ] Tool calls with MCP → MCP invoke/search/describe work - [ ] No agent → no agent__/todo__ tools in declarations ## Old code reference - `src/function/mod.rs` — eval_tool_calls, ToolCall::eval - `src/function/supervisor.rs` — handle_supervisor_tool - `src/function/todo.rs` — handle_todo_tool - `src/function/user_interaction.rs` — handle_user_tool