testing
This commit is contained in:
@@ -0,0 +1,59 @@
|
||||
# Test Plan: Tool Evaluation
|
||||
|
||||
## Feature description
|
||||
|
||||
When the LLM returns tool calls, `eval_tool_calls` dispatches each
|
||||
call to the appropriate handler. Handlers include: shell tools
|
||||
(bash/python/ts scripts), MCP tools, supervisor tools (agent spawn),
|
||||
todo tools, and user interaction tools.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### eval_tool_calls dispatch
|
||||
- [ ] Calls dispatched to correct handler by function name prefix
|
||||
- [ ] Tool results returned for each call
|
||||
- [ ] Multiple concurrent tool calls processed
|
||||
- [ ] Tool call tracker updated (chain length, repeats)
|
||||
- [ ] Root agent (depth 0) checks escalation queue after eval
|
||||
- [ ] Escalation notifications injected into results
|
||||
|
||||
### ToolCall::eval routing
|
||||
- [ ] agent__* → handle_supervisor_tool
|
||||
- [ ] todo__* → handle_todo_tool
|
||||
- [ ] user__* → handle_user_tool (depth 0) or escalate (depth > 0)
|
||||
- [ ] mcp_invoke_* → invoke_mcp_tool
|
||||
- [ ] mcp_search_* → search_mcp_tools
|
||||
- [ ] mcp_describe_* → describe_mcp_tool
|
||||
- [ ] Other → shell tool execution
|
||||
|
||||
### Shell tool execution
|
||||
- [ ] Tool binary found and executed
|
||||
- [ ] Arguments passed correctly
|
||||
- [ ] Environment variables set (LLM_OUTPUT, etc.)
|
||||
- [ ] Tool output returned as result
|
||||
- [ ] Tool failure → error returned as tool result (not panic)
|
||||
|
||||
### Tool call tracking
|
||||
- [ ] Tracker counts consecutive identical calls
|
||||
- [ ] Max repeats triggers warning
|
||||
- [ ] Chain length tracked across turns
|
||||
- [ ] Tracker state preserved across tool-result loops
|
||||
|
||||
### Function selection
|
||||
- [ ] select_functions filters by role's enabled_tools
|
||||
- [ ] select_functions includes MCP meta functions for enabled servers
|
||||
- [ ] select_functions includes agent functions when agent active
|
||||
- [ ] "all" enables all functions
|
||||
- [ ] Comma-separated list enables specific functions
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Tool calls during agent → agent tools available
|
||||
- [ ] Tool calls during role → role tools available
|
||||
- [ ] Tool calls with MCP → MCP invoke/search/describe work
|
||||
- [ ] No agent → no agent__/todo__ tools in declarations
|
||||
|
||||
## Old code reference
|
||||
- `src/function/mod.rs` — eval_tool_calls, ToolCall::eval
|
||||
- `src/function/supervisor.rs` — handle_supervisor_tool
|
||||
- `src/function/todo.rs` — handle_todo_tool
|
||||
- `src/function/user_interaction.rs` — handle_user_tool
|
||||
Reference in New Issue
Block a user