Explore Help

Dark-Alex-17/coyote

Code Issues Pull Requests Actions 9 Packages Projects Releases Wiki Activity

Files

e23e5f9f7bb8e90ee31fae623688598110da4f89

coyote/docs/testing/plans/06-tool-evaluation.md

T

Dark-Alex-17 53eff10d75 test: implemented tests for tool call dispatch and tracking

2026-05-01 10:52:56 -06:00

4.1 KiB

Raw Blame History

Test Plan: Tool Evaluation

Feature description

When the LLM returns tool calls, eval_tool_calls dispatches each call to the appropriate handler. Handlers include: shell tools (bash/python/ts scripts), MCP tools, supervisor tools (agent spawn), todo tools, and user interaction tools.

Behaviors to test

eval_tool_calls dispatch

Calls dispatched to correct handler by function name prefix (requires RequestContext)
Tool results returned for each call (requires RequestContext)
Multiple concurrent tool calls processed (requires RequestContext)
Tool call tracker updated (chain length, repeats)
Root agent (depth 0) checks escalation queue after eval (requires RequestContext)
Escalation notifications injected into results (requires RequestContext)

ToolCall::eval routing

agent__* → handle_supervisor_tool (requires RequestContext)
todo__* → handle_todo_tool (requires RequestContext)
user__* → handle_user_tool (depth 0) or escalate (depth > 0) (requires RequestContext)
mcp_invoke_* → invoke_mcp_tool (requires RequestContext + live MCP)
mcp_search_* → search_mcp_tools (requires RequestContext + live MCP)
mcp_describe_* → describe_mcp_tool (requires RequestContext + live MCP)
Other → shell tool execution (requires RequestContext + binary)

Shell tool execution

Tool binary found and executed (integration test)
Arguments passed correctly (integration test)
Environment variables set (LLM_OUTPUT, etc.) (integration test)
Tool output returned as result (integration test)
Tool failure → error returned as tool result (not panic) (integration test)

Tool call tracking

Tracker counts consecutive identical calls
Max repeats triggers warning
Chain length tracked across turns
Tracker state preserved across tool-result loops

Function selection

select_functions filters by role's enabled_tools (requires filesystem)
select_functions includes MCP meta functions for enabled servers
select_functions includes agent functions when agent active (via append tests)
"all" enables all functions (requires filesystem)
Comma-separated list enables specific functions (requires filesystem)

Context switching scenarios

Tool calls during agent → agent tools available (integration test)
Tool calls during role → role tools available (integration test)
Tool calls with MCP → MCP invoke/search/describe work (integration test)
No agent → no agent__/todo__ tools in declarations (via Functions::default)

Additional behaviors tested (not in original plan)

ToolCall::new sets name, arguments, id correctly
ToolCall::default has empty/null fields
ToolCall::with_thought_signature sets and clears
ToolCall::dedup keeps last occurrence for duplicate ids
ToolCall::dedup keeps all calls without ids
ToolCall::dedup empty input returns empty
ToolCall::dedup mixed with/without ids
ToolCallTracker default values (max_repeats=2, chain_len=3)
ToolCallTracker no loop on fresh tracker
ToolCallTracker no loop below threshold
ToolCallTracker different args breaks loop
ToolCallTracker different names breaks loop
ToolCallTracker record_call respects capacity
ToolCallTracker loop message includes call_history
All 6 prefix constants verified
Functions::append_todo adds all 5 todo tools
Functions::append_supervisor adds spawn/check/collect/list/cancel/reply + task queue
Functions::append_teammate adds send_message/check_inbox
Functions::append_user_interaction adds ask/confirm/input/checkbox
Functions::append_mcp_meta creates 3 per server with correct schemas
Functions::append_mcp_meta empty servers → no declarations
Functions::find/contains work correctly
ToolResult::new stores call and output

Old code reference

src/function/mod.rs — eval_tool_calls, ToolCall::eval
src/function/supervisor.rs — handle_supervisor_tool
src/function/todo.rs — handle_todo_tool
src/function/user_interaction.rs — handle_user_tool

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea Version: 1.27.0+dev-165-gf26f71f1b2 Page: 35ms Template: 3ms

Auto

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API