testing
This commit is contained in:
@@ -0,0 +1,62 @@
|
||||
# Test Plan: Config Loading and AppConfig
|
||||
|
||||
## Feature description
|
||||
|
||||
Loki loads its configuration from a YAML file (`config.yaml`) into
|
||||
a `Config` struct, then converts it to `AppConfig` (immutable,
|
||||
shared) + `RequestContext` (mutable, per-request). The `AppConfig`
|
||||
holds all serialized fields; `RequestContext` holds runtime state.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Config loading
|
||||
- [ ] Config loads from YAML file with all supported fields
|
||||
- [x] Missing optional fields get correct defaults (config_defaults_match_expected)
|
||||
- [ ] `model_id` defaults to first available model if empty (requires Config::init, integration test)
|
||||
- [x] `temperature`, `top_p` default to `None`
|
||||
- [x] `stream` defaults to `true`
|
||||
- [x] `save` defaults to `false` (CORRECTED: was listed as true)
|
||||
- [x] `highlight` defaults to `true`
|
||||
- [x] `dry_run` defaults to `false`
|
||||
- [x] `function_calling_support` defaults to `true`
|
||||
- [x] `mcp_server_support` defaults to `true`
|
||||
- [x] `compression_threshold` defaults to `4000`
|
||||
- [ ] `document_loaders` populated from config and defaults (requires Config::init)
|
||||
- [x] `clients` parsed from config (to_app_config_copies_clients)
|
||||
|
||||
### AppConfig conversion
|
||||
- [x] `to_app_config()` copies all serialized fields correctly
|
||||
- [x] `clients` field populated on AppConfig
|
||||
- [ ] `visible_tools` correctly computed from `enabled_tools` config (deferred to plan 16)
|
||||
- [x] `mapping_tools` correctly parsed
|
||||
- [x] `mapping_mcp_servers` correctly parsed
|
||||
- [ ] `user_agent` resolved (auto → crate name/version)
|
||||
|
||||
### RequestContext conversion
|
||||
- [x] `to_request_context()` copies all runtime fields (to_request_context_creates_clean_state)
|
||||
- [ ] `model` field populated with resolved model (requires Model::retrieve_model)
|
||||
- [ ] `working_mode` set correctly (Repl vs Cmd)
|
||||
- [x] `tool_scope` starts with default (empty)
|
||||
- [x] `agent_runtime` starts as `None`
|
||||
|
||||
### AppConfig field accessors
|
||||
- [x] `editor()` returns configured editor or $EDITOR
|
||||
- [x] `light_theme()` returns theme flag
|
||||
- [ ] `render_options()` returns options for markdown rendering
|
||||
- [x] `sync_models_url()` returns configured or default URL
|
||||
|
||||
### Dynamic config updates
|
||||
- [x] `update_app_config` closure correctly clones and replaces Arc
|
||||
- [x] Changes to `dry_run`, `stream`, `save` persist across calls
|
||||
- [x] Changes visible to subsequent `ctx.app.config` reads
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] AppConfig remains immutable after construction (no field mutation)
|
||||
- [ ] Multiple RequestContexts can share the same AppState
|
||||
- [ ] Changing AppConfig fields (via clone-mutate-replace) doesn't
|
||||
affect other references to the old Arc
|
||||
|
||||
## Old code reference
|
||||
- `src/config/mod.rs` — `Config` struct, `Config::init`, defaults
|
||||
- `src/config/bridge.rs` — `to_app_config`, `to_request_context`
|
||||
- `src/config/app_config.rs` — `AppConfig` struct and methods
|
||||
@@ -0,0 +1,68 @@
|
||||
# Test Plan: Roles
|
||||
|
||||
## Feature description
|
||||
|
||||
Roles define a system prompt + optional model/temperature/MCP config
|
||||
that customizes LLM behavior. Roles can be built-in or user-defined
|
||||
(markdown files). Roles are "role-likes" — sessions and agents also
|
||||
implement the RoleLike trait.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Role loading
|
||||
- [x] Built-in roles load correctly (shell, code)
|
||||
- [ ] User-defined roles load from markdown files (requires filesystem)
|
||||
- [x] Role parses model_id from metadata
|
||||
- [x] Role parses temperature, top_p from metadata
|
||||
- [x] Role parses enabled_tools from metadata
|
||||
- [x] Role parses enabled_mcp_servers from metadata
|
||||
- [ ] Role with no model_id inherits current model (requires retrieve_role + client config)
|
||||
- [ ] Role with no temperature inherits from AppConfig (requires retrieve_role)
|
||||
- [ ] Role with no top_p inherits from AppConfig (requires retrieve_role)
|
||||
|
||||
### retrieve_role
|
||||
- [ ] Retrieves by name from file system
|
||||
- [ ] Resolves model via Model::retrieve_model
|
||||
- [ ] Falls back to current model if role has no model_id
|
||||
- [ ] Sets temperature/top_p from AppConfig when role doesn't specify
|
||||
|
||||
### use_role (scope transition)
|
||||
- [x] Sets role on RequestContext (use_role_obj_sets_role)
|
||||
- [ ] Triggers rebuild_tool_scope (async, deferred to plan 05/08)
|
||||
- [ ] MCP servers start if role has enabled_mcp_servers (deferred to plan 05)
|
||||
- [ ] MCP meta functions added to function list (deferred to plan 05)
|
||||
- [ ] Previous role cleared when switching (deferred to plan 08)
|
||||
- [x] Role-like temperature/top_p take effect (role_set_temperature_works)
|
||||
|
||||
### exit_role
|
||||
- [x] Clears role from RequestContext (exit_role_clears_role)
|
||||
- [ ] Followed by bootstrap_tools to restore global tool scope (async, deferred)
|
||||
- [ ] MCP servers from role are stopped (deferred to plan 05)
|
||||
- [ ] Global MCP servers restored (deferred to plan 05)
|
||||
|
||||
### use_prompt (temp role)
|
||||
- [x] Creates a TEMP_ROLE_NAME role with the prompt text (use_prompt_creates_temp_role)
|
||||
- [x] Uses current model
|
||||
- [x] Activates via use_role_obj
|
||||
|
||||
### extract_role
|
||||
- [ ] Returns role from agent if agent active (deferred to plan 04)
|
||||
- [ ] Returns role from session if session active with role (deferred to plan 03)
|
||||
- [x] Returns standalone role if active (extract_role_returns_standalone_role)
|
||||
- [x] Returns default role if none active (extract_role_returns_default_when_nothing_active)
|
||||
|
||||
### One-shot role messages (REPL)
|
||||
- [ ] `.role coder write hello` sends message with role, then exits role
|
||||
- [ ] Original state restored after one-shot
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Role → different role: old role replaced, MCP swapped
|
||||
- [ ] Role → session: role cleared, session takes over
|
||||
- [ ] Role with MCP → exit: MCP servers stop, global MCP restored
|
||||
- [ ] No MCP → role with MCP: servers start
|
||||
- [ ] Role with MCP → role without MCP: servers stop
|
||||
|
||||
## Old code reference
|
||||
- `src/config/mod.rs` — `use_role`, `exit_role`, `retrieve_role`
|
||||
- `src/config/role.rs` — `Role` struct, parsing
|
||||
- `src/config/request_context.rs` — `use_role`, `exit_role`, `use_prompt`, `retrieve_role`
|
||||
@@ -0,0 +1,66 @@
|
||||
# Test Plan: Sessions
|
||||
|
||||
## Feature description
|
||||
|
||||
Sessions persist conversation history across multiple turns. They
|
||||
store messages, role context, model info, and optional MCP config.
|
||||
Sessions can be temporary, named, or auto-named.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Session creation
|
||||
- [ ] Temp session created with TEMP_SESSION_NAME
|
||||
- [ ] Named session created at correct file path
|
||||
- [ ] New session captures current role via extract_role
|
||||
- [ ] New session captures save_session from AppConfig
|
||||
- [ ] Session tracks model_id
|
||||
|
||||
### Session loading
|
||||
- [ ] Named session loads from YAML file
|
||||
- [ ] Loaded session resolves model via Model::retrieve_model
|
||||
- [ ] Loaded session restores role_prompt if role exists
|
||||
- [ ] Auto-named sessions (prefixed `_/`) handled correctly
|
||||
|
||||
### Session saving
|
||||
- [ ] Session saved to correct path
|
||||
- [ ] Session file contains messages, model_id, role info
|
||||
- [ ] save_session flag controls whether session is persisted
|
||||
- [ ] set_save_session_this_time overrides for current turn
|
||||
|
||||
### Session lifecycle
|
||||
- [ ] use_session creates or loads session
|
||||
- [ ] Already in session → error
|
||||
- [ ] exit_session saves and clears
|
||||
- [ ] empty_session clears messages but keeps session active
|
||||
|
||||
### Session carry-over
|
||||
- [ ] New empty session with last_message prompts "incorporate?"
|
||||
- [ ] If accepted, last Q&A added to session
|
||||
- [ ] If declined, session starts fresh
|
||||
- [ ] Only prompts when continuous and output not empty
|
||||
|
||||
### Session compression
|
||||
- [ ] maybe_compress_session returns true when threshold exceeded
|
||||
- [ ] compress_session reduces message count
|
||||
- [ ] Compression message shown to user
|
||||
- [ ] Session usable after compression
|
||||
|
||||
### Session autoname
|
||||
- [ ] maybe_autoname_session returns true for new sessions
|
||||
- [ ] Auto-naming sets session name based on content
|
||||
- [ ] Autoname only triggers once per session
|
||||
|
||||
### Session info
|
||||
- [ ] session_info returns formatted session details
|
||||
- [ ] Shows message count, model, role, tokens
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Session → role change: role updated within session
|
||||
- [ ] Session → exit session: messages saved, state cleared
|
||||
- [ ] Agent session → exit: agent session cleanup
|
||||
- [ ] Session with MCP → exit: MCP servers handled
|
||||
|
||||
## Old code reference
|
||||
- `src/config/mod.rs` — `use_session`, `exit_session`, `empty_session`
|
||||
- `src/config/session.rs` — `Session` struct, new, load, save
|
||||
- `src/config/request_context.rs` — `use_session`, `exit_session`
|
||||
@@ -0,0 +1,77 @@
|
||||
# Test Plan: Agents
|
||||
|
||||
## Feature description
|
||||
|
||||
Agents combine a role (instructions), tools (bash/python/ts scripts),
|
||||
optional RAG, optional MCP servers, and optional sub-agent spawning
|
||||
capability. Agent::init compiles tools, resolves model, loads RAG,
|
||||
and sets up the agent environment.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Agent initialization
|
||||
- [ ] Agent::init loads config.yaml from agent directory
|
||||
- [ ] Agent tools compiled from tools.sh / tools.py / tools.ts
|
||||
- [ ] Tool file priority: .sh > .py > .ts > .js
|
||||
- [ ] Global tools loaded (from global_tools config)
|
||||
- [ ] Model resolved from agent config or defaults to current
|
||||
- [ ] Agent with no model_id uses current model
|
||||
- [ ] Temperature/top_p from agent config applied
|
||||
- [ ] Dynamic instructions (_instructions function) invoked if configured
|
||||
- [ ] Static instructions loaded from config
|
||||
- [ ] Agent variables interpolated into instructions
|
||||
- [ ] Special variables (__os__, __cwd__, __now__, etc.) interpolated
|
||||
- [ ] Agent .env file loaded if present
|
||||
- [ ] Built-in agents installed on first run (skip if exists)
|
||||
|
||||
### Agent tools
|
||||
- [ ] Agent-specific tools available as function declarations
|
||||
- [ ] Global tools (from global_tools) also available
|
||||
- [ ] Tool binaries built in agent bin directory
|
||||
- [ ] clear_agent_bin_dir removes old binaries before rebuild
|
||||
- [ ] Tool declarations include name, description, parameters
|
||||
|
||||
### Agent with MCP
|
||||
- [ ] MCP servers listed in agent config started
|
||||
- [ ] MCP meta functions (invoke/search/describe) added
|
||||
- [ ] Agent with MCP but mcp_server_support=false → error
|
||||
- [ ] MCP servers stopped on agent exit
|
||||
|
||||
### Agent with RAG
|
||||
- [ ] RAG documents loaded from agent config
|
||||
- [ ] RAG available during agent conversation
|
||||
- [ ] RAG search results included in context
|
||||
|
||||
### Agent sessions
|
||||
- [ ] Agent session started (temp or named)
|
||||
- [ ] agent_session config used if no explicit session
|
||||
- [ ] Agent session variables initialized
|
||||
|
||||
### Agent lifecycle
|
||||
- [ ] use_agent checks function_calling_support
|
||||
- [ ] use_agent errors if agent already active
|
||||
- [ ] exit_agent clears agent, session, rag, supervisor
|
||||
- [ ] exit_agent restores global tool scope
|
||||
|
||||
### Auto-continuation
|
||||
- [ ] Agents with auto_continue=true continue after incomplete todos
|
||||
- [ ] max_auto_continues limits continuation attempts
|
||||
- [ ] Continuation prompt sent with todo state
|
||||
- [ ] clear todo stops continuation
|
||||
|
||||
### Conversation starters
|
||||
- [ ] Starters loaded from agent config
|
||||
- [ ] .starter lists available starters
|
||||
- [ ] .starter <n> sends the starter as a message
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Agent → exit: tools cleared, MCP stopped, session ended
|
||||
- [ ] Agent with MCP → exit: MCP servers released, global MCP restored
|
||||
- [ ] Already in agent → start agent: error
|
||||
- [ ] Agent with RAG → exit: RAG cleared
|
||||
|
||||
## Old code reference
|
||||
- `src/config/agent.rs` — Agent::init, agent config parsing
|
||||
- `src/config/mod.rs` — use_agent, exit_agent
|
||||
- `src/config/request_context.rs` — use_agent, exit_agent
|
||||
- `src/function/mod.rs` — Functions::init_agent, tool compilation
|
||||
@@ -0,0 +1,98 @@
|
||||
# Test Plan: MCP Server Lifecycle
|
||||
|
||||
## Feature description
|
||||
|
||||
MCP (Model Context Protocol) servers are external tools that run
|
||||
as subprocesses communicating via stdio. Loki manages their lifecycle
|
||||
through McpFactory (start/share via Weak dedup) and McpRuntime
|
||||
(per-scope active server handles). Servers are started/stopped
|
||||
during scope transitions (role/session/agent enter/exit).
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### MCP config loading
|
||||
- [ ] mcp.json parsed correctly from functions directory
|
||||
- [ ] Server specs include command, args, env, cwd
|
||||
- [ ] Vault secrets interpolated in mcp.json
|
||||
- [ ] Missing secrets reported as warnings
|
||||
- [ ] McpServersConfig stored on AppState.mcp_config
|
||||
|
||||
### McpFactory
|
||||
- [ ] acquire() spawns new server when none active
|
||||
- [ ] acquire() returns existing handle via Weak upgrade
|
||||
- [ ] acquire() spawns fresh when Weak is dead
|
||||
- [ ] Multiple acquire() calls for same spec share handle
|
||||
- [ ] Different specs get different handles
|
||||
- [ ] McpServerKey built correctly from spec (sorted args/env)
|
||||
|
||||
### McpRuntime
|
||||
- [ ] insert() adds server handle by name
|
||||
- [ ] get() retrieves handle by name
|
||||
- [ ] server_names() returns all active names
|
||||
- [ ] is_empty() correct for empty/non-empty
|
||||
- [ ] search() finds tools by keyword (BM25 ranking)
|
||||
- [ ] describe() returns tool input schema
|
||||
- [ ] invoke() calls tool on server and returns result
|
||||
|
||||
### spawn_mcp_server
|
||||
- [ ] Builds Command from spec (command, args, env, cwd)
|
||||
- [ ] Creates TokioChildProcess transport
|
||||
- [ ] Completes rmcp handshake (serve)
|
||||
- [ ] Returns Arc<ConnectedServer>
|
||||
- [ ] Log file created when log_path provided
|
||||
|
||||
### rebuild_tool_scope (MCP integration)
|
||||
- [ ] Empty enabled_mcp_servers → no servers acquired
|
||||
- [ ] "all" → all configured servers acquired
|
||||
- [ ] Comma-separated list → only listed servers acquired
|
||||
- [ ] Mapping resolution: alias → actual server key(s)
|
||||
- [ ] MCP meta functions appended for each started server
|
||||
- [ ] Old ToolScope dropped (releasing old server handles)
|
||||
- [ ] Loading spinner shown during acquisition
|
||||
- [ ] AbortSignal properly threaded through
|
||||
|
||||
### Server lifecycle during scope transitions
|
||||
- [ ] Enter role with MCP: servers start
|
||||
- [ ] Exit role: servers stop (handle dropped)
|
||||
- [ ] Enter role A (MCP-X) → exit → enter role B (MCP-Y):
|
||||
X stops, Y starts
|
||||
- [ ] Enter role with MCP → exit to no MCP: servers stop,
|
||||
global MCP restored
|
||||
- [ ] Start REPL with global MCP → enter agent with different MCP:
|
||||
agent MCP takes over
|
||||
- [ ] Exit agent: agent MCP stops, global MCP restored
|
||||
|
||||
### MCP tool invocation chain
|
||||
- [ ] LLM calls mcp__search_<server> → search results returned
|
||||
- [ ] LLM calls mcp__describe_<server> tool_name → schema returned
|
||||
- [ ] LLM calls mcp__invoke_<server> tool args → tool executed
|
||||
- [ ] Server not found → "MCP server not found in runtime" error
|
||||
- [ ] Tool not found → appropriate error
|
||||
|
||||
### MCP support flag
|
||||
- [ ] mcp_server_support=false → no MCP servers started
|
||||
- [ ] mcp_server_support=false + agent with MCP → error (blocks)
|
||||
- [ ] mcp_server_support=false + role with MCP → warning, continues
|
||||
- [ ] .set mcp_server_support true → MCP servers start
|
||||
|
||||
### MCP in child agents
|
||||
- [ ] Child agent MCP servers acquired via factory
|
||||
- [ ] Child agent MCP runtime populated
|
||||
- [ ] Child agent MCP tool invocations work
|
||||
- [ ] Child agent exit drops MCP handles
|
||||
|
||||
## Context switching scenarios (comprehensive)
|
||||
- [ ] No MCP → role with MCP → exit role → no MCP
|
||||
- [ ] Global MCP-A → role MCP-B → exit role → global MCP-A
|
||||
- [ ] Global MCP-A → agent MCP-B → exit agent → global MCP-A
|
||||
- [ ] Role MCP-A → session MCP-B (overrides) → exit session
|
||||
- [ ] Agent MCP → child agent MCP → child exits → parent MCP intact
|
||||
- [ ] .set enabled_mcp_servers X → .set enabled_mcp_servers Y:
|
||||
X released, Y acquired
|
||||
- [ ] .set enabled_mcp_servers null → all released
|
||||
|
||||
## Old code reference
|
||||
- `src/mcp/mod.rs` — McpRegistry, init, reinit, start/stop
|
||||
- `src/config/mcp_factory.rs` — McpFactory, acquire, McpServerKey
|
||||
- `src/config/tool_scope.rs` — ToolScope, McpRuntime
|
||||
- `src/config/request_context.rs` — rebuild_tool_scope, bootstrap_tools
|
||||
@@ -0,0 +1,59 @@
|
||||
# Test Plan: Tool Evaluation
|
||||
|
||||
## Feature description
|
||||
|
||||
When the LLM returns tool calls, `eval_tool_calls` dispatches each
|
||||
call to the appropriate handler. Handlers include: shell tools
|
||||
(bash/python/ts scripts), MCP tools, supervisor tools (agent spawn),
|
||||
todo tools, and user interaction tools.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### eval_tool_calls dispatch
|
||||
- [ ] Calls dispatched to correct handler by function name prefix
|
||||
- [ ] Tool results returned for each call
|
||||
- [ ] Multiple concurrent tool calls processed
|
||||
- [ ] Tool call tracker updated (chain length, repeats)
|
||||
- [ ] Root agent (depth 0) checks escalation queue after eval
|
||||
- [ ] Escalation notifications injected into results
|
||||
|
||||
### ToolCall::eval routing
|
||||
- [ ] agent__* → handle_supervisor_tool
|
||||
- [ ] todo__* → handle_todo_tool
|
||||
- [ ] user__* → handle_user_tool (depth 0) or escalate (depth > 0)
|
||||
- [ ] mcp_invoke_* → invoke_mcp_tool
|
||||
- [ ] mcp_search_* → search_mcp_tools
|
||||
- [ ] mcp_describe_* → describe_mcp_tool
|
||||
- [ ] Other → shell tool execution
|
||||
|
||||
### Shell tool execution
|
||||
- [ ] Tool binary found and executed
|
||||
- [ ] Arguments passed correctly
|
||||
- [ ] Environment variables set (LLM_OUTPUT, etc.)
|
||||
- [ ] Tool output returned as result
|
||||
- [ ] Tool failure → error returned as tool result (not panic)
|
||||
|
||||
### Tool call tracking
|
||||
- [ ] Tracker counts consecutive identical calls
|
||||
- [ ] Max repeats triggers warning
|
||||
- [ ] Chain length tracked across turns
|
||||
- [ ] Tracker state preserved across tool-result loops
|
||||
|
||||
### Function selection
|
||||
- [ ] select_functions filters by role's enabled_tools
|
||||
- [ ] select_functions includes MCP meta functions for enabled servers
|
||||
- [ ] select_functions includes agent functions when agent active
|
||||
- [ ] "all" enables all functions
|
||||
- [ ] Comma-separated list enables specific functions
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Tool calls during agent → agent tools available
|
||||
- [ ] Tool calls during role → role tools available
|
||||
- [ ] Tool calls with MCP → MCP invoke/search/describe work
|
||||
- [ ] No agent → no agent__/todo__ tools in declarations
|
||||
|
||||
## Old code reference
|
||||
- `src/function/mod.rs` — eval_tool_calls, ToolCall::eval
|
||||
- `src/function/supervisor.rs` — handle_supervisor_tool
|
||||
- `src/function/todo.rs` — handle_todo_tool
|
||||
- `src/function/user_interaction.rs` — handle_user_tool
|
||||
@@ -0,0 +1,58 @@
|
||||
# Test Plan: Input Construction
|
||||
|
||||
## Feature description
|
||||
|
||||
`Input` encapsulates a single chat turn's data: text, files, role,
|
||||
model, session context, RAG embeddings, and function declarations.
|
||||
It's constructed at the start of each turn and captures all needed
|
||||
state from `RequestContext`.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Input::from_str
|
||||
- [ ] Creates Input from text string
|
||||
- [ ] Captures role via resolve_role
|
||||
- [ ] Captures session from ctx
|
||||
- [ ] Captures rag from ctx
|
||||
- [ ] Captures functions via select_functions
|
||||
- [ ] Captures stream_enabled from AppConfig
|
||||
- [ ] app_config field set from ctx.app.config
|
||||
- [ ] Empty text → is_empty() returns true
|
||||
|
||||
### Input::from_files
|
||||
- [ ] Loads file contents
|
||||
- [ ] Supports multiple files
|
||||
- [ ] Supports directories (recursive)
|
||||
- [ ] Supports URLs (fetches content)
|
||||
- [ ] Supports loader syntax (e.g., jina:url)
|
||||
- [ ] Last message carry-over (%% syntax)
|
||||
- [ ] Combines file content with text
|
||||
- [ ] document_loaders from AppConfig used
|
||||
|
||||
### resolve_role
|
||||
- [ ] Returns provided role if given
|
||||
- [ ] Extracts role from agent if agent active
|
||||
- [ ] Extracts role from session if session has role
|
||||
- [ ] Returns default model-based role otherwise
|
||||
- [ ] with_session flag set correctly
|
||||
- [ ] with_agent flag set correctly
|
||||
|
||||
### Input methods
|
||||
- [ ] stream() returns stream_enabled && !model.no_stream()
|
||||
- [ ] create_client() uses app_config to init client
|
||||
- [ ] prepare_completion_data() uses captured functions
|
||||
- [ ] build_messages() uses captured session
|
||||
- [ ] echo_messages() uses captured session
|
||||
- [ ] set_regenerate(role) refreshes role
|
||||
- [ ] use_embeddings() searches RAG if present
|
||||
- [ ] merge_tool_results() creates continuation input
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Input with agent → agent functions selected
|
||||
- [ ] Input with MCP → MCP meta functions in declarations
|
||||
- [ ] Input with RAG → embeddings included after use_embeddings
|
||||
- [ ] Input without session → no session messages in build_messages
|
||||
|
||||
## Old code reference
|
||||
- `src/config/input.rs` — Input struct, from_str, from_files
|
||||
- `src/config/mod.rs` — select_functions, extract_role
|
||||
@@ -0,0 +1,69 @@
|
||||
# Test Plan: RequestContext
|
||||
|
||||
## Feature description
|
||||
|
||||
`RequestContext` is the per-request mutable state container. It holds
|
||||
the active model, role, session, agent, RAG, tool scope, and agent
|
||||
runtime. It provides methods for scope transitions, state queries,
|
||||
and chat completion lifecycle.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### State management
|
||||
- [ ] info() returns formatted system info
|
||||
- [ ] state() returns correct StateFlags combination
|
||||
- [ ] current_model() returns active model
|
||||
- [ ] role_info(), session_info(), rag_info(), agent_info() format correctly
|
||||
- [ ] sysinfo() returns system details
|
||||
- [ ] working_mode correctly distinguishes Repl vs Cmd
|
||||
|
||||
### Scope transitions
|
||||
- [ ] use_role changes role, rebuilds tool scope
|
||||
- [ ] use_session creates/loads session, rebuilds tool scope
|
||||
- [ ] use_agent initializes agent with all subsystems
|
||||
- [ ] exit_role clears role
|
||||
- [ ] exit_session saves and clears session
|
||||
- [ ] exit_agent clears agent, supervisor, rag, session
|
||||
- [ ] exit_rag clears rag
|
||||
- [ ] bootstrap_tools rebuilds tool scope with global MCP
|
||||
|
||||
### Chat completion lifecycle
|
||||
- [ ] before_chat_completion sets up for API call
|
||||
- [ ] after_chat_completion saves messages, updates state
|
||||
- [ ] discontinuous_last_message marks last message as non-continuous
|
||||
|
||||
### ToolScope management
|
||||
- [ ] rebuild_tool_scope creates fresh Functions
|
||||
- [ ] rebuild_tool_scope acquires MCP servers via factory
|
||||
- [ ] rebuild_tool_scope appends user interaction functions in REPL mode
|
||||
- [ ] rebuild_tool_scope appends MCP meta functions for started servers
|
||||
- [ ] Tool tracker preserved across scope rebuilds
|
||||
|
||||
### AgentRuntime management
|
||||
- [ ] agent_runtime populated by use_agent
|
||||
- [ ] agent_runtime cleared by exit_agent
|
||||
- [ ] Accessor methods (current_depth, supervisor, inbox, etc.) return
|
||||
correct values when agent active
|
||||
- [ ] Accessor methods return defaults when no agent
|
||||
|
||||
### Settings update
|
||||
- [ ] update() handles all .set keys correctly
|
||||
- [ ] update_app_config() clones and replaces Arc properly
|
||||
- [ ] delete() handles all delete subcommands
|
||||
|
||||
### Session helpers
|
||||
- [ ] list_sessions() returns session names
|
||||
- [ ] list_autoname_sessions() returns auto-named sessions
|
||||
- [ ] session_file() returns correct path
|
||||
- [ ] save_session() persists session
|
||||
- [ ] empty_session() clears messages
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] No state → use_role → exit_role → no state
|
||||
- [ ] No state → use_agent → exit_agent → no state
|
||||
- [ ] Role → use_agent (error: agent requires exiting role first)
|
||||
- [ ] Agent → exit_agent → use_role (clean transition)
|
||||
|
||||
## Old code reference
|
||||
- `src/config/request_context.rs` — all methods
|
||||
- `src/config/mod.rs` — original Config methods (for parity)
|
||||
@@ -0,0 +1,61 @@
|
||||
# Test Plan: REPL Commands
|
||||
|
||||
## Feature description
|
||||
|
||||
The REPL processes dot-commands (`.role`, `.session`, `.agent`, etc.)
|
||||
and plain text (chat messages). Each command has state assertions
|
||||
(e.g., `.info role` requires an active role).
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Command parsing
|
||||
- [ ] Dot-commands parsed correctly (command + args)
|
||||
- [ ] Multi-line input (:::) handled
|
||||
- [ ] Plain text treated as chat message
|
||||
- [ ] Empty input ignored
|
||||
|
||||
### State assertions (REPL_COMMANDS array)
|
||||
- [ ] Each command's assert_state enforced correctly
|
||||
- [ ] Invalid state → command rejected with appropriate error
|
||||
- [ ] Commands with AssertState::pass() always available
|
||||
|
||||
### Command handlers (each one)
|
||||
- [ ] .help — prints help text
|
||||
- [ ] .info [subcommand] — displays appropriate info
|
||||
- [ ] .model <name> — switches model
|
||||
- [ ] .prompt <text> — sets temp role
|
||||
- [ ] .role <name> [text] — enters role or one-shot
|
||||
- [ ] .session [name] — starts/resumes session
|
||||
- [ ] .agent <name> [session] [key=value] — starts agent
|
||||
- [ ] .rag [name] — initializes RAG
|
||||
- [ ] .starter [n] — lists or executes conversation starter
|
||||
- [ ] .set <key> <value> — updates setting
|
||||
- [ ] .delete <type> — deletes item
|
||||
- [ ] .exit [type] — exits scope or REPL
|
||||
- [ ] .save role/session [name] — saves to file
|
||||
- [ ] .edit role/session/config/agent-config/rag-docs — opens editor
|
||||
- [ ] .empty session — clears session
|
||||
- [ ] .compress session — compresses session
|
||||
- [ ] .rebuild rag — rebuilds RAG
|
||||
- [ ] .sources rag — shows RAG sources
|
||||
- [ ] .copy — copies last response
|
||||
- [ ] .continue — continues response
|
||||
- [ ] .regenerate — regenerates response
|
||||
- [ ] .file <path> [-- text] — includes files
|
||||
- [ ] .macro <name> [text] — runs/creates macro
|
||||
- [ ] .authenticate — OAuth flow
|
||||
- [ ] .vault <cmd> [name] — vault operations
|
||||
- [ ] .clear todo — clears agent todo
|
||||
|
||||
### ask function (chat flow)
|
||||
- [ ] Input constructed from text
|
||||
- [ ] Embeddings applied if RAG active
|
||||
- [ ] Waits for compression to complete
|
||||
- [ ] before_chat_completion called
|
||||
- [ ] Streaming vs non-streaming based on config
|
||||
- [ ] Tool results loop (recursive ask with merged results)
|
||||
- [ ] after_chat_completion called
|
||||
- [ ] Auto-continuation for agents with todos
|
||||
|
||||
## Old code reference
|
||||
- `src/repl/mod.rs` — run_repl_command, ask, REPL_COMMANDS
|
||||
@@ -0,0 +1,56 @@
|
||||
# Test Plan: CLI Flags
|
||||
|
||||
## Feature description
|
||||
|
||||
Loki CLI accepts flags for model, role, session, agent, file input,
|
||||
execution mode, and various info/list commands. Flags determine
|
||||
the execution path through main.rs.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Early-exit flags
|
||||
- [ ] --info prints info and exits
|
||||
- [ ] --list-models prints models and exits
|
||||
- [ ] --list-roles prints roles and exits
|
||||
- [ ] --list-sessions prints sessions and exits
|
||||
- [ ] --list-agents prints agents and exits
|
||||
- [ ] --list-rags prints RAGs and exits
|
||||
- [ ] --list-macros prints macros and exits
|
||||
- [ ] --sync-models fetches and exits
|
||||
- [ ] --build-tools (with --agent) builds and exits
|
||||
- [ ] --authenticate runs OAuth and exits
|
||||
- [ ] --completions generates shell completions and exits
|
||||
- [ ] Vault flags (--add/get/update/delete-secret, --list-secrets) and exit
|
||||
|
||||
### Mode selection
|
||||
- [ ] No text/file → REPL mode
|
||||
- [ ] Text provided → command mode (single-shot)
|
||||
- [ ] --agent → agent mode
|
||||
- [ ] --role → role mode
|
||||
- [ ] --execute (-e) → shell execute mode
|
||||
- [ ] --code (-c) → code output mode
|
||||
- [ ] --prompt → temp role mode
|
||||
- [ ] --macro → macro execution mode
|
||||
|
||||
### Flag combinations
|
||||
- [ ] --model + any mode → model applied
|
||||
- [ ] --session + --role → session with role
|
||||
- [ ] --session + --agent → agent with session
|
||||
- [ ] --agent + --agent-variable → variables set
|
||||
- [ ] --dry-run + any mode → input shown, no API call
|
||||
- [ ] --no-stream + any mode → non-streaming response
|
||||
- [ ] --file + text → file content + text combined
|
||||
- [ ] --empty-session + --session → fresh session
|
||||
- [ ] --save-session + --session → force save
|
||||
|
||||
### Prelude
|
||||
- [ ] apply_prelude runs before main execution
|
||||
- [ ] Prelude "role:name" loads role
|
||||
- [ ] Prelude "session:name" loads session
|
||||
- [ ] Prelude "session:role" loads both
|
||||
- [ ] Prelude skipped if macro_flag set
|
||||
- [ ] Prelude skipped if state already has role/session/agent
|
||||
|
||||
## Old code reference
|
||||
- `src/cli/mod.rs` — Cli struct, flag definitions
|
||||
- `src/main.rs` — run(), flag processing, mode branching
|
||||
@@ -0,0 +1,59 @@
|
||||
# Test Plan: Sub-Agent Spawning
|
||||
|
||||
## Feature description
|
||||
|
||||
Agents with can_spawn_agents=true can spawn child agents that run
|
||||
in parallel as background tokio tasks. Children communicate results
|
||||
back to the parent via collect/check. Escalation allows children
|
||||
to request user input through the parent.
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Spawn
|
||||
- [ ] agent__spawn creates child agent in background
|
||||
- [ ] Child gets own RequestContext with incremented depth
|
||||
- [ ] Child gets own session, model, functions
|
||||
- [ ] Child gets shared root_escalation_queue
|
||||
- [ ] Child gets inbox for teammate messaging
|
||||
- [ ] Child MCP servers acquired if configured
|
||||
- [ ] Max concurrent agents enforced
|
||||
- [ ] Max depth enforced
|
||||
- [ ] Agent not found → error
|
||||
- [ ] can_spawn_agents=false → no spawn tools available
|
||||
|
||||
### Collect/Check
|
||||
- [ ] agent__check returns PENDING or result
|
||||
- [ ] agent__collect blocks until done, returns output
|
||||
- [ ] Output summarization when exceeds threshold
|
||||
- [ ] Summarization uses configured model
|
||||
|
||||
### Task queue
|
||||
- [ ] agent__task_create creates tasks with dependencies
|
||||
- [ ] agent__task_complete marks done, unblocks dependents
|
||||
- [ ] Auto-dispatch spawns agent for unblocked tasks
|
||||
- [ ] agent__task_list shows all tasks with status
|
||||
|
||||
### Escalation
|
||||
- [ ] Child calls user__ask → escalation created
|
||||
- [ ] Parent sees pending_escalations notification
|
||||
- [ ] agent__reply_escalation unblocks child
|
||||
- [ ] Escalation timeout → fallback message
|
||||
|
||||
### Teammate messaging
|
||||
- [ ] agent__send_message delivers to sibling inbox
|
||||
- [ ] agent__check_inbox drains messages
|
||||
|
||||
### Child agent lifecycle
|
||||
- [ ] run_child_agent loops: create input → call completions → process results
|
||||
- [ ] Child uses before/after_chat_completion
|
||||
- [ ] Child tool calls evaluated via eval_tool_calls
|
||||
- [ ] Child exits cleanly, supervisor cancels on completion
|
||||
|
||||
## Context switching scenarios
|
||||
- [ ] Parent spawns child with MCP → child MCP works independently
|
||||
- [ ] Parent exits agent → all children cancelled
|
||||
- [ ] Multiple children share escalation queue correctly
|
||||
|
||||
## Old code reference
|
||||
- `src/function/supervisor.rs` — all handler functions
|
||||
- `src/supervisor/` — Supervisor, EscalationQueue, Inbox, TaskQueue
|
||||
@@ -0,0 +1,17 @@
|
||||
# Test Plan: RAG
|
||||
|
||||
## Behaviors to test
|
||||
- [ ] Rag::init creates new RAG with embedding model
|
||||
- [ ] Rag::load loads existing RAG from disk
|
||||
- [ ] Rag::create builds vector store from documents
|
||||
- [ ] Rag::refresh_document_paths updates document list
|
||||
- [ ] RAG search returns relevant embeddings
|
||||
- [ ] RAG template formats context + sources + input
|
||||
- [ ] Reranker model applied when configured
|
||||
- [ ] top_k controls number of results
|
||||
- [ ] RAG sources tracked for .sources command
|
||||
- [ ] exit_rag clears RAG from context
|
||||
|
||||
## Old code reference
|
||||
- `src/rag/mod.rs` — Rag struct and methods
|
||||
- `src/config/request_context.rs` — use_rag, edit_rag_docs, rebuild_rag
|
||||
@@ -0,0 +1,30 @@
|
||||
# Test Plan: Tab Completion and Prompt
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Tab completion (repl_complete)
|
||||
- [ ] .role<TAB> → role names (no hidden files)
|
||||
- [ ] .agent<TAB> → agent names (no .shared)
|
||||
- [ ] .session<TAB> → session names
|
||||
- [ ] .rag<TAB> → RAG names
|
||||
- [ ] .macro<TAB> → macro names
|
||||
- [ ] .model<TAB> → model names with descriptions
|
||||
- [ ] .set <TAB> → setting keys (sorted)
|
||||
- [ ] .set temperature <TAB> → current value suggestions
|
||||
- [ ] .set enabled_tools <TAB> → tool names (no internal tools)
|
||||
- [ ] .set enabled_mcp_servers <TAB> → configured servers + aliases
|
||||
- [ ] .delete <TAB> → type names
|
||||
- [ ] .vault <TAB> → subcommands
|
||||
- [ ] .agent <name> <TAB> → session names for that agent
|
||||
- [ ] Fuzzy filtering applied to all completions
|
||||
|
||||
### Prompt rendering
|
||||
- [ ] Left prompt shows role/session/agent name
|
||||
- [ ] Right prompt shows model name
|
||||
- [ ] Prompt updates after scope transitions
|
||||
- [ ] Multi-line indicator shown during ::: input
|
||||
|
||||
## Old code reference
|
||||
- `src/config/request_context.rs` — repl_complete
|
||||
- `src/repl/completer.rs` — ReplCompleter
|
||||
- `src/repl/prompt.rs` — ReplPrompt
|
||||
@@ -0,0 +1,14 @@
|
||||
# Test Plan: Macros
|
||||
|
||||
## Behaviors to test
|
||||
- [ ] Macro loaded from YAML file
|
||||
- [ ] Macro steps executed sequentially
|
||||
- [ ] Each step runs through run_repl_command
|
||||
- [ ] Variable interpolation in macro steps
|
||||
- [ ] Built-in macros installed on first run
|
||||
- [ ] macro_execute creates isolated RequestContext
|
||||
- [ ] Macro context inherits tool scope from parent
|
||||
- [ ] Macro context has macro_flag set
|
||||
|
||||
## Old code reference
|
||||
- `src/config/macros.rs` — macro_execute, Macro struct
|
||||
@@ -0,0 +1,16 @@
|
||||
# Test Plan: Vault
|
||||
|
||||
## Behaviors to test
|
||||
- [ ] Vault add stores encrypted secret
|
||||
- [ ] Vault get decrypts and returns secret
|
||||
- [ ] Vault update replaces secret value
|
||||
- [ ] Vault delete removes secret
|
||||
- [ ] Vault list shows all secret names
|
||||
- [ ] Secrets interpolated in MCP config (mcp.json)
|
||||
- [ ] Missing secrets produce warning during MCP init
|
||||
- [ ] Vault accessible from REPL (.vault commands)
|
||||
- [ ] Vault accessible from CLI (--add/get/update/delete-secret)
|
||||
|
||||
## Old code reference
|
||||
- `src/vault/mod.rs` — GlobalVault, operations
|
||||
- `src/mcp/mod.rs` — interpolate_secrets
|
||||
@@ -0,0 +1,43 @@
|
||||
# Test Plan: Functions and Tools
|
||||
|
||||
## Behaviors to test
|
||||
|
||||
### Function declarations
|
||||
- [ ] Functions::init loads from visible_tools config
|
||||
- [ ] Tool declarations parsed from bash scripts (argc annotations)
|
||||
- [ ] Tool declarations parsed from python scripts (docstrings)
|
||||
- [ ] Tool declarations parsed from typescript (JSDoc + type inference)
|
||||
- [ ] Each declaration has name, description, parameters
|
||||
- [ ] Agent tools loaded via Functions::init_agent
|
||||
- [ ] Global tools loaded via build_global_tool_declarations
|
||||
|
||||
### Tool compilation
|
||||
- [ ] Bash tools compiled to bin directory
|
||||
- [ ] Python tools compiled to bin directory
|
||||
- [ ] TypeScript tools compiled to bin directory
|
||||
- [ ] clear_agent_bin_dir removes old binaries
|
||||
- [ ] Tool file priority: .sh > .py > .ts > .js
|
||||
|
||||
### User interaction functions
|
||||
- [ ] append_user_interaction_functions adds user__ask/confirm/input/checkbox
|
||||
- [ ] Only appended in REPL mode
|
||||
- [ ] User interaction tools work at depth 0 (direct prompt)
|
||||
- [ ] User interaction tools escalate at depth > 0
|
||||
|
||||
### MCP meta functions
|
||||
- [ ] append_mcp_meta_functions adds invoke/search/describe per server
|
||||
- [ ] Meta functions removed when ToolScope rebuilt without those servers
|
||||
- [ ] Function names follow mcp_invoke_<server> pattern
|
||||
|
||||
### Function selection
|
||||
- [ ] select_functions filters by role's enabled_tools
|
||||
- [ ] "all" enables everything
|
||||
- [ ] Specific tool names enabled selectively
|
||||
- [ ] mapping_tools aliases resolved
|
||||
- [ ] Agent functions included when agent active
|
||||
- [ ] MCP meta functions included when servers active
|
||||
|
||||
## Old code reference
|
||||
- `src/function/mod.rs` — Functions struct, init, init_agent
|
||||
- `src/config/paths.rs` — agent_functions_file (priority)
|
||||
- `src/parsers/` — bash, python, typescript parsers
|
||||
Reference in New Issue
Block a user