408 lines
18 KiB
Markdown
408 lines
18 KiB
Markdown
# Phase 1 Flow Test Plan
|
|
|
|
Comprehensive behavioral verification plan comparing the old codebase
|
|
(`~/code/testing/loki` on `develop` branch) against the new Phase 1
|
|
codebase (`~/code/loki`). Every test should produce identical behavior
|
|
in both codebases unless noted as an intentional improvement.
|
|
|
|
## How to run
|
|
|
|
For each test case:
|
|
1. Run the test in the OLD codebase (`cd ~/code/testing/loki && cargo run --`)
|
|
2. Run the same test in the NEW codebase (`cd ~/code/loki && cargo run --`)
|
|
3. Compare output/behavior
|
|
4. Mark PASS/FAIL/IMPROVED
|
|
|
|
Legend:
|
|
- `OLD:` = expected behavior from old codebase
|
|
- `NEW:` = expected behavior from new codebase (should match unless noted)
|
|
- `[IMPROVED]` = intentional behavioral improvement in new code
|
|
|
|
---
|
|
|
|
## 1. Build Baseline
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 1.1 | Compile check | `cargo check` | Zero warnings, zero errors |
|
|
| 1.2 | Clippy | `cargo clippy` | Zero warnings (excluding pre-existing) |
|
|
| 1.3 | Tests | `cargo test` | All tests pass |
|
|
|
|
---
|
|
|
|
## 2. CLI — Info and Listing (early-exit paths)
|
|
|
|
These should produce identical output in both codebases.
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 2.1 | System info | `loki --info` | Prints config paths, model, settings |
|
|
| 2.2 | List models | `loki --list-models` | Prints all available model IDs |
|
|
| 2.3 | List roles | `loki --list-roles` | Prints role names (no hidden files) |
|
|
| 2.4 | List sessions | `loki --list-sessions` | Prints session names |
|
|
| 2.5 | List agents | `loki --list-agents` | Prints agent names, no `.shared` [IMPROVED] |
|
|
| 2.6 | List RAGs | `loki --list-rags` | Prints RAG names |
|
|
| 2.7 | List macros | `loki --list-macros` | Prints macro names |
|
|
| 2.8 | Sync models | `loki --sync-models` | Fetches models.yaml, prints status |
|
|
|
|
---
|
|
|
|
## 3. CLI — Single-shot Chat
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 3.1 | Basic chat | `loki "What is 2+2?"` | Response printed, exits |
|
|
| 3.2 | With role | `loki --role coder "hello"` | Role context applied |
|
|
| 3.3 | With prompt | `loki --prompt "you are a pirate" "hello"` | Temp role applied |
|
|
| 3.4 | With model | `loki --model <model_id> "hello"` | Uses specified model |
|
|
| 3.5 | With session | `loki -s test "hello"` | Session created, message saved |
|
|
| 3.6 | Resume session | `loki -s test "what did I say?"` | Session context preserved |
|
|
| 3.7 | Dry run | `loki --dry-run "hello"` | Input echoed, no API call |
|
|
| 3.8 | No stream | `loki --no-stream "hello"` | Response printed all at once |
|
|
| 3.9 | Empty session | `loki -s test --empty-session "hello"` | Session cleared, fresh start |
|
|
| 3.10 | Save session | `loki -s test --save-session "hello"` | Forces session save |
|
|
| 3.11 | Code mode | `loki -c "fibonacci in python"` | Only code output |
|
|
|
|
---
|
|
|
|
## 4. CLI — File Input
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 4.1 | File + text | `loki -f /etc/hostname "summarize"` | File content included |
|
|
| 4.2 | File only | `loki -f /etc/hostname` | File sent as input |
|
|
| 4.3 | Multiple files | `loki -f /etc/hostname -f /etc/os-release "compare"` | Both files included |
|
|
| 4.4 | Stdin pipe | `echo "hello" \| loki "summarize"` | Stdin included |
|
|
|
|
---
|
|
|
|
## 5. CLI — Shell Execute
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 5.1 | Generate command | `loki -e "list files in /tmp"` | Shell command generated |
|
|
| 5.2 | Describe mode | Press 'd' when prompted | Explanation shown |
|
|
| 5.3 | Execute mode | Press 'y' when prompted | Command executed |
|
|
| 5.4 | Dry run | `loki -e --dry-run "list files"` | Input shown, no execution |
|
|
|
|
---
|
|
|
|
## 6. CLI — Agent (non-interactive)
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 6.1 | Agent chat | `loki -a coder "write hello world in python"` | Agent tools available, response |
|
|
| 6.2 | Agent + session | `loki -a coder -s test "hello"` | Agent with specific session |
|
|
| 6.3 | Agent variables | `loki -a demo --agent-variable key val "hello"` | Variable injected |
|
|
| 6.4 | Agent MCP | `loki -a <mcp-agent> "use the server"` | MCP servers start, tools work |
|
|
| 6.5 | Build tools | `loki -a coder --build-tools` | Tools compiled, exits |
|
|
|
|
---
|
|
|
|
## 7. CLI — Macros
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 7.1 | Execute macro | `loki --macro generate-commit-message` | Macro executes |
|
|
|
|
---
|
|
|
|
## 8. CLI — Vault (early-exit)
|
|
|
|
| # | Test | Command | Expected |
|
|
|---|---|---|---|
|
|
| 8.1 | Add secret | `loki --add-secret test-secret` | Prompts for value, saves |
|
|
| 8.2 | Get secret | `loki --get-secret test-secret` | Prints decrypted value |
|
|
| 8.3 | List secrets | `loki --list-secrets` | Lists all secret names |
|
|
| 8.4 | Delete secret | `loki --delete-secret test-secret` | Deletes, confirms |
|
|
|
|
---
|
|
|
|
## 9. REPL — Startup and Exit
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 9.1 | Start REPL | `loki` | Welcome message shown |
|
|
| 9.2 | Exit command | Type `.exit` | Clean exit |
|
|
| 9.3 | Ctrl+D | Press Ctrl+D | Clean exit |
|
|
| 9.4 | Ctrl+C | Press Ctrl+C | Hint message, stays in REPL |
|
|
| 9.5 | Prelude role | Set `repl_prelude: "role:coder"` in config, start REPL | Role auto-loaded, prompt changes |
|
|
| 9.6 | Prelude session | Set `repl_prelude: "mysession:coder"`, start | Session+role auto-loaded |
|
|
|
|
---
|
|
|
|
## 10. REPL — Basic Chat
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 10.1 | Chat message | Type `hello` | Response streamed |
|
|
| 10.2 | Continue | Type `.continue` after response | Continuation generated |
|
|
| 10.3 | Regenerate | Type `.regenerate` | New response generated |
|
|
| 10.4 | Copy | Type `.copy` | Last response copied to clipboard |
|
|
| 10.5 | Multi-line | Type `:::`, then multi-line, then `:::` | Multi-line sent as one message |
|
|
| 10.6 | Empty input | Press Enter on empty line | No action |
|
|
| 10.7 | Help | Type `.help` | Help text shown |
|
|
| 10.8 | Info | Type `.info` | System info printed |
|
|
|
|
---
|
|
|
|
## 11. REPL — Roles
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 11.1 | Enter role | `.role coder` | Prompt changes, role active |
|
|
| 11.2 | One-shot role | `.role coder write hello world` | Response with role, then returns to no-role |
|
|
| 11.3 | Role info | `.info role` (while in role) | Role details shown |
|
|
| 11.4 | Edit role | `.edit role` (while in role) | Editor opens |
|
|
| 11.5 | Save role | `.save role myname` | Role saved to file |
|
|
| 11.6 | Exit role | `.exit role` | Prompt resets, role cleared |
|
|
| 11.7 | Create new role | `.role newname` (non-existent) | Editor opens for new role |
|
|
| 11.8 | Role + MCP | `.role <mcp-role>` | MCP servers start with spinner, tools available |
|
|
| 11.9 | Exit role + MCP | `.exit role` (from MCP role) | MCP servers stop, global MCP restored |
|
|
| 11.10 | Role in session | `.session test` then `.role coder` | Role applied within session |
|
|
|
|
---
|
|
|
|
## 12. REPL — Sessions
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 12.1 | Temp session | `.session` | Temp session started |
|
|
| 12.2 | Named session | `.session mytest` | Named session created/resumed |
|
|
| 12.3 | Session info | `.info session` | Session details shown |
|
|
| 12.4 | Edit session | `.edit session` | Editor opens |
|
|
| 12.5 | Save session | `.save session myname` | Session saved |
|
|
| 12.6 | Empty session | `.empty session` | Messages cleared |
|
|
| 12.7 | Compress session | `.compress session` | Compression runs with spinner |
|
|
| 12.8 | Exit session | `.exit session` | Session exited |
|
|
| 12.9 | Carry-over prompt | Send message, then `.session test` | "incorporate last Q&A?" prompt |
|
|
| 12.10 | Session + MCP | `.session <mcp-session>` | MCP servers start |
|
|
| 12.11 | Already in session | `.session` while in session | Error: "Already in a session" |
|
|
|
|
---
|
|
|
|
## 13. REPL — Agents
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 13.1 | Start agent | `.agent coder` | Tools compiled, prompt changes, agent active |
|
|
| 13.2 | Agent + session | `.agent coder mysession` | Agent with specific session |
|
|
| 13.3 | Agent variables | `.agent demo key=value` | Variable set, available in tools |
|
|
| 13.4 | Agent info | `.info agent` | Agent details shown |
|
|
| 13.5 | Starter list | `.starter` | Conversation starters listed |
|
|
| 13.6 | Starter select | `.starter 1` | Starter message sent |
|
|
| 13.7 | Edit agent config | `.edit agent-config` | Editor opens |
|
|
| 13.8 | Exit agent | `.exit agent` | Agent cleared, prompt resets |
|
|
| 13.9 | Agent + MCP | `.agent <mcp-agent>` | MCP servers start, tools available |
|
|
| 13.10 | MCP disabled | `.agent <mcp-agent>` with mcp_server_support=false | Error, agent blocked [IMPROVED] |
|
|
| 13.11 | Tool execution | Send message that triggers tool call | Tool executes, result returned |
|
|
| 13.12 | Global tools | Agent with `global_tools` configured | Global tools available alongside agent tools |
|
|
| 13.13 | Tool file priority | Delete .ts, have .sh | .sh used [IMPROVED] |
|
|
| 13.14 | Clear todo | `.clear todo` (in agent with auto-continue) | Todo list cleared |
|
|
| 13.15 | Auto-continuation | Agent with auto_continue=true, create todos | Agent continues until todos done |
|
|
| 13.16 | Already in agent | `.agent coder` while agent active | Error: "Already in an agent" |
|
|
|
|
---
|
|
|
|
## 14. REPL — Sub-Agent Spawning and Escalation
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 14.1 | Spawn sub-agent | Use agent with can_spawn_agents=true, trigger spawn | Sub-agent starts in background |
|
|
| 14.2 | Check sub-agent | Call agent__check with agent ID | Returns PENDING or result |
|
|
| 14.3 | Collect sub-agent | Call agent__collect with agent ID | Blocks until done, returns output |
|
|
| 14.4 | List sub-agents | Call agent__list | Shows all spawned agents + status |
|
|
| 14.5 | Cancel sub-agent | Call agent__cancel with agent ID | Agent cancelled |
|
|
| 14.6 | Escalation | Sub-agent calls user__ask | Parent gets notification |
|
|
| 14.7 | Reply escalation | Parent calls agent__reply_escalation | Sub-agent unblocked |
|
|
| 14.8 | Max depth | Spawn beyond max_agent_depth | Error: "Max agent depth exceeded" |
|
|
| 14.9 | Max concurrent | Spawn beyond max_concurrent_agents | Error: capacity reached |
|
|
| 14.10 | Teammate messaging | Sub-agent sends message to sibling | Message delivered via inbox |
|
|
|
|
---
|
|
|
|
## 15. REPL — RAG
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 15.1 | Init RAG | `.rag <name>` | RAG initialized/loaded |
|
|
| 15.2 | RAG info | `.info rag` | RAG details shown |
|
|
| 15.3 | RAG sources | `.sources rag` (after a query) | Citation sources listed |
|
|
| 15.4 | Edit RAG docs | `.edit rag-docs` | Editor opens |
|
|
| 15.5 | Rebuild RAG | `.rebuild rag` | RAG rebuilt |
|
|
| 15.6 | Exit RAG | `.exit rag` | RAG cleared |
|
|
| 15.7 | RAG embeddings | Send query with RAG active | Embeddings included in context |
|
|
|
|
---
|
|
|
|
## 16. REPL — MCP Servers
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 16.1 | Global MCP start | Start REPL with `enabled_mcp_servers` configured | Servers start |
|
|
| 16.2 | MCP search | LLM calls `mcp__search_<server>` | Tools found and ranked |
|
|
| 16.3 | MCP describe | LLM calls `mcp__describe_<server>` tool_name | Schema returned |
|
|
| 16.4 | MCP invoke | LLM calls `mcp__invoke_<server>` tool args | Tool executed, result returned |
|
|
| 16.5 | Change servers | `.set enabled_mcp_servers <other>` | Old stopped, new started |
|
|
| 16.6 | Disable MCP | `.set mcp_server_support false` | MCP tools removed |
|
|
| 16.7 | Enable MCP | `.set mcp_server_support true` | MCP tools restored |
|
|
| 16.8 | Role MCP switch | Enter role with MCP X, exit, enter role with MCP Y | X stops, Y starts |
|
|
| 16.9 | Null servers | `.set enabled_mcp_servers null` | All MCP servers stop, tools removed |
|
|
|
|
---
|
|
|
|
## 17. REPL — Settings (.set)
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 17.1 | Temperature | `.set temperature 0.5` | Temperature changed |
|
|
| 17.2 | Top-p | `.set top_p 0.9` | Top-p changed |
|
|
| 17.3 | Model | `.set model <name>` | Model switched |
|
|
| 17.4 | Dry run | `.set dry_run true` | Dry run enabled |
|
|
| 17.5 | Stream | `.set stream false` | Streaming disabled |
|
|
| 17.6 | Save | `.set save false` | Auto-save disabled |
|
|
| 17.7 | Highlight | `.set highlight false` | Syntax highlighting disabled |
|
|
| 17.8 | Save session | `.set save_session true` | Session auto-save enabled |
|
|
| 17.9 | Null value | `.set temperature null` | Temperature reset to default |
|
|
| 17.10 | Compression threshold | `.set compression_threshold 2000` | Threshold changed |
|
|
| 17.11 | Max output tokens | `.set max_output_tokens 4096` | Max tokens set |
|
|
| 17.12 | Enabled tools | `.set enabled_tools all` | All tools enabled |
|
|
| 17.13 | Function calling | `.set function_calling_support false` | Function calling disabled |
|
|
|
|
---
|
|
|
|
## 18. REPL — Tab Completion
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 18.1 | Role completion | `.role<TAB>` | Shows role names |
|
|
| 18.2 | Agent completion | `.agent<TAB>` | Shows agent names (no .shared) [IMPROVED] |
|
|
| 18.3 | Session completion | `.session<TAB>` | Shows session names |
|
|
| 18.4 | RAG completion | `.rag<TAB>` | Shows RAG names |
|
|
| 18.5 | Macro completion | `.macro<TAB>` | Shows macro names |
|
|
| 18.6 | Model completion | `.model<TAB>` | Shows model names with descriptions |
|
|
| 18.7 | Set keys | `.set <TAB>` | Shows all setting names |
|
|
| 18.8 | Set values | `.set temperature <TAB>` | Shows current/suggested value |
|
|
| 18.9 | Enabled tools | `.set enabled_tools <TAB>` | Shows tools (no user__/mcp_/todo__/agent__) [IMPROVED] |
|
|
| 18.10 | MCP servers | `.set enabled_mcp_servers <TAB>` | Shows configured servers + mappings [IMPROVED] |
|
|
| 18.11 | Delete types | `.delete <TAB>` | Shows: role, session, rag, macro, agent-data |
|
|
| 18.12 | Vault cmds | `.vault <TAB>` | Shows: add, get, update, delete, list |
|
|
|
|
---
|
|
|
|
## 19. REPL — Delete
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 19.1 | Delete role | `.delete role` | Shows role picker, deletes selected |
|
|
| 19.2 | Delete session | `.delete session` | Shows session picker, deletes |
|
|
| 19.3 | Delete RAG | `.delete rag` | Shows RAG picker, deletes |
|
|
| 19.4 | Delete macro | `.delete macro` | Shows macro picker, deletes |
|
|
| 19.5 | Delete agent data | `.delete agent-data` | Shows agent picker, deletes data |
|
|
|
|
---
|
|
|
|
## 20. REPL — Vault
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 20.1 | Add secret | `.vault add mysecret` | Prompts for value, saves |
|
|
| 20.2 | Get secret | `.vault get mysecret` | Prints decrypted value |
|
|
| 20.3 | Update secret | `.vault update mysecret` | Prompts for new value |
|
|
| 20.4 | Delete secret | `.vault delete mysecret` | Deletes |
|
|
| 20.5 | List secrets | `.vault list` | Lists all secret names |
|
|
|
|
---
|
|
|
|
## 21. REPL — Macros and File
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 21.1 | Execute macro | `.macro generate-commit-message` | Macro runs |
|
|
| 21.2 | Create macro | `.macro newname` (non-existent) | Editor opens |
|
|
| 21.3 | File include | `.file /etc/hostname -- summarize this` | File included, query sent |
|
|
| 21.4 | URL include | `.file https://example.com -- summarize` | URL fetched, content included |
|
|
|
|
---
|
|
|
|
## 22. REPL — Edit Commands
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 22.1 | Edit config | `.edit config` | Config file opens in editor |
|
|
| 22.2 | Edit role | `.edit role` (in role) | Role file opens in editor |
|
|
| 22.3 | Edit session | `.edit session` (in session) | Session file opens in editor |
|
|
| 22.4 | Edit agent config | `.edit agent-config` (in agent) | Agent config opens in editor |
|
|
| 22.5 | Edit RAG docs | `.edit rag-docs` (in RAG) | RAG docs opens in editor |
|
|
|
|
---
|
|
|
|
## 23. Session Compression and Autoname
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 23.1 | Auto-compress | Set low compression_threshold, send many messages | "Compressing the session." shown |
|
|
| 23.2 | Manual compress | `.compress session` | Compression runs with spinner |
|
|
| 23.3 | Auto-name | Start temp session, send messages | Session auto-named |
|
|
|
|
---
|
|
|
|
## 24. Error Handling
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 24.1 | Invalid role | `.role nonexistent_role_xxxxxxx` | Error shown, REPL continues |
|
|
| 24.2 | Invalid model | `.set model nonexistent_model` | Error shown, REPL continues |
|
|
| 24.3 | No session active | `.info session` (no session) | Error or empty |
|
|
| 24.4 | No agent active | `.info agent` (no agent) | Error or empty |
|
|
| 24.5 | Already in session | `.session` then `.session` again | Error: "Already in a session" |
|
|
| 24.6 | Already in agent | `.agent coder` then `.agent coder` | Error: "Already in an agent" |
|
|
| 24.7 | Unknown command | `.nonexistent` | Error message shown |
|
|
| 24.8 | Tool failure | Trigger tool that fails | Error returned to LLM as tool result |
|
|
|
|
---
|
|
|
|
## 25. MCP Lifecycle State Transitions (Critical)
|
|
|
|
These test the most bug-prone area of the migration.
|
|
|
|
| # | Test | Steps | Expected |
|
|
|---|---|---|---|
|
|
| 25.1 | Role A→B MCP swap | Enter role with MCP-A, exit, enter role with MCP-B | A stops, B starts, B tools work |
|
|
| 25.2 | Role MCP→no MCP | Enter role with MCP, exit role | MCP stops, global MCP restored |
|
|
| 25.3 | No MCP→Role MCP | Start REPL (no MCP), enter role with MCP | MCP starts, tools work |
|
|
| 25.4 | Agent MCP lifecycle | Start agent with MCP, use tools, exit agent | Agent MCP starts, works, stops on exit |
|
|
| 25.5 | Session MCP | Start session with MCP config | MCP starts for session |
|
|
| 25.6 | Global→Agent→Global | Start with global MCP-A, enter agent with MCP-B, exit agent | A→B→A transitions clean |
|
|
| 25.7 | MCP mapping resolution | Role has `enabled_mcp_servers: alias`, mapping configured | Alias resolved, correct servers start |
|
|
| 25.8 | MCP disabled + agent | Agent requires MCP, mcp_server_support=false | Error blocks agent start [IMPROVED] |
|
|
|
|
---
|
|
|
|
## Intentional Improvements (NEW ≠ OLD, by design)
|
|
|
|
| # | What changed | Old behavior | New behavior |
|
|
|---|---|---|---|
|
|
| I.1 | Agent list hides `.shared` | `.shared` shown in completions | `.shared` hidden |
|
|
| I.2 | Tool file priority | Filesystem order (non-deterministic) | Priority: .sh > .py > .ts > .js |
|
|
| I.3 | MCP disabled + agent | Warning printed, agent starts anyway | Error, agent blocked |
|
|
| I.4 | Role MCP disabled warning | Warning always shown (even if role has no MCP) | Warning only when role actually has MCP |
|
|
| I.5 | Enabled tools completions | Shows internal tools (user__, mcp_, etc.) | Internal tools hidden |
|
|
| I.6 | MCP server completions | Only mapping aliases | Both configured servers + aliases |
|
|
|
|
---
|
|
|
|
## Test Execution Notes
|
|
|
|
- Run tests in order — some depend on state from previous tests
|
|
(e.g., session tests create sessions that later tests reference)
|
|
- For MCP tests, ensure at least one MCP server is configured in
|
|
`~/.config/loki/functions/mcp.json`
|
|
- For agent tests, use built-in agents (coder, demo, explore)
|
|
- For sub-agent tests, use the sisyphus agent (has can_spawn_agents)
|
|
- For RAG tests, configure a RAG with test documents
|
|
- For vault tests, use temporary secret names to avoid polluting
|
|
the real vault
|
|
- Compare error messages between old and new — they may differ
|
|
slightly in wording but should convey the same meaning
|