# Phase 1 Flow Test Plan Comprehensive behavioral verification plan comparing the old codebase (`~/code/testing/loki` on `develop` branch) against the new Phase 1 codebase (`~/code/loki`). Every test should produce identical behavior in both codebases unless noted as an intentional improvement. ## How to run For each test case: 1. Run the test in the OLD codebase (`cd ~/code/testing/loki && cargo run --`) 2. Run the same test in the NEW codebase (`cd ~/code/loki && cargo run --`) 3. Compare output/behavior 4. Mark PASS/FAIL/IMPROVED Legend: - `OLD:` = expected behavior from old codebase - `NEW:` = expected behavior from new codebase (should match unless noted) - `[IMPROVED]` = intentional behavioral improvement in new code --- ## 1. Build Baseline | # | Test | Command | Expected | |---|---|---|---| | 1.1 | Compile check | `cargo check` | Zero warnings, zero errors | | 1.2 | Clippy | `cargo clippy` | Zero warnings (excluding pre-existing) | | 1.3 | Tests | `cargo test` | All tests pass | --- ## 2. CLI — Info and Listing (early-exit paths) These should produce identical output in both codebases. | # | Test | Command | Expected | |---|---|---|---| | 2.1 | System info | `loki --info` | Prints config paths, model, settings | | 2.2 | List models | `loki --list-models` | Prints all available model IDs | | 2.3 | List roles | `loki --list-roles` | Prints role names (no hidden files) | | 2.4 | List sessions | `loki --list-sessions` | Prints session names | | 2.5 | List agents | `loki --list-agents` | Prints agent names, no `.shared` [IMPROVED] | | 2.6 | List RAGs | `loki --list-rags` | Prints RAG names | | 2.7 | List macros | `loki --list-macros` | Prints macro names | | 2.8 | Sync models | `loki --sync-models` | Fetches models.yaml, prints status | --- ## 3. CLI — Single-shot Chat | # | Test | Command | Expected | |---|---|---|---| | 3.1 | Basic chat | `loki "What is 2+2?"` | Response printed, exits | | 3.2 | With role | `loki --role coder "hello"` | Role context applied | | 3.3 | With prompt | `loki --prompt "you are a pirate" "hello"` | Temp role applied | | 3.4 | With model | `loki --model "hello"` | Uses specified model | | 3.5 | With session | `loki -s test "hello"` | Session created, message saved | | 3.6 | Resume session | `loki -s test "what did I say?"` | Session context preserved | | 3.7 | Dry run | `loki --dry-run "hello"` | Input echoed, no API call | | 3.8 | No stream | `loki --no-stream "hello"` | Response printed all at once | | 3.9 | Empty session | `loki -s test --empty-session "hello"` | Session cleared, fresh start | | 3.10 | Save session | `loki -s test --save-session "hello"` | Forces session save | | 3.11 | Code mode | `loki -c "fibonacci in python"` | Only code output | --- ## 4. CLI — File Input | # | Test | Command | Expected | |---|---|---|---| | 4.1 | File + text | `loki -f /etc/hostname "summarize"` | File content included | | 4.2 | File only | `loki -f /etc/hostname` | File sent as input | | 4.3 | Multiple files | `loki -f /etc/hostname -f /etc/os-release "compare"` | Both files included | | 4.4 | Stdin pipe | `echo "hello" \| loki "summarize"` | Stdin included | --- ## 5. CLI — Shell Execute | # | Test | Command | Expected | |---|---|---|---| | 5.1 | Generate command | `loki -e "list files in /tmp"` | Shell command generated | | 5.2 | Describe mode | Press 'd' when prompted | Explanation shown | | 5.3 | Execute mode | Press 'y' when prompted | Command executed | | 5.4 | Dry run | `loki -e --dry-run "list files"` | Input shown, no execution | --- ## 6. CLI — Agent (non-interactive) | # | Test | Command | Expected | |---|---|---|---| | 6.1 | Agent chat | `loki -a coder "write hello world in python"` | Agent tools available, response | | 6.2 | Agent + session | `loki -a coder -s test "hello"` | Agent with specific session | | 6.3 | Agent variables | `loki -a demo --agent-variable key val "hello"` | Variable injected | | 6.4 | Agent MCP | `loki -a "use the server"` | MCP servers start, tools work | | 6.5 | Build tools | `loki -a coder --build-tools` | Tools compiled, exits | --- ## 7. CLI — Macros | # | Test | Command | Expected | |---|---|---|---| | 7.1 | Execute macro | `loki --macro generate-commit-message` | Macro executes | --- ## 8. CLI — Vault (early-exit) | # | Test | Command | Expected | |---|---|---|---| | 8.1 | Add secret | `loki --add-secret test-secret` | Prompts for value, saves | | 8.2 | Get secret | `loki --get-secret test-secret` | Prints decrypted value | | 8.3 | List secrets | `loki --list-secrets` | Lists all secret names | | 8.4 | Delete secret | `loki --delete-secret test-secret` | Deletes, confirms | --- ## 9. REPL — Startup and Exit | # | Test | Steps | Expected | |---|---|---|---| | 9.1 | Start REPL | `loki` | Welcome message shown | | 9.2 | Exit command | Type `.exit` | Clean exit | | 9.3 | Ctrl+D | Press Ctrl+D | Clean exit | | 9.4 | Ctrl+C | Press Ctrl+C | Hint message, stays in REPL | | 9.5 | Prelude role | Set `repl_prelude: "role:coder"` in config, start REPL | Role auto-loaded, prompt changes | | 9.6 | Prelude session | Set `repl_prelude: "mysession:coder"`, start | Session+role auto-loaded | --- ## 10. REPL — Basic Chat | # | Test | Steps | Expected | |---|---|---|---| | 10.1 | Chat message | Type `hello` | Response streamed | | 10.2 | Continue | Type `.continue` after response | Continuation generated | | 10.3 | Regenerate | Type `.regenerate` | New response generated | | 10.4 | Copy | Type `.copy` | Last response copied to clipboard | | 10.5 | Multi-line | Type `:::`, then multi-line, then `:::` | Multi-line sent as one message | | 10.6 | Empty input | Press Enter on empty line | No action | | 10.7 | Help | Type `.help` | Help text shown | | 10.8 | Info | Type `.info` | System info printed | --- ## 11. REPL — Roles | # | Test | Steps | Expected | |---|---|---|---| | 11.1 | Enter role | `.role coder` | Prompt changes, role active | | 11.2 | One-shot role | `.role coder write hello world` | Response with role, then returns to no-role | | 11.3 | Role info | `.info role` (while in role) | Role details shown | | 11.4 | Edit role | `.edit role` (while in role) | Editor opens | | 11.5 | Save role | `.save role myname` | Role saved to file | | 11.6 | Exit role | `.exit role` | Prompt resets, role cleared | | 11.7 | Create new role | `.role newname` (non-existent) | Editor opens for new role | | 11.8 | Role + MCP | `.role ` | MCP servers start with spinner, tools available | | 11.9 | Exit role + MCP | `.exit role` (from MCP role) | MCP servers stop, global MCP restored | | 11.10 | Role in session | `.session test` then `.role coder` | Role applied within session | --- ## 12. REPL — Sessions | # | Test | Steps | Expected | |---|---|---|---| | 12.1 | Temp session | `.session` | Temp session started | | 12.2 | Named session | `.session mytest` | Named session created/resumed | | 12.3 | Session info | `.info session` | Session details shown | | 12.4 | Edit session | `.edit session` | Editor opens | | 12.5 | Save session | `.save session myname` | Session saved | | 12.6 | Empty session | `.empty session` | Messages cleared | | 12.7 | Compress session | `.compress session` | Compression runs with spinner | | 12.8 | Exit session | `.exit session` | Session exited | | 12.9 | Carry-over prompt | Send message, then `.session test` | "incorporate last Q&A?" prompt | | 12.10 | Session + MCP | `.session ` | MCP servers start | | 12.11 | Already in session | `.session` while in session | Error: "Already in a session" | --- ## 13. REPL — Agents | # | Test | Steps | Expected | |---|---|---|---| | 13.1 | Start agent | `.agent coder` | Tools compiled, prompt changes, agent active | | 13.2 | Agent + session | `.agent coder mysession` | Agent with specific session | | 13.3 | Agent variables | `.agent demo key=value` | Variable set, available in tools | | 13.4 | Agent info | `.info agent` | Agent details shown | | 13.5 | Starter list | `.starter` | Conversation starters listed | | 13.6 | Starter select | `.starter 1` | Starter message sent | | 13.7 | Edit agent config | `.edit agent-config` | Editor opens | | 13.8 | Exit agent | `.exit agent` | Agent cleared, prompt resets | | 13.9 | Agent + MCP | `.agent ` | MCP servers start, tools available | | 13.10 | MCP disabled | `.agent ` with mcp_server_support=false | Error, agent blocked [IMPROVED] | | 13.11 | Tool execution | Send message that triggers tool call | Tool executes, result returned | | 13.12 | Global tools | Agent with `global_tools` configured | Global tools available alongside agent tools | | 13.13 | Tool file priority | Delete .ts, have .sh | .sh used [IMPROVED] | | 13.14 | Clear todo | `.clear todo` (in agent with auto-continue) | Todo list cleared | | 13.15 | Auto-continuation | Agent with auto_continue=true, create todos | Agent continues until todos done | | 13.16 | Already in agent | `.agent coder` while agent active | Error: "Already in an agent" | --- ## 14. REPL — Sub-Agent Spawning and Escalation | # | Test | Steps | Expected | |---|---|---|---| | 14.1 | Spawn sub-agent | Use agent with can_spawn_agents=true, trigger spawn | Sub-agent starts in background | | 14.2 | Check sub-agent | Call agent__check with agent ID | Returns PENDING or result | | 14.3 | Collect sub-agent | Call agent__collect with agent ID | Blocks until done, returns output | | 14.4 | List sub-agents | Call agent__list | Shows all spawned agents + status | | 14.5 | Cancel sub-agent | Call agent__cancel with agent ID | Agent cancelled | | 14.6 | Escalation | Sub-agent calls user__ask | Parent gets notification | | 14.7 | Reply escalation | Parent calls agent__reply_escalation | Sub-agent unblocked | | 14.8 | Max depth | Spawn beyond max_agent_depth | Error: "Max agent depth exceeded" | | 14.9 | Max concurrent | Spawn beyond max_concurrent_agents | Error: capacity reached | | 14.10 | Teammate messaging | Sub-agent sends message to sibling | Message delivered via inbox | --- ## 15. REPL — RAG | # | Test | Steps | Expected | |---|---|---|---| | 15.1 | Init RAG | `.rag ` | RAG initialized/loaded | | 15.2 | RAG info | `.info rag` | RAG details shown | | 15.3 | RAG sources | `.sources rag` (after a query) | Citation sources listed | | 15.4 | Edit RAG docs | `.edit rag-docs` | Editor opens | | 15.5 | Rebuild RAG | `.rebuild rag` | RAG rebuilt | | 15.6 | Exit RAG | `.exit rag` | RAG cleared | | 15.7 | RAG embeddings | Send query with RAG active | Embeddings included in context | --- ## 16. REPL — MCP Servers | # | Test | Steps | Expected | |---|---|---|---| | 16.1 | Global MCP start | Start REPL with `enabled_mcp_servers` configured | Servers start | | 16.2 | MCP search | LLM calls `mcp__search_` | Tools found and ranked | | 16.3 | MCP describe | LLM calls `mcp__describe_` tool_name | Schema returned | | 16.4 | MCP invoke | LLM calls `mcp__invoke_` tool args | Tool executed, result returned | | 16.5 | Change servers | `.set enabled_mcp_servers ` | Old stopped, new started | | 16.6 | Disable MCP | `.set mcp_server_support false` | MCP tools removed | | 16.7 | Enable MCP | `.set mcp_server_support true` | MCP tools restored | | 16.8 | Role MCP switch | Enter role with MCP X, exit, enter role with MCP Y | X stops, Y starts | | 16.9 | Null servers | `.set enabled_mcp_servers null` | All MCP servers stop, tools removed | --- ## 17. REPL — Settings (.set) | # | Test | Steps | Expected | |---|---|---|---| | 17.1 | Temperature | `.set temperature 0.5` | Temperature changed | | 17.2 | Top-p | `.set top_p 0.9` | Top-p changed | | 17.3 | Model | `.set model ` | Model switched | | 17.4 | Dry run | `.set dry_run true` | Dry run enabled | | 17.5 | Stream | `.set stream false` | Streaming disabled | | 17.6 | Save | `.set save false` | Auto-save disabled | | 17.7 | Highlight | `.set highlight false` | Syntax highlighting disabled | | 17.8 | Save session | `.set save_session true` | Session auto-save enabled | | 17.9 | Null value | `.set temperature null` | Temperature reset to default | | 17.10 | Compression threshold | `.set compression_threshold 2000` | Threshold changed | | 17.11 | Max output tokens | `.set max_output_tokens 4096` | Max tokens set | | 17.12 | Enabled tools | `.set enabled_tools all` | All tools enabled | | 17.13 | Function calling | `.set function_calling_support false` | Function calling disabled | --- ## 18. REPL — Tab Completion | # | Test | Steps | Expected | |---|---|---|---| | 18.1 | Role completion | `.role` | Shows role names | | 18.2 | Agent completion | `.agent` | Shows agent names (no .shared) [IMPROVED] | | 18.3 | Session completion | `.session` | Shows session names | | 18.4 | RAG completion | `.rag` | Shows RAG names | | 18.5 | Macro completion | `.macro` | Shows macro names | | 18.6 | Model completion | `.model` | Shows model names with descriptions | | 18.7 | Set keys | `.set ` | Shows all setting names | | 18.8 | Set values | `.set temperature ` | Shows current/suggested value | | 18.9 | Enabled tools | `.set enabled_tools ` | Shows tools (no user__/mcp_/todo__/agent__) [IMPROVED] | | 18.10 | MCP servers | `.set enabled_mcp_servers ` | Shows configured servers + mappings [IMPROVED] | | 18.11 | Delete types | `.delete ` | Shows: role, session, rag, macro, agent-data | | 18.12 | Vault cmds | `.vault ` | Shows: add, get, update, delete, list | --- ## 19. REPL — Delete | # | Test | Steps | Expected | |---|---|---|---| | 19.1 | Delete role | `.delete role` | Shows role picker, deletes selected | | 19.2 | Delete session | `.delete session` | Shows session picker, deletes | | 19.3 | Delete RAG | `.delete rag` | Shows RAG picker, deletes | | 19.4 | Delete macro | `.delete macro` | Shows macro picker, deletes | | 19.5 | Delete agent data | `.delete agent-data` | Shows agent picker, deletes data | --- ## 20. REPL — Vault | # | Test | Steps | Expected | |---|---|---|---| | 20.1 | Add secret | `.vault add mysecret` | Prompts for value, saves | | 20.2 | Get secret | `.vault get mysecret` | Prints decrypted value | | 20.3 | Update secret | `.vault update mysecret` | Prompts for new value | | 20.4 | Delete secret | `.vault delete mysecret` | Deletes | | 20.5 | List secrets | `.vault list` | Lists all secret names | --- ## 21. REPL — Macros and File | # | Test | Steps | Expected | |---|---|---|---| | 21.1 | Execute macro | `.macro generate-commit-message` | Macro runs | | 21.2 | Create macro | `.macro newname` (non-existent) | Editor opens | | 21.3 | File include | `.file /etc/hostname -- summarize this` | File included, query sent | | 21.4 | URL include | `.file https://example.com -- summarize` | URL fetched, content included | --- ## 22. REPL — Edit Commands | # | Test | Steps | Expected | |---|---|---|---| | 22.1 | Edit config | `.edit config` | Config file opens in editor | | 22.2 | Edit role | `.edit role` (in role) | Role file opens in editor | | 22.3 | Edit session | `.edit session` (in session) | Session file opens in editor | | 22.4 | Edit agent config | `.edit agent-config` (in agent) | Agent config opens in editor | | 22.5 | Edit RAG docs | `.edit rag-docs` (in RAG) | RAG docs opens in editor | --- ## 23. Session Compression and Autoname | # | Test | Steps | Expected | |---|---|---|---| | 23.1 | Auto-compress | Set low compression_threshold, send many messages | "Compressing the session." shown | | 23.2 | Manual compress | `.compress session` | Compression runs with spinner | | 23.3 | Auto-name | Start temp session, send messages | Session auto-named | --- ## 24. Error Handling | # | Test | Steps | Expected | |---|---|---|---| | 24.1 | Invalid role | `.role nonexistent_role_xxxxxxx` | Error shown, REPL continues | | 24.2 | Invalid model | `.set model nonexistent_model` | Error shown, REPL continues | | 24.3 | No session active | `.info session` (no session) | Error or empty | | 24.4 | No agent active | `.info agent` (no agent) | Error or empty | | 24.5 | Already in session | `.session` then `.session` again | Error: "Already in a session" | | 24.6 | Already in agent | `.agent coder` then `.agent coder` | Error: "Already in an agent" | | 24.7 | Unknown command | `.nonexistent` | Error message shown | | 24.8 | Tool failure | Trigger tool that fails | Error returned to LLM as tool result | --- ## 25. MCP Lifecycle State Transitions (Critical) These test the most bug-prone area of the migration. | # | Test | Steps | Expected | |---|---|---|---| | 25.1 | Role A→B MCP swap | Enter role with MCP-A, exit, enter role with MCP-B | A stops, B starts, B tools work | | 25.2 | Role MCP→no MCP | Enter role with MCP, exit role | MCP stops, global MCP restored | | 25.3 | No MCP→Role MCP | Start REPL (no MCP), enter role with MCP | MCP starts, tools work | | 25.4 | Agent MCP lifecycle | Start agent with MCP, use tools, exit agent | Agent MCP starts, works, stops on exit | | 25.5 | Session MCP | Start session with MCP config | MCP starts for session | | 25.6 | Global→Agent→Global | Start with global MCP-A, enter agent with MCP-B, exit agent | A→B→A transitions clean | | 25.7 | MCP mapping resolution | Role has `enabled_mcp_servers: alias`, mapping configured | Alias resolved, correct servers start | | 25.8 | MCP disabled + agent | Agent requires MCP, mcp_server_support=false | Error blocks agent start [IMPROVED] | --- ## Intentional Improvements (NEW ≠ OLD, by design) | # | What changed | Old behavior | New behavior | |---|---|---|---| | I.1 | Agent list hides `.shared` | `.shared` shown in completions | `.shared` hidden | | I.2 | Tool file priority | Filesystem order (non-deterministic) | Priority: .sh > .py > .ts > .js | | I.3 | MCP disabled + agent | Warning printed, agent starts anyway | Error, agent blocked | | I.4 | Role MCP disabled warning | Warning always shown (even if role has no MCP) | Warning only when role actually has MCP | | I.5 | Enabled tools completions | Shows internal tools (user__, mcp_, etc.) | Internal tools hidden | | I.6 | MCP server completions | Only mapping aliases | Both configured servers + aliases | --- ## Test Execution Notes - Run tests in order — some depend on state from previous tests (e.g., session tests create sessions that later tests reference) - For MCP tests, ensure at least one MCP server is configured in `~/.config/loki/functions/mcp.json` - For agent tests, use built-in agents (coder, demo, explore) - For sub-agent tests, use the sisyphus agent (has can_spawn_agents) - For RAG tests, configure a RAG with test documents - For vault tests, use temporary secret names to avoid polluting the real vault - Compare error messages between old and new — they may differ slightly in wording but should convey the same meaning