Files
loki/docs/PHASE-1-FLOW-TEST-PLAN.md
2026-04-16 10:17:03 -06:00

18 KiB

Phase 1 Flow Test Plan

Comprehensive behavioral verification plan comparing the old codebase (~/code/testing/loki on develop branch) against the new Phase 1 codebase (~/code/loki). Every test should produce identical behavior in both codebases unless noted as an intentional improvement.

How to run

For each test case:

  1. Run the test in the OLD codebase (cd ~/code/testing/loki && cargo run --)
  2. Run the same test in the NEW codebase (cd ~/code/loki && cargo run --)
  3. Compare output/behavior
  4. Mark PASS/FAIL/IMPROVED

Legend:

  • OLD: = expected behavior from old codebase
  • NEW: = expected behavior from new codebase (should match unless noted)
  • [IMPROVED] = intentional behavioral improvement in new code

1. Build Baseline

# Test Command Expected
1.1 Compile check cargo check Zero warnings, zero errors
1.2 Clippy cargo clippy Zero warnings (excluding pre-existing)
1.3 Tests cargo test All tests pass

2. CLI — Info and Listing (early-exit paths)

These should produce identical output in both codebases.

# Test Command Expected
2.1 System info loki --info Prints config paths, model, settings
2.2 List models loki --list-models Prints all available model IDs
2.3 List roles loki --list-roles Prints role names (no hidden files)
2.4 List sessions loki --list-sessions Prints session names
2.5 List agents loki --list-agents Prints agent names, no .shared [IMPROVED]
2.6 List RAGs loki --list-rags Prints RAG names
2.7 List macros loki --list-macros Prints macro names
2.8 Sync models loki --sync-models Fetches models.yaml, prints status

3. CLI — Single-shot Chat

# Test Command Expected
3.1 Basic chat loki "What is 2+2?" Response printed, exits
3.2 With role loki --role coder "hello" Role context applied
3.3 With prompt loki --prompt "you are a pirate" "hello" Temp role applied
3.4 With model loki --model <model_id> "hello" Uses specified model
3.5 With session loki -s test "hello" Session created, message saved
3.6 Resume session loki -s test "what did I say?" Session context preserved
3.7 Dry run loki --dry-run "hello" Input echoed, no API call
3.8 No stream loki --no-stream "hello" Response printed all at once
3.9 Empty session loki -s test --empty-session "hello" Session cleared, fresh start
3.10 Save session loki -s test --save-session "hello" Forces session save
3.11 Code mode loki -c "fibonacci in python" Only code output

4. CLI — File Input

# Test Command Expected
4.1 File + text loki -f /etc/hostname "summarize" File content included
4.2 File only loki -f /etc/hostname File sent as input
4.3 Multiple files loki -f /etc/hostname -f /etc/os-release "compare" Both files included
4.4 Stdin pipe echo "hello" | loki "summarize" Stdin included

5. CLI — Shell Execute

# Test Command Expected
5.1 Generate command loki -e "list files in /tmp" Shell command generated
5.2 Describe mode Press 'd' when prompted Explanation shown
5.3 Execute mode Press 'y' when prompted Command executed
5.4 Dry run loki -e --dry-run "list files" Input shown, no execution

6. CLI — Agent (non-interactive)

# Test Command Expected
6.1 Agent chat loki -a coder "write hello world in python" Agent tools available, response
6.2 Agent + session loki -a coder -s test "hello" Agent with specific session
6.3 Agent variables loki -a demo --agent-variable key val "hello" Variable injected
6.4 Agent MCP loki -a <mcp-agent> "use the server" MCP servers start, tools work
6.5 Build tools loki -a coder --build-tools Tools compiled, exits

7. CLI — Macros

# Test Command Expected
7.1 Execute macro loki --macro generate-commit-message Macro executes

8. CLI — Vault (early-exit)

# Test Command Expected
8.1 Add secret loki --add-secret test-secret Prompts for value, saves
8.2 Get secret loki --get-secret test-secret Prints decrypted value
8.3 List secrets loki --list-secrets Lists all secret names
8.4 Delete secret loki --delete-secret test-secret Deletes, confirms

9. REPL — Startup and Exit

# Test Steps Expected
9.1 Start REPL loki Welcome message shown
9.2 Exit command Type .exit Clean exit
9.3 Ctrl+D Press Ctrl+D Clean exit
9.4 Ctrl+C Press Ctrl+C Hint message, stays in REPL
9.5 Prelude role Set repl_prelude: "role:coder" in config, start REPL Role auto-loaded, prompt changes
9.6 Prelude session Set repl_prelude: "mysession:coder", start Session+role auto-loaded

10. REPL — Basic Chat

# Test Steps Expected
10.1 Chat message Type hello Response streamed
10.2 Continue Type .continue after response Continuation generated
10.3 Regenerate Type .regenerate New response generated
10.4 Copy Type .copy Last response copied to clipboard
10.5 Multi-line Type :::, then multi-line, then ::: Multi-line sent as one message
10.6 Empty input Press Enter on empty line No action
10.7 Help Type .help Help text shown
10.8 Info Type .info System info printed

11. REPL — Roles

# Test Steps Expected
11.1 Enter role .role coder Prompt changes, role active
11.2 One-shot role .role coder write hello world Response with role, then returns to no-role
11.3 Role info .info role (while in role) Role details shown
11.4 Edit role .edit role (while in role) Editor opens
11.5 Save role .save role myname Role saved to file
11.6 Exit role .exit role Prompt resets, role cleared
11.7 Create new role .role newname (non-existent) Editor opens for new role
11.8 Role + MCP .role <mcp-role> MCP servers start with spinner, tools available
11.9 Exit role + MCP .exit role (from MCP role) MCP servers stop, global MCP restored
11.10 Role in session .session test then .role coder Role applied within session

12. REPL — Sessions

# Test Steps Expected
12.1 Temp session .session Temp session started
12.2 Named session .session mytest Named session created/resumed
12.3 Session info .info session Session details shown
12.4 Edit session .edit session Editor opens
12.5 Save session .save session myname Session saved
12.6 Empty session .empty session Messages cleared
12.7 Compress session .compress session Compression runs with spinner
12.8 Exit session .exit session Session exited
12.9 Carry-over prompt Send message, then .session test "incorporate last Q&A?" prompt
12.10 Session + MCP .session <mcp-session> MCP servers start
12.11 Already in session .session while in session Error: "Already in a session"

13. REPL — Agents

# Test Steps Expected
13.1 Start agent .agent coder Tools compiled, prompt changes, agent active
13.2 Agent + session .agent coder mysession Agent with specific session
13.3 Agent variables .agent demo key=value Variable set, available in tools
13.4 Agent info .info agent Agent details shown
13.5 Starter list .starter Conversation starters listed
13.6 Starter select .starter 1 Starter message sent
13.7 Edit agent config .edit agent-config Editor opens
13.8 Exit agent .exit agent Agent cleared, prompt resets
13.9 Agent + MCP .agent <mcp-agent> MCP servers start, tools available
13.10 MCP disabled .agent <mcp-agent> with mcp_server_support=false Error, agent blocked [IMPROVED]
13.11 Tool execution Send message that triggers tool call Tool executes, result returned
13.12 Global tools Agent with global_tools configured Global tools available alongside agent tools
13.13 Tool file priority Delete .ts, have .sh .sh used [IMPROVED]
13.14 Clear todo .clear todo (in agent with auto-continue) Todo list cleared
13.15 Auto-continuation Agent with auto_continue=true, create todos Agent continues until todos done
13.16 Already in agent .agent coder while agent active Error: "Already in an agent"

14. REPL — Sub-Agent Spawning and Escalation

# Test Steps Expected
14.1 Spawn sub-agent Use agent with can_spawn_agents=true, trigger spawn Sub-agent starts in background
14.2 Check sub-agent Call agent__check with agent ID Returns PENDING or result
14.3 Collect sub-agent Call agent__collect with agent ID Blocks until done, returns output
14.4 List sub-agents Call agent__list Shows all spawned agents + status
14.5 Cancel sub-agent Call agent__cancel with agent ID Agent cancelled
14.6 Escalation Sub-agent calls user__ask Parent gets notification
14.7 Reply escalation Parent calls agent__reply_escalation Sub-agent unblocked
14.8 Max depth Spawn beyond max_agent_depth Error: "Max agent depth exceeded"
14.9 Max concurrent Spawn beyond max_concurrent_agents Error: capacity reached
14.10 Teammate messaging Sub-agent sends message to sibling Message delivered via inbox

15. REPL — RAG

# Test Steps Expected
15.1 Init RAG .rag <name> RAG initialized/loaded
15.2 RAG info .info rag RAG details shown
15.3 RAG sources .sources rag (after a query) Citation sources listed
15.4 Edit RAG docs .edit rag-docs Editor opens
15.5 Rebuild RAG .rebuild rag RAG rebuilt
15.6 Exit RAG .exit rag RAG cleared
15.7 RAG embeddings Send query with RAG active Embeddings included in context

16. REPL — MCP Servers

# Test Steps Expected
16.1 Global MCP start Start REPL with enabled_mcp_servers configured Servers start
16.2 MCP search LLM calls mcp__search_<server> Tools found and ranked
16.3 MCP describe LLM calls mcp__describe_<server> tool_name Schema returned
16.4 MCP invoke LLM calls mcp__invoke_<server> tool args Tool executed, result returned
16.5 Change servers .set enabled_mcp_servers <other> Old stopped, new started
16.6 Disable MCP .set mcp_server_support false MCP tools removed
16.7 Enable MCP .set mcp_server_support true MCP tools restored
16.8 Role MCP switch Enter role with MCP X, exit, enter role with MCP Y X stops, Y starts
16.9 Null servers .set enabled_mcp_servers null All MCP servers stop, tools removed

17. REPL — Settings (.set)

# Test Steps Expected
17.1 Temperature .set temperature 0.5 Temperature changed
17.2 Top-p .set top_p 0.9 Top-p changed
17.3 Model .set model <name> Model switched
17.4 Dry run .set dry_run true Dry run enabled
17.5 Stream .set stream false Streaming disabled
17.6 Save .set save false Auto-save disabled
17.7 Highlight .set highlight false Syntax highlighting disabled
17.8 Save session .set save_session true Session auto-save enabled
17.9 Null value .set temperature null Temperature reset to default
17.10 Compression threshold .set compression_threshold 2000 Threshold changed
17.11 Max output tokens .set max_output_tokens 4096 Max tokens set
17.12 Enabled tools .set enabled_tools all All tools enabled
17.13 Function calling .set function_calling_support false Function calling disabled

18. REPL — Tab Completion

# Test Steps Expected
18.1 Role completion .role<TAB> Shows role names
18.2 Agent completion .agent<TAB> Shows agent names (no .shared) [IMPROVED]
18.3 Session completion .session<TAB> Shows session names
18.4 RAG completion .rag<TAB> Shows RAG names
18.5 Macro completion .macro<TAB> Shows macro names
18.6 Model completion .model<TAB> Shows model names with descriptions
18.7 Set keys .set <TAB> Shows all setting names
18.8 Set values .set temperature <TAB> Shows current/suggested value
18.9 Enabled tools .set enabled_tools <TAB> Shows tools (no user__/mcp_/todo__/agent__) [IMPROVED]
18.10 MCP servers .set enabled_mcp_servers <TAB> Shows configured servers + mappings [IMPROVED]
18.11 Delete types .delete <TAB> Shows: role, session, rag, macro, agent-data
18.12 Vault cmds .vault <TAB> Shows: add, get, update, delete, list

19. REPL — Delete

# Test Steps Expected
19.1 Delete role .delete role Shows role picker, deletes selected
19.2 Delete session .delete session Shows session picker, deletes
19.3 Delete RAG .delete rag Shows RAG picker, deletes
19.4 Delete macro .delete macro Shows macro picker, deletes
19.5 Delete agent data .delete agent-data Shows agent picker, deletes data

20. REPL — Vault

# Test Steps Expected
20.1 Add secret .vault add mysecret Prompts for value, saves
20.2 Get secret .vault get mysecret Prints decrypted value
20.3 Update secret .vault update mysecret Prompts for new value
20.4 Delete secret .vault delete mysecret Deletes
20.5 List secrets .vault list Lists all secret names

21. REPL — Macros and File

# Test Steps Expected
21.1 Execute macro .macro generate-commit-message Macro runs
21.2 Create macro .macro newname (non-existent) Editor opens
21.3 File include .file /etc/hostname -- summarize this File included, query sent
21.4 URL include .file https://example.com -- summarize URL fetched, content included

22. REPL — Edit Commands

# Test Steps Expected
22.1 Edit config .edit config Config file opens in editor
22.2 Edit role .edit role (in role) Role file opens in editor
22.3 Edit session .edit session (in session) Session file opens in editor
22.4 Edit agent config .edit agent-config (in agent) Agent config opens in editor
22.5 Edit RAG docs .edit rag-docs (in RAG) RAG docs opens in editor

23. Session Compression and Autoname

# Test Steps Expected
23.1 Auto-compress Set low compression_threshold, send many messages "Compressing the session." shown
23.2 Manual compress .compress session Compression runs with spinner
23.3 Auto-name Start temp session, send messages Session auto-named

24. Error Handling

# Test Steps Expected
24.1 Invalid role .role nonexistent_role_xxxxxxx Error shown, REPL continues
24.2 Invalid model .set model nonexistent_model Error shown, REPL continues
24.3 No session active .info session (no session) Error or empty
24.4 No agent active .info agent (no agent) Error or empty
24.5 Already in session .session then .session again Error: "Already in a session"
24.6 Already in agent .agent coder then .agent coder Error: "Already in an agent"
24.7 Unknown command .nonexistent Error message shown
24.8 Tool failure Trigger tool that fails Error returned to LLM as tool result

25. MCP Lifecycle State Transitions (Critical)

These test the most bug-prone area of the migration.

# Test Steps Expected
25.1 Role A→B MCP swap Enter role with MCP-A, exit, enter role with MCP-B A stops, B starts, B tools work
25.2 Role MCP→no MCP Enter role with MCP, exit role MCP stops, global MCP restored
25.3 No MCP→Role MCP Start REPL (no MCP), enter role with MCP MCP starts, tools work
25.4 Agent MCP lifecycle Start agent with MCP, use tools, exit agent Agent MCP starts, works, stops on exit
25.5 Session MCP Start session with MCP config MCP starts for session
25.6 Global→Agent→Global Start with global MCP-A, enter agent with MCP-B, exit agent A→B→A transitions clean
25.7 MCP mapping resolution Role has enabled_mcp_servers: alias, mapping configured Alias resolved, correct servers start
25.8 MCP disabled + agent Agent requires MCP, mcp_server_support=false Error blocks agent start [IMPROVED]

Intentional Improvements (NEW ≠ OLD, by design)

# What changed Old behavior New behavior
I.1 Agent list hides .shared .shared shown in completions .shared hidden
I.2 Tool file priority Filesystem order (non-deterministic) Priority: .sh > .py > .ts > .js
I.3 MCP disabled + agent Warning printed, agent starts anyway Error, agent blocked
I.4 Role MCP disabled warning Warning always shown (even if role has no MCP) Warning only when role actually has MCP
I.5 Enabled tools completions Shows internal tools (user__, mcp_, etc.) Internal tools hidden
I.6 MCP server completions Only mapping aliases Both configured servers + aliases

Test Execution Notes

  • Run tests in order — some depend on state from previous tests (e.g., session tests create sessions that later tests reference)
  • For MCP tests, ensure at least one MCP server is configured in ~/.config/loki/functions/mcp.json
  • For agent tests, use built-in agents (coder, demo, explore)
  • For sub-agent tests, use the sisyphus agent (has can_spawn_agents)
  • For RAG tests, configure a RAG with test documents
  • For vault tests, use temporary secret names to avoid polluting the real vault
  • Compare error messages between old and new — they may differ slightly in wording but should convey the same meaning