Dark-Alex-17/loki

Fork 0

Files

Alex Clarke d906713d7d

testing

2026-04-16 10:17:03 -06:00

18 KiB

Raw Blame History

Phase 1 Flow Test Plan

Comprehensive behavioral verification plan comparing the old codebase (~/code/testing/loki on develop branch) against the new Phase 1 codebase (~/code/loki). Every test should produce identical behavior in both codebases unless noted as an intentional improvement.

How to run

For each test case:

Run the test in the OLD codebase (cd ~/code/testing/loki && cargo run --)
Run the same test in the NEW codebase (cd ~/code/loki && cargo run --)
Compare output/behavior
Mark PASS/FAIL/IMPROVED

Legend:

OLD: = expected behavior from old codebase
NEW: = expected behavior from new codebase (should match unless noted)
[IMPROVED] = intentional behavioral improvement in new code

1. Build Baseline

#	Test	Command	Expected
1.1	Compile check	`cargo check`	Zero warnings, zero errors
1.2	Clippy	`cargo clippy`	Zero warnings (excluding pre-existing)
1.3	Tests	`cargo test`	All tests pass

2. CLI — Info and Listing (early-exit paths)

These should produce identical output in both codebases.

#	Test	Command	Expected
2.1	System info	`loki --info`	Prints config paths, model, settings
2.2	List models	`loki --list-models`	Prints all available model IDs
2.3	List roles	`loki --list-roles`	Prints role names (no hidden files)
2.4	List sessions	`loki --list-sessions`	Prints session names
2.5	List agents	`loki --list-agents`	Prints agent names, no `.shared` [IMPROVED]
2.6	List RAGs	`loki --list-rags`	Prints RAG names
2.7	List macros	`loki --list-macros`	Prints macro names
2.8	Sync models	`loki --sync-models`	Fetches models.yaml, prints status

3. CLI — Single-shot Chat

#	Test	Command	Expected
3.1	Basic chat	`loki "What is 2+2?"`	Response printed, exits
3.2	With role	`loki --role coder "hello"`	Role context applied
3.3	With prompt	`loki --prompt "you are a pirate" "hello"`	Temp role applied
3.4	With model	`loki --model <model_id> "hello"`	Uses specified model
3.5	With session	`loki -s test "hello"`	Session created, message saved
3.6	Resume session	`loki -s test "what did I say?"`	Session context preserved
3.7	Dry run	`loki --dry-run "hello"`	Input echoed, no API call
3.8	No stream	`loki --no-stream "hello"`	Response printed all at once
3.9	Empty session	`loki -s test --empty-session "hello"`	Session cleared, fresh start
3.10	Save session	`loki -s test --save-session "hello"`	Forces session save
3.11	Code mode	`loki -c "fibonacci in python"`	Only code output

4. CLI — File Input

#	Test	Command	Expected
4.1	File + text	`loki -f /etc/hostname "summarize"`	File content included
4.2	File only	`loki -f /etc/hostname`	File sent as input
4.3	Multiple files	`loki -f /etc/hostname -f /etc/os-release "compare"`	Both files included
4.4	Stdin pipe	`echo "hello" \| loki "summarize"`	Stdin included

5. CLI — Shell Execute

#	Test	Command	Expected
5.1	Generate command	`loki -e "list files in /tmp"`	Shell command generated
5.2	Describe mode	Press 'd' when prompted	Explanation shown
5.3	Execute mode	Press 'y' when prompted	Command executed
5.4	Dry run	`loki -e --dry-run "list files"`	Input shown, no execution

6. CLI — Agent (non-interactive)

#	Test	Command	Expected
6.1	Agent chat	`loki -a coder "write hello world in python"`	Agent tools available, response
6.2	Agent + session	`loki -a coder -s test "hello"`	Agent with specific session
6.3	Agent variables	`loki -a demo --agent-variable key val "hello"`	Variable injected
6.4	Agent MCP	`loki -a <mcp-agent> "use the server"`	MCP servers start, tools work
6.5	Build tools	`loki -a coder --build-tools`	Tools compiled, exits

7. CLI — Macros

#	Test	Command	Expected
7.1	Execute macro	`loki --macro generate-commit-message`	Macro executes

8. CLI — Vault (early-exit)

#	Test	Command	Expected
8.1	Add secret	`loki --add-secret test-secret`	Prompts for value, saves
8.2	Get secret	`loki --get-secret test-secret`	Prints decrypted value
8.3	List secrets	`loki --list-secrets`	Lists all secret names
8.4	Delete secret	`loki --delete-secret test-secret`	Deletes, confirms

9. REPL — Startup and Exit

#	Test	Steps	Expected
9.1	Start REPL	`loki`	Welcome message shown
9.2	Exit command	Type `.exit`	Clean exit
9.3	Ctrl+D	Press Ctrl+D	Clean exit
9.4	Ctrl+C	Press Ctrl+C	Hint message, stays in REPL
9.5	Prelude role	Set `repl_prelude: "role:coder"` in config, start REPL	Role auto-loaded, prompt changes
9.6	Prelude session	Set `repl_prelude: "mysession:coder"`, start	Session+role auto-loaded

10. REPL — Basic Chat

#	Test	Steps	Expected
10.1	Chat message	Type `hello`	Response streamed
10.2	Continue	Type `.continue` after response	Continuation generated
10.3	Regenerate	Type `.regenerate`	New response generated
10.4	Copy	Type `.copy`	Last response copied to clipboard
10.5	Multi-line	Type `:::`, then multi-line, then `:::`	Multi-line sent as one message
10.6	Empty input	Press Enter on empty line	No action
10.7	Help	Type `.help`	Help text shown
10.8	Info	Type `.info`	System info printed

11. REPL — Roles

#	Test	Steps	Expected
11.1	Enter role	`.role coder`	Prompt changes, role active
11.2	One-shot role	`.role coder write hello world`	Response with role, then returns to no-role
11.3	Role info	`.info role` (while in role)	Role details shown
11.4	Edit role	`.edit role` (while in role)	Editor opens
11.5	Save role	`.save role myname`	Role saved to file
11.6	Exit role	`.exit role`	Prompt resets, role cleared
11.7	Create new role	`.role newname` (non-existent)	Editor opens for new role
11.8	Role + MCP	`.role <mcp-role>`	MCP servers start with spinner, tools available
11.9	Exit role + MCP	`.exit role` (from MCP role)	MCP servers stop, global MCP restored
11.10	Role in session	`.session test` then `.role coder`	Role applied within session

12. REPL — Sessions

#	Test	Steps	Expected
12.1	Temp session	`.session`	Temp session started
12.2	Named session	`.session mytest`	Named session created/resumed
12.3	Session info	`.info session`	Session details shown
12.4	Edit session	`.edit session`	Editor opens
12.5	Save session	`.save session myname`	Session saved
12.6	Empty session	`.empty session`	Messages cleared
12.7	Compress session	`.compress session`	Compression runs with spinner
12.8	Exit session	`.exit session`	Session exited
12.9	Carry-over prompt	Send message, then `.session test`	"incorporate last Q&A?" prompt
12.10	Session + MCP	`.session <mcp-session>`	MCP servers start
12.11	Already in session	`.session` while in session	Error: "Already in a session"

13. REPL — Agents

#	Test	Steps	Expected
13.1	Start agent	`.agent coder`	Tools compiled, prompt changes, agent active
13.2	Agent + session	`.agent coder mysession`	Agent with specific session
13.3	Agent variables	`.agent demo key=value`	Variable set, available in tools
13.4	Agent info	`.info agent`	Agent details shown
13.5	Starter list	`.starter`	Conversation starters listed
13.6	Starter select	`.starter 1`	Starter message sent
13.7	Edit agent config	`.edit agent-config`	Editor opens
13.8	Exit agent	`.exit agent`	Agent cleared, prompt resets
13.9	Agent + MCP	`.agent <mcp-agent>`	MCP servers start, tools available
13.10	MCP disabled	`.agent <mcp-agent>` with mcp_server_support=false	Error, agent blocked [IMPROVED]
13.11	Tool execution	Send message that triggers tool call	Tool executes, result returned
13.12	Global tools	Agent with `global_tools` configured	Global tools available alongside agent tools
13.13	Tool file priority	Delete .ts, have .sh	.sh used [IMPROVED]
13.14	Clear todo	`.clear todo` (in agent with auto-continue)	Todo list cleared
13.15	Auto-continuation	Agent with auto_continue=true, create todos	Agent continues until todos done
13.16	Already in agent	`.agent coder` while agent active	Error: "Already in an agent"

14. REPL — Sub-Agent Spawning and Escalation

#	Test	Steps	Expected
14.1	Spawn sub-agent	Use agent with can_spawn_agents=true, trigger spawn	Sub-agent starts in background
14.2	Check sub-agent	Call agent__check with agent ID	Returns PENDING or result
14.3	Collect sub-agent	Call agent__collect with agent ID	Blocks until done, returns output
14.4	List sub-agents	Call agent__list	Shows all spawned agents + status
14.5	Cancel sub-agent	Call agent__cancel with agent ID	Agent cancelled
14.6	Escalation	Sub-agent calls user__ask	Parent gets notification
14.7	Reply escalation	Parent calls agent__reply_escalation	Sub-agent unblocked
14.8	Max depth	Spawn beyond max_agent_depth	Error: "Max agent depth exceeded"
14.9	Max concurrent	Spawn beyond max_concurrent_agents	Error: capacity reached
14.10	Teammate messaging	Sub-agent sends message to sibling	Message delivered via inbox

15. REPL — RAG

#	Test	Steps	Expected
15.1	Init RAG	`.rag <name>`	RAG initialized/loaded
15.2	RAG info	`.info rag`	RAG details shown
15.3	RAG sources	`.sources rag` (after a query)	Citation sources listed
15.4	Edit RAG docs	`.edit rag-docs`	Editor opens
15.5	Rebuild RAG	`.rebuild rag`	RAG rebuilt
15.6	Exit RAG	`.exit rag`	RAG cleared
15.7	RAG embeddings	Send query with RAG active	Embeddings included in context

16. REPL — MCP Servers

#	Test	Steps	Expected
16.1	Global MCP start	Start REPL with `enabled_mcp_servers` configured	Servers start
16.2	MCP search	LLM calls `mcp__search_<server>`	Tools found and ranked
16.3	MCP describe	LLM calls `mcp__describe_<server>` tool_name	Schema returned
16.4	MCP invoke	LLM calls `mcp__invoke_<server>` tool args	Tool executed, result returned
16.5	Change servers	`.set enabled_mcp_servers <other>`	Old stopped, new started
16.6	Disable MCP	`.set mcp_server_support false`	MCP tools removed
16.7	Enable MCP	`.set mcp_server_support true`	MCP tools restored
16.8	Role MCP switch	Enter role with MCP X, exit, enter role with MCP Y	X stops, Y starts
16.9	Null servers	`.set enabled_mcp_servers null`	All MCP servers stop, tools removed

17. REPL — Settings (.set)

#	Test	Steps	Expected
17.1	Temperature	`.set temperature 0.5`	Temperature changed
17.2	Top-p	`.set top_p 0.9`	Top-p changed
17.3	Model	`.set model <name>`	Model switched
17.4	Dry run	`.set dry_run true`	Dry run enabled
17.5	Stream	`.set stream false`	Streaming disabled
17.6	Save	`.set save false`	Auto-save disabled
17.7	Highlight	`.set highlight false`	Syntax highlighting disabled
17.8	Save session	`.set save_session true`	Session auto-save enabled
17.9	Null value	`.set temperature null`	Temperature reset to default
17.10	Compression threshold	`.set compression_threshold 2000`	Threshold changed
17.11	Max output tokens	`.set max_output_tokens 4096`	Max tokens set
17.12	Enabled tools	`.set enabled_tools all`	All tools enabled
17.13	Function calling	`.set function_calling_support false`	Function calling disabled

18. REPL — Tab Completion

#	Test	Steps	Expected
18.1	Role completion	`.role<TAB>`	Shows role names
18.2	Agent completion	`.agent<TAB>`	Shows agent names (no .shared) [IMPROVED]
18.3	Session completion	`.session<TAB>`	Shows session names
18.4	RAG completion	`.rag<TAB>`	Shows RAG names
18.5	Macro completion	`.macro<TAB>`	Shows macro names
18.6	Model completion	`.model<TAB>`	Shows model names with descriptions
18.7	Set keys	`.set <TAB>`	Shows all setting names
18.8	Set values	`.set temperature <TAB>`	Shows current/suggested value
18.9	Enabled tools	`.set enabled_tools <TAB>`	Shows tools (no user__/mcp_/todo__/agent__) [IMPROVED]
18.10	MCP servers	`.set enabled_mcp_servers <TAB>`	Shows configured servers + mappings [IMPROVED]
18.11	Delete types	`.delete <TAB>`	Shows: role, session, rag, macro, agent-data
18.12	Vault cmds	`.vault <TAB>`	Shows: add, get, update, delete, list

19. REPL — Delete

#	Test	Steps	Expected
19.1	Delete role	`.delete role`	Shows role picker, deletes selected
19.2	Delete session	`.delete session`	Shows session picker, deletes
19.3	Delete RAG	`.delete rag`	Shows RAG picker, deletes
19.4	Delete macro	`.delete macro`	Shows macro picker, deletes
19.5	Delete agent data	`.delete agent-data`	Shows agent picker, deletes data

20. REPL — Vault

#	Test	Steps	Expected
20.1	Add secret	`.vault add mysecret`	Prompts for value, saves
20.2	Get secret	`.vault get mysecret`	Prints decrypted value
20.3	Update secret	`.vault update mysecret`	Prompts for new value
20.4	Delete secret	`.vault delete mysecret`	Deletes
20.5	List secrets	`.vault list`	Lists all secret names

21. REPL — Macros and File

#	Test	Steps	Expected
21.1	Execute macro	`.macro generate-commit-message`	Macro runs
21.2	Create macro	`.macro newname` (non-existent)	Editor opens
21.3	File include	`.file /etc/hostname -- summarize this`	File included, query sent
21.4	URL include	`.file https://example.com -- summarize`	URL fetched, content included

22. REPL — Edit Commands

#	Test	Steps	Expected
22.1	Edit config	`.edit config`	Config file opens in editor
22.2	Edit role	`.edit role` (in role)	Role file opens in editor
22.3	Edit session	`.edit session` (in session)	Session file opens in editor
22.4	Edit agent config	`.edit agent-config` (in agent)	Agent config opens in editor
22.5	Edit RAG docs	`.edit rag-docs` (in RAG)	RAG docs opens in editor

23. Session Compression and Autoname

#	Test	Steps	Expected
23.1	Auto-compress	Set low compression_threshold, send many messages	"Compressing the session." shown
23.2	Manual compress	`.compress session`	Compression runs with spinner
23.3	Auto-name	Start temp session, send messages	Session auto-named

24. Error Handling

#	Test	Steps	Expected
24.1	Invalid role	`.role nonexistent_role_xxxxxxx`	Error shown, REPL continues
24.2	Invalid model	`.set model nonexistent_model`	Error shown, REPL continues
24.3	No session active	`.info session` (no session)	Error or empty
24.4	No agent active	`.info agent` (no agent)	Error or empty
24.5	Already in session	`.session` then `.session` again	Error: "Already in a session"
24.6	Already in agent	`.agent coder` then `.agent coder`	Error: "Already in an agent"
24.7	Unknown command	`.nonexistent`	Error message shown
24.8	Tool failure	Trigger tool that fails	Error returned to LLM as tool result

25. MCP Lifecycle State Transitions (Critical)

These test the most bug-prone area of the migration.

#	Test	Steps	Expected
25.1	Role A→B MCP swap	Enter role with MCP-A, exit, enter role with MCP-B	A stops, B starts, B tools work
25.2	Role MCP→no MCP	Enter role with MCP, exit role	MCP stops, global MCP restored
25.3	No MCP→Role MCP	Start REPL (no MCP), enter role with MCP	MCP starts, tools work
25.4	Agent MCP lifecycle	Start agent with MCP, use tools, exit agent	Agent MCP starts, works, stops on exit
25.5	Session MCP	Start session with MCP config	MCP starts for session
25.6	Global→Agent→Global	Start with global MCP-A, enter agent with MCP-B, exit agent	A→B→A transitions clean
25.7	MCP mapping resolution	Role has `enabled_mcp_servers: alias`, mapping configured	Alias resolved, correct servers start
25.8	MCP disabled + agent	Agent requires MCP, mcp_server_support=false	Error blocks agent start [IMPROVED]

Intentional Improvements (NEW ≠ OLD, by design)

#	What changed	Old behavior	New behavior
I.1	Agent list hides `.shared`	`.shared` shown in completions	`.shared` hidden
I.2	Tool file priority	Filesystem order (non-deterministic)	Priority: .sh > .py > .ts > .js
I.3	MCP disabled + agent	Warning printed, agent starts anyway	Error, agent blocked
I.4	Role MCP disabled warning	Warning always shown (even if role has no MCP)	Warning only when role actually has MCP
I.5	Enabled tools completions	Shows internal tools (user__, mcp_, etc.)	Internal tools hidden
I.6	MCP server completions	Only mapping aliases	Both configured servers + aliases

Test Execution Notes

Run tests in order — some depend on state from previous tests (e.g., session tests create sessions that later tests reference)
For MCP tests, ensure at least one MCP server is configured in ~/.config/loki/functions/mcp.json
For agent tests, use built-in agents (coder, demo, explore)
For sub-agent tests, use the sisyphus agent (has can_spawn_agents)
For RAG tests, configure a RAG with test documents
For vault tests, use temporary secret names to avoid polluting the real vault
Compare error messages between old and new — they may differ slightly in wording but should convey the same meaning

18 KiB Raw Blame History