testing

Merge branch 'tree-sitter-tools' into 'develop'
feat: Automatic runtime customization using shebangs
2026-04-10 15:45:51 -06:00 · 2026-04-09 14:48:22 -06:00 · 2026-04-09 14:16:02 -06:00 · 2026-04-09 13:53:52 -06:00 · 2026-04-09 13:45:08 -06:00 · 2026-04-09 13:32:16 -06:00
113 changed files with 17037 additions and 2209 deletions
@@ -0,0 +1,11 @@
 ### AI assistance (if any):
 - List tools here and files touched by them
 ### Authorship & Understanding
 - [ ] I wrote or heavily modified this code myself
 - [ ] I understand how it works end-to-end
 - [ ] I can maintain this code in the future
 - [ ] No undisclosed AI-generated code was used
 - [ ] If AI assistance was used, it is documented below
@@ -1,3 +1,74 @@
 ## v0.3.0 (2026-04-02)
 ### Feat
 - Added `todo__clear` function to the todo system and updated REPL commands to have a .clear todo as well for significant changes in agent direction
 - Added available tools to prompts for sisyphus and code-reviewer agent families
 - Added available tools to coder prompt
 - Improved token efficiency when delegating from sisyphus -> coder
 - modified sisyphus agents to use the new ddg-search MCP server for web searches instead of built-in model searches
 - Added support for specifying a custom response to multiple-choice prompts when nothing suits the user's needs
 - Supported theming in the inquire prompts in the REPL
 - Added the duckduckgo-search MCP server for searching the web (in addition to the built-in tools for web searches)
 - Support for Gemini OAuth
 - Support authenticating or refreshing OAuth for supported clients from within the REPL
 - Allow first-runs to select OAuth for supported providers
 - Support OAuth authentication flows for Claude
 - Improved MCP server spinup and spindown when switching contexts or settings in the REPL: Modify existing config rather than stopping all servers always and re-initializing if unnecessary
 - Allow the explore agent to run search queries for understanding docs or API specs
 - Allow the oracle to perform web searches for deeper research
 - Added web search support to the main sisyphus agent to answer user queries
 - Created a CodeRabbit-style code-reviewer agent
 - Added configuration option in agents to indicate the timeout for user input before proceeding (defaults to 5 minutes)
 - Added support for sub-agents to escalate user interaction requests from any depth to the parent agents for user interactions
 - built-in user interaction tools to remove the need for the list/confirm/etc prompts in prompt tools and to enhance user interactions in Loki
 - Experimental update to sisyphus to use the new parallel agent spawning system
 - Added an agent configuration property that allows auto-injecting sub-agent spawning instructions (when using the built-in sub-agent spawning system)
 - Auto-dispatch support of sub-agents and support for the teammate pattern between subagents
 - Full passive task queue integration for parallelization of subagents
 - Implemented initial scaffolding for built-in sub-agent spawning tool call operations
 - Initial models for agent parallelization
 - Added interactive prompting between the LLM and the user in Sisyphus using the built-in Bash utils scripts
 ### Fix
 - Clarified user text input interaction
 - recursion bug with similarly named Bash search functions in the explore agent
 - updated the error for unauthenticated oauth to include the REPL .authenticated command
 - Corrected a bug in the coder agent that wasn't outputting a summary of the changes made, so the parent Sisyphus agent has no idea if the agent worked or not
 - Claude code system prompt injected into claude requests to make them valid once again
 - Do not inject tools when models don't support them; detect this conflict before API calls happen
 - The REPL .authenticate command works from within sessions, agents, and roles with pre-configured models
 - Implemented the path normalization fix for the oracle and explore agents
 - Updated the atlassian MCP server endpoint to account for future deprecation
 - Fixed a bug in the coder agent that was causing the agent to create absolute paths from the current directory
 - the updated regex for secrets injection broke MCP server secrets interpolation because the regex greedily matched on new lines, replacing too much content. This fix just ignores commented out lines in YAML files by skipping commented out lines.
 - Don't try to inject secrets into commented-out lines in the config
 - Removed top_p parameter from some agents so they can work across model providers
 - Improved sub-agent stdout and stderr output for users to follow
 - Inject agent variables into environment variables for global tool calls when invoked from agents to modify global tool behavior
 - Removed the unnecessary execute_commands tool from the oracle agent
 - Added auto_confirm to the coder agent so sub-agent spawning doesn't freeze
 - Fixed a bug in the new supervisor and todo built-ins that was causing errors with OpenAI models
 - Added condition to sisyphus to always output a summary to clearly indicate completion
 - Updated the sisyphus prompt to explicitly tell it to delegate to the coder agent when it wants to write any code at all except for trivial changes
 - Added back in the auto_confirm variable into sisyphus
 - Removed the now unnecessary is_stale_response that was breaking auto-continuing with parallel agents
 - Bypassed enabled_tools for user interaction tools so if function calling is enabled at all, the LLM has access to the user interaction tools when in REPL mode
 - When parallel agents run, only write to stdout from the parent and only display the parent's throbber
 - Forgot to implement support for failing a task and keep all dependents blocked
 - Clean up orphaned sub-agents when the parent agent
 - Fixed the bash prompt utils so that they correctly show output when being run by a tool invocation
 - Forgot to automatically add the bidirectional communication back up to parent agents from sub-agents (i.e. need to be able to check inbox and send messages)
 - Agent delegation tools were not being passed into the {{__tools__}} placeholder so agents weren't delegating to subagents
 ### Refactor
 - Made the oauth module more generic so it can support loopback OAuth (not just manual)
 - Changed the default session name for Sisyphus to temp (to require users to explicitly name sessions they wish to save)
 - Updated the sisyphus agent to use the built-in user interaction tools instead of custom bash-based tools
 - Cleaned up some left-over implementation stubs
 ## v0.2.0 (2026-02-14)
 ### Feat
@@ -76,6 +76,13 @@ Then, you can run workflows locally without having to commit and see if the GitH
 act -W .github/workflows/release.yml --input_type bump=minor
 ```
 ## Authorship Policy
 All code in this repository is written and reviewed by humans. AI-generated code (e.g., Copilot, ChatGPT,
 Claude, etc.) is not permitted unless explicitly disclosed and approved.
 Submissions must certify that the contributor understands and can maintain the code they submit.
 ## Questions? Reach out to me!
 If you encounter any questions while developing Loki, please don't hesitate to reach out to me at 
 alex.j.tusa@gmail.com. I'm happy to help contributors in any way I can, regardless of if they're new or experienced!
@@ -1,6 +1,6 @@
 [package]
 name = "loki-ai"
-version = "0.2.0"
+version = "0.3.0"
 edition = "2024"
 authors = ["Alex Clarke <alex.j.tusa@gmail.com>"]
 description = "An all-in-one, batteries included LLM CLI Tool"
@@ -18,10 +18,11 @@ anyhow = "1.0.69"
 bytes = "1.4.0"
 clap = { version = "4.5.40", features = ["cargo", "derive", "wrap_help"] }
 dirs = "6.0.0"
 dunce = "1.0.5"
 futures-util = "0.3.29"
-inquire = "0.7.0"
+inquire = "0.9.4"
 is-terminal = "0.4.9"
-reedline = "0.40.0"
+reedline = "0.46.0"
 serde = { version = "1.0.152", features = ["derive"] }
 serde_json = { version = "1.0.93", features = ["preserve_order"] }
 serde_yaml = "0.9.17"
@@ -37,7 +38,7 @@ tokio-graceful = "0.2.2"
 tokio-stream = { version = "0.1.15", default-features = false, features = [
  "sync",
 ] }
-crossterm = "0.28.1"
+crossterm = "0.29.0"
 chrono = "0.4.23"
 bincode = { version = "2.0.0", features = [
  "serde",
@@ -90,12 +91,17 @@ strum_macros = "0.27.2"
 indoc = "2.0.6"
 rmcp = { version = "0.16.0", features = ["client", "transport-child-process"] }
 num_cpus = "1.17.0"
-rustpython-parser = "0.4.0"
+tree-sitter = "0.26.8"
-rustpython-ast = "0.4.0"
+tree-sitter-language = "0.1"
 tree-sitter-python = "0.25.0"
 tree-sitter-typescript = "0.23"
 colored = "3.0.0"
 clap_complete = { version = "4.5.58", features = ["unstable-dynamic"] }
-gman = "0.3.0"
+gman = "0.4.1"
 clap_complete_nushell = "4.5.9"
 open = "5"
 rand = { version = "0.10.0", features = ["default"] }
 url = "2.5.8"
 [dependencies.reqwest]
 version = "0.12.0"
@@ -126,7 +132,6 @@ arboard = { version = "3.3.0", default-features = false }
 [dev-dependencies]
 pretty_assertions = "1.4.0"
 rand = "0.9.0"
 [[bin]]
 name = "loki"
@@ -28,6 +28,7 @@ Coming from [AIChat](https://github.com/sigoden/aichat)? Follow the [migration g
 * [Function Calling](./docs/function-calling/TOOLS.md#Tools): Leverage function calling capabilities to extend Loki's functionality with custom tools
    * [Creating Custom Tools](./docs/function-calling/CUSTOM-TOOLS.md): You can create your own custom tools to enhance Loki's capabilities.
        * [Create Custom Python Tools](./docs/function-calling/CUSTOM-TOOLS.md#custom-python-based-tools)
        * [Create Custom TypeScript Tools](./docs/function-calling/CUSTOM-TOOLS.md#custom-typescript-based-tools)
        * [Create Custom Bash Tools](./docs/function-calling/CUSTOM-BASH-TOOLS.md)
            * [Bash Prompt Utilities](./docs/function-calling/BASH-PROMPT-HELPERS.md)
 * [First-Class MCP Server Support](./docs/function-calling/MCP-SERVERS.md): Easily connect and interact with MCP servers for advanced functionality.
@@ -39,6 +40,7 @@ Coming from [AIChat](https://github.com/sigoden/aichat)? Follow the [migration g
    * [Todo System](./docs/TODO-SYSTEM.md): Built-in task tracking for improved agent reliability with smaller models.
 * [Environment Variables](./docs/ENVIRONMENT-VARIABLES.md): Override and customize your Loki configuration at runtime with environment variables.
 * [Client Configurations](./docs/clients/CLIENTS.md): Configuration instructions for various LLM providers.
    * [Authentication (API Key & OAuth)](./docs/clients/CLIENTS.md#authentication): Authenticate with API keys or OAuth for subscription-based access.
    * [Patching API Requests](./docs/clients/PATCHES.md): Learn how to patch API requests for advanced customization.
 * [Custom Themes](./docs/THEMES.md): Change the look and feel of Loki to your preferences with custom themes.
 * [History](#history): A history of how Loki came to be.
@@ -150,6 +152,26 @@ guide you through the process when you first attempt to access the vault. So, to
 loki --list-secrets
 ```
 ### Authentication
 Each client in your configuration needs authentication (with a few exceptions; e.g. ollama). Most clients use an API key
 (set via `api_key` in the config or through the [vault](./docs/VAULT.md)). For providers that support OAuth (e.g. Claude Pro/Max 
 subscribers, Google Gemini), you can authenticate with your existing subscription instead:
 ```yaml
 # In your config.yaml
 clients:
  - type: claude
    name: my-claude-oauth
    auth: oauth # Indicate you want to authenticate with OAuth instead of an API key
 ```
 ```sh
 loki --authenticate my-claude-oauth
 # Or via the REPL: .authenticate
 ```
 For full details, see the [authentication documentation](./docs/clients/CLIENTS.md#authentication).
 ### Tab-Completions
 You can also enable tab completions to make using Loki easier. To do so, add the following to your shell profile:
 ```shell
@@ -2,68 +2,6 @@
 # Shared Agent Utilities - Minimal, focused helper functions
 set -euo pipefail
 #############################
 ## CONTEXT FILE MANAGEMENT ##
 #############################
 get_context_file() {
  local project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
  echo "${project_dir}/.loki-context"
 }
 # Initialize context file for a new task
 # Usage: init_context "Task description"
 init_context() {
  local task="$1"
  local project_dir="${LLM_AGENT_VAR_PROJECT_DIR:-.}"
  local context_file
  context_file=$(get_context_file)
  cat > "${context_file}" <<EOF
 ## Project: ${project_dir}
 ## Task: ${task}
 ## Started: $(date -Iseconds)
 ### Prior Findings
 EOF
 }
 # Append findings to the context file
 # Usage: append_context "agent_name" "finding summary
 append_context() {
  local agent="$1"
  local finding="$2"
  local context_file
  context_file=$(get_context_file)
  if [[ -f "${context_file}" ]]; then
    {
      echo ""
      echo "[${agent}]:"
      echo "${finding}"
    } >> "${context_file}"
  fi
 }
 # Read the current context (returns empty string if no context)
 # Usage: context=$(read_context)
 read_context() {
  local context_file
  context_file=$(get_context_file)
  if [[ -f "${context_file}" ]]; then
    cat "${context_file}"
  fi
 }
 # Clear the context file
 clear_context() {
  local context_file
  context_file=$(get_context_file)
  rm -f "${context_file}"
 }
 #######################
 ## PROJECT DETECTION ##
 #######################
@@ -279,9 +217,9 @@ _detect_with_llm() {
  evidence=$(_gather_project_evidence "${dir}")
  local prompt
  prompt=$(cat <<-EOF
-		
+
 		Analyze this project directory and determine the project type, primary language, and the correct shell commands to build, test, and check (lint/typecheck) it.
-		
+
 		EOF
 	)
  prompt+=$'\n'"${evidence}"$'\n'
@@ -348,77 +286,11 @@ detect_project() {
  echo '{"type":"unknown","build":"","test":"","check":""}'
 }
 ######################
 ## AGENT INVOCATION ##
 ######################
 # Invoke a subagent with optional context injection
 # Usage: invoke_agent <agent_name> <prompt> [extra_args...]
 invoke_agent() {
  local agent="$1"
  local prompt="$2"
  shift 2
  local context
  context=$(read_context)
  local full_prompt
  if [[ -n "${context}" ]]; then
    full_prompt="## Orchestrator Context
 The orchestrator (sisyphus) has gathered this context from prior work:
 <context>
 ${context}
 </context>
 ## Your Task
 ${prompt}"
  else
    full_prompt="${prompt}"
  fi
  env AUTO_CONFIRM=true loki --agent "${agent}" "$@" "${full_prompt}" 2>&1
 }
 # Invoke a subagent and capture a summary of its findings
 # Usage: result=$(invoke_agent_with_summary "explore" "find auth patterns")
 invoke_agent_with_summary() {
  local agent="$1"
  local prompt="$2"
  shift 2
  local output
  output=$(invoke_agent "${agent}" "${prompt}" "$@")
  local summary=""
  if echo "${output}" | grep -q "FINDINGS:"; then
    summary=$(echo "${output}" | sed -n '/FINDINGS:/,/^[A-Z_]*COMPLETE/p' | grep "^- " | sed 's/^- /  - /')
  elif echo "${output}" | grep -q "CODER_COMPLETE:"; then
    summary=$(echo "${output}" | grep "CODER_COMPLETE:" | sed 's/CODER_COMPLETE: *//')
  elif echo "${output}" | grep -q "ORACLE_COMPLETE"; then
    summary=$(echo "${output}" | sed -n '/^## Recommendation/,/^## /{/^## Recommendation/d;/^## /d;p}' | sed '/^$/d' | head -10)
  fi
  # Failsafe: extract up to 5 meaningful lines if no markers found
  if [[ -z "${summary}" ]]; then
    summary=$(echo "${output}" | grep -v "^$" | grep -v "^#" | grep -v "^\-\-\-" | tail -10 | head -5)
  fi
  if [[ -n "${summary}" ]]; then
    append_context "${agent}" "${summary}"
  fi
  echo "${output}"
 }
 ###########################
 ## FILE SEARCH UTILITIES ##
 ###########################
-search_files() {
+_search_files() {
  local pattern="$1"
  local dir="${2:-.}"
@@ -0,0 +1,36 @@
 # Code Reviewer
 A CodeRabbit-style code review orchestrator that coordinates per-file reviews and synthesizes findings into a unified 
 report.
 This agent acts as the manager for the review process, delegating actual file analysis to **[File Reviewer](../file-reviewer/README.md)** 
 agents while handling coordination and final reporting.
 ## Features
 - 🤖 **Orchestration**: Spawns parallel reviewers for each changed file.
 - 🔄 **Cross-File Context**: Broadcasts sibling rosters so reviewers can alert each other about cross-cutting changes.
 - 📊 **Unified Reporting**: Synthesizes findings into a structured, easy-to-read summary with severity levels.
 - ⚡ **Parallel Execution**: Runs reviews concurrently for maximum speed.
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
 an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
 server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
 them), and modify the agent definition to look like this:
 ```yaml
 # ...
 mcp_servers:
  - jetbrains # The name of your configured IDE MCP server
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
 #  - execute_command.sh
 # ...
 ```
@@ -2,7 +2,6 @@ name: code-reviewer
 description: CodeRabbit-style code reviewer - spawns per-file reviewers, synthesizes findings
 version: 1.0.0
 temperature: 0.1
 top_p: 0.95
 auto_continue: true
 max_auto_continues: 20
@@ -123,3 +122,6 @@ instructions: |
  - Project: {{project_dir}}
  - CWD: {{__cwd__}}
  - Shell: {{__shell__}}
  ## Available Tools:
  {{__tools__}}
@@ -2,7 +2,7 @@
 An AI agent that assists you with your coding tasks.
-This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent to implement code specifications. Sisyphus 
+This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent to implement code specifications. Sisyphus
 acts as the coordinator/architect, while Coder handles the implementation details.
 ## Features
@@ -13,4 +13,28 @@ acts as the coordinator/architect, while Coder handles the implementation detail
 - 🧐 Advanced code analysis and improvement suggestions
 - 📊 Precise diff-based file editing for controlled code modifications
-It can also be used as a standalone tool for direct coding assistance.
+It can also be used as a standalone tool for direct coding assistance.
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
 an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
 server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
 them), and modify the agent definition to look like this:
 ```yaml
 # ...
 mcp_servers:
  - jetbrains # The name of your configured IDE MCP server
 global_tools:
  # Keep useful read-only tools for reading files in other non-project directories
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
 #  - fs_write.sh
 #  - fs_patch.sh
  - execute_command.sh
 # ...
 ```
@@ -2,7 +2,6 @@ name: coder
 description: Implementation agent - writes code, follows patterns, verifies with builds
 version: 1.0.0
 temperature: 0.1
 top_p: 0.95
 auto_continue: true
 max_auto_continues: 15
@@ -30,11 +29,30 @@ instructions: |
  ## Your Mission
  Given an implementation task:
-  1. Understand what to build (from context provided)
+  1. Check for orchestrator context first (see below)
-  2. Study existing patterns (read 1-2 similar files)
+  2. Fill gaps only. Read files NOT already covered in context
  3. Write the code (using tools, NOT chat output)
  4. Verify it compiles/builds
-  5. Signal completion
+  5. Signal completion with a summary
  ## Using Orchestrator Context (IMPORTANT)
  When spawned by sisyphus, your prompt will often contain a `<context>` block
  with prior findings: file paths, code patterns, and conventions discovered by
  explore agents.
  **If context is provided:**
  1. Use it as your primary reference. Don't re-read files already summarized
  2. Follow the code patterns shown. Snippets in context ARE the style guide
  3. Read the referenced files ONLY IF you need more detail (e.g. full function
     signature, import list, or adjacent code not included in the snippet)
  4. If context includes a "Conventions" section, follow it exactly
  **If context is NOT provided or is too vague to act on:**
  Fall back to self-exploration: grep for similar files, read 1-2 examples,
  match their style.
  **Never ignore provided context.** It represents work already done upstream.
  ## Todo System
@@ -83,12 +101,13 @@ instructions: |
  ## Completion Signal
-  End with:
+  When done, end your response with a summary so the parent agent knows what happened:
  ```
-  CODER_COMPLETE: [summary of what was implemented]
+  CODER_COMPLETE: [summary of what was implemented, which files were created/modified, and build status]
  ```
-  Or if failed:
+  Or if something went wrong:
  ```
  CODER_FAILED: [what went wrong]
  ```
@@ -105,4 +124,6 @@ instructions: |
  - Project: {{project_dir}}
  - CWD: {{__cwd__}}
  - Shell: {{__shell__}}
-
+  
  ## Available tools:
  {{__tools__}}
@@ -14,11 +14,28 @@ _project_dir() {
  (cd "${dir}" 2>/dev/null && pwd) || echo "${dir}"
 }
 # Normalize a path to be relative to project root.
 # Strips the project_dir prefix if the LLM passes an absolute path.
 # Usage: local rel_path; rel_path=$(_normalize_path "/abs/or/rel/path")
 _normalize_path() {
  local input_path="$1"
  local project_dir
  project_dir=$(_project_dir)
  if [[ "${input_path}" == /* ]]; then
    input_path="${input_path#"${project_dir}"/}"
  fi
  input_path="${input_path#./}"
  echo "${input_path}"
 }
 # @cmd Read a file's contents before modifying
 # @option --path! Path to the file (relative to project root)
 read_file() {
  local file_path
 	# shellcheck disable=SC2154
-  local file_path="${argc_path}"
+  file_path=$(_normalize_path "${argc_path}")
  local project_dir
  project_dir=$(_project_dir)
  local full_path="${project_dir}/${file_path}"
@@ -39,7 +56,8 @@ read_file() {
 # @option --path! Path for the file (relative to project root)
 # @option --content! Complete file contents to write
 write_file() {
-  local file_path="${argc_path}"
+  local file_path
  file_path=$(_normalize_path "${argc_path}")
  # shellcheck disable=SC2154
  local content="${argc_content}"
  local project_dir
@@ -47,7 +65,7 @@ write_file() {
  local full_path="${project_dir}/${file_path}"
  mkdir -p "$(dirname "${full_path}")"
-  echo "${content}" > "${full_path}"
+  printf '%s' "${content}" > "${full_path}"
  green "Wrote: ${file_path}" >> "$LLM_OUTPUT"
 }
@@ -55,7 +73,8 @@ write_file() {
 # @cmd Find files similar to a given path (for pattern matching)
 # @option --path! Path to find similar files for
 find_similar_files() {
-  local file_path="${argc_path}"
+  local file_path
  file_path=$(_normalize_path "${argc_path}")
  local project_dir
  project_dir=$(_project_dir)
@@ -71,14 +90,14 @@ find_similar_files() {
    ! -name "$(basename "${file_path}")" \
    ! -name "*test*" \
    ! -name "*spec*" \
-    2>/dev/null | head -3)
+    2>/dev/null | sed "s|^${project_dir}/||" | head -3)
  if [[ -z "${results}" ]]; then
    results=$(find "${project_dir}/src" -type f -name "*.${ext}" \
      ! -name "*test*" \
      ! -name "*spec*" \
      -not -path '*/target/*' \
-      2>/dev/null | head -3)
+      2>/dev/null | sed "s|^${project_dir}/||" | head -3)
  fi
  if [[ -n "${results}" ]]; then
@@ -186,6 +205,7 @@ search_code() {
    grep -v '/target/' | \
    grep -v '/node_modules/' | \
    grep -v '/.git/' | \
    sed "s|^${project_dir}/||" | \
    head -20) || true
  if [[ -n "${results}" ]]; then
@@ -2,7 +2,7 @@
 An AI agent specialized in exploring codebases, finding patterns, and understanding project structures.
-This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent to gather information and context. Sisyphus 
+This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent to gather information and context. Sisyphus
 acts as the coordinator/architect, while Explore handles the research and discovery phase.
 It can also be used as a standalone tool for understanding codebases and finding specific information.
@@ -13,3 +13,25 @@ It can also be used as a standalone tool for understanding codebases and finding
 - 📂 File system navigation and content analysis
 - 🧠 Context gathering for complex tasks
 - 🛡️ Read-only operations for safe investigation
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
 an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
 server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
 them), and modify the agent definition to look like this:
 ```yaml
 # ...
 mcp_servers:
  - jetbrains # The name of your configured IDE MCP server
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - web_search_loki.sh
 # ...
 ```
@@ -2,19 +2,19 @@ name: explore
 description: Fast codebase exploration agent - finds patterns, structures, and relevant files
 version: 1.0.0
 temperature: 0.1
 top_p: 0.95
 variables:
  - name: project_dir
    description: Project directory to explore
    default: '.'
 mcp_servers:
  - ddg-search
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - web_search_loki.sh
 instructions: |
  You are a codebase explorer. Your job: Search, find, report. Nothing else.
@@ -68,6 +68,9 @@ instructions: |
  ## Context
  - Project: {{project_dir}}
  - CWD: {{__cwd__}}
  ## Available Tools:
  {{__tools__}}
 conversation_starters:
  - 'Find how authentication is implemented'
@@ -14,6 +14,21 @@ _project_dir() {
  (cd "${dir}" 2>/dev/null && pwd) || echo "${dir}"
 }
 # Normalize a path to be relative to project root.
 # Strips the project_dir prefix if the LLM passes an absolute path.
 _normalize_path() {
  local input_path="$1"
  local project_dir
  project_dir=$(_project_dir)
  if [[ "${input_path}" == /* ]]; then
    input_path="${input_path#"${project_dir}"/}"
  fi
  input_path="${input_path#./}"
  echo "${input_path}"
 }
 # @cmd Get project structure and layout
 get_structure() {
  local project_dir
@@ -45,7 +60,7 @@ search_files() {
  echo "" >> "$LLM_OUTPUT"
  local results
-  results=$(search_files "${pattern}" "${project_dir}")
+  results=$(_search_files "${pattern}" "${project_dir}")
  if [[ -n "${results}" ]]; then
    echo "${results}" >> "$LLM_OUTPUT"
@@ -78,6 +93,7 @@ search_content() {
    grep -v '/node_modules/' | \
    grep -v '/.git/' | \
    grep -v '/dist/' | \
    sed "s|^${project_dir}/||" | \
    head -30) || true
  if [[ -n "${results}" ]]; then
@@ -91,8 +107,9 @@ search_content() {
 # @option --path! Path to the file (relative to project root)
 # @option --lines Maximum lines to read (default: 200)
 read_file() {
  local file_path
 	# shellcheck disable=SC2154
-  local file_path="${argc_path}"
+  file_path=$(_normalize_path "${argc_path}")
  local max_lines="${argc_lines:-200}"
  local project_dir
  project_dir=$(_project_dir)
@@ -122,7 +139,8 @@ read_file() {
 # @cmd Find similar files to a given file (for pattern matching)
 # @option --path! Path to the reference file
 find_similar() {
-  local file_path="${argc_path}"
+  local file_path
  file_path=$(_normalize_path "${argc_path}")
  local project_dir
  project_dir=$(_project_dir)
@@ -138,7 +156,7 @@ find_similar() {
    ! -name "$(basename "${file_path}")" \
    ! -name "*test*" \
    ! -name "*spec*" \
-    2>/dev/null | head -5)
+    2>/dev/null | sed "s|^${project_dir}/||" | head -5)
  if [[ -n "${results}" ]]; then
    echo "${results}" >> "$LLM_OUTPUT"
@@ -147,7 +165,7 @@ find_similar() {
      ! -name "$(basename "${file_path}")" \
      ! -name "*test*" \
      -not -path '*/target/*' \
-      2>/dev/null | head -5)
+      2>/dev/null | sed "s|^${project_dir}/||" | head -5)
    if [[ -n "${results}" ]]; then
      echo "${results}" >> "$LLM_OUTPUT"
    else
@@ -0,0 +1,35 @@
 # File Reviewer
 A specialized worker agent that reviews a single file's diff for bugs, style issues, and cross-cutting concerns.
 This agent is designed to be spawned by the **[Code Reviewer](../code-reviewer/README.md)** agent. It focuses deeply on 
 one file while communicating with sibling agents to catch issues that span multiple files.
 ## Features
 - 🔍 **Deep Analysis**: Focuses on bugs, logic errors, security issues, and style problems in a single file.
 - 🗣️ **Teammate Communication**: Sends and receives alerts to/from sibling reviewers about interface or dependency 
  changes.
 - 🎯 **Targeted Reading**: Reads only relevant context around changed lines to stay efficient.
 - 🏷️ **Structured Findings**: Categorizes issues by severity (🔴 Critical, 🟡 Warning, 🟢 Suggestion, 💡 Nitpick).
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
 an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
 server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
 them), and modify the agent definition to look like this:
 ```yaml
 # ...
 mcp_servers:
  - jetbrains # The name of your configured IDE MCP server
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
 # ...
 ```
@@ -2,7 +2,6 @@ name: file-reviewer
 description: Reviews a single file's diff for bugs, style issues, and cross-cutting concerns
 version: 1.0.0
 temperature: 0.1
 top_p: 0.95
 variables:
  - name: project_dir
@@ -109,3 +108,6 @@ instructions: |
  ## Context
  - Project: {{project_dir}}
  - CWD: {{__cwd__}}
  ## Available Tools:
  {{__tools__}}
@@ -2,7 +2,7 @@
 An AI agent specialized in high-level architecture, complex debugging, and design decisions.
-This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent when deep reasoning, architectural advice, 
+This agent is designed to be delegated to by the **[Sisyphus](../sisyphus/README.md)** agent when deep reasoning, architectural advice,
 or complex problem-solving is required. Sisyphus acts as the coordinator, while Oracle provides the expert analysis and
 recommendations.
@@ -15,3 +15,25 @@ It can also be used as a standalone tool for design reviews and solving difficul
 - ⚖️ Tradeoff analysis and technology selection
 - 📝 Code review and best practices advice
 - 🧠 Deep reasoning for ambiguous problems
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
 an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
 server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
 them), and modify the agent definition to look like this:
 ```yaml
 # ...
 mcp_servers:
  - jetbrains # The name of your configured IDE MCP server
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - web_search_loki.sh
 # ...
 ```
@@ -2,19 +2,19 @@ name: oracle
 description: High-IQ advisor for architecture, debugging, and complex decisions
 version: 1.0.0
 temperature: 0.2
 top_p: 0.95
 variables:
  - name: project_dir
    description: Project directory for context
    default: '.'
 mcp_servers:
  - ddg-search
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - web_search_loki.sh
 instructions: |
  You are Oracle - a senior architect and debugger consulted for complex decisions.
@@ -75,6 +75,9 @@ instructions: |
  ## Context
  - Project: {{project_dir}}
  - CWD: {{__cwd__}}
  ## Available Tools:
  {{__tools__}}
 conversation_starters:
  - 'Review this architecture design'
@@ -14,21 +14,38 @@ _project_dir() {
  (cd "${dir}" 2>/dev/null && pwd) || echo "${dir}"
 }
 # Normalize a path to be relative to project root.
 # Strips the project_dir prefix if the LLM passes an absolute path.
 _normalize_path() {
  local input_path="$1"
  local project_dir
  project_dir=$(_project_dir)
  if [[ "${input_path}" == /* ]]; then
    input_path="${input_path#"${project_dir}"/}"
  fi
  input_path="${input_path#./}"
  echo "${input_path}"
 }
 # @cmd Read a file for analysis
 # @option --path! Path to the file (relative to project root)
 read_file() {
  local project_dir
  project_dir=$(_project_dir)
  local file_path
  # shellcheck disable=SC2154
-  local full_path="${project_dir}/${argc_path}"
+  file_path=$(_normalize_path "${argc_path}")
  local full_path="${project_dir}/${file_path}"
  if [[ ! -f "${full_path}" ]]; then
-    error "File not found: ${argc_path}" >> "$LLM_OUTPUT"
+    error "File not found: ${file_path}" >> "$LLM_OUTPUT"
    return 1
  fi
  {
-  	info "Reading: ${argc_path}"
+  	info "Reading: ${file_path}"
  	echo ""
  	cat "${full_path}"
  } >> "$LLM_OUTPUT"
@@ -80,6 +97,7 @@ search_code() {
    grep -v '/target/' | \
    grep -v '/node_modules/' | \
    grep -v '/.git/' | \
    sed "s|^${project_dir}/||" | \
    head -30) || true
  if [[ -n "${results}" ]]; then
@@ -113,7 +131,8 @@ analyze_with_command() {
 # @cmd List directory contents
 # @option --path Path to list (default: project root)
 list_directory() {
-  local dir_path="${argc_path:-.}"
+  local dir_path
  dir_path=$(_normalize_path "${argc_path:-.}")
  local project_dir
  project_dir=$(_project_dir)
  local full_path="${project_dir}/${dir_path}"
@@ -1,6 +1,6 @@
 # Sisyphus
-The main coordinator agent for the Loki coding ecosystem, providing a powerful CLI interface for code generation and 
+The main coordinator agent for the Loki coding ecosystem, providing a powerful CLI interface for code generation and
 project management similar to OpenCode, ClaudeCode, Codex, or Gemini CLI.
 _Inspired by the Sisyphus and Oracle agents of OpenCode._
@@ -16,3 +16,26 @@ Sisyphus acts as the primary entry point, capable of handling complex tasks by c
 - 💻 **CLI Coding**: Provides a natural language interface for writing and editing code.
 - 🔄 **Task Management**: Tracks progress and context across complex operations.
 - 🛠️ **Tool Integration**: Seamlessly uses system tools for building, testing, and file manipulation.
 ## Pro-Tip: Use an IDE MCP Server for Improved Performance
 Many modern IDEs now include MCP servers that let LLMs perform operations within the IDE itself and use IDE tools. Using
 an IDE's MCP server dramatically improves the performance of coding agents. So if you have an IDE, try adding that MCP
 server to your config (see the [MCP Server docs](../../../docs/function-calling/MCP-SERVERS.md) to see how to configure
 them), and modify the agent definition to look like this:
 ```yaml
 # ...
 mcp_servers:
  - jetbrains
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - web_search_loki.sh
  - execute_command.sh
 # ...
 ```
@@ -2,7 +2,6 @@ name: sisyphus
 description: OpenCode-style orchestrator - classifies intent, delegates to specialists, tracks progress with todos
 version: 2.0.0
 temperature: 0.1
 top_p: 0.95
 agent_session: temp
 auto_continue: true
@@ -13,7 +12,7 @@ can_spawn_agents: true
 max_concurrent_agents: 4
 max_agent_depth: 3
 inject_spawn_instructions: true
-summarization_threshold: 4000
+summarization_threshold: 8000
 variables:
  - name: project_dir
@@ -23,12 +22,13 @@ variables:
    description: Auto-confirm command execution
    default: '1'
 mcp_servers:
  - ddg-search
 global_tools:
  - fs_read.sh
  - fs_grep.sh
  - fs_glob.sh
  - fs_ls.sh
  - web_search_loki.sh
  - execute_command.sh
 instructions: |
@@ -70,6 +70,45 @@ instructions: |
  | coder | Write/edit files, implement features | Creates/modifies files, runs builds |
  | oracle | Architecture decisions, complex debugging | Advisory, high-quality reasoning |
  ## Coder Delegation Format (MANDATORY)
  When spawning the `coder` agent, your prompt MUST include these sections.
  The coder has NOT seen the codebase. Your prompt IS its entire context.
  ### Template:
  ```
  ## Goal
  [1-2 sentences: what to build/modify and where]
  ## Reference Files
  [Files that explore found, with what each demonstrates]
  - `path/to/file.ext` - what pattern this file shows
  - `path/to/other.ext` - what convention this file shows
  ## Code Patterns to Follow
  [Paste ACTUAL code snippets from explore results, not descriptions]
  <code>
  // From path/to/file.ext - this is the pattern to follow:
  [actual code explore found, 5-20 lines]
  </code>
  ## Conventions
  [Naming, imports, error handling, file organization]
  - Convention 1
  - Convention 2
  ## Constraints
  [What NOT to do, scope boundaries]
  - Do NOT modify X
  - Only touch files in Y/
  ```
  **CRITICAL**: Include actual code snippets, not just file paths.
  If explore returned code patterns, paste them into the coder prompt.
  Vague prompts like "follow existing patterns" waste coder's tokens on
  re-exploration that you already did.
  ## Workflow Examples
  ### Example 1: Implementation task (explore -> coder, parallel exploration)
@@ -81,12 +120,12 @@ instructions: |
  2. todo__add --task "Explore existing API patterns"
  3. todo__add --task "Implement profile endpoint"
  4. todo__add --task "Verify with build/test"
-  5. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions"
+  5. agent__spawn --agent explore --prompt "Find existing API endpoint patterns, route structures, and controller conventions. Include code snippets."
-  6. agent__spawn --agent explore --prompt "Find existing data models and database query patterns"
+  6. agent__spawn --agent explore --prompt "Find existing data models and database query patterns. Include code snippets."
  7. agent__collect --id <id1>
  8. agent__collect --id <id2>
  9. todo__done --id 1
-  10. agent__spawn --agent coder --prompt "Create user profiles endpoint following existing patterns. [Include context from explore results]"
+  10. agent__spawn --agent coder --prompt "<structured prompt using Coder Delegation Format above, including code snippets from explore results>"
  11. agent__collect --id <coder_id>
  12. todo__done --id 2
  13. run_build
@@ -135,7 +174,6 @@ instructions: |
  ## When to Do It Yourself
  - Single-file reads/writes
  - Simple command execution
  - Trivial changes (typos, renames)
  - Quick file searches
@@ -16,11 +16,15 @@
    },
    "atlassian": {
      "command": "npx",
-      "args": ["-y", "mcp-remote@0.1.13", "https://mcp.atlassian.com/v1/sse"]
+      "args": ["-y", "mcp-remote@0.1.13", "https://mcp.atlassian.com/v1/mcp"]
    },
    "docker": {
      "command": "uvx",
      "args": ["mcp-server-docker"]
    },
    "ddg-search": {
      "command": "uvx",
      "args": ["duckduckgo-mcp-server"]
    }
  }
 }
@@ -50,7 +50,13 @@ def parse_raw_data(data):
 def parse_argv():
    agent_func = sys.argv[1]
-    agent_data = sys.argv[2]
+
    tool_data_file = os.environ.get("LLM_TOOL_DATA_FILE")
    if tool_data_file and os.path.isfile(tool_data_file):
        with open(tool_data_file, "r", encoding="utf-8") as f:
            agent_data = f.read()
    else:
        agent_data = sys.argv[2]
    if (not agent_data) or (not agent_func):
        print("Usage: ./{agent_name}.py <agent-func> <agent-data>", file=sys.stderr)
@@ -14,7 +14,11 @@ main() {
 parse_argv() {
 		agent_func="$1"
-		agent_data="$2"
+    if [[ -n "$LLM_TOOL_DATA_FILE" ]] && [[ -f "$LLM_TOOL_DATA_FILE" ]]; then
      	agent_data="$(cat "$LLM_TOOL_DATA_FILE")"
    else
 				agent_data="$2"
    fi
    if [[ -z "$agent_data" ]] || [[ -z "$agent_func" ]]; then
        die "usage: ./{agent_name}.sh <agent-func> <agent-data>"
    fi
@@ -57,7 +61,6 @@ run() {
    if [[ "$OS" == "Windows_NT" ]]; then
        set -o igncr
        tools_path="$(cygpath -w "$tools_path")"
        tool_data="$(echo "$tool_data" | sed 's/\\/\\\\/g')"
    fi
    jq_script="$(cat <<-'EOF'
@@ -0,0 +1,189 @@
 #!/usr/bin/env tsx
 // Usage: ./{agent_name}.ts <agent-func> <agent-data>
 import { readFileSync, writeFileSync, existsSync } from "fs";
 import { join } from "path";
 import { pathToFileURL } from "url";
 async function main(): Promise<void> {
  const { agentFunc, rawData } = parseArgv();
  const agentData = parseRawData(rawData);
  const configDir = "{config_dir}";
  setupEnv(configDir, agentFunc);
  const agentToolsPath = join(configDir, "agents", "{agent_name}", "tools.ts");
  await run(agentToolsPath, agentFunc, agentData);
 }
 function parseRawData(data: string): Record<string, unknown> {
  if (!data) {
    throw new Error("No JSON data");
  }
  try {
    return JSON.parse(data);
  } catch {
    throw new Error("Invalid JSON data");
  }
 }
 function parseArgv(): { agentFunc: string; rawData: string } {
  const agentFunc = process.argv[2];
  const toolDataFile = process.env["LLM_TOOL_DATA_FILE"];
  let agentData: string;
  if (toolDataFile && existsSync(toolDataFile)) {
    agentData = readFileSync(toolDataFile, "utf-8");
  } else {
    agentData = process.argv[3];
  }
  if (!agentFunc || !agentData) {
    process.stderr.write("Usage: ./{agent_name}.ts <agent-func> <agent-data>\n");
    process.exit(1);
  }
  return { agentFunc, rawData: agentData };
 }
 function setupEnv(configDir: string, agentFunc: string): void {
  loadEnv(join(configDir, ".env"));
  process.env["LLM_ROOT_DIR"] = configDir;
  process.env["LLM_AGENT_NAME"] = "{agent_name}";
  process.env["LLM_AGENT_FUNC"] = agentFunc;
  process.env["LLM_AGENT_ROOT_DIR"] = join(configDir, "agents", "{agent_name}");
  process.env["LLM_AGENT_CACHE_DIR"] = join(configDir, "cache", "{agent_name}");
 }
 function loadEnv(filePath: string): void {
  let lines: string[];
  try {
    lines = readFileSync(filePath, "utf-8").split("\n");
  } catch {
    return;
  }
  for (const raw of lines) {
    const line = raw.trim();
    if (line.startsWith("#") || !line) {
      continue;
    }
    const eqIdx = line.indexOf("=");
    if (eqIdx === -1) {
      continue;
    }
    const key = line.slice(0, eqIdx).trim();
    if (key in process.env) {
      continue;
    }
    let value = line.slice(eqIdx + 1).trim();
    if (
      (value.startsWith('"') && value.endsWith('"')) ||
      (value.startsWith("'") && value.endsWith("'"))
    ) {
      value = value.slice(1, -1);
    }
    process.env[key] = value;
  }
 }
 function extractParamNames(fn: Function): string[] {
  const src = fn.toString();
  const match = src.match(/^(?:async\s+)?function\s*\w*\s*\(([^)]*)\)/);
  if (!match) {
    return [];
  }
  return match[1]
    .split(",")
    .map((p) => p.trim().replace(/[:=?].*/s, "").trim())
    .filter(Boolean);
 }
 function spreadArgs(
  fn: Function,
  data: Record<string, unknown>,
 ): unknown[] {
  const names = extractParamNames(fn);
  if (names.length === 0) {
    return [];
  }
  return names.map((name) => data[name]);
 }
 async function run(
  agentPath: string,
  agentFunc: string,
  agentData: Record<string, unknown>,
 ): Promise<void> {
  const mod = await import(pathToFileURL(agentPath).href);
  if (typeof mod[agentFunc] !== "function") {
    throw new Error(`No module function '${agentFunc}' at '${agentPath}'`);
  }
  const fn = mod[agentFunc] as Function;
  const args = spreadArgs(fn, agentData);
  const value = await fn(...args);
  returnToLlm(value);
  dumpResult(`{agent_name}:${agentFunc}`);
 }
 function returnToLlm(value: unknown): void {
  if (value === null || value === undefined) {
    return;
  }
  const output = process.env["LLM_OUTPUT"];
  const write = (s: string) => {
    if (output) {
      writeFileSync(output, s, "utf-8");
    } else {
      process.stdout.write(s);
    }
  };
  if (typeof value === "string" || typeof value === "number" || typeof value === "boolean") {
    write(String(value));
  } else if (typeof value === "object") {
    write(JSON.stringify(value, null, 2));
  }
 }
 function dumpResult(name: string): void {
  const dumpResults = process.env["LLM_DUMP_RESULTS"];
  const llmOutput = process.env["LLM_OUTPUT"];
  if (!dumpResults || !llmOutput || !process.stdout.isTTY) {
    return;
  }
  try {
    const pattern = new RegExp(`\\b(${dumpResults})\\b`);
    if (!pattern.test(name)) {
      return;
    }
  } catch {
    return;
  }
  let data: string;
  try {
    data = readFileSync(llmOutput, "utf-8");
  } catch {
    return;
  }
  process.stdout.write(
    `\x1b[2m----------------------\n${data}\n----------------------\x1b[0m\n`,
  );
 }
 main().catch((err) => {
  process.stderr.write(`${err}\n`);
  process.exit(1);
 });
@@ -49,6 +49,11 @@ def parse_raw_data(data):
 def parse_argv():
    tool_data_file = os.environ.get("LLM_TOOL_DATA_FILE")
    if tool_data_file and os.path.isfile(tool_data_file):
        with open(tool_data_file, "r", encoding="utf-8") as f:
            return f.read()
    argv = sys.argv[:] + [None] * max(0, 2 - len(sys.argv))
    tool_data = argv[1]
@@ -13,7 +13,11 @@ main() {
 }
 parse_argv() {
-		tool_data="$1"
+    if [[ -n "$LLM_TOOL_DATA_FILE" ]] && [[ -f "$LLM_TOOL_DATA_FILE" ]]; then
        tool_data="$(cat "$LLM_TOOL_DATA_FILE")"
    else
 				tool_data="$1"
    fi
    if [[ -z "$tool_data" ]]; then
        die "usage: ./{function_name}.sh <tool-data>"
    fi
@@ -54,7 +58,6 @@ run() {
    if [[ "$OS" == "Windows_NT" ]]; then
        set -o igncr
        tool_path="$(cygpath -w "$tool_path")"
        tool_data="$(echo "$tool_data" | sed 's/\\/\\\\/g')"
    fi
    jq_script="$(cat <<-'EOF'
@@ -0,0 +1,184 @@
 #!/usr/bin/env tsx
 // Usage: ./{function_name}.ts <tool-data>
 import { readFileSync, writeFileSync, existsSync } from "fs";
 import { join } from "path";
 import { pathToFileURL } from "url";
 async function main(): Promise<void> {
  const rawData = parseArgv();
  const toolData = parseRawData(rawData);
  const rootDir = "{root_dir}";
  setupEnv(rootDir);
  const toolPath = "{tool_path}.ts";
  await run(toolPath, "run", toolData);
 }
 function parseRawData(data: string): Record<string, unknown> {
  if (!data) {
    throw new Error("No JSON data");
  }
  try {
    return JSON.parse(data);
  } catch {
    throw new Error("Invalid JSON data");
  }
 }
 function parseArgv(): string {
  const toolDataFile = process.env["LLM_TOOL_DATA_FILE"];
  if (toolDataFile && existsSync(toolDataFile)) {
    return readFileSync(toolDataFile, "utf-8");
  }
  const toolData = process.argv[2];
  if (!toolData) {
    process.stderr.write("Usage: ./{function_name}.ts <tool-data>\n");
    process.exit(1);
  }
  return toolData;
 }
 function setupEnv(rootDir: string): void {
  loadEnv(join(rootDir, ".env"));
  process.env["LLM_ROOT_DIR"] = rootDir;
  process.env["LLM_TOOL_NAME"] = "{function_name}";
  process.env["LLM_TOOL_CACHE_DIR"] = join(rootDir, "cache", "{function_name}");
 }
 function loadEnv(filePath: string): void {
  let lines: string[];
  try {
    lines = readFileSync(filePath, "utf-8").split("\n");
  } catch {
    return;
  }
  for (const raw of lines) {
    const line = raw.trim();
    if (line.startsWith("#") || !line) {
      continue;
    }
    const eqIdx = line.indexOf("=");
    if (eqIdx === -1) {
      continue;
    }
    const key = line.slice(0, eqIdx).trim();
    if (key in process.env) {
      continue;
    }
    let value = line.slice(eqIdx + 1).trim();
    if (
      (value.startsWith('"') && value.endsWith('"')) ||
      (value.startsWith("'") && value.endsWith("'"))
    ) {
      value = value.slice(1, -1);
    }
    process.env[key] = value;
  }
 }
 function extractParamNames(fn: Function): string[] {
  const src = fn.toString();
  const match = src.match(/^(?:async\s+)?function\s*\w*\s*\(([^)]*)\)/);
  if (!match) {
    return [];
  }
  return match[1]
    .split(",")
    .map((p) => p.trim().replace(/[:=?].*/s, "").trim())
    .filter(Boolean);
 }
 function spreadArgs(
  fn: Function,
  data: Record<string, unknown>,
 ): unknown[] {
  const names = extractParamNames(fn);
  if (names.length === 0) {
    return [];
  }
  return names.map((name) => data[name]);
 }
 async function run(
  toolPath: string,
  toolFunc: string,
  toolData: Record<string, unknown>,
 ): Promise<void> {
  const mod = await import(pathToFileURL(toolPath).href);
  if (typeof mod[toolFunc] !== "function") {
    throw new Error(`No module function '${toolFunc}' at '${toolPath}'`);
  }
  const fn = mod[toolFunc] as Function;
  const args = spreadArgs(fn, toolData);
  const value = await fn(...args);
  returnToLlm(value);
  dumpResult("{function_name}");
 }
 function returnToLlm(value: unknown): void {
  if (value === null || value === undefined) {
    return;
  }
  const output = process.env["LLM_OUTPUT"];
  const write = (s: string) => {
    if (output) {
      writeFileSync(output, s, "utf-8");
    } else {
      process.stdout.write(s);
    }
  };
  if (typeof value === "string" || typeof value === "number" || typeof value === "boolean") {
    write(String(value));
  } else if (typeof value === "object") {
    write(JSON.stringify(value, null, 2));
  }
 }
 function dumpResult(name: string): void {
  const dumpResults = process.env["LLM_DUMP_RESULTS"];
  const llmOutput = process.env["LLM_OUTPUT"];
  if (!dumpResults || !llmOutput || !process.stdout.isTTY) {
    return;
  }
  try {
    const pattern = new RegExp(`\\b(${dumpResults})\\b`);
    if (!pattern.test(name)) {
      return;
    }
  } catch {
    return;
  }
  let data: string;
  try {
    data = readFileSync(llmOutput, "utf-8");
  } catch {
    return;
  }
  process.stdout.write(
    `\x1b[2m----------------------\n${data}\n----------------------\x1b[0m\n`,
  );
 }
 main().catch((err) => {
  process.stderr.write(`${err}\n`);
  process.exit(1);
 });
@@ -1,6 +1,7 @@
 import os
 from typing import List, Literal, Optional
 def run(
    string: str,
    string_enum: Literal["foo", "bar"],
@@ -9,26 +10,38 @@ def run(
    number: float,
    array: List[str],
    string_optional: Optional[str] = None,
    integer_with_default: int = 42,
    boolean_with_default: bool = True,
    number_with_default: float = 3.14,
    string_with_default: str = "hello",
    array_optional: Optional[List[str]] = None,
 ):
-    """Demonstrates how to create a tool using Python and how to use comments.
+    """Demonstrates all supported Python parameter types and variations.
    Args:
-        string: Define a required string property
+        string: A required string property
-        string_enum: Define a required string property with enum
+        string_enum: A required string property constrained to specific values
-        boolean: Define a required boolean property
+        boolean: A required boolean property
-        integer: Define a required integer property
+        integer: A required integer property
-        number: Define a required number property
+        number: A required number (float) property
-        array: Define a required string array property
+        array: A required string array property
-        string_optional: Define an optional string property
+        string_optional: An optional string property (Optional[str] with None default)
-        array_optional: Define an optional string array property
+        integer_with_default: An optional integer with a non-None default value
        boolean_with_default: An optional boolean with a default value
        number_with_default: An optional number with a default value
        string_with_default: An optional string with a default value
        array_optional: An optional string array property
    """
    output = f"""string: {string}
 string_enum: {string_enum}
 string_optional: {string_optional}
 boolean: {boolean}
 integer: {integer}
 number: {number}
 array: {array}
 string_optional: {string_optional}
 integer_with_default: {integer_with_default}
 boolean_with_default: {boolean_with_default}
 number_with_default: {number_with_default}
 string_with_default: {string_with_default}
 array_optional: {array_optional}"""
    for key, value in os.environ.items():
@@ -0,0 +1,53 @@
 /**
 * Demonstrates all supported TypeScript parameter types and variations.
 *
 * @param string - A required string property
 * @param string_enum - A required string property constrained to specific values
 * @param boolean - A required boolean property
 * @param number - A required number property
 * @param array_bracket - A required string array using bracket syntax
 * @param array_generic - A required string array using generic syntax
 * @param string_optional - An optional string using the question mark syntax
 * @param string_nullable - An optional string using the union-with-null syntax
 * @param number_with_default - An optional number with a default value
 * @param boolean_with_default - An optional boolean with a default value
 * @param string_with_default - An optional string with a default value
 * @param array_optional - An optional string array using the question mark syntax
 */
 export function run(
  string: string,
  string_enum: "foo" | "bar",
  boolean: boolean,
  number: number,
  array_bracket: string[],
  array_generic: Array<string>,
  string_optional?: string,
  string_nullable: string | null = null,
  number_with_default: number = 42,
  boolean_with_default: boolean = true,
  string_with_default: string = "hello",
  array_optional?: string[],
 ): string {
  const parts = [
    `string: ${string}`,
    `string_enum: ${string_enum}`,
    `boolean: ${boolean}`,
    `number: ${number}`,
    `array_bracket: ${JSON.stringify(array_bracket)}`,
    `array_generic: ${JSON.stringify(array_generic)}`,
    `string_optional: ${string_optional}`,
    `string_nullable: ${string_nullable}`,
    `number_with_default: ${number_with_default}`,
    `boolean_with_default: ${boolean_with_default}`,
    `string_with_default: ${string_with_default}`,
    `array_optional: ${JSON.stringify(array_optional)}`,
  ];
  for (const [key, value] of Object.entries(process.env)) {
    if (key.startsWith("LLM_")) {
      parts.push(`${key}: ${value}`);
    }
  }
  return parts.join("\n");
 }
@@ -0,0 +1,24 @@
 #!/usr/bin/env tsx
 import { appendFileSync, mkdirSync } from "fs";
 import { dirname } from "path";
 /**
 * Get the current weather in a given location
 * @param location - The city and optionally the state or country (e.g., "London", "San Francisco, CA").
 */
 export async function run(location: string): string {
  const encoded = encodeURIComponent(location);
  const url = `https://wttr.in/${encoded}?format=4`;
  const resp = await fetch(url);
  const data = await resp.text();
  const dest = process.env["LLM_OUTPUT"] ?? "/dev/stdout";
  if (dest !== "-" && dest !== "/dev/stdout") {
    mkdirSync(dirname(dest), { recursive: true });
    appendFileSync(dest, data, "utf-8");
  }
  return data;
 }
@@ -32,7 +32,7 @@ max_concurrent_agents: 4         # Maximum number of agents that can run simulta
 max_agent_depth: 3               # Maximum nesting depth for sub-agents (prevents runaway spawning)
 inject_spawn_instructions: true  # Inject the default agent spawning instructions into the agent's system prompt
 summarization_model: null        # Model to use for summarizing sub-agent output (e.g. 'openai:gpt-4o-mini'); defaults to current model
-summarization_threshold: 4000   # Character threshold above which sub-agent output is summarized before returning to parent
+summarization_threshold: 4000    # Character threshold above which sub-agent output is summarized before returning to parent
 escalation_timeout: 300          # Seconds a sub-agent waits for a user interaction response before timing out (default: 5 minutes)
 mcp_servers:                     # Optional list of MCP servers that the agent utilizes
  - github                       # Corresponds to the name of an MCP server in the `<loki-config-dir>/functions/mcp.json` file
@@ -46,6 +46,7 @@ enabled_tools: null              # Which tools to enable by default. (e.g. 'fs,w
 visible_tools:                   # Which tools are visible to be compiled (and are thus able to be defined in 'enabled_tools')
 #  - demo_py.py
 #  - demo_sh.sh
 #  - demo_ts.ts
  - execute_command.sh
 #  - execute_py_code.py
 #  - execute_sql_code.sh
@@ -61,6 +62,7 @@ visible_tools:                   # Which tools are visible to be compiled (and a
 #  - fs_write.sh
  - get_current_time.sh
 #  - get_current_weather.py
 #  - get_current_weather.ts
  - get_current_weather.sh
  - query_jira_issues.sh
 #  - search_arxiv.sh
@@ -77,7 +79,7 @@ visible_tools:                   # Which tools are visible to be compiled (and a
 mcp_server_support: true         # Enables or disables MCP servers (globally).
 mapping_mcp_servers:             # Alias for an MCP server or set of servers
  git: github,gitmcp
-enabled_mcp_servers: null        # Which MCP servers to enable by default (e.g. 'github,slack')
+enabled_mcp_servers: null        # Which MCP servers to enable by default (e.g. 'github,slack,ddg-search')
 # ---- Session ----
 # See the [Session documentation](./docs/SESSIONS.md) for more information
@@ -192,6 +194,8 @@ clients:
  - type: gemini
    api_base: https://generativelanguage.googleapis.com/v1beta
    api_key: '{{GEMINI_API_KEY}}'                       # You can either hard-code or inject secrets from the Loki vault
    auth: null                                          # When set to 'oauth', Loki will use OAuth instead of an API key
                                                        # Authenticate with `loki --authenticate` or `.authenticate` in the REPL
    patch:
      chat_completions:
        '.*':
@@ -210,6 +214,8 @@ clients:
  - type: claude
    api_base: https://api.anthropic.com/v1              # Optional
    api_key: '{{ANTHROPIC_API_KEY}}'                    # You can either hard-code or inject secrets from the Loki vault
    auth: null                                          # When set to 'oauth', Loki will use OAuth instead of an API key
                                                        # Authenticate with `loki --authenticate` or `.authenticate` in the REPL
  # See https://docs.mistral.ai/
  - type: openai-compatible
@@ -33,6 +33,7 @@ If you're looking for more example agents, refer to the [built-in agents](../ass
  - [.env File Support](#env-file-support)
  - [Python-Based Agent Tools](#python-based-agent-tools)
  - [Bash-Based Agent Tools](#bash-based-agent-tools)
  - [TypeScript-Based Agent Tools](#typescript-based-agent-tools)
 - [5. Conversation Starters](#5-conversation-starters)
 - [6. Todo System & Auto-Continuation](#6-todo-system--auto-continuation)
 - [7. Sub-Agent Spawning System](#7-sub-agent-spawning-system)
@@ -62,10 +63,12 @@ Agent configurations often have the following directory structure:
        ├── tools.sh
            or
        ├── tools.py
            or
        ├── tools.ts
 ```
 This means that agent configurations often are only two files: the agent configuration file (`config.yaml`), and the 
-tool definitions (`agents/my-agent/tools.sh` or `tools.py`).
+tool definitions (`agents/my-agent/tools.sh`, `tools.py`, or `tools.ts`).
 To see a full example configuration file, refer to the [example agent config file](../config.agent.example.yaml).
@@ -114,10 +117,10 @@ isolated environment, so in order for an agent to use a tool or MCP server that
 explicitly state which tools and/or MCP servers the agent uses. Otherwise, it is assumed that the agent doesn't use any 
 tools outside its own custom defined tools.
-And if you don't define a `agents/my-agent/tools.sh` or `agents/my-agent/tools.py`, then the agent is really just a 
+And if you don't define a `agents/my-agent/tools.sh`, `agents/my-agent/tools.py`, or `agents/my-agent/tools.ts`, then the agent is really just a 
 `role`.
-You'll notice there's no settings for agent-specific tooling. This is because they are handled separately and 
+You'll notice there are no settings for agent-specific tooling. This is because they are handled separately and 
 automatically. See the [Building Tools for Agents](#4-building-tools-for-agents) section below for more information.
 To see a full example configuration file, refer to the [example agent config file](../config.agent.example.yaml).
@@ -205,7 +208,7 @@ variables:
 ### Dynamic Instructions
 Sometimes you may find it useful to dynamically generate instructions on startup. Whether that be via a call to Loki
 itself to generate them, or by some other means. Loki supports this type of behavior using a special function defined
-in your `agents/my-agent/tools.py` or `agents/my-agent/tools.sh`.
+in your `agents/my-agent/tools.py`, `agents/my-agent/tools.sh`, or `agents/my-agent/tools.ts`.
 **Example: Instructions for a JSON-reader agent that specializes on each JSON input it receives**
 `agents/json-reader/tools.py`:
@@ -306,8 +309,8 @@ EOF
 }
 ```
-For more information on how to create custom tools for your agent and the structure of the `agent/my-agent/tools.sh` or 
+For more information on how to create custom tools for your agent and the structure of the `agent/my-agent/tools.sh`,
-`agent/my-agent/tools.py` files, refer to the [Building Tools for Agents](#4-building-tools-for-agents) section below.
+`agent/my-agent/tools.py`, or `agent/my-agent/tools.ts` files, refer to the [Building Tools for Agents](#4-building-tools-for-agents) section below.
 #### Variables
 All the same variable interpolations supported by static instructions is also supported by dynamic instructions. For 
@@ -337,10 +340,11 @@ defining a single function that gets executed at runtime (e.g. `main` for bash t
 tools define a number of *subcommands*.
 ### Limitations
-You can only utilize either a bash-based `<loki-config-dir>/agents/my-agent/tools.sh` or a Python-based 
+You can only utilize one of: a bash-based `<loki-config-dir>/agents/my-agent/tools.sh`, a Python-based 
-`<loki-config-dir>/agents/my-agent/tools.py`. However, if it's easier to achieve a task in one language vs the other, 
+`<loki-config-dir>/agents/my-agent/tools.py`, or a TypeScript-based `<loki-config-dir>/agents/my-agent/tools.ts`. 
 However, if it's easier to achieve a task in one language vs the other, 
 you're free to define other scripts in your agent's configuration directory and reference them from the main 
-`tools.py/sh` file. **Any scripts *not* named `tools.{py,sh}` will not be picked up by Loki's compiler**, meaning they 
+tools file. **Any scripts *not* named `tools.{py,sh,ts}` will not be picked up by Loki's compiler**, meaning they 
 can be used like any other set of scripts.
 It's important to keep in mind the following:
@@ -428,6 +432,55 @@ the same syntax ad formatting as is used to create custom bash tools globally.
 For more information on how to write, [build and test](function-calling/CUSTOM-BASH-TOOLS.md#execute-and-test-your-bash-tools) tools in bash, refer to the 
 [custom bash tools documentation](function-calling/CUSTOM-BASH-TOOLS.md).
 ### TypeScript-Based Agent Tools
 TypeScript-based agent tools work exactly the same as TypeScript global tools. Instead of a single `run` function,
 you define as many exported functions as you like. Non-exported functions are private helpers and are invisible to the 
 LLM.
 **Example:**
 `agents/my-agent/tools.ts`
 ```typescript
 /**
 * Get your IP information
 */
 export async function get_ip_info(): Promise<string> {
  const resp = await fetch("https://httpbin.org/ip");
  return await resp.text();
 }
 /**
 * Find your public IP address using AWS
 */
 export async function get_ip_address_from_aws(): Promise<string> {
  const resp = await fetch("https://checkip.amazonaws.com");
  return await resp.text();
 }
 // Non-exported helper — invisible to the LLM
 function formatResponse(data: string): string {
  return data.trim();
 }
 ```
 Loki automatically compiles each exported function as a separate tool for the LLM to call. Just make sure you
 follow the same JSDoc and parameter conventions as you would when creating custom TypeScript tools.
 TypeScript agent tools also support dynamic instructions via an exported `_instructions()` function:
 ```typescript
 import { readFileSync } from "fs";
 /**
 * Generates instructions for the agent dynamically
 */
 export function _instructions(): string {
  const schema = readFileSync("schema.json", "utf-8");
  return `You are an AI agent that works with the following schema:\n${schema}`;
 }
 ```
 For more information on how to build tools in TypeScript, refer to the [custom TypeScript tools documentation](function-calling/CUSTOM-TOOLS.md#custom-typescript-based-tools).
 ## 5. Conversation Starters
 It's often helpful to also have some conversation starters so users know what kinds of things the agent is capable of 
 doing. These are available in the REPL via the `.starter` command and are selectable.
@@ -467,11 +520,12 @@ inject_todo_instructions: true   # Include the default todo instructions into pr
 ### How It Works
-1. When `inject_todo_instructions` is enabled, agents receive instructions on using four built-in tools:
+1. When `inject_todo_instructions` is enabled, agents receive instructions on using five built-in tools:
    - `todo__init`: Initialize a todo list with a goal
    - `todo__add`: Add a task to the list
    - `todo__done`: Mark a task complete
    - `todo__list`: View current todo state
    - `todo__clear`: Clear the entire todo list and reset the goal
   These instructions are a reasonable default that detail how to use Loki's To-Do System. If you wish, 
   you can disable the injection of the default instructions and specify your own instructions for how 
@@ -714,6 +768,7 @@ Loki comes packaged with some useful built-in agents:
 * `code-reviewer`: A [CodeRabbit](https://coderabbit.ai)-style code reviewer that spawns per-file reviewers using the teammate messaging pattern
 * `demo`: An example agent to use for reference when learning to create your own agents
 * `explore`: An agent designed to help you explore and understand your codebase
 * `file-reviewer`: An agent designed to perform code-review on a single file (used by the `code-reviewer` agent)
 * `jira-helper`: An agent that assists you with all your Jira-related tasks
 * `oracle`: An agent for high-level architecture, design decisions, and complex debugging
 * `sisyphus`: A powerhouse orchestrator agent for writing complex code and acting as a natural language interface for your codebase (similar to ClaudeCode, Gemini CLI, Codex, or OpenCode). Uses sub-agent spawning to delegate to `explore`, `coder`, and `oracle`.
@@ -107,6 +107,7 @@ The following variables can be used to change the log level of Loki or the locat
 can also pass the `--disable-log-colors` flag as well.
 ## Miscellaneous Variables
-| Environment Variable | Description                                                                                      | Default Value |
+| Environment Variable | Description                                                                                                                                                                                                                                                                                                                    | Default Value |
-|----------------------|--------------------------------------------------------------------------------------------------|---------------|
+|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
-| `AUTO_CONFIRM`       | Bypass all `guard_*` checks in the bash prompt helpers; useful for agent composition and routing |               |
+| `AUTO_CONFIRM`       | Bypass all `guard_*` checks in the bash prompt helpers; useful for agent composition and routing                                                                                                                                                                                                                               |               |
 | `LLM_TOOL_DATA_FILE` | Set automatically by Loki on Windows. Points to a temporary file containing the JSON tool call data. <br>Tool scripts (`run-tool.sh`, `run-agent.sh`, etc.) read from this file instead of command-line args <br>to avoid JSON escaping issues when data passes through `cmd.exe` → bash. **Not intended to be set by users.** |               |
@@ -0,0 +1,956 @@
 # Phase 1 Implementation Plan: Extract AppState from Config
 ## Overview
 Split the monolithic `Config` struct into:
 - **`AppConfig`** — immutable server-wide settings (deserialized from `config.yaml`)
 - **`RequestContext`** — per-request mutable state (current role, session, agent, supervisor, etc.)
 The existing `GlobalConfig` (`Arc<RwLock<Config>>`) type alias is replaced. CLI and REPL continue working identically. No API code is added in this phase.
 **Estimated effort:** ~3-4 weeks (originally estimated 1-2 weeks; revised during implementation as Steps 6.5 and 7 deferred their semantic rewrites to an expanded Step 8)
 **Risk:** Medium — touches 91 callsites across 15 modules
 **Mitigation:** Incremental migration with tests passing at every step
 **Sub-step tracking:** Each step has per-step implementation notes in `docs/implementation/PHASE-1-STEP-*-NOTES.md`
 ---
 ## Current State: Config Field Classification
 ### Serialized Fields (from config.yaml → AppConfig)
 These are loaded from disk once and should be immutable during request processing:
 | Field | Type | Notes |
 |---|---|---|
 | `model_id` | `String` | Default model ID |
 | `temperature` | `Option<f64>` | Default temperature |
 | `top_p` | `Option<f64>` | Default top_p |
 | `dry_run` | `bool` | Can be overridden per-request |
 | `stream` | `bool` | Can be overridden per-request |
 | `save` | `bool` | Whether to persist to messages.md |
 | `keybindings` | `String` | REPL keybinding style |
 | `editor` | `Option<String>` | Editor command |
 | `wrap` | `Option<String>` | Text wrapping |
 | `wrap_code` | `bool` | Code block wrapping |
 | `vault_password_file` | `Option<PathBuf>` | Vault password location |
 | `function_calling_support` | `bool` | Global function calling toggle |
 | `mapping_tools` | `IndexMap<String, String>` | Tool aliases |
 | `enabled_tools` | `Option<String>` | Default enabled tools |
 | `visible_tools` | `Option<Vec<String>>` | Visible tool list |
 | `mcp_server_support` | `bool` | Global MCP toggle |
 | `mapping_mcp_servers` | `IndexMap<String, String>` | MCP server aliases |
 | `enabled_mcp_servers` | `Option<String>` | Default enabled MCP servers |
 | `repl_prelude` | `Option<String>` | REPL prelude config |
 | `cmd_prelude` | `Option<String>` | CLI prelude config |
 | `agent_session` | `Option<String>` | Default agent session |
 | `save_session` | `Option<bool>` | Session save behavior |
 | `compression_threshold` | `usize` | Session compression threshold |
 | `summarization_prompt` | `Option<String>` | Compression prompt |
 | `summary_context_prompt` | `Option<String>` | Summary context prompt |
 | `rag_embedding_model` | `Option<String>` | RAG embedding model |
 | `rag_reranker_model` | `Option<String>` | RAG reranker model |
 | `rag_top_k` | `usize` | RAG top-k results |
 | `rag_chunk_size` | `Option<usize>` | RAG chunk size |
 | `rag_chunk_overlap` | `Option<usize>` | RAG chunk overlap |
 | `rag_template` | `Option<String>` | RAG template |
 | `document_loaders` | `HashMap<String, String>` | Document loader mappings |
 | `highlight` | `bool` | Syntax highlighting |
 | `theme` | `Option<String>` | Color theme |
 | `left_prompt` | `Option<String>` | REPL left prompt format |
 | `right_prompt` | `Option<String>` | REPL right prompt format |
 | `user_agent` | `Option<String>` | HTTP User-Agent |
 | `save_shell_history` | `bool` | Shell history persistence |
 | `sync_models_url` | `Option<String>` | Models sync URL |
 | `clients` | `Vec<ClientConfig>` | LLM provider configs |
 ### Runtime Fields (#[serde(skip)] → RequestContext)
 These are created at runtime and are per-request/per-session mutable state:
 | Field | Type | Destination |
 |---|---|---|
 | `vault` | `GlobalVault` | `AppState.vault` (shared service) |
 | `macro_flag` | `bool` | `RequestContext.macro_flag` |
 | `info_flag` | `bool` | `RequestContext.info_flag` |
 | `agent_variables` | `Option<AgentVariables>` | `RequestContext.agent_variables` |
 | `model` | `Model` | `RequestContext.model` |
 | `functions` | `Functions` | `RequestContext.tool_scope.functions` (unified in Step 6) |
 | `mcp_registry` | `Option<McpRegistry>` | **REMOVED.** Replaced by per-`ToolScope` `McpRuntime`s produced by a new `McpFactory` on `AppState`. See the architecture doc's "Tool Scope Isolation" section. |
 | `working_mode` | `WorkingMode` | `RequestContext.working_mode` |
 | `last_message` | `Option<LastMessage>` | `RequestContext.last_message` |
 | `role` | `Option<Role>` | `RequestContext.role` |
 | `session` | `Option<Session>` | `RequestContext.session` |
 | `rag` | `Option<Arc<Rag>>` | `RequestContext.rag` |
 | `agent` | `Option<Agent>` | `RequestContext.agent` (agent spec + role + RAG) |
 | `tool_call_tracker` | `Option<ToolCallTracker>` | `RequestContext.tool_scope.tool_tracker` (unified in Step 6) |
 | `supervisor` | `Option<Arc<RwLock<Supervisor>>>` | `RequestContext.agent_runtime.supervisor` |
 | `parent_supervisor` | `Option<Arc<RwLock<Supervisor>>>` | `RequestContext.agent_runtime.parent_supervisor` |
 | `self_agent_id` | `Option<String>` | `RequestContext.agent_runtime.self_agent_id` |
 | `current_depth` | `usize` | `RequestContext.agent_runtime.current_depth` |
 | `inbox` | `Option<Arc<Inbox>>` | `RequestContext.agent_runtime.inbox` |
 | `root_escalation_queue` | `Option<Arc<EscalationQueue>>` | `RequestContext.agent_runtime.escalation_queue` (shared from the root via `Arc`) |
 **Note on `ToolScope` and `AgentRuntime`:** during Phase 1 Step 0 the new `RequestContext` struct keeps `functions`, `tool_call_tracker`, supervisor/inbox/escalation fields as **flat fields** mirroring today's `Config`. This is deliberate — it makes the field-by-field migration mechanical. In Step 6.5 these fields collapse into two sub-structs:
 - `ToolScope { functions, mcp_runtime, tool_tracker }` — owned by every active `RoleLike` scope, rebuilt on role/session/agent transitions via `McpFactory::acquire()`.
 - `AgentRuntime { spec, rag, supervisor, inbox, escalation_queue, todo_list, self_agent_id, parent_supervisor, current_depth, auto_continue_count }` — owned only when an agent is active.
 **Two behavior changes land during Step 6.5** that tighten today's code:
 1. `todo_list` becomes `Option<TodoList>`. Today the code always allocates `TodoList::default()` for every agent, even when `auto_continue: false`. Since the todo tools and instructions are only exposed when `auto_continue: true`, the allocation is wasted. The new shape skips allocation unless the agent opts in. No user-visible change.
 2. A unified `RagCache` on `AppState` serves **both** standalone RAGs (attached via `.rag <name>`) and agent-owned RAGs (loaded from an agent's `documents:` field). Today, both paths independently call `Rag::load` from disk on every use; with the cache, any scope requesting the same `RagKey` shares the same `Arc<Rag>`. Standalone RAG lives in `ctx.rag`; agent RAG lives in `ctx.agent_runtime.rag`. Roles and Sessions do **not** own RAG (the structs have no RAG fields) — this is true today and unchanged by the refactor. `rebuild_rag` and `edit_rag_docs` call `RagCache::invalidate()`.
 See `docs/REST-API-ARCHITECTURE.md` section 5 for the full `ToolScope`, `McpFactory`, `RagCache`, and MCP pooling designs.
 ---
 ## Migration Strategy: The Facade Pattern
 **Do NOT rewrite everything at once.** Instead, use a transitional facade that keeps the old `Config` working while new code uses the split types.
 ### Step 0: Add new types alongside Config (no breaking changes)  ✅ DONE
 Create the new structs in new files. `Config` stays untouched. Nothing breaks.
 **Files created:**
 - `src/config/app_config.rs` — `AppConfig` struct (the serialized half)
 - `src/config/request_context.rs` — `RequestContext` struct (the runtime half)
 - `src/config/app_state.rs` — `AppState` struct (Arc-wrapped global services, no `mcp_registry` — see below)
 **`AppConfig`** is essentially the current `Config` struct but containing ONLY the serialized fields (no `#[serde(skip)]` fields). It should derive `Deserialize` identically so the existing `config.yaml` still loads.
 **Important change from the original plan:** `AppState` does NOT hold an `McpRegistry`. MCP server processes are scoped per `RoleLike`, not process-wide. An `McpFactory` service will be added to `AppState` in Step 6.5. See `docs/REST-API-ARCHITECTURE.md` section 5 for the design rationale.
 ```rust
 // src/config/app_config.rs
 #[derive(Debug, Clone, Deserialize)]
 #[serde(default)]
 pub struct AppConfig {
    #[serde(rename(serialize = "model", deserialize = "model"))]
    pub model_id: String,
    pub temperature: Option<f64>,
    pub top_p: Option<f64>,
    pub dry_run: bool,
    pub stream: bool,
    pub save: bool,
    pub keybindings: String,
    pub editor: Option<String>,
    pub wrap: Option<String>,
    pub wrap_code: bool,
    vault_password_file: Option<PathBuf>,
    pub function_calling_support: bool,
    pub mapping_tools: IndexMap<String, String>,
    pub enabled_tools: Option<String>,
    pub visible_tools: Option<Vec<String>>,
    pub mcp_server_support: bool,
    pub mapping_mcp_servers: IndexMap<String, String>,
    pub enabled_mcp_servers: Option<String>,
    pub repl_prelude: Option<String>,
    pub cmd_prelude: Option<String>,
    pub agent_session: Option<String>,
    pub save_session: Option<bool>,
    pub compression_threshold: usize,
    pub summarization_prompt: Option<String>,
    pub summary_context_prompt: Option<String>,
    pub rag_embedding_model: Option<String>,
    pub rag_reranker_model: Option<String>,
    pub rag_top_k: usize,
    pub rag_chunk_size: Option<usize>,
    pub rag_chunk_overlap: Option<usize>,
    pub rag_template: Option<String>,
    pub document_loaders: HashMap<String, String>,
    pub highlight: bool,
    pub theme: Option<String>,
    pub left_prompt: Option<String>,
    pub right_prompt: Option<String>,
    pub user_agent: Option<String>,
    pub save_shell_history: bool,
    pub sync_models_url: Option<String>,
    pub clients: Vec<ClientConfig>,
 }
 ```
 ```rust
 // src/config/app_state.rs
 #[derive(Clone)]
 pub struct AppState {
    pub config: Arc<AppConfig>,
    pub vault: GlobalVault,
    // NOTE: no `mcp_registry` field. MCP runtime is scoped per-`ToolScope`
    // on `RequestContext`, not process-wide. An `McpFactory` will be added
    // here later (Step 6 / Phase 5) to pool and ref-count MCP processes
    // across concurrent ToolScopes. See architecture doc section 5.
 }
 ```
 ```rust
 // src/config/request_context.rs
 pub struct RequestContext {
    pub app: Arc<AppState>,
    // per-request flags
    pub macro_flag: bool,
    pub info_flag: bool,
    pub working_mode: WorkingMode,
    // active context
    pub model: Model,
    pub functions: Functions,
    pub role: Option<Role>,
    pub session: Option<Session>,
    pub rag: Option<Arc<Rag>>,
    pub agent: Option<Agent>,
    pub agent_variables: Option<AgentVariables>,
    // conversation state
    pub last_message: Option<LastMessage>,
    pub tool_call_tracker: Option<ToolCallTracker>,
    // agent supervision
    pub supervisor: Option<Arc<RwLock<Supervisor>>>,
    pub parent_supervisor: Option<Arc<RwLock<Supervisor>>>,
    pub self_agent_id: Option<String>,
    pub current_depth: usize,
    pub inbox: Option<Arc<Inbox>>,
    pub root_escalation_queue: Option<Arc<EscalationQueue>>,
 }
 ```
 ### Step 1: Make Config constructible from AppConfig + RequestContext
 Add conversion methods so the old `Config` can be built from the new types, and vice versa. This is the bridge that lets us migrate incrementally.
 ```rust
 // On Config:
 impl Config {
    /// Extract the global portion into AppConfig
    pub fn to_app_config(&self) -> AppConfig { /* copy serialized fields */ }
    /// Extract the runtime portion into RequestContext
    pub fn to_request_context(&self, app: Arc<AppState>) -> RequestContext { /* copy runtime fields */ }
    /// Reconstruct Config from the split types (for backward compat during migration)
    pub fn from_parts(app: &AppState, ctx: &RequestContext) -> Config { /* merge back */ }
 }
 ```
 **Test:** After this step, `Config::from_parts(config.to_app_config(), config.to_request_context())` round-trips correctly. Existing tests still pass.
 ### Step 2: Migrate static methods off Config
 There are ~30 static methods on Config (no `self` parameter). These are pure utility functions that don't need Config at all — they compute file paths, list directories, etc.
 **Target:** Move these to a standalone `paths` module or keep on `AppConfig` where appropriate.
 | Method | Move to |
 |---|---|
 | `config_dir()` | `paths::config_dir()` |
 | `local_path(name)` | `paths::local_path(name)` |
 | `cache_path()` | `paths::cache_path()` |
 | `oauth_tokens_path()` | `paths::oauth_tokens_path()` |
 | `token_file(client)` | `paths::token_file(client)` |
 | `log_path()` | `paths::log_path()` |
 | `config_file()` | `paths::config_file()` |
 | `roles_dir()` | `paths::roles_dir()` |
 | `role_file(name)` | `paths::role_file(name)` |
 | `macros_dir()` | `paths::macros_dir()` |
 | `macro_file(name)` | `paths::macro_file(name)` |
 | `env_file()` | `paths::env_file()` |
 | `rags_dir()` | `paths::rags_dir()` |
 | `functions_dir()` | `paths::functions_dir()` |
 | `functions_bin_dir()` | `paths::functions_bin_dir()` |
 | `mcp_config_file()` | `paths::mcp_config_file()` |
 | `global_tools_dir()` | `paths::global_tools_dir()` |
 | `global_utils_dir()` | `paths::global_utils_dir()` |
 | `bash_prompt_utils_file()` | `paths::bash_prompt_utils_file()` |
 | `agents_data_dir()` | `paths::agents_data_dir()` |
 | `agent_data_dir(name)` | `paths::agent_data_dir(name)` |
 | `agent_config_file(name)` | `paths::agent_config_file(name)` |
 | `agent_bin_dir(name)` | `paths::agent_bin_dir(name)` |
 | `agent_rag_file(agent, rag)` | `paths::agent_rag_file(agent, rag)` |
 | `agent_functions_file(name)` | `paths::agent_functions_file(name)` |
 | `models_override_file()` | `paths::models_override_file()` |
 | `list_roles(with_builtin)` | `Role::list(with_builtin)` or `paths` |
 | `list_rags()` | `Rag::list()` or `paths` |
 | `list_macros()` | `Macro::list()` or `paths` |
 | `has_role(name)` | `Role::exists(name)` |
 | `has_macro(name)` | `Macro::exists(name)` |
 | `sync_models(url, abort)` | Standalone function or on `AppConfig` |
 | `local_models_override()` | Standalone function |
 | `log_config()` | Standalone function |
 **Approach:** Create `src/config/paths.rs`, move functions there, and add `#[deprecated]` forwarding methods on `Config` that call the new locations. Compile, run tests, fix callsites module by module, then remove the deprecated methods.
 **Callsite count:** Low — most of these are called from 1-3 places. This is a quick-win step.
 ### Step 3: Migrate global-read methods to AppConfig
 These methods only read serialized config values and should live on `AppConfig`:
 | Method | Current Signature | New Home |
 |---|---|---|
 | `vault_password_file` | `&self -> PathBuf` | `AppConfig` |
 | `editor` | `&self -> Result<String>` | `AppConfig` |
 | `sync_models_url` | `&self -> String` | `AppConfig` |
 | `light_theme` | `&self -> bool` | `AppConfig` |
 | `render_options` | `&self -> Result<RenderOptions>` | `AppConfig` |
 | `print_markdown` | `&self, text -> Result<()>` | `AppConfig` |
 | `rag_template` | `&self, embeddings, sources, text -> String` | `AppConfig` |
 | `select_functions` | `&self, role -> Option<Vec<...>>` | `AppConfig` |
 | `select_enabled_functions` | `&self, role -> Vec<...>` | `AppConfig` |
 | `select_enabled_mcp_servers` | `&self, role -> Vec<...>` | `AppConfig` |
 **Same pattern:** Add new methods on `AppConfig`, add `#[deprecated]` forwarding on `Config`, migrate callers, remove.
 ### Step 4: Migrate global-write methods
 These modify serialized config settings (`.set` command, environment loading):
 | Method | Notes |
 |---|---|
 | `set_wrap` | Modifies `self.wrap` |
 | `update` | Generic key-value update of config settings |
 | `load_envs` | Applies env var overrides |
 | `load_functions` | Initializes function definitions |
 | `load_mcp_servers` | Starts MCP servers |
 | `setup_model` | Sets default model |
 | `setup_document_loaders` | Sets default doc loaders |
 | `setup_user_agent` | Sets user agent string |
 The `load_*` / `setup_*` methods are initialization-only (called once in `Config::init`). They become part of `AppState` construction.
 `update` and `set_wrap` are runtime mutations of global config. For the API world, these should require a config reload. For now, they can stay as methods that mutate `AppConfig` through interior mutability or require a mutable reference during REPL setup.
 ### Step 5: Migrate request-read methods to RequestContext
 Pure reads of per-request state:
 | Method | Notes |
 |---|---|
 | `state` | Returns flags for active role/session/agent/rag |
 | `messages_file` | Path depends on active agent |
 | `sessions_dir` | Path depends on active agent |
 | `session_file` | Path depends on active agent |
 | `rag_file` | Path depends on active agent |
 | `info` | Reads current agent/session/role/rag |
 | `role_info` | Reads current role |
 | `session_info` | Reads current session |
 | `agent_info` | Reads current agent |
 | `agent_banner` | Reads current agent |
 | `rag_info` | Reads current rag |
 | `list_sessions` | Depends on sessions_dir (agent context) |
 | `list_autoname_sessions` | Depends on sessions_dir |
 | `is_compressing_session` | Reads session state |
 | `role_like_mut` | Returns mutable ref to role-like |
 ### Step 6: Migrate request-write methods to RequestContext
 Mutations of per-request state:
 | Method | Notes |
 |---|---|
 | `use_prompt` | Sets temporary role |
 | `use_role` / `use_role_obj` | Sets role on session or self |
 | `exit_role` | Clears role |
 | `edit_role` | Edits and re-applies role |
 | `use_session` | Sets session |
 | `exit_session` | Saves and clears session |
 | `save_session` | Persists session |
 | `empty_session` | Clears session messages |
 | `set_save_session_this_time` | Session flag |
 | `compress_session` / `maybe_compress_session` | Session compression |
 | `autoname_session` / `maybe_autoname_session` | Session naming |
 | `use_rag` / `exit_rag` / `edit_rag_docs` / `rebuild_rag` | RAG lifecycle |
 | `use_agent` / `exit_agent` / `exit_agent_session` | Agent lifecycle |
 | `apply_prelude` | Sets role/session from prelude config |
 | `before_chat_completion` | Pre-LLM state updates |
 | `after_chat_completion` | Post-LLM state updates |
 | `discontinuous_last_message` | Message state |
 | `init_agent_shared_variables` | Agent vars |
 | `init_agent_session_variables` | Agent session vars |
 ### Step 6.5: Unify tool/MCP fields into `ToolScope` and agent fields into `AgentRuntime`
 After Step 6, `RequestContext` has many flat fields that logically cluster into two sub-structs. This step collapses them and introduces three new services on `AppState`.
 **New types:**
 ```rust
 pub struct ToolScope {
    pub functions: Functions,
    pub mcp_runtime: McpRuntime,
    pub tool_tracker: ToolCallTracker,
 }
 pub struct McpRuntime {
    servers: HashMap<String, Arc<McpServerHandle>>,
 }
 pub struct AgentRuntime {
    pub spec: AgentSpec,
    pub rag: Option<Arc<Rag>>,                   // shared across siblings of same type
    pub supervisor: Supervisor,
    pub inbox: Arc<Inbox>,
    pub escalation_queue: Arc<EscalationQueue>,  // shared from root
    pub todo_list: Option<TodoList>,             // Some(...) only when auto_continue: true
    pub self_agent_id: String,
    pub parent_supervisor: Option<Arc<Supervisor>>,
    pub current_depth: usize,
    pub auto_continue_count: usize,
 }
 ```
 **New services on `AppState`:**
 ```rust
 pub struct AppState {
    pub config: Arc<AppConfig>,
    pub vault: GlobalVault,
    pub mcp_factory: Arc<McpFactory>,
    pub rag_cache: Arc<RagCache>,
 }
 pub struct McpFactory {
    active: Mutex<HashMap<McpServerKey, Weak<McpServerHandle>>>,
    // idle pool + reaper added in Phase 5; Step 6.5 ships the no-pool version
 }
 impl McpFactory {
    pub async fn acquire(&self, key: &McpServerKey) -> Result<Arc<McpServerHandle>>;
 }
 pub struct RagCache {
    entries: RwLock<HashMap<RagKey, Weak<Rag>>>,
 }
 #[derive(Hash, Eq, PartialEq, Clone, Debug)]
 pub enum RagKey {
    Named(String),   // standalone: rags/<name>.yaml
    Agent(String),   // agent-owned: agents/<name>/rag.yaml
 }
 impl RagCache {
    pub async fn load(&self, key: &RagKey) -> Result<Option<Arc<Rag>>>;
    pub fn invalidate(&self, key: &RagKey);
 }
 ```
 **`RequestContext` after collapse:**
 ```rust
 pub struct RequestContext {
    pub app: Arc<AppState>,
    pub macro_flag: bool,
    pub info_flag: bool,
    pub working_mode: WorkingMode,
    pub model: Model,
    pub agent_variables: Option<AgentVariables>,
    pub role: Option<Role>,
    pub session: Option<Session>,
    pub rag: Option<Arc<Rag>>,  // session/standalone RAG, not agent RAG
    pub agent: Option<Agent>,
    pub last_message: Option<LastMessage>,
    pub tool_scope: ToolScope,                // replaces functions + tool_call_tracker + global mcp_registry
    pub agent_runtime: Option<AgentRuntime>,  // replaces supervisor + inbox + escalation_queue + todo + self_id + parent + depth; holds shared agent RAG
 }
 ```
 **What this step does:**
 1. Implement `McpRuntime` and `ToolScope`.
 2. Implement `McpFactory` — **no pool, no idle handling, no reaper.** `acquire()` checks `active` for an upgradable `Weak`, otherwise spawns fresh. `Drop` on `McpServerHandle` tears down the subprocess directly. Pooling lands in Phase 5.
 3. Implement `RagCache` with `RagKey` enum, weak-ref sharing, and per-key serialization for concurrent first-load.
 4. Implement `AgentRuntime` with the shape above. `todo_list` is `Option` — only allocated when `agent.spec.auto_continue == true`. `rag` is served from `RagCache` during activation via `RagKey::Agent(name)`.
 5. Rewrite scope transitions (`use_role`, `use_session`, `use_agent`, `exit_*`, `Config::update`) to:
   - Resolve the effective enabled-tool / enabled-MCP-server list using priority `Agent > Session > Role > Global`
   - Build a fresh `McpRuntime` by calling `McpFactory::acquire()` for each required server key
   - Construct a new `ToolScope` wrapping the runtime + resolved `Functions`
   - Swap `ctx.tool_scope` atomically
 6. `use_rag` (standalone / `.rag <name>` path) is rewritten to call `app.rag_cache.load(RagKey::Named(name))` and assign the result to `ctx.rag`. No role/session RAG changes because roles/sessions do not own RAG.
 7. Agent activation additionally:
   - Calls `app.rag_cache.load(RagKey::Agent(agent_name))` and stores the returned `Arc<Rag>` in the new `AgentRuntime.rag`
   - Allocates `todo_list: Some(TodoList::default())` only when `auto_continue: true`; otherwise `None`
   - Constructs the `AgentRuntime` and assigns it to `ctx.agent_runtime`
   - **Preserves today's clobber behavior for standalone RAG:** does NOT save `ctx.rag` anywhere. When the agent exits, the user's previous `.rag <name>` selection is not restored (matches current behavior). Stacking / restoration is flagged as a Phase 2+ enhancement.
 8. `exit_agent` drops `ctx.agent_runtime` (which drops the agent's `Arc<Rag>`; the cache entry becomes evictable if no other scope holds it) and rebuilds `ctx.tool_scope` from the now-topmost `RoleLike`.
 9. Sub-agent spawning (in `function/supervisor.rs`) constructs a fresh `RequestContext` for the child from the shared `AppState`:
   - Its own `ToolScope` via `McpFactory::acquire()` calls for the child agent's `mcp_servers`
   - Its own `AgentRuntime` with:
     - `rag` via `rag_cache.load(RagKey::Agent(child_agent_name))` — shared with parent/siblings of same type
     - Fresh `Supervisor`, fresh `Inbox`, `current_depth = parent.depth + 1`
     - `parent_supervisor = Some(parent.supervisor.clone())` (for messaging)
     - `escalation_queue = parent.escalation_queue.clone()` — one queue, rooted at the human
     - `todo_list` honoring the child's own `auto_continue` flag
 10. Old `Agent::init` logic that mutates a global `McpRegistry` is removed — that's now `McpFactory::acquire()` producing scope-local handles.
 11. `rebuild_rag` and `edit_rag_docs` are updated to determine the correct `RagKey` (check `ctx.agent_runtime` first — if present use `RagKey::Agent(spec.name)`, otherwise use the standalone name from `ctx.rag`'s origin) and call `rag_cache.invalidate(&key)` before reloading.
 **What this step preserves:**
 - **Diff-based reinit for REPL users** — when you `.exit role` from `[github, jira]` back to global `[github]`, the new `ToolScope` is built by calling `McpFactory::acquire("github")`. Without pooling (Phase 1), this respawns `github`. With pooling (Phase 5), `github`'s `Arc` is still held by no one, but the idle pool keeps it warm, so revival is instant. The Phase 1 version has a mild regression here that Phase 5 fixes.
 - **Agent-vs-non-agent compatibility** — today's `Agent::init` reinits a global registry; after this step, agent activation replaces `ctx.tool_scope` with an agent-specific one, and `exit_agent` restores the pre-agent scope by rebuilding from the (now-active) role/session/global lists.
 - **Todo semantics from the user's perspective** — today's behavior is "todos are available when `auto_continue: true`". After Step 6.5, it's still "todos are available when `auto_continue: true`" — the only difference is we skip the wasted `TodoList::default()` allocation for the other agents.
 **Risk:** Medium–high. This is where the Phase 1 refactor stops being mechanical and starts having semantic implications. Five things to watch:
 1. **Parent scope restoration on `exit_agent`.** Today, `exit_agent` tears down the agent's MCP set but leaves the registry in whatever state `reinit` put it — the parent's original MCP set is NOT restored. Users don't notice because the next scope activation (or REPL exit) reinits anyway. In the new design, `exit_agent` MUST rebuild the parent's `ToolScope` from the still-active role/session/global lists so the user sees the expected state. Test this carefully.
 2. **`McpFactory` contention.** With many concurrent sub-agents (say, 4 siblings each needing different MCP sets), the factory's mutex could become a bottleneck during `acquire()`. Hold the lock only while touching `active`, never while awaiting subprocess spawn.
 3. **`RagCache` concurrent first-load.** If two consumers request the same `RagKey` simultaneously and neither finds a cached entry, both will try to `Rag::load` from disk. Use per-key `tokio::sync::Mutex` or `OnceCell` to serialize the first load — the second caller blocks briefly and receives the shared Arc. This applies equally to standalone and agent RAGs.
 4. **Weak ref staleness in `RagCache`.** The `Weak<Rag>` in the map might point to a dropped `Rag`. The `load()` path MUST attempt `Weak::upgrade()` before returning; if upgrade fails, treat it as a miss and reload.
 5. **`rebuild_rag` / `edit_rag_docs` race.** If a user runs `.rag rebuild` while another scope holds the same `Arc<Rag>` (concurrent API session, running sub-agent, etc.), the cache invalidation must NOT yank the Arc out from under the active holder. The `Arc` keeps its reference alive — invalidation just ensures *future* loads read fresh. This is the correct behavior for both standalone and agent RAG; worth confirming in tests.
 6. **Identifying the right `RagKey` during rebuild.** `rebuild_rag` today operates on `Config.rag` without knowing its origin. In the new model, the code needs to check `ctx.agent_runtime` first to determine if the active RAG is agent-owned (`RagKey::Agent`) or standalone (`RagKey::Named`). Get this wrong and you invalidate the wrong cache entry, silently breaking subsequent loads.
 ### Step 7: Tackle mixed methods (THE HARD PART)
 These 17 methods conditionally read global config OR per-request state depending on what's active. They need to be split into explicit parameter passing.
 | Method | Why it's mixed | Refactoring approach |
 |---|---|---|
 | `current_model` | Returns agent model, session model, role model, or global model | Take `(&AppConfig, &RequestContext) -> &Model` — check ctx first, fall back to app |
 | `extract_role` | Builds role from session/agent/role or global settings | Take `(&AppConfig, &RequestContext) -> Role` |
 | `sysinfo` | Reads global settings + current rag/session/agent | Take `(&AppConfig, &RequestContext) -> String` |
 | `set_temperature` | Sets on role-like or global | Split: `ctx.set_temperature()` for role-like, `app.set_temperature()` for global |
 | `set_top_p` | Same pattern as temperature | Same split |
 | `set_enabled_tools` | Same pattern | Same split |
 | `set_enabled_mcp_servers` | Same pattern | Same split |
 | `set_save_session` | Sets on session or global | Same split |
 | `set_compression_threshold` | Sets on session or global | Same split |
 | `set_rag_reranker_model` | Sets on rag or global | Same split |
 | `set_rag_top_k` | Sets on rag or global | Same split |
 | `set_max_output_tokens` | Sets on role-like model or global model | Same split |
 | `set_model` | Sets on role-like or global | Same split |
 | `retrieve_role` | Loads role, merges with current model settings | Take `(&AppConfig, &RequestContext, name) -> Role` |
 | `use_role_safely` | Takes GlobalConfig, does take/replace pattern | Refactor to `(&mut RequestContext, name)` |
 | `use_session_safely` | Takes GlobalConfig, does take/replace pattern | Refactor to `(&mut RequestContext, name)` |
 | `save_message` | Reads `save` flag (global) + writes to messages_file (agent-dependent path) | Take `(&AppConfig, &RequestContext, input, output)` |
 | `render_prompt_left/right` | Reads prompt format (global) + current model/session/agent/role (request) | Take `(&AppConfig, &RequestContext) -> String` |
 | `generate_prompt_context` | Same as prompt rendering | Take `(&AppConfig, &RequestContext) -> HashMap` |
 | `repl_complete` | Reads both global config and request state for completions | Take `(&AppConfig, &RequestContext, cmd, args) -> Vec<...>` |
 **Common pattern for `set_*` methods:** The current code does something like:
 ```rust
 fn set_temperature(&mut self, value: Option<f64>) {
    if let Some(role_like) = self.role_like_mut() {
        role_like.set_temperature(value);
    } else {
        self.temperature = value;
    }
 }
 ```
 This becomes:
 ```rust
 // On RequestContext:
 fn set_temperature(&mut self, value: Option<f64>, app_defaults: &AppConfig) {
    if let Some(role_like) = self.role_like_mut() {
        role_like.set_temperature(value);
    }
    // Global default mutation goes through a separate path if needed
 }
 ```
 ### Step 8: The Caller Migration Epic (absorbed scope from Steps 6.5 and 7)
 **Important:** the original plan described Step 8 as "rewrite main.rs and repl/mod.rs entry points." During implementation, Steps 6.5 and 7 deliberately deferred their semantic rewrites to Step 8 so the bridge pattern (add new methods alongside old, don't migrate callers yet) stayed consistent. As a result, Step 8 now absorbs:
 - **Original Step 8 scope:** entry point rewrite (`main.rs`, `repl/mod.rs`)
 - **From Step 6.5 deferrals:** `McpFactory::acquire()` implementation, scope transition rewrites (`use_role`, `use_session`, `use_agent`, `exit_agent`), RAG lifecycle via `RagCache` (`use_rag`, `edit_rag_docs`, `rebuild_rag`), session compression/autoname, `apply_prelude`, sub-agent spawning
 - **From Step 7 deferrals:** `Model::retrieve_model` client module refactor, `retrieve_role`, `set_model`, `repl_complete`, `setup_model`, `update` dispatcher, `set_rag_reranker_model`, `set_rag_top_k`, `use_role_safely`/`use_session_safely` elimination, `use_prompt`, `edit_role`
 This is a large amount of work. Step 8 is split into **8 sub-steps (8a–8h)** for clarity and reviewability. Each sub-step keeps the build green and can be completed as a standalone unit.
 **Dependency graph between sub-steps:**
 ```
                                                    ┌─────────┐
                                                    │   8a    │  client module refactor
                                                    │ (Model: │  (Model::retrieve_model, list_models,
                                                    │  &App-  │   list_all_models! → &AppConfig)
                                                    │ Config) │
                                                    └────┬────┘
                                                         │ unblocks
                                                         ▼
                                                    ┌─────────┐
                                                    │   8b    │  remaining Step 7 deferrals
                                                    │ (Step 7 │  (retrieve_role, set_model, setup_model,
                                                    │  debt)  │   use_prompt, edit_role, update, etc.)
                                                    └────┬────┘
                                                         │
            ┌──────────┐                                 │
            │    8c    │                                 │
            │(McpFac-  │──┐                              │
            │ tory::   │  │ unblocks                     │
            │acquire())│  │                              │
            └──────────┘  │                              │
                          ▼                              │
                     ┌─────────┐                         │
                     │   8d    │                         │
                     │ (scope  │──┐                      │
                     │ trans.) │  │ unblocks             │
                     └─────────┘  │                      │
                                  ▼                      │
                              ┌─────────┐                │
                              │   8e    │                │
                              │ (RAG +  │──┐             │
                              │session) │  │             │
                              └─────────┘  │             │
                                           ▼             ▼
                                          ┌───────────────────┐
                                          │      8f + 8g      │
                                          │  caller migration │
                                          │  (main.rs, REPL)  │
                                          └─────────┬─────────┘
                                                    ▼
                                              ┌──────────┐
                                              │    8h    │  remaining callsites
                                              │ (sweep)  │  (priority-ordered)
                                              └──────────┘
 ```
 ---
 #### Step 8a: Client module refactor — `Model::retrieve_model` takes `&AppConfig`
 Target: remove the `&Config` dependency from the LLM client infrastructure so Step 8b's mixed-method migrations (`retrieve_role`, `set_model`, `repl_complete`, `setup_model`) can proceed.
 **Files touched:**
 - `src/client/model.rs` — `Model::retrieve_model(config: &Config, ...)` → `Model::retrieve_model(config: &AppConfig, ...)`
 - `src/client/macros.rs` — `list_all_models!` macro takes `&AppConfig` instead of `&Config`
 - `src/client/*.rs` — `list_models`, helper functions updated to take `&AppConfig`
 - Any callsite in `src/config/`, `src/main.rs`, `src/repl/`, etc. that calls these client functions — updated to pass `&config.app_config_snapshot()` or equivalent during the bridge window
 **Bridge strategy:** add a helper `Config::app_config_snapshot(&self) -> AppConfig` that clones the serialized fields into an `AppConfig`. Callsites that currently pass `&*config.read()` pass `&config.read().app_config_snapshot()` instead. This is slightly wasteful (clones ~40 fields per call) but keeps the bridge window working without a mass caller rewrite. Step 8f/8g will eliminate the clones when callers hold `Arc<AppState>` directly.
 **Verification:** full build green. All existing tests pass. CLI/REPL manual smoke test: `loki --model openai:gpt-4o "hello"` still works.
 **Risk:** Low. Mechanical refactor. The bridge helper absorbs the signature change cost.
 ---
 #### Step 8b: Finish Step 7's deferred mixed-method migrations
 With Step 8a done, the methods that transitively depended on `Model::retrieve_model(&Config)` can now migrate to `RequestContext` with `&AppConfig` parameters.
 **Methods migrated to `RequestContext`:**
 - `retrieve_role(&self, app: &AppConfig, name: &str) -> Result<Role>`
 - `set_model_on_role_like(&mut self, app: &AppConfig, model_id: &str) -> Result<bool>` (paired with `AppConfig::set_model_default`)
 - `repl_complete(&self, app: &AppConfig, cmd: &str, args: &[&str]) -> Vec<(String, Option<String>)>`
 - `setup_model(&mut self, app: &AppConfig) -> Result<()>` — actually, `setup_model` writes to `self.model_id` (serialized) AND `self.model` (runtime). The split: `AppConfig::ensure_default_model_id()` picks the first available model and updates `self.model_id`; `RequestContext::reload_current_model(&AppConfig)` refreshes `ctx.model` from the app config's id.
 - `use_prompt(&mut self, app: &AppConfig, prompt: &str) -> Result<()>` — trivial wrapper around `extract_role` (already done) + `use_role_obj` (Step 6)
 - `edit_role(&mut self, app: &AppConfig, abort_signal: AbortSignal) -> Result<()>` — calls `app.editor()`, `upsert_role`, `use_role` (still deferred to 8d)
 **RAG-related deferrals:**
 - `set_rag_reranker_model` and `set_rag_top_k` get split: the runtime branch (update the active `Rag`) becomes a `RequestContext` method taking `Arc<Rag>` mutation, and the global branch becomes `AppConfig::set_rag_reranker_model_default` / `AppConfig::set_rag_top_k_default`.
 **`update` dispatcher:** Once all the individual `set_*` methods exist on both types, `update` migrates to `RequestContext::update(&mut self, app: &mut AppConfig, data: &str) -> Result<()>`. The dispatcher's body becomes a match that calls the appropriate split pair for each key.
 **`use_role_safely` / `use_session_safely`:** Still not eliminated in 8b — they're wrappers around the still-`Config`-based `use_role` and `use_session`. Eliminated in 8f/8g when callers switch to `&mut RequestContext`.
 **Verification:** full build green. All tests pass. Smoke test: `.set temperature 0.7`, `.set enabled_tools fs`, `.model openai:gpt-4o` all work in REPL.
 **Risk:** Low. Same bridge pattern, now unblocked by 8a.
 ---
 #### Step 8c: Extract `McpFactory::acquire()` from `McpRegistry::init_server`
 Target: give `McpFactory` a working `acquire()` method so Step 8d can build real `ToolScope` instances.
 **Files touched:**
 - `src/mcp/mod.rs` — extract the MCP subprocess spawn + rmcp handshake logic (currently inside `McpRegistry::init_server`, ~60 lines) into a standalone function:
  ```rust
  pub(crate) async fn spawn_mcp_server(
      spec: &McpServer,
      log_path: Option<&Path>,
      abort_signal: &AbortSignal,
  ) -> Result<ConnectedServer>
  ```
  `McpRegistry::init_server` then calls this helper and does its own bookkeeping. Backward-compatible for bridge callers.
 - `src/config/mcp_factory.rs` — implement `McpFactory::acquire(spec: &McpServer, log_path, abort_signal) -> Result<Arc<ConnectedServer>>`:
  1. Build an `McpServerKey` from the spec
  2. Try `self.try_get_active(&key)` → share if upgraded
  3. Otherwise call `spawn_mcp_server(spec, ...).await` → wrap in `Arc` → `self.insert_active(key, &arc)` → return
 - Write a couple of integration tests that exercise the factory's sharing behavior with a mock server spec (or document why a real integration test needs Phase 5's pooling work)
 **What this step does NOT do:** no caller migration, no `ToolScope` construction, no changes to `McpRegistry::reinit`. Step 8d does those.
 **Verification:** new unit tests pass. Existing tests pass. `McpRegistry` still works for all current callers.
 **Risk:** Medium. The spawn logic is intricate (child process + stdio handshake + error recovery). Extracting without a behavior change requires careful diff review.
 ---
 #### Step 8d: Scope transition rewrites — `use_role`, `use_session`, `use_agent`, `exit_agent`
 Target: build real `ToolScope` instances via `McpFactory` when scopes change. This is where Step 6.5's scaffolding stops being scaffolding.
 **New methods on `RequestContext`:**
 - `use_role(&mut self, app: &AppConfig, name: &str, abort_signal: AbortSignal) -> Result<()>`:
  1. Call `self.retrieve_role(app, name)?` (from 8b)
  2. Resolve the role's `enabled_mcp_servers` list
  3. Build a fresh `ToolScope` by calling `app.mcp_factory.acquire(spec, ...)` for each required server
  4. Populate `ctx.tool_scope.functions` with the role's effective function list via `select_functions(app, &role)`
  5. Swap `ctx.tool_scope` atomically
  6. Call `self.use_role_obj(role)` (from Step 6)
 - `use_session(&mut self, app: &AppConfig, session_name: Option<&str>, abort_signal) -> Result<()>` — same pattern, with session-specific handling for `agent_session_variables`
 - `use_agent(&mut self, app: &AppConfig, agent_name: &str, session_name: Option<&str>, abort_signal) -> Result<()>` — builds an `AgentRuntime` (Step 6.5 scaffolding), populates `ctx.agent_runtime`, activates the optional inner session
 - `exit_agent(&mut self, app: &AppConfig) -> Result<()>` — drops `ctx.agent_runtime`, rebuilds `ctx.tool_scope` from the now-topmost RoleLike (role/session/global), cancels the supervisor, clears RAG if it came from the agent
 **Key invariant: parent scope restoration on `exit_agent`.** Today's `Config::exit_agent` leaves the `McpRegistry` in whatever state the agent left it. The new `exit_agent` explicitly rebuilds `ctx.tool_scope` from the current role/session/global enabled-server lists so the user sees the expected state after exiting an agent. This is a semantic improvement over today's behavior (which technically has a latent bug that nobody notices because the next scope activation fixes it).
 **What this step does NOT do:** no caller migration. `Config::use_role`, `Config::use_session`, etc. are still on `Config` and still work for existing callers. The `_safely` wrappers are still around.
 **Verification:** new `RequestContext::use_role` etc. have unit tests. Full build green. Existing tests pass. No runtime behavior change because nothing calls the new methods yet.
 **Risk:** Medium–high. This is the first time `McpFactory::acquire()` is exercised outside unit tests. Specifically watch:
 - **`McpFactory` mutex contention** — hold the `active` lock only during HashMap mutation, never across subprocess spawn or `await`
 - **Parent scope restoration correctness** — write a targeted test that activates an agent with `[github]`, exits, activates a role with `[jira]`, and verifies the tool scope has only `jira` (not `github` leftover)
 ---
 #### Step 8e: RAG lifecycle + session compression + `apply_prelude`
 Target: migrate the Category C deferrals from Step 6 (session/RAG lifecycle methods that currently take `&GlobalConfig`).
 **New methods on `RequestContext`:**
 - `use_rag(&mut self, app: &AppConfig, name: Option<&str>, abort_signal) -> Result<()>` — routes through `app.rag_cache.load(RagKey::Named(name))`
 - `edit_rag_docs(&mut self, app: &AppConfig, abort_signal) -> Result<()>` — determines the `RagKey` (Agent or Named) from `ctx.agent_runtime` / `ctx.rag` origin, calls `app.rag_cache.invalidate(&key)`, reloads
 - `rebuild_rag(&mut self, app: &AppConfig, abort_signal) -> Result<()>` — same pattern as `edit_rag_docs`
 - `compress_session(&mut self, app: &AppConfig) -> Result<()>` — reads `app.summarization_prompt`, `app.summary_context_prompt`, mutates `ctx.session`. Async, does an LLM call via an existing `Input::from_str` pattern.
 - `maybe_compress_session(&mut self, app: &AppConfig) -> bool` — checks `ctx.session.needs_compression(app.compression_threshold)`, triggers compression if so. Returns whether compression was triggered; caller decides whether to spawn a background task (the task spawning moves to the caller's responsibility, not the method's).
 - `autoname_session(&mut self, app: &AppConfig) -> Result<()>` — same pattern, uses `CREATE_TITLE_ROLE` and `Input::from_str`
 - `maybe_autoname_session(&mut self, app: &AppConfig) -> bool` — same return-bool pattern
 - `apply_prelude(&mut self, app: &AppConfig, abort_signal) -> Result<()>` — parses `app.repl_prelude` / `app.cmd_prelude`, calls the new `self.use_role()` / `self.use_session()` from 8d
 **The `GlobalConfig`-taking static methods go away.** Today's code uses the pattern `Config::maybe_compress_session(config: GlobalConfig)` which takes an owned `Arc<RwLock<Config>>` and spawns a background task. After 8e, the new `RequestContext::maybe_compress_session` returns a bool; callers that want async compression spawn the task themselves with their `RequestContext` context. This is simpler and more explicit.
 **Verification:** new methods have unit tests where feasible. Full build green. `compress_session` and `autoname_session` are tricky to unit-test because they do LLM calls; mock the LLM or skip the full path in tests.
 **Risk:** Medium. The session compression flow is the most behavior-sensitive — getting the semantics wrong here results in lost session history. Write a targeted integration test that feeds 10+ user messages into a session, triggers compression, and verifies the session's summary is preserved.
 ---
 #### Step 8f: Entry point rewrite — `main.rs`
 Target: rewrite `main.rs` to construct `AppState` + `RequestContext` explicitly instead of using `GlobalConfig`.
 **Specific changes:**
 - `Config::init()` → `AppState::init()` which:
  1. Loads `config.yaml` into `AppConfig`
  2. Applies environment variable overrides (calls `AppConfig::load_envs` from Step 4)
  3. Calls `AppConfig::setup_document_loaders` / `AppConfig::setup_user_agent` (Step 4)
  4. Constructs the `Vault`, `McpFactory`, `RagCache`
  5. Returns `Arc<AppState>`
 - `main::run()` constructs a `RequestContext` from the `AppState` and threads it through to subcommands
 - `main::start_directive(ctx: &mut RequestContext, ...)` — signature change
 - `main::create_input(ctx: &RequestContext, ...)` — signature change
 - `main::shell_execute(ctx: &mut RequestContext, ...)` — signature change
 - All 18 `main.rs` callsites updated
 **`load_functions` and `load_mcp_servers`:** These are initialization-time methods that populate `ctx.tool_scope.functions` and `ctx.tool_scope.mcp_runtime`. They move from `Config` to a new `RequestContext::bootstrap_tools(&mut self, app: &AppConfig, abort_signal) -> Result<()>` that:
 1. Initializes `Functions` via `Functions::init(visible_tools)` (existing code)
 2. Resolves the initial enabled-MCP-server list from `app.enabled_mcp_servers`
 3. Calls `app.mcp_factory.acquire()` for each
 4. Assigns the result to `ctx.tool_scope`
 This replaces the `Config::load_functions` + `Config::load_mcp_servers` call sequence in today's `main.rs`.
 **Verification:** CLI smoke tests from the original plan's Step 8 verification checklist. Specifically:
 - `loki "hello"` — plain prompt
 - `loki --role explain "what is TCP"` — role activation
 - `loki --session my-project "..."` — session
 - `loki --agent sisyphus "..."` — agent activation
 - `loki --info` — sysinfo output
 Each should produce output matching the pre-Step-8 behavior exactly.
 **Risk:** High. `main.rs` is the primary entry point; any regression here is user-visible. Write smoke tests that compare CLI output byte-for-byte with a recorded baseline.
 ---
 #### Step 8g: REPL rewrite — `repl/mod.rs`
 Target: rewrite `repl/mod.rs` to use `&mut RequestContext` instead of `GlobalConfig`.
 **Specific changes:**
 - `Repl` struct: `config: GlobalConfig` → `ctx: RequestContext` (long-lived, mutable across turns)
 - `run_repl_command(ctx: &mut RequestContext, ...)` — signature change
 - `ask(ctx: &mut RequestContext, ...)` — signature change
 - Every dot-command handler updated. Dot-commands that take the `GlobalConfig` pattern (like the `_safely` wrappers) are **eliminated** — they just call `ctx.use_role(...)` directly.
 - All 39 command handlers migrated
 - All 12 `repl/mod.rs` internal callsites updated
 **`use_role_safely` / `use_session_safely` elimination:** these wrappers exist only because `Config::use_role` is `&mut self` and the REPL holds `Arc<RwLock<Config>>`. After Step 8g, the REPL holds `RequestContext` directly (no lock), so the wrappers are no longer needed and get deleted.
 **Verification:** REPL smoke tests matching the pre-Step-8 behavior. Specifically:
 - Start REPL, issue a prompt → should see same output
 - `.role explain`, `.session my-session`, `.agent sisyphus`, `.exit agent` — should all work identically
 - `.set temperature 0.7` then `.info` — should show updated temperature
 - Ctrl-C during an LLM call — should cleanly abort
 - `.macro run-tests` — should execute without errors
 **Risk:** High. Same reason as 8f — this is a user-visible entry point. Test every dot-command.
 ---
 #### Step 8h: Remaining callsite sweep
 Target: migrate the remaining modules in priority order (lowest callsite count first, keeping the build green after each module):
 | Priority | Module | Callsites | Notes |
 |---|---|---|---|
 | 1 | `render/mod.rs` | 1 | `render_stream` just reads config — trivial |
 | 2 | `repl/completer.rs` | 1 | Just reads for completions |
 | 3 | `repl/prompt.rs` | 1 | Just reads for prompt rendering |
 | 4 | `function/user_interaction.rs` | 1 | Just reads for user prompts |
 | 5 | `function/mod.rs` | 2 | `eval_tool_calls` reads config |
 | 6 | `config/macros.rs` | 3 | `macro_execute` reads and writes |
 | 7 | `function/todo.rs` | 4 | Todo handlers read/write agent state |
 | 8 | `config/input.rs` | 6 | Input creation — reads config |
 | 9 | `rag/mod.rs` | 6 | RAG init/search |
 | 10 | `function/supervisor.rs` | 8 | Sub-agent spawning — complex |
 | 11 | `config/agent.rs` | 12 | Agent init — complex, many mixed concerns |
 **Sub-agent spawning** (`function/supervisor.rs`) is the most complex item in the sweep. Each child agent gets a fresh `RequestContext` forked from the parent's `Arc<AppState>`:
 - Own `ToolScope` built by calling `app.mcp_factory.acquire()` for the child's `mcp_servers` list
 - Own `AgentRuntime` with fresh supervisor, fresh inbox, `current_depth = parent.depth + 1`
 - `parent_supervisor = Some(parent.agent_runtime.supervisor.clone())` — weakly linked to parent for messaging
 - `escalation_queue = parent.agent_runtime.escalation_queue.clone()` — `Arc`-shared from root
 - RAG served via `app.rag_cache.load(RagKey::Agent(child_name))` — shared with any sibling of the same type
 `config/agent.rs` — `Agent::init` is currently tightly coupled to `Config`. It needs to be rewritten to take `&AppState` + `&mut RequestContext`. Some of its complexity (MCP server startup, RAG loading) moves into `RequestContext::use_agent` from Step 8d; `Agent::init` becomes just the spec-loading portion.
 **Verification:** after each module migrates, run full `cargo check` + `cargo test`. After all modules migrate, run the full smoke test suite from 8f and 8g.
 **Risk:** Medium. The sub-agent spawning and `config/agent.rs` work is complex, but the bridge pattern means we can take each module independently.
 ---
 ### Step 9: Remove the Bridge
 After Step 8h, no caller references `GlobalConfig` or calls `Config`-based methods that have `RequestContext` / `AppConfig` equivalents. Step 9 is the cleanup:
 1. Delete `src/config/bridge.rs` (the conversion methods added in Step 1)
 2. Delete the `#[allow(dead_code)]` attributes from `AppConfig` impl blocks (now that callers use them)
 3. Delete the `#[allow(dead_code)]` attributes from `RequestContext` impl blocks
 4. Remove the flat runtime fields from `RequestContext` that have been superseded by `tool_scope` and `agent_runtime` (i.e., delete `functions`, `tool_call_tracker`, `supervisor`, `parent_supervisor`, `self_agent_id`, `current_depth`, `inbox`, `root_escalation_queue` from `RequestContext`; they now live inside `ToolScope` and `AgentRuntime`)
 5. Run the full test suite
 This step is mostly mechanical deletion. The one care-ful part is removing the flat fields — make sure no remaining code path still reads them. `cargo check` will catch any stragglers.
 ### Step 10: Remove the old Config struct and GlobalConfig type alias
 Once Step 9 is done and the tree is clean:
 1. Delete all Step 2-7 method definitions on `Config` (the ones that got duplicated to `paths` / `AppConfig` / `RequestContext`)
 2. Delete the `#[serde(skip)]` runtime fields on `Config` (they're all on `RequestContext` now)
 3. At this point `Config` should be nearly empty — possibly just the serialized fields
 4. Delete `Config` entirely (or rename it to `RawConfig` if it's still useful as a serde DTO, though `AppConfig` already plays that role)
 5. Delete the `pub type GlobalConfig = Arc<RwLock<Config>>` type alias
 6. Run `cargo check`, `cargo test`, `cargo clippy` — all clean
 7. Run the full manual smoke test suite one more time
 Phase 1 complete.
 ---
 ## Callsite Migration Summary
 | Module | Functions to Migrate | Handled In |
 |---|---|---|
 | `config/mod.rs` | 120 methods (30 static, 10 global-read, 8 global-write, 35 request-read/write, 17 mixed) | Steps 2-7 (mechanical duplication), Step 10 (deletion) |
 | `client/` macros and `model.rs` | `Model::retrieve_model`, `list_all_models!`, `list_models` | Step 8a |
 | `main.rs` | `run`, `start_directive`, `shell_execute`, `create_input`, `apply_prelude_safely` | Step 8f |
 | `repl/mod.rs` | `run_repl_command`, `ask`, plus 39 command handlers | Step 8g |
 | `config/agent.rs` | `Agent::init`, agent lifecycle methods | Step 8h (partial) + Step 8d (scope transitions) |
 | `function/supervisor.rs` | Sub-agent spawning, task management | Step 8h |
 | `config/input.rs` | `Input::from_str`, `from_files`, `from_files_with_spinner` | Step 8h |
 | `rag/mod.rs` | RAG init, load, search | Step 8e (lifecycle) + Step 8h (remaining) |
 | `mcp/mod.rs` | `McpRegistry::init_server` spawn logic extraction | Step 8c |
 | `function/mod.rs` | `eval_tool_calls` | Step 8h |
 | `function/todo.rs` | Todo handlers | Step 8h |
 | `function/user_interaction.rs` | User prompt handler | Step 8h |
 | `render/mod.rs` | `render_stream` | Step 8h |
 | `repl/completer.rs` | Completion logic | Step 8h |
 | `repl/prompt.rs` | Prompt rendering | Step 8h |
 | `config/macros.rs` | `macro_execute` | Step 8h |
 ### Step 8 effort estimates
 | Sub-step | Effort | Risk |
 |---|---|---|
 | 8a — client module refactor | 0.5–1 day | Low |
 | 8b — Step 7 deferrals | 0.5–1 day | Low |
 | 8c — `McpFactory::acquire()` extraction | 1 day | Medium |
 | 8d — scope transition rewrites | 1–2 days | Medium–high |
 | 8e — RAG + session lifecycle migration | 1–2 days | Medium |
 | 8f — `main.rs` rewrite | 1 day | High |
 | 8g — `repl/mod.rs` rewrite | 1–2 days | High |
 | 8h — remaining callsite sweep | 1–2 days | Medium |
 **Total estimated Step 8 effort: ~7–12 days.** The "total Phase 1 effort" from the plan header needs to be updated once Step 8 finishes.
 ---
 ## Verification Checkpoints
 After each step, verify:
 1. **`cargo check`** — no compilation errors
 2. **`cargo test`** — all existing tests pass
 3. **Manual smoke test** — CLI one-shot prompt works, REPL starts and processes a prompt
 4. **No behavior changes** — identical output for identical inputs
 ## Risk Factors
 ### Phase-wide risks
 | Risk | Severity | Mitigation |
 |---|---|---|
 | Bridge-window duplication drift — bug fixed in `Config::X` but not `RequestContext::X` or vice versa | Medium | Keep the bridge window as short as possible. Step 8 should finish within 2 weeks of Step 7 ideally. Any bug fix during Steps 8a-8h must be applied to both places if the method is still duplicated. |
 | Sub-agent spawning semantics change subtly | High | Cross-agent MCP trampling is a latent bug today that Step 8d/8h fixes. Write targeted integration tests for the sub-agent spawning path before and after Step 8h to verify semantics match (or improve intentionally). |
 | Long-running Phase 1 blocking Phase 2+ work | Medium | Phase 2 (Engine + Emitter) can start prep work in parallel with Step 8h — the final callsite sweep doesn't block the new Engine design. |
 ### Step 8 sub-step risks
 | Sub-step | Risk | Severity | Mitigation |
 |---|---|---|---|
 | 8a | Client macro refactor breaks LLM provider integration | Low | All LLM providers use the same `Model::retrieve_model` entry point. Test with at least 2 providers (openai + another) before declaring 8a done. |
 | 8b | `update` dispatcher has ~15 cases — easy to miss one | Low | Enumerate every `.set` key handled today; check each is in the new dispatcher. |
 | 8c | Extracting `spawn_mcp_server` introduces behavior differences (e.g., error handling, abort signal propagation) | Medium | Do a line-by-line diff review. Write a test that kills an in-flight spawn via the abort signal. |
 | 8d | `McpFactory` mutex contention under parallel sub-agent spawning | Medium | Hold the `active` lock only during HashMap operations, never across `await`. Benchmark with 4 concurrent scope transitions before declaring 8d done. |
 | 8d | Parent scope restoration on `exit_agent` differs from today's implicit behavior | High | Write a targeted test: activate global→role(jira)→agent(github,slack)→exit_agent. Verify scope is role(jira), not agent's (github,slack) or stale. |
 | 8e | Session compression loses messages when triggered mid-request | High | Integration test: feed 10+ user messages, compress, verify summary preserves all user intent. Also test concurrent compression (REPL background task + foreground turn). |
 | 8e | `rebuild_rag` / `edit_rag_docs` pick the wrong `RagKey` variant | Medium | Test both paths explicitly: agent-scoped rebuild and standalone rebuild. Assert the right cache entry is invalidated. |
 | 8f | CLI output bytes differ from pre-refactor baseline | High | Record baseline CLI outputs for 10 common invocations. After 8f, diff byte-for-byte. Any difference is a regression unless explicitly justified. |
 | 8g | REPL dot-command behavior regresses silently | High | Test every dot-command end-to-end: `.role`, `.session`, `.agent`, `.rag`, `.set`, `.info`, `.exit *`, `.compress session`, etc. |
 | 8h | Sub-agent spawning in `function/supervisor.rs` shares state incorrectly between parent and child | High | Integration test: parent activates agent A (github), spawns child B (jira), verify B's tool scope has only jira and parent's has only github. Each parallel child has independent tool scopes. |
 | 8h | `Agent::init` refactor drops initialization logic | Medium | `Agent::init` is ~100 lines today. Diff the old vs new init paths line-by-line. |
 ### Legacy risks (resolved during the refactor)
 These risks from the original plan have been addressed by the step-by-step scaffolding approach:
 | Original risk | How it was resolved |
 |---|---|
 | `use_role_safely` / `use_session_safely` use take/replace pattern | Eliminated entirely in Step 8g — REPL holds `&mut RequestContext` directly, no lock take/replace needed |
 | `Agent::init` creates MCP servers, functions, RAG on Config | Resolved in Step 8d + Step 8h — MCP via `McpFactory::acquire()`, RAG via `RagCache`, functions via `RequestContext::bootstrap_tools` |
 | Sub-agent spawning clones Config | Resolved in Step 8h — children get fresh `RequestContext` forked from `Arc<AppState>` |
 | Input holds `GlobalConfig` clone | Resolved in Step 8f — `Input` now holds references to the context it needs from `RequestContext`, not an owned clone |
 | Concurrent REPL operations spawn tasks with `GlobalConfig` clone | Resolved in Step 8e — task spawning moves to the caller's responsibility with explicit `RequestContext` context |
 ## What This Phase Does NOT Do
 - No REST API server code
 - No Engine::run() unification (that's Phase 2)
 - No Emitter trait (that's Phase 2)
 - No SessionStore abstraction (that's Phase 3)
 - No UUID-based sessions (that's Phase 3)
 - No agent isolation refactoring (that's Phase 5)
 - No new dependencies added
 The sole goal is: **split Config into immutable global + mutable per-request, with identical external behavior.**
@@ -0,0 +1,727 @@
 # Phase 2 Implementation Plan: Engine + Emitter
 ## Overview
 Phase 1 splits `Config` into `AppState` + `RequestContext`. Phase 2 takes the unified state and introduces the **Engine** — a single core function that replaces CLI's `start_directive()` and REPL's `ask()` — plus an **Emitter trait** that abstracts output away from direct stdout writes. After this phase, CLI and REPL both call `Engine::run()` with different `Emitter` implementations and behave identically to today. The API server in Phase 4 will plug in without touching core logic.
 **Estimated effort:** ~1 week
 **Risk:** Low-medium. The work is refactoring existing well-tested code paths into a shared shape. Most of the risk is in preserving exact terminal rendering behavior.
 **Depends on:** Phase 1 Steps 0–10 complete (`GlobalConfig` eliminated, `RequestContext` wired through all entry points).
 ---
 ## Why Phase 2 Exists
 Today's CLI and REPL have two near-identical pipelines that diverge in five specific places. The divergences are accidents of history, not intentional design:
 1. **Streaming flag handling.** `start_directive` forces non-streaming when extracting code; `ask` never extracts code.
 2. **Auto-continuation loop.** `ask` has complex logic for `auto_continue_count`, todo inspection, and continuation prompt injection. `start_directive` has none.
 3. **Session compression.** `ask` triggers `maybe_compress_session` and awaits completion; `start_directive` never compresses.
 4. **Session autoname.** `ask` calls `maybe_autoname_session` after each turn; `start_directive` doesn't.
 5. **Cleanup on exit.** `start_directive` calls `exit_session()` at the end; `ask` lets the REPL loop handle it.
 Four of these five divergences are bugs waiting to happen — they mean agents behave differently in CLI vs REPL mode, sessions don't get compressed in CLI even when they should, and auto-continuation is silently unavailable from the CLI. Phase 2 collapses both pipelines into one `Engine::run()` that handles all five behaviors uniformly, with per-request flags to control what's active (e.g., `auto_continue: bool` on `RunRequest`).
 The Emitter trait exists to decouple the rendering pipeline from its destination. Today, streaming output is hardcoded to write to the terminal via `crossterm`. An `Emitter` implementation can also feed an axum SSE stream, collect events for a JSON response, or capture everything for a test. The Engine sends semantic events; Emitters decide how to present them.
 ---
 ## The Architecture After Phase 2
 ```
 ┌─────────┐  ┌─────────┐                 ┌─────────┐
 │   CLI   │  │  REPL   │                 │   API   │ (Phase 4)
 └────┬────┘  └────┬────┘                 └────┬────┘
     │            │                           │
     ▼            ▼                           ▼
 ┌──────────────────────────────────────────────────┐
 │            Engine::run(ctx, req, emitter)        │
 │  ┌────────────────────────────────────────────┐  │
 │  │ 1. Apply CoreCommand (if any)              │  │
 │  │ 2. Build Input from req                    │  │
 │  │ 3. apply_prelude (first turn only)         │  │
 │  │ 4. before_chat_completion                  │  │
 │  │ 5. Stream or buffered LLM call             │  │
 │  │    ├─ emit Started                         │  │
 │  │    ├─ emit AssistantDelta (per chunk)      │  │
 │  │    ├─ emit ToolCall                        │  │
 │  │    ├─ execute tool                         │  │
 │  │    ├─ emit ToolResult                      │  │
 │  │    └─ loop on tool results                 │  │
 │  │ 6. after_chat_completion                   │  │
 │  │ 7. maybe_compress_session                  │  │
 │  │ 8. maybe_autoname_session                  │  │
 │  │ 9. Auto-continuation (if applicable)       │  │
 │  │ 10. emit Finished                          │  │
 │  └────────────────────────────────────────────┘  │
 └──────────────────────────────────────────────────┘
     │            │                           │
     ▼            ▼                           ▼
 TerminalEmitter  TerminalEmitter          JsonEmitter / SseEmitter
 ```
 ---
 ## Core Types
 ### `Engine`
 ```rust
 pub struct Engine {
    pub app: Arc<AppState>,
 }
 impl Engine {
    pub fn new(app: Arc<AppState>) -> Self { Self { app } }
    pub async fn run(
        &self,
        ctx: &mut RequestContext,
        req: RunRequest,
        emitter: &dyn Emitter,
    ) -> Result<RunOutcome, CoreError>;
 }
 ```
 `Engine` is intentionally a thin wrapper around `Arc<AppState>`. All per-turn state lives on `RequestContext`, so the engine itself has no per-call fields. This makes it cheap to clone and makes `Engine::run` trivially testable.
 ### `RunRequest`
 ```rust
 pub struct RunRequest {
    pub input: Option<UserInput>,
    pub command: Option<CoreCommand>,
    pub options: RunOptions,
 }
 pub struct UserInput {
    pub text: String,
    pub files: Vec<FileInput>,
    pub media: Vec<MediaInput>,
    pub continuation: Option<ContinuationKind>,
 }
 pub enum ContinuationKind {
    Continue,
    Regenerate,
 }
 pub struct RunOptions {
    pub stream: Option<bool>,
    pub extract_code: bool,
    pub auto_continue: bool,
    pub compress_session: bool,
    pub autoname_session: bool,
    pub apply_prelude: bool,
    pub with_embeddings: bool,
    pub cancel: CancellationToken,
 }
 impl RunOptions {
    pub fn cli() -> Self { /* today's start_directive defaults */ }
    pub fn repl_turn() -> Self { /* today's ask defaults */ }
    pub fn api_oneshot() -> Self { /* API one-shot defaults */ }
    pub fn api_session() -> Self { /* API session defaults */ }
 }
 ```
 Two things to notice:
 1. **`input` is `Option`.** A `RunRequest` can carry just a `command` (e.g., `.role explain`) with no user text, just an input (a plain prompt), or both (the `.role <name> <text>` form that activates a role and immediately sends a prompt through it). The engine handles all three shapes with one code path.
 2. **`RunOptions` is the knob panel that replaces the five divergences.** CLI today has `auto_continue: false, compress_session: false, autoname_session: false`; REPL has all three `true`. Phase 2 exposes these as explicit options with factory constructors for each frontend's conventional defaults. This also means you can now run a CLI one-shot with auto-continuation by constructing `RunOptions::cli()` and flipping `auto_continue = true` — a capability that doesn't exist today.
 ### `CoreCommand`
 ```rust
 pub enum CoreCommand {
    // State setters
    SetModel(String),
    UsePrompt(String),
    UseRole { name: String, trailing_text: Option<String> },
    UseSession(Option<String>),
    UseAgent { name: String, session: Option<String>, variables: Vec<(String, String)> },
    UseRag(Option<String>),
    // Exit commands
    ExitRole,
    ExitSession,
    ExitRag,
    ExitAgent,
    // State queries
    Info(InfoScope),
    RagSources,
    // Config mutation
    Set { key: String, value: String },
    // Session actions
    CompressSession,
    EmptySession,
    SaveSession { name: Option<String> },
    EditSession,
    // Role actions
    SaveRole { name: Option<String> },
    EditRole,
    // RAG actions
    EditRagDocs,
    RebuildRag,
    // Agent actions
    EditAgentConfig,
    ClearTodo,
    StarterList,
    StarterRun(usize),
    // File input shortcut
    IncludeFiles { paths: Vec<String>, trailing_text: Option<String> },
    // Macro execution
    Macro { name: String, args: Vec<String> },
    // Vault
    VaultAdd(String),
    VaultGet(String),
    VaultUpdate(String),
    VaultDelete(String),
    VaultList,
    // Miscellaneous
    EditConfig,
    Authenticate,
    Delete(DeleteKind),
    Copy,
    Help,
 }
 pub enum InfoScope {
    System,
    Role,
    Session,
    Rag,
    Agent,
 }
 pub enum DeleteKind {
    Role(String),
    Session(String),
    Rag(String),
    Macro(String),
    AgentData(String),
 }
 ```
 This enum captures all 37 dot-commands identified in the explore. Three categories deserve special attention:
 - **LLM-triggering commands** (`UsePrompt`, `UseRole` with trailing_text, `IncludeFiles` with trailing_text, `StarterRun`, `Macro` that contains LLM calls, and the continuation variants `Continue`/`Regenerate` expressed via `UserInput.continuation`) — these don't just mutate state; they produce a full run through the LLM pipeline. The engine treats them as `RunRequest { command: Some(_), input: Some(_), .. }` — command runs first, then input flows through.
 - **Asynchronous commands that return immediately** (`EditConfig`, `EditRole`, `EditRagDocs`, `EditAgentConfig`, most `Vault*`, `Delete`) — these are side-effecting but don't produce an LLM interaction. The engine handles them, emits a `Result` event, and returns without invoking the LLM path.
 - **Context-dependent commands** (`ClearTodo`, `StarterList`, `StarterRun`, `EditAgentConfig`, etc.) — these require a specific scope (e.g., active agent). The engine validates the precondition before executing and returns a `CoreError::InvalidState { expected: "active agent" }` if the precondition fails.
 ### `Emitter` trait and `Event` enum
 ```rust
 #[async_trait]
 pub trait Emitter: Send + Sync {
    async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
 }
 pub enum Event<'a> {
    // Lifecycle
    Started { request_id: Uuid, session_id: Option<SessionId>, agent: Option<&'a str> },
    Finished { outcome: &'a RunOutcome },
    // Assistant output
    AssistantDelta(&'a str),
    AssistantMessageEnd { full_text: &'a str },
    // Tool calls
    ToolCall { id: &'a str, name: &'a str, args: &'a str },
    ToolResult { id: &'a str, name: &'a str, result: &'a str, is_error: bool },
    // Auto-continuation
    AutoContinueTriggered { count: usize, max: usize, remaining_todos: usize },
    // Session lifecycle signals
    SessionCompressing,
    SessionCompressed { tokens_saved: Option<usize> },
    SessionAutonamed(&'a str),
    // Informational
    Info(&'a str),
    Warning(&'a str),
    // Errors
    Error(&'a CoreError),
 }
 pub enum EmitError {
    ClientDisconnected,
    WriteFailed(std::io::Error),
 }
 ```
 Three implementations ship in Phase 2; two are stubs, one is real:
 - **`TerminalEmitter`** (real) — wraps today's `SseHandler` → `markdown_stream`/`raw_stream` path. This is the bulk of Phase 2's work; see "Terminal rendering details" below.
 - **`NullEmitter`** (stub, for tests) — drops all events on the floor.
 - **`CollectingEmitter`** (stub, for tests and future JSON API) — appends events to a `Vec<OwnedEvent>` for later inspection.
 The `JsonEmitter` and `SseEmitter` implementations land in **Phase 4** when the API server comes online.
 ### `RunOutcome`
 ```rust
 pub struct RunOutcome {
    pub request_id: Uuid,
    pub session_id: Option<SessionId>,
    pub final_message: Option<String>,
    pub tool_call_count: usize,
    pub turns: usize,
    pub compressed: bool,
    pub autonamed: Option<String>,
    pub auto_continued: usize,
 }
 ```
 `RunOutcome` is what CLI/REPL ignore but the future API returns as JSON. It records everything the caller might want to know about what happened during the run.
 ### `CoreError`
 ```rust
 pub enum CoreError {
    InvalidRequest { msg: String },
    InvalidState { expected: String, found: String },
    NotFound { what: String, name: String },
    Cancelled,
    ProviderError { provider: String, msg: String },
    ToolError { tool: String, msg: String },
    EmitterError(EmitError),
    Io(std::io::Error),
    Other(anyhow::Error),
 }
 impl CoreError {
    pub fn is_retryable(&self) -> bool { /* ... */ }
    pub fn http_status(&self) -> u16 { /* for future API use */ }
    pub fn terminal_message(&self) -> String { /* for TerminalEmitter */ }
 }
 ```
 ---
 ## Terminal Rendering Details
 The `TerminalEmitter` is the most delicate part of Phase 2 because it has to preserve every pixel of today's REPL/CLI behavior. Here's the mental model:
 **Today's flow:**
 ```
 LLM client → mpsc::Sender<SseEvent> → SseHandler → render_stream
                                                      ├─ markdown_stream (if highlight)
                                                      └─ raw_stream (else)
 ```
 Both `markdown_stream` and `raw_stream` write directly to stdout via `crossterm`, managing cursor positions, line clears, and incremental markdown parsing themselves.
 **Target flow:**
 ```
 LLM client → mpsc::Sender<SseEvent> → SseHandler → TerminalEmitter::emit(Event::AssistantDelta)
                                                      ├─ (internal) markdown_stream state machine
                                                      └─ (internal) raw_stream state machine
 ```
 The `TerminalEmitter` owns a `RefCell<StreamRenderState>` (or `Mutex` if we need `Send`) that wraps the existing `markdown_stream`/`raw_stream` state. Each `emit(AssistantDelta)` call feeds the chunk into this state machine exactly as `SseHandler`'s receive loop does today. The result is that the exact same crossterm calls happen in the exact same order — we've just moved them behind a trait.
 **Things that migrate 1:1 into `TerminalEmitter`:**
 - Spinner start/stop on first delta
 - Cursor positioning for line reprint during code block growth
 - Syntax highlighting invocation via `MarkdownRender`
 - Color/dim output for tool call banners
 - Final newline + cursor reset on `AssistantMessageEnd`
 **Things that the engine handles, not the emitter:**
 - Tool call *execution* (still lives in the engine loop)
 - Session state mutations (engine calls `before_chat_completion` / `after_chat_completion` on `RequestContext`)
 - Auto-continuation decisions (engine inspects agent runtime)
 - Compression and autoname decisions (engine)
 **Things the emitter decides, not the engine:**
 - Whether to suppress ToolCall rendering (sub-agents in today's code suppress their own output; TerminalEmitter respects a `verbose: bool` flag)
 - How to format errors (TerminalEmitter uses colored stderr; JsonEmitter will use structured JSON)
 - Whether to show a spinner at all (disabled for non-TTY output)
 **One gotcha:** today's `SseHandler` itself produces the `mpsc` channel that LLM clients push into. In the new model, `SseHandler` becomes an internal helper inside the engine's streaming path that converts `mpsc::Receiver<SseEvent>` into `Emitter::emit(Event::AssistantDelta(...))` calls. No LLM client code changes — they still push into the same channel type. Only the consumer side of the channel changes.
 ---
 ## The Engine::run Pipeline
 Here's the full pipeline in pseudocode, annotated with which frontend controls each behavior via `RunOptions`:
 ```rust
 impl Engine {
    pub async fn run(
        &self,
        ctx: &mut RequestContext,
        req: RunRequest,
        emitter: &dyn Emitter,
    ) -> Result<RunOutcome, CoreError> {
        let request_id = Uuid::new_v4();
        let mut outcome = RunOutcome::new(request_id);
        emitter.emit(Event::Started { request_id, session_id: ctx.session_id(), agent: ctx.agent_name() }).await?;
        // 1. Execute command (if any). Commands may be LLM-triggering, mutating, or informational.
        if let Some(command) = req.command {
            self.dispatch_command(ctx, command, emitter, &req.options).await?;
        }
        // 2. Early return if there's no user input (pure command)
        let Some(user_input) = req.input else {
            emitter.emit(Event::Finished { outcome: &outcome }).await?;
            return Ok(outcome);
        };
        // 3. Apply prelude on first turn of a fresh context (CLI/REPL only)
        if req.options.apply_prelude && !ctx.prelude_applied {
            apply_prelude(ctx, &req.options.cancel).await?;
            ctx.prelude_applied = true;
        }
        // 4. Build Input from user_input + ctx
        let input = build_input(ctx, user_input, &req.options).await?;
        // 5. Wait for any in-progress compression to finish (REPL-style block)
        while ctx.is_compressing_session() {
            tokio::time::sleep(Duration::from_millis(100)).await;
        }
        // 6. Enter the turn loop
        self.run_turn(ctx, input, &req.options, emitter, &mut outcome).await?;
        // 7. Maybe compress session
        if req.options.compress_session && ctx.session_needs_compression() {
            emitter.emit(Event::SessionCompressing).await?;
            compress_session(ctx).await?;
            outcome.compressed = true;
            emitter.emit(Event::SessionCompressed { tokens_saved: None }).await?;
        }
        // 8. Maybe autoname session
        if req.options.autoname_session {
            if let Some(name) = maybe_autoname_session(ctx).await? {
                outcome.autonamed = Some(name.clone());
                emitter.emit(Event::SessionAutonamed(&name)).await?;
            }
        }
        // 9. Auto-continuation (agents only)
        if req.options.auto_continue {
            if let Some(continuation) = self.check_auto_continue(ctx) {
                emitter.emit(Event::AutoContinueTriggered { .. }).await?;
                outcome.auto_continued += 1;
                // Recursive call with continuation prompt
                let next_req = RunRequest {
                    input: Some(UserInput::from_continuation(continuation)),
                    command: None,
                    options: req.options.clone(),
                };
                return Box::pin(self.run(ctx, next_req, emitter)).await;
            }
        }
        emitter.emit(Event::Finished { outcome: &outcome }).await?;
        Ok(outcome)
    }
    async fn run_turn(
        &self,
        ctx: &mut RequestContext,
        mut input: Input,
        options: &RunOptions,
        emitter: &dyn Emitter,
        outcome: &mut RunOutcome,
    ) -> Result<(), CoreError> {
        loop {
            outcome.turns += 1;
            before_chat_completion(ctx, &input);
            let client = input.create_client(ctx)?;
            let (output, tool_results) = if should_stream(&input, options) {
                stream_chat_completion(ctx, &input, client, emitter, &options.cancel).await?
            } else {
                buffered_chat_completion(ctx, &input, client, options.extract_code, &options.cancel).await?
            };
            after_chat_completion(ctx, &input, &output, &tool_results);
            outcome.tool_call_count += tool_results.len();
            if tool_results.is_empty() {
                outcome.final_message = Some(output);
                return Ok(());
            }
            // Emit each tool call and result
            for result in &tool_results {
                emitter.emit(Event::ToolCall { .. }).await?;
                emitter.emit(Event::ToolResult { .. }).await?;
            }
            // Loop: feed tool results back in
            input = input.merge_tool_results(output, tool_results);
        }
    }
 }
 ```
 **Key design decisions in this pipeline:**
 1. **Command dispatch happens first.** A `RunRequest` that carries both a command and input runs the command first (mutating `ctx`), then the input flows through the now-updated context. This lets `.role explain "tell me about X"` work as a single atomic operation — the role is activated, then the prompt is sent under the new role.
 2. **Tool loop is iterative, not recursive.** Today both `start_directive` and `ask` recursively call themselves after tool results. The new `run_turn` uses a `loop` instead, which is cleaner, avoids stack growth on long tool chains, and makes cancellation handling simpler. Auto-continuation remains recursive because it's a full new turn with a new prompt, not just a tool-result continuation.
 3. **Cancellation is checked at every await point.** `options.cancel: CancellationToken` is threaded into every async call. On cancellation, the engine emits `Event::Error(CoreError::Cancelled)` and returns. Today's `AbortSignal` pattern gets wrapped in a `CancellationToken` adapter during the migration.
 4. **Session state hooks fire at the same points as today.** `before_chat_completion` and `after_chat_completion` continue to exist on `RequestContext`, called from the same places in the same order. The refactor doesn't change their semantics.
 5. **Emitter errors don't abort the run.** If the emitter's output destination disconnects (client closes browser tab), the engine keeps running to completion so session state is correctly persisted, but it stops emitting events. The `EmitError::ClientDisconnected` case is special-cased to swallow subsequent emits. Session save + tool execution still happen.
 ---
 ## Migration Strategy
 This phase is structured as **extract, unify, rewrite frontends** — similar to Phase 1's facade pattern. The old functions stay in place until the new Engine is proven by tests and manual verification.
 ### Step 1: Create the core types
 Add the new files without wiring them into anything:
 - `src/engine/mod.rs` — module root
 - `src/engine/engine.rs` — `Engine` struct + `run` method (initially `unimplemented!()`)
 - `src/engine/request.rs` — `RunRequest`, `UserInput`, `RunOptions`, `ContinuationKind`, `RunOutcome`
 - `src/engine/command.rs` — `CoreCommand` enum + sub-enums
 - `src/engine/error.rs` — `CoreError` enum
 - `src/engine/emitter.rs` — `Emitter` trait + `Event` enum + `EmitError`
 - `src/engine/emitters/mod.rs` — emitter module
 - `src/engine/emitters/null.rs` — `NullEmitter` (test stub)
 - `src/engine/emitters/collecting.rs` — `CollectingEmitter` (test stub)
 - `src/engine/emitters/terminal.rs` — `TerminalEmitter` (initially `unimplemented!()`)
 Register `pub mod engine;` in `src/main.rs`. Code compiles but nothing calls it yet.
 **Verification:** `cargo check` clean, `cargo test` passes.
 ### Step 2: Implement `TerminalEmitter` against existing render code
 Before wiring the engine, build the `TerminalEmitter` by wrapping today's `SseHandler` + `markdown_stream` + `raw_stream` + `MarkdownRender` + `Spinner` code. Don't change any of those modules — just construct a `TerminalEmitter` that holds the state they need and forwards `emit(Event::AssistantDelta(...))` into them.
 ```rust
 pub struct TerminalEmitter {
    render_state: Mutex<StreamRenderState>,
    options: TerminalEmitterOptions,
 }
 pub struct TerminalEmitterOptions {
    pub highlight: bool,
    pub theme: Option<String>,
    pub verbose_tool_calls: bool,
    pub show_spinner: bool,
 }
 impl TerminalEmitter {
    pub fn new_from_app(app: &AppState, working_mode: WorkingMode) -> Self { /* ... */ }
 }
 ```
 Implement `Emitter` for it, mapping each `Event` variant to the appropriate crossterm operation:
 | Event | TerminalEmitter action |
 |---|---|
 | `Started` | Start spinner |
 | `AssistantDelta(chunk)` | Stop spinner (if first), feed chunk into render state |
 | `AssistantMessageEnd { full_text }` | Flush render state, emit trailing newline |
 | `ToolCall { name, args }` | Print dimmed `⚙ Using <name>` banner if verbose |
 | `ToolResult { .. }` | Print dimmed result summary if verbose |
 | `AutoContinueTriggered` | Print yellow `⟳ Continuing (N/M, R todos remaining)` to stderr |
 | `SessionCompressing` | Print `Compressing session...` to stderr |
 | `SessionCompressed` | Print `Session compressed.` to stderr |
 | `SessionAutonamed` | Print `Session auto-named: <name>` to stderr |
 | `Info(msg)` | Print to stdout |
 | `Warning(msg)` | Print yellow to stderr |
 | `Error(e)` | Print red to stderr |
 | `Finished` | No-op (ensures trailing newline is flushed) |
 **Verification:** write integration tests that construct a `TerminalEmitter`, feed it a sequence of events manually, and compare captured stdout/stderr to golden outputs. Use `assert_cmd` or similar to snapshot the rendered output of each event variant.
 ### Step 3: Implement `Engine::run` without wiring it
 Implement `Engine::run` and `Engine::run_turn` following the pseudocode above. Use the existing helper functions (`before_chat_completion`, `after_chat_completion`, `apply_prelude`, `create_client`, `call_chat_completions`, `call_chat_completions_streaming`, `maybe_compress_session`, `maybe_autoname_session`) unchanged, just called through `ctx` instead of `&GlobalConfig`.
 **Implementing `dispatch_command`** is the largest sub-task here because it needs to match all 37 `CoreCommand` variants and invoke the right `ctx` methods. Most variants are straightforward one-liners that call a corresponding method on `RequestContext`. A few need special handling:
 - `CoreCommand::UseRole { name, trailing_text }` — activate role, then if `trailing_text` is `Some`, the outer `run` will flow through with the trailing text as `UserInput.text`.
 - `CoreCommand::IncludeFiles` — reads files, converts to `FileInput` list, attaches to `ctx`'s next input (or fails if no input is provided).
 - `CoreCommand::StarterRun(id)` — looks up the starter text on the active agent, fails if no agent.
 - `CoreCommand::Macro` — delegates to `macro_execute`, which may itself call `Engine::run` internally for LLM-triggering macros.
 **Verification:** write unit tests for `dispatch_command` using `NullEmitter`. Each test activates a command and asserts the expected state mutation on `ctx`. This is ~37 tests, one per variant, and they catch the bulk of regressions early.
 Then write a handful of integration tests for `Engine::run` with `CollectingEmitter`, asserting the expected event sequence for:
 - Plain prompt, no tools, streaming
 - Plain prompt, no tools, non-streaming
 - Prompt that triggers 2 tool calls
 - Prompt that triggers auto-continuation (mock the LLM response)
 - Prompt on a session that crosses the compression threshold
 - Command-only request (`.info`)
 - Command + prompt request (`.role explain "..."`)
 ### Step 4: Wire CLI to `Engine::run`
 Replace `main.rs::start_directive` with a thin wrapper:
 ```rust
 async fn start_directive(
    app: Arc<AppState>,
    ctx: &mut RequestContext,
    input_text: String,
    files: Vec<String>,
    code_mode: bool,
 ) -> Result<()> {
    let engine = Engine::new(app.clone());
    let emitter = TerminalEmitter::new_from_app(&app, WorkingMode::Cmd);
    let req = RunRequest {
        input: Some(UserInput::from_text_and_files(input_text, files)),
        command: None,
        options: {
            let mut o = RunOptions::cli();
            o.extract_code = code_mode && !*IS_STDOUT_TERMINAL;
            o
        },
    };
    match engine.run(ctx, req, &emitter).await {
        Ok(_outcome) => Ok(()),
        Err(CoreError::Cancelled) => Ok(()),
        Err(e) => Err(e.into()),
    }
 }
 ```
 **Verification:** manual smoke test. Run `loki "hello"`, `loki --code "write a rust hello world"`, `loki --role explain "what is TCP"`. All should produce identical output to before the change.
 ### Step 5: Wire REPL to `Engine::run`
 Replace `repl/mod.rs::ask` with a wrapper that calls the engine. The REPL's outer loop that reads lines and calls `run_repl_command` stays. `run_repl_command` for non-dot-command lines constructs a `RunRequest { input: Some(...), .. }` and calls `Engine::run`. Dot-commands get parsed into `CoreCommand` and called as `RunRequest { command: Some(...), input: None, .. }` (or with input if they carry trailing text).
 ```rust
 // In Repl:
 async fn handle_line(&mut self, line: &str) -> Result<()> {
    let req = if let Some(rest) = line.strip_prefix('.') {
        parse_dot_command_to_run_request(rest, &self.ctx)?
    } else {
        RunRequest {
            input: Some(UserInput::from_text(line.to_string())),
            command: None,
            options: RunOptions::repl_turn(),
        }
    };
    match self.engine.run(&mut self.ctx, req, &self.emitter).await {
        Ok(_) => Ok(()),
        Err(CoreError::Cancelled) => Ok(()),
        Err(e) => {
            self.emitter.emit(Event::Error(&e)).await.ok();
            Ok(())
        }
    }
 }
 ```
 **Verification:** manual smoke test of the REPL. Run through a typical session:
 1. `loki` → REPL starts
 2. `hello` → plain prompt works
 3. `.role explain` → role activates
 4. `what is TCP` → responds under the role
 5. `.session` → session starts
 6. Several messages → conversation continues
 7. `.info session` → info prints
 8. `.compress session` → compression runs
 9. `.agent sisyphus` → agent activates with sub-agents
 10. `write a hello world in rust` → tool calls + output
 11. `.exit agent` → agent exits, previous session still active
 12. `.exit` → REPL exits
 Every interaction should behave identically to pre-Phase-2. Any visual difference is a bug.
 ### Step 6: Delete the old `start_directive` and `ask`
 Once CLI and REPL both route through `Engine::run` and all tests/smoke tests pass, delete the old function bodies. Remove any now-unused imports. Run `cargo check` and `cargo test`.
 **Verification:** full test suite green, no dead code warnings.
 ### Step 7: Tidy and document
 - Add rustdoc comments on `Engine`, `RunRequest`, `RunOptions`, `Emitter`, `Event`, `CoreCommand`, `CoreError`.
 - Add an `examples/` subdirectory under `src/engine/` showing how to call the engine with each emitter.
 - Update `docs/AGENTS.md` with a note that CLI now supports auto-continuation (since it's no longer a REPL-only feature).
 - Update `docs/REST-API-ARCHITECTURE.md` to remove any "in Phase 2" placeholders.
 ---
 ## Risks and Watch Items
 | Risk | Severity | Mitigation |
 |---|---|---|
 | **Terminal rendering regressions** | High | Golden-file snapshot tests for every `Event` variant. Manual smoke tests across all common REPL flows. Keep `TerminalEmitter` as a thin wrapper — no logic changes in the render code itself. |
 | **Auto-continuation recursion limits** | Medium | The new `Engine::run` uses `Box::pin` for the auto-continuation recursive call. Verify with a mock LLM that `max_auto_continues = 100` doesn't blow the stack. |
 | **Cancellation during tool execution** | Medium | Tool execution currently uses `AbortSignal`; the new path uses `CancellationToken`. Write a shim that translates. Write a test that cancels mid-tool-call and verifies graceful cleanup (no orphaned subprocesses, no leaked file descriptors). |
 | **Command parsing fidelity** | Medium | The dot-command parser in today's REPL is hand-written and has edge cases. Port the parsing code verbatim into a dedicated `parse_dot_command_to_run_request` function with unit tests for every edge case found in today's code. |
 | **Macro execution recursion** | Medium | `.macro` can invoke LLM calls, which now go through `Engine::run`, which can invoke more macros. Verify there's a recursion depth limit or cycle detection; add one if missing. |
 | **Emitter error propagation** | Low | Emitter errors (ClientDisconnected) should NOT abort session save logic. Engine must continue executing after the first `EmitError::ClientDisconnected` — just stop emitting. Write a test that simulates a disconnected emitter mid-response and asserts the session is still correctly persisted. |
 | **Spinner interleaving with tool output** | Low | Today's spinner is tightly coupled to the stream handler. If the new order of operations fires a tool call before the spinner is stopped, you'll get garbled output. Test this specifically. |
 | **Feature flag: `auto_continue` in CLI** | Low | After Phase 2, CLI *could* support auto-continuation but it's not exposed. Decision: leave it off by default in `RunOptions::cli()`, add a `--auto-continue` flag in a separate follow-up if desired. Don't sneak behavior changes into this refactor. |
 ---
 ## What Phase 2 Does NOT Do
 - **No new features.** Everything that worked before works the same way after.
 - **No API server.** `JsonEmitter` and `SseEmitter` are placeholders — Phase 4 implements them.
 - **No `SessionStore` abstraction.** That's Phase 3.
 - **No `ToolScope` unification.** That landed in Phase 1 Step 6.5.
 - **No changes to LLM client code.** `call_chat_completions` and `call_chat_completions_streaming` keep their existing signatures.
 - **No MCP factory pooling.** That's Phase 5.
 - **No dot-command syntax changes.** The REPL still accepts exactly the same dot-commands; they just parse into `CoreCommand` instead of being hand-dispatched in `run_repl_command`.
 The sole goal of Phase 2 is: **extract the pipeline into Engine::run, route CLI and REPL through it, and prove via tests and smoke tests that nothing regressed.**
 ---
 ## Entry Criteria (from Phase 1)
 Before starting Phase 2, Phase 1 must be complete:
 - [ ] `GlobalConfig` type alias is removed
 - [ ] `AppState` and `RequestContext` are the only state holders
 - [ ] All 91 callsites in the original migration table have been updated
 - [ ] `cargo test` passes with no `Config`-based tests remaining
 - [ ] CLI and REPL manual smoke tests pass identically to pre-Phase-1
 ## Exit Criteria (Phase 2 complete)
 - [ ] `src/engine/` module exists with Engine, Emitter, Event, CoreCommand, RunRequest, RunOutcome, CoreError
 - [ ] `TerminalEmitter` implemented and wrapping all existing render paths
 - [ ] `NullEmitter` and `CollectingEmitter` implemented
 - [ ] `start_directive` in main.rs is a thin wrapper around `Engine::run`
 - [ ] REPL's per-line handler routes through `Engine::run`
 - [ ] All 37 `CoreCommand` variants implemented with unit tests
 - [ ] Integration tests for the 7 engine scenarios listed in Step 3
 - [ ] Manual smoke tests for CLI and REPL match pre-Phase-2 behavior
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 - [ ] Phase 3 (SessionStore abstraction) can begin
@@ -0,0 +1,607 @@
 # Phase 3 Implementation Plan: SessionStore Abstraction
 ## Overview
 Phase 3 extracts session persistence behind a trait so that CLI, REPL, and the future API server all resolve sessions through the same interface. The file-based YAML storage that exists today remains the only implementation in Phase 3 — no database, no schema migration, no new on-disk format. What changes is that session identity becomes **UUID-primary with optional name-based aliases**, direct `std::fs::write` calls disappear from `Session::save()`, and concurrent access to the same session is properly serialized.
 After Phase 3, Phase 4 (REST API) can plug in without touching any persistence code: `POST /v1/sessions` returns a UUID, subsequent requests address sessions by that UUID, and CLI/REPL users continue typing `.session my-project` without noticing the internal change.
 **Estimated effort:** ~3–5 days
 **Risk:** Low. Storage semantics don't change; we're re-shaping the API surface around existing YAML files.
 **Depends on:** Phase 1 complete, Phase 2 complete (Engine needs to call through the new store, not raw `Session::load`).
 ---
 ## Why This Phase Exists
 Today's `Session::load()` and `Session::save()` embed the file layout, the filename-is-the-identity assumption, and the absence of concurrency control directly in the type. Three things break when you try to run this in a multi-tenant server:
 1. **No UUID identity.** Two API clients both start a "project" session and collide on the filename. You can't safely let clients name sessions freely.
 2. **No concurrency control.** Two concurrent requests to the same session do `load → mutate → save` with no coordination. The later save clobbers the earlier one's changes.
 3. **No abstraction seam.** Every callsite computes paths itself via `Config::session_file(name)` and calls `Session::load()` / `.save()` directly. There's no single place to swap in alternate storage, add caching, or instrument persistence.
 Phase 3 fixes all three without breaking anything users currently do.
 ---
 ## The Architecture After Phase 3
 ```
 ┌────────┐ ┌────────┐ ┌────────┐
 │  CLI   │ │  REPL  │ │  API   │  (Phase 4)
 └───┬────┘ └───┬────┘ └───┬────┘
    └──────────┼──────────┘
               ▼
    ┌──────────────────────┐
    │       Engine         │
    └──────────┬───────────┘
               ▼
    ┌──────────────────────┐
    │  SessionStore trait  │
    └──────────┬───────────┘
               ▼
    ┌──────────────────────┐
    │  FileSessionStore    │   (Phase 3: the only impl)
    │  — UUID primary      │
    │  — name alias index  │
    │  — per-session mutex │
    │  — atomic writes     │
    └──────────┬───────────┘
               ▼
    ~/.config/loki/sessions/
      by-id/<uuid>/state.yaml
      by-name/<alias> → <uuid>  (text file containing the UUID)
      agents/<agent>/sessions/
        by-id/<uuid>/state.yaml
        by-name/<alias> → <uuid>
 ```
 ---
 ## Core Types
 ### `SessionId`
 ```rust
 #[derive(Copy, Clone, Eq, PartialEq, Hash, Debug, Serialize, Deserialize)]
 pub struct SessionId(Uuid);
 impl SessionId {
    pub fn new() -> Self { Self(Uuid::new_v4()) }
    pub fn as_uuid(&self) -> Uuid { self.0 }
    pub fn to_string(&self) -> String { self.0.to_string() }
    pub fn parse(s: &str) -> Result<Self, SessionIdError> { /* ... */ }
 }
 ```
 UUID v4 by default. Newtype so we can't accidentally pass arbitrary strings where a session ID is expected, and so the on-disk format can evolve without breaking callers.
 ### `SessionAlias`
 ```rust
 #[derive(Clone, Eq, PartialEq, Hash, Debug)]
 pub struct SessionAlias(String);
 impl SessionAlias {
    pub fn new(s: impl Into<String>) -> Result<Self, AliasError>;
    pub fn as_str(&self) -> &str { &self.0 }
 }
 ```
 Wraps the human-readable names users type in `.session my-project`. Validation rejects path traversal (`..`), slashes, null bytes, and anything that would produce an invalid filename. This is the CLI/REPL compatibility layer — existing `sessions/my-project.yaml` files continue to work, the alias system just maps them to auto-generated UUIDs on first access.
 ### `SessionHandle`
 ```rust
 pub struct SessionHandle {
    id: SessionId,
    alias: Option<SessionAlias>,
    is_agent: Option<String>,
    state: Arc<tokio::sync::Mutex<Session>>,
    store: Arc<dyn SessionStore>,
    dirty: Arc<AtomicBool>,
 }
 impl SessionHandle {
    pub fn id(&self) -> SessionId { self.id }
    pub fn alias(&self) -> Option<&SessionAlias> { self.alias.as_ref() }
    pub async fn lock(&self) -> SessionGuard<'_>;
    pub fn mark_dirty(&self);
    pub async fn save(&self) -> Result<(), StoreError>;
    pub async fn rename(&mut self, new_alias: SessionAlias) -> Result<(), StoreError>;
 }
 pub struct SessionGuard<'a> {
    session: MutexGuard<'a, Session>,
    handle: &'a SessionHandle,
 }
 impl SessionGuard<'_> {
    pub fn get(&self) -> &Session { &self.session }
    pub fn get_mut(&mut self) -> &mut Session {
        self.handle.mark_dirty();
        &mut self.session
    }
 }
 ```
 A `SessionHandle` is what callers pass around. It wraps:
 - The stable `SessionId` (never changes after creation)
 - An optional `SessionAlias` (can be renamed; users see this in `.info session`)
 - An optional `is_agent` marker so the store knows which directory to read/write
 - A shared `Arc<Mutex<Session>>` that serializes access within the process
 - A backpointer to the store so `save()`, `rename()`, etc. work without the caller knowing the storage type
 - A dirty flag that auto-sets on `get_mut()` and clears after successful save
 The `lock()` / `SessionGuard` pattern is important: it makes the "you must lock before touching state" rule compiler-enforced. Today's code mutates `Config.session` freely because the whole `Config` is behind an `RwLock`. After Phase 3, mutating a session requires going through `handle.lock().await.get_mut()`, which acquires the per-session mutex. Two concurrent requests to the same session serialize automatically.
 ### `SessionStore` trait
 ```rust
 #[async_trait]
 pub trait SessionStore: Send + Sync {
    /// Create a new session. If `alias` is provided, register it in the
    /// alias index. Fails with AliasInUse if the alias already exists.
    async fn create(
        &self,
        agent: Option<&str>,
        alias: Option<SessionAlias>,
        initial: Session,
    ) -> Result<SessionHandle, StoreError>;
    /// Open an existing session by UUID.
    async fn open(
        &self,
        agent: Option<&str>,
        id: SessionId,
    ) -> Result<SessionHandle, StoreError>;
    /// Open an existing session by alias, or create it if it doesn't exist.
    /// This is the CLI/REPL compatibility path.
    async fn open_or_create_by_alias(
        &self,
        agent: Option<&str>,
        alias: SessionAlias,
        initial_factory: impl FnOnce() -> Session + Send,
    ) -> Result<SessionHandle, StoreError>;
    /// Resolve an alias to its UUID without loading the session.
    async fn resolve_alias(
        &self,
        agent: Option<&str>,
        alias: &SessionAlias,
    ) -> Result<Option<SessionId>, StoreError>;
    /// Persist the current in-memory state of a handle back to storage.
    /// Atomically — no torn writes.
    async fn save(&self, handle: &SessionHandle) -> Result<(), StoreError>;
    /// Rename a session's alias. The UUID and session state are unchanged.
    async fn rename(
        &self,
        handle: &SessionHandle,
        new_alias: SessionAlias,
    ) -> Result<(), StoreError>;
    /// Delete a session permanently. Both the state file and any alias
    /// pointing at it are removed.
    async fn delete(
        &self,
        agent: Option<&str>,
        id: SessionId,
    ) -> Result<(), StoreError>;
    /// List all sessions in a scope (global or per-agent). Returns UUIDs
    /// paired with their aliases if any.
    async fn list(
        &self,
        agent: Option<&str>,
    ) -> Result<Vec<SessionMeta>, StoreError>;
 }
 pub struct SessionMeta {
    pub id: SessionId,
    pub alias: Option<SessionAlias>,
    pub last_modified: SystemTime,
    pub is_autoname: bool,
 }
 pub enum StoreError {
    NotFound { id: Option<SessionId>, alias: Option<String> },
    AliasInUse(String),
    InvalidAlias(String),
    Io(std::io::Error),
    Serde(serde_yaml::Error),
    Concurrent,  // best-effort optimistic check
    Other(anyhow::Error),
 }
 ```
 ### `FileSessionStore`
 ```rust
 pub struct FileSessionStore {
    root: PathBuf,                                      // ~/.config/loki/
    agents_root: PathBuf,                               // ~/.config/loki/agents/
    handles: Mutex<HashMap<(Option<String>, SessionId), Weak<Mutex<Session>>>>,
 }
 ```
 The `handles` map is the in-process cache that enforces "one `Arc<Mutex<Session>>` per live session per process." If two callers `open()` the same session, they get two `SessionHandle`s pointing at the same underlying mutex, so their locks serialize. When the last handle drops, the weak ref fails on the next lookup and the store re-reads from disk.
 ---
 ## The On-Disk Layout
 ### New layout (Phase 3 target)
 ```
 ~/.config/loki/sessions/
  by-id/
    <uuid>/
      state.yaml
  by-name/
    my-project      → text file containing the UUID
    another-chat    → text file containing the UUID
 ```
 Agent sessions mirror this inside each agent's directory:
 ```
 ~/.config/loki/agents/sisyphus/sessions/
  by-id/
    <uuid>/
      state.yaml
  by-name/
    my-project   → UUID
 ```
 ### Backward compatibility
 The migration is lazy and non-destructive. On `FileSessionStore` startup, we do NOT rewrite the directory. On the first `open_or_create_by_alias("my-project")` call, the store checks:
 1. **New layout hit:** is there a `by-name/my-project` alias file? Read the UUID, open `by-id/<uuid>/state.yaml`.
 2. **Legacy layout hit:** is there a `sessions/my-project.yaml`? Generate a fresh UUID, create `by-id/<uuid>/state.yaml` from the legacy content (atomic copy), write `by-name/my-project` pointing to the new UUID, and leave the legacy file in place. The legacy file becomes stale but untouched.
 3. **Neither:** create fresh.
 This means users upgrading from pre-Phase-3 builds never lose data, and they can downgrade during the migration window (their old files are still readable by the old code because we haven't deleted them). A `loki migrate sessions` command can later do a clean sweep to remove the legacy files — but that's an operational convenience, not a requirement of Phase 3.
 **Deleting a migrated session** (the `.delete` REPL command) also deletes the legacy file if it still exists, so users don't see orphan entries in `list_sessions()`.
 **Autoname temp sessions** (today: `sessions/_/20231201T123456-autoname.yaml`) map cleanly to the new layout — they get UUIDs like any other session, and their alias is the generated `20231201T123456-autoname` string. The `_/` prefix from today's path becomes a flag on `SessionMeta::is_autoname: true` set by the store when it recognizes the naming pattern during migration.
 ### Atomic writes
 Today's `Session::save()` is `std::fs::write(path, yaml)` — if the process dies mid-write, you get a truncated YAML file that can't be loaded. The new `FileSessionStore::save()` uses the standard tempfile-and-rename pattern:
 ```rust
 async fn save(&self, handle: &SessionHandle) -> Result<(), StoreError> {
    let session = handle.state.lock().await;
    let yaml = serde_yaml::to_string(&*session)?;
    let target = self.state_path(handle.is_agent.as_deref(), handle.id);
    let tmp = target.with_extension("yaml.tmp");
    tokio::fs::write(&tmp, yaml).await?;
    tokio::fs::rename(&tmp, &target).await?;
    handle.dirty.store(false, Ordering::Release);
    Ok(())
 }
 ```
 `rename` is atomic on POSIX filesystems and on Windows NTFS (via `MoveFileEx`). Either the old content or the new content is visible to readers; never a half-written file.
 ---
 ## Concurrency Model
 Three layers, each with a clear responsibility:
 1. **Process-level: per-session `Arc<Mutex<Session>>`.** Two handles to the same session share one mutex. Inside one process, concurrent access to the same session is serialized automatically. This is enough for CLI (single request) and REPL (single user, but multiple async tasks like background compression).
 2. **Inter-process: filesystem rename atomicity.** Two separate Loki processes (unlikely today but possible for someone running CLI and REPL simultaneously on the same state) can't corrupt files because writes go through tempfile+rename. The later writer wins cleanly; the earlier writer's changes are lost but the file is always readable.
 3. **Optimistic conflict detection (optional, Phase 5+):** If we later decide to add "you edited this session somewhere else, please reload" UX, we can add an `mtime` check on load/save and surface `StoreError::Concurrent` when the on-disk mtime doesn't match the value we read at `open()` time. This is deliberately not built in Phase 3 — it's a UX improvement for later, not a correctness requirement.
 For Phase 3, layers 1 and 2 together are sufficient for everything up through "many concurrent API sessions, each addressing different UUIDs." The one gap they don't cover is "multiple API requests on the same session UUID at the same time" — but the per-session mutex in layer 1 handles that by serializing them, which is the desired behavior. The second request waits its turn and sees the first request's updates.
 ---
 ## Engine and Callsite Changes
 ### Before Phase 3
 ```rust
 // In REPL command handler:
 Config::use_session_safely(&config, Some("my-project"), abort_signal)?;
 // later:
 config.write().session.as_mut().unwrap().add_message(...);
 // later:
 Config::save_session_safely(&config, None)?;
 ```
 ### After Phase 3
 ```rust
 // In CoreCommand::UseSession handler inside Engine::dispatch_command:
 let alias = SessionAlias::new("my-project")?;
 let handle = self.app.sessions.open_or_create_by_alias(
    ctx.agent_name(),
    alias,
    || Session::new_default(ctx.model_id(), ctx.role_name()),
 ).await?;
 ctx.session = Some(handle);
 // later, during the chat loop:
 {
    let mut guard = handle.lock().await;
    guard.get_mut().add_message(input, output);
 }
 handle.save().await?;  // fires when the turn completes
 ```
 The `RequestContext.session: Option<Session>` field becomes `RequestContext.session: Option<SessionHandle>`. All 13 session-touching callsites from the explore get rewritten to go through the handle instead of direct access.
 ### The 13 callsites and their new shapes
 | Current location | Current call | New call |
 |---|---|---|
 | `Config::use_session` | `Session::load` or `Session::new` | `store.open_or_create_by_alias(...)` |
 | `Config::use_session_safely` | take/replace pattern on `config.session` | `ctx.session = Some(handle)` |
 | `Config::exit_session` | `session.exit()` (maybe saves) | `if ctx.session.dirty() { handle.save().await? }; ctx.session = None` |
 | `Config::empty_session` | `session.clear_messages()` | `handle.lock().await.get_mut().clear_messages()` |
 | `Config::save_session` | `session.save()` with name logic | `handle.rename(alias)?; handle.save().await?` |
 | `Config::compress_session` | mutates session, relies on dirty flag | `handle.lock().await.get_mut().compress(...)?; handle.save().await?` |
 | `Config::maybe_autoname_session` | spawns task, mutates session | same, but via handle |
 | `Config::delete` (kind="session") | `remove_file` on path | `store.delete(agent, id).await?` |
 | `Config::after_chat_completion` | `session.add_message(...)` | via handle |
 | `Config::apply_prelude` | may `use_session` | via store |
 | `Agent::init` / `use_agent` | may load agent session | via store, with `agent=Some(name)` |
 | `.session` REPL command | via `use_session_safely` | via store |
 | `.delete session` REPL command | via `Config::delete` | via store |
 Most of these are one-liner changes since the store's API mirrors the semantics of today's methods. The subtle ones are:
 - **`exit_session`** has "save if dirty and `save_session != Some(false)`" logic plus "prompt for name if temp session" UX. The prompt lives in the REPL layer (it calls `inquire::Text`), not in the store. After the refactor, the REPL reads the dirty flag from the handle, prompts for a name if needed, calls `handle.rename()` if the user provided one, then calls `handle.save()`.
 - **`compress_session`** runs asynchronously today — it spawns a task that holds a clone of `GlobalConfig` and writes back via `config.write()`. After the refactor, the task holds an `Arc<SessionHandle>` and does `handle.lock().await.get_mut().compress(...)` followed by `handle.save().await`. The per-session mutex prevents the compression task from clobbering concurrent turn writes.
 - **`maybe_autoname_session`** is the same story as compression: spawn task, mutate through handle, save through store.
 ---
 ## Migration Strategy
 ### Step 1: Create the types without wiring
 Add new files:
 - `src/session/mod.rs` — module root
 - `src/session/id.rs` — `SessionId`, `SessionAlias`
 - `src/session/store.rs` — `SessionStore` trait, `StoreError`, `SessionMeta`
 - `src/session/handle.rs` — `SessionHandle`, `SessionGuard`
 - `src/session/file_store.rs` — `FileSessionStore` implementation
 Move the existing `Session` struct from `src/config/session.rs` to `src/session/session.rs`. Keep the pub re-export at `src/config::Session` so no external callers break during the migration. The struct itself is unchanged — same fields, same YAML format, same methods. This is purely a module reorganization.
 Register `pub mod session;` in `src/main.rs` and add `pub sessions: Arc<dyn SessionStore>` to `AppState`. Initialize it in `AppState::init()` with `FileSessionStore::new(config_dir)`.
 **Verification:** `cargo check` clean, `cargo test` passes. Nothing uses the new types yet.
 ### Step 2: Implement `FileSessionStore` against the new layout
 Build the file-based implementation:
 - `state_path(agent, id) → ~/.config/loki/[agents/<agent>/]sessions/by-id/<uuid>/state.yaml`
 - `alias_path(agent, alias) → ~/.config/loki/[agents/<agent>/]sessions/by-name/<alias>`
 - `legacy_path(agent, alias) → ~/.config/loki/[agents/<agent>/]sessions/<alias>.yaml`
 Implement `create`, `open`, `open_or_create_by_alias`, `resolve_alias`, `save`, `rename`, `delete`, `list`. The `open_or_create_by_alias` method is the most complex — it has the lazy-migration logic that checks new layout, then legacy layout, then falls through to creation.
 **Unit tests for `FileSessionStore`:**
 - Create + open roundtrip
 - Create with alias + open_or_create_by_alias finds it
 - Lazy migration from legacy `.yaml` file
 - Delete removes both new and legacy paths
 - Rename updates alias index without touching state file
 - List returns both new-layout and legacy-layout sessions
 - Atomic write: kill the process mid-write (simulated by injected failure) and verify no torn YAML
 These tests use `tempfile::TempDir` so they don't touch the real config directory.
 **Verification:** Unit tests pass. `cargo check` clean.
 ### Step 3: Add `SessionHandle` and integrate with `RequestContext`
 Change `RequestContext.session` from `Option<Session>` to `Option<SessionHandle>`. This is a mass rename across the codebase — every callsite that does `ctx.session.as_ref()` needs to become `ctx.session.as_ref().map(|h| h.lock().await.get())` or similar.
 The cleanest way to minimize the blast radius is to add a thin compatibility layer on `RequestContext`:
 ```rust
 impl RequestContext {
    pub async fn session_read<F, R>(&self, f: F) -> Option<R>
    where F: FnOnce(&Session) -> R {
        let handle = self.session.as_ref()?;
        let guard = handle.lock().await;
        Some(f(guard.get()))
    }
    pub async fn session_write<F, R>(&mut self, f: F) -> Option<R>
    where F: FnOnce(&mut Session) -> R {
        let handle = self.session.as_ref()?;
        let mut guard = handle.lock().await;
        Some(f(guard.get_mut()))
    }
 }
 ```
 Most callsites become `ctx.session_read(|s| s.model_id.clone()).await` or `ctx.session_write(|s| s.add_message(...)).await`. A few that need to hold the guard across await points (e.g., compression) use `handle.lock()` directly.
 **Verification:** `cargo check` clean. Existing REPL functions still work because the old method names get forwarded through the compatibility helpers.
 ### Step 4: Rewrite the 13 session callsites to use the store
 Go through each callsite in the inventory table and rewrite it:
 1. `Config::use_session` → `Engine::dispatch_command` for `CoreCommand::UseSession`
 2. `Config::use_session_safely` → same, with extra ctx reset logic
 3. `Config::exit_session` → `Engine::dispatch_command` for `CoreCommand::ExitSession`
 4. ... and so on
 Where possible, move the logic INTO `Engine::dispatch_command` rather than leaving it on `Config`. This is consistent with Phase 2's direction — core logic lives in the engine, not on state containers.
 For each rewrite:
 - Delete the old method from `Config`
 - Add the new handler in `Engine::dispatch_command`
 - Update any callers that still reference the old method name
 - Run `cargo check` after each file to catch issues incrementally
 **Verification:** After each rewrite, `cargo check` + the relevant integration tests from Phase 2. The Phase 2 `CollectingEmitter` tests for session-touching scenarios are especially important here — they're the regression net.
 ### Step 5: Remove the compatibility helpers from `RequestContext`
 Once all 13 callsites are rewritten, the `session_read` / `session_write` helpers are only used by the old session methods we just deleted. Remove them. Any remaining compile errors point at callsites we missed.
 **Verification:** `cargo check` clean, all of Phase 2's tests still pass, plus the new `FileSessionStore` unit tests.
 ### Step 6: Add the integration tests for concurrent access
 These are the tests that prove Phase 3 actually solved the concurrency problem:
 ```rust
 #[tokio::test]
 async fn concurrent_opens_share_one_mutex() {
    let store = FileSessionStore::new(tempdir);
    let id = SessionId::new();
    // ... create initial session ...
    let h1 = store.open(None, id).await.unwrap();
    let h2 = store.open(None, id).await.unwrap();
    // Both handles should point at the same Arc<Mutex<Session>>
    let lock1 = h1.lock().await;
    // Try to lock h2 — should block
    let try_lock = tokio::time::timeout(
        Duration::from_millis(50),
        h2.lock(),
    ).await;
    assert!(try_lock.is_err(), "h2 should block while h1 holds the lock");
    drop(lock1);
    let _lock2 = h2.lock().await;
 }
 #[tokio::test]
 async fn concurrent_writes_serialize_without_loss() {
    let store = Arc::new(FileSessionStore::new(tempdir));
    let id = create_initial_session(&store).await;
    let tasks: Vec<_> = (0..100).map(|i| {
        let store = store.clone();
        tokio::spawn(async move {
            let handle = store.open(None, id).await.unwrap();
            {
                let mut guard = handle.lock().await;
                guard.get_mut().add_message(
                    Input::from_str(format!("msg-{i}")),
                    format!("reply-{i}"),
                );
            }
            handle.save().await.unwrap();
        })
    }).collect();
    for t in tasks { t.await.unwrap(); }
    let handle = store.open(None, id).await.unwrap();
    let guard = handle.lock().await;
    assert_eq!(guard.get().messages.len(), 200);  // 100 user + 100 assistant
 }
 ```
 The second test specifically verifies that the per-session mutex serialization prevents lost updates — the flaw in today's code.
 **Verification:** Both tests pass. `cargo test` green overall.
 ### Step 7: Legacy migration smoke test
 Copy a real user's `sessions/my-project.yaml` file into a test fixture directory. Run `FileSessionStore::open_or_create_by_alias("my-project")` and assert:
 - A new `by-id/<uuid>/state.yaml` exists with identical content
 - A new `by-name/my-project` file exists containing the UUID
 - The original `sessions/my-project.yaml` is still there, untouched
 - A second `open_or_create_by_alias("my-project")` call reuses the same UUID (idempotent)
 **Verification:** Test passes with real fixture data including a session that has compressed messages and agent variables.
 ### Step 8: Manual smoke test
 Run through a full REPL session covering every session-touching command:
 1. `loki` → REPL starts, `.session foo` → new session created, check `by-id/` and `by-name/foo` exist
 2. Several messages → check `state.yaml` updates atomically
 3. `.save session bar` → check alias renamed, UUID unchanged
 4. `.empty session` → messages cleared, file still exists
 5. `.exit session` → session closed
 6. `loki --session bar` from command line → same UUID resumes
 7. `.delete` then choose session → both new and legacy files gone
 8. Agent with `.agent sisyphus my-work` → agent-scoped session in `agents/sisyphus/sessions/`
 9. Auto-continuation in an agent → compression fires, concurrent writes serialize cleanly
 Every interaction should behave identically to pre-Phase-3.
 ---
 ## Risks and Watch Items
 | Risk | Severity | Mitigation |
 |---|---|---|
 | **Legacy file discovery** | Medium | The migration path must handle every legacy layout: `sessions/<name>.yaml`, `sessions/_/<timestamp>-<autoname>.yaml`, and agent-scoped `agents/<agent>/sessions/<name>.yaml`. Write a fixture test for each variant. |
 | **Alias collisions during migration** | Medium | If two processes simultaneously migrate the same legacy session, they could create two different UUIDs. Mitigation: the `open_or_create_by_alias` path should acquire a file lock on the alias file itself during creation, not just rely on the store's in-memory map. |
 | **`RequestContext.session` type change blast radius** | Medium | Using the compatibility helpers (`session_read` / `session_write`) in Step 3 contains the blast radius. Only remove them in Step 5 once everything compiles. |
 | **Session::save deadlock via re-entry** | Medium | If `Session::compress()` or `add_message()` internally trigger anything that tries to re-lock the session's mutex, we get a deadlock. Audit every `Session` method called inside a `guard.get_mut()` scope to make sure none of them take the lock again. Document the invariant in `SessionHandle` rustdoc. |
 | **Tempfile cleanup on crash** | Low | If the process dies after writing `.yaml.tmp` but before the rename, we leave a stray file. On startup, `FileSessionStore::new` should sweep `by-id/*/state.yaml.tmp` files and remove them. |
 | **Alias index corruption** | Low | If `by-name/foo` contains garbage (not a valid UUID), treat it as a missing alias and log a warning. Don't crash the process. |
 | **Serde compatibility with old files** | Low | The `Session` struct's serde shape doesn't change in Phase 3, so old YAML files deserialize identically. Verify with a fixture test that includes every optional field set. |
 | **CLI `--session <uuid>` vs `--session <alias>` ambiguity** | Low | `SessionId::parse` recognizes UUID format; fall back to treating the argument as an alias if parsing fails. Document in `--help`. |
 | **Concurrent delete while handle held** | Low | If one task is using a handle while another deletes the session, the first task's save will fail (file missing). This is acceptable behavior — log a warning and return `StoreError::NotFound`. Tests should cover this. |
 ---
 ## What Phase 3 Does NOT Do
 - **No schema migration.** YAML format stays identical. `Session` struct unchanged.
 - **No database.** `FileSessionStore` is the only implementation.
 - **No session TTL / eviction.** Sessions live until explicitly deleted.
 - **No cross-process locking.** Two Loki processes can still race, but writes are atomic so files never corrupt.
 - **No session encryption.** Vault handles secrets; sessions are plain YAML.
 - **No session sharing between users.** Each process has its own config directory.
 - **No optimistic concurrency (mtime check).** Deferred to Phase 5+ as a UX enhancement.
 - **No session versioning / rollback.** Deferred.
 - **No changes to `Session::build_messages()`, compression logic, or autoname generation.** The behaviors that read/mutate `Session` stay the same — only how they're reached changes.
 The sole goal of Phase 3 is: **route all session persistence through a `SessionStore` trait with UUID-primary identity, lazy migration from the legacy layout, per-session mutex serialization, and atomic writes.**
 ---
 ## Entry Criteria (from Phase 2)
 - [ ] `Engine::run` is the only path to the LLM pipeline
 - [ ] `CoreCommand::UseSession`, `ExitSession`, `EmptySession`, `CompressSession`, `SaveSession`, `EditSession` are all implemented and tested
 - [ ] `CollectingEmitter` integration tests cover session-touching scenarios
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 - [ ] CLI and REPL manual smoke tests match pre-Phase-2 behavior
 ## Exit Criteria (Phase 3 complete)
 - [ ] `src/session/` module exists with `SessionStore` trait, `FileSessionStore`, `SessionId`, `SessionAlias`, `SessionHandle`, `SessionGuard`
 - [ ] `AppState.sessions: Arc<dyn SessionStore>` is wired in
 - [ ] `RequestContext.session: Option<SessionHandle>` (not `Option<Session>`)
 - [ ] All 13 session callsites go through the store; no direct `Session::load` or `Session::save` calls remain outside `FileSessionStore`
 - [ ] Legacy layout files are lazily migrated on first access
 - [ ] New layout (`by-id/<uuid>/state.yaml` + `by-name/<alias>`) is the canonical on-disk format for all new sessions
 - [ ] Atomic writes via tempfile+rename
 - [ ] Per-session mutex serialization verified by concurrent-write integration tests
 - [ ] Legacy fixture test passes (existing user data still loads)
 - [ ] Full REPL smoke test covers every session command
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 - [ ] Phase 4 (REST API) can address sessions by UUID without touching persistence code
@@ -0,0 +1,824 @@
 # Phase 4 Implementation Plan: REST API Server
 ## Overview
 Phase 4 introduces a `--serve` mode that starts an HTTP server exposing Loki's functionality as a RESTful API. The server is a thin axum layer on top of `Engine::run()` — most of the work is mapping HTTP requests into `RunRequest`s, mapping `Emitter` events into JSON or Server-Sent Events, and providing baseline auth, cancellation, and graceful shutdown. By the end of this phase, Loki can run as a backend service that multiple clients can talk to simultaneously, each with their own session.
 **Estimated effort:** ~1–2 weeks
 **Risk:** Low–medium. The core pipeline (Engine) is unchanged; the risk is in the HTTP layer's correctness around streaming, cancellation, and concurrent session handling.
 **Depends on:** Phases 1–3 complete. `SessionStore` with UUID identity, `Engine::run()` as the pipeline entrypoint, `Emitter` trait with working `TerminalEmitter` + `CollectingEmitter`.
 ---
 ## Why Phase 4 Exists
 After Phase 3, everything the API server needs is already in place:
 - `AppState` is a clonable `Arc` holding global services, safe to share across concurrent HTTP handlers.
 - `RequestContext` is per-request mutable state with no hidden global singletons.
 - `Engine::run()` is the single pipeline entrypoint that works for any frontend.
 - `SessionStore` serves sessions by UUID with per-session mutex serialization.
 - `Emitter` trait decouples output from destination.
 What's missing is the last mile: accepting HTTP requests, routing them to `Engine::run()`, and turning `Event`s into HTTP responses. This phase builds exactly that.
 The mental model is "Loki as a backend service." A frontend developer should be able to `curl -X POST http://localhost:3400/v1/completions -d '{"prompt":"hello"}'` and get a sensible response. A JavaScript app should be able to open an EventSource to `/v1/sessions/:id/completions?stream=true` and get live token streaming. An automation script should be able to maintain session state across many requests by passing back the same session UUID.
 ---
 ## The Architecture After Phase 4
 ```
 ┌─────────────────────────────────────────────┐
 │         loki --serve --port 3400            │
 │  ┌───────────────────────────────────────┐  │
 │  │              axum Router              │  │
 │  │  ┌─────────────┐  ┌────────────────┐  │  │
 │  │  │   Middleware│  │    Handlers    │  │  │
 │  │  │  - Auth     │  │  /v1/*         │  │  │
 │  │  │  - Trace    │  │                │  │  │
 │  │  │  - CORS     │  │                │  │  │
 │  │  │  - Limit    │  │                │  │  │
 │  │  └──────┬──────┘  └────────┬───────┘  │  │
 │  └─────────┼──────────────────┼──────────┘  │
 │            ▼                  ▼             │
 │  ┌───────────────────────────────────┐      │
 │  │       Arc<AppState> (shared)      │      │
 │  └────────────────┬──────────────────┘      │
 │                   ▼                         │
 │  ┌───────────────────────────────────┐      │
 │  │  Per-request RequestContext +     │      │
 │  │  JsonEmitter or SseEmitter        │      │
 │  └────────────────┬──────────────────┘      │
 │                   ▼                         │
 │  ┌───────────────────────────────────┐      │
 │  │        Engine::run()              │      │
 │  └───────────────────────────────────┘      │
 └─────────────────────────────────────────────┘
 ```
 ---
 ## API Surface
 ### Versioning
 All endpoints live under `/v1/`. The version prefix lets us ship breaking changes later without breaking existing clients. `/v2/` endpoints can coexist with `/v1/` indefinitely.
 ### Endpoint summary
 ```
 Authentication
 POST   /v1/auth/check                        # validate API key, returns subject info
 Metadata
 GET    /v1/models                            # list available LLM models
 GET    /v1/agents                            # list installed agents
 GET    /v1/roles                             # list installed roles
 GET    /v1/rags                              # list standalone RAGs
 GET    /v1/info                              # server build info, health
 One-shot completions
 POST   /v1/completions                       # stateless completion (no session)
 Sessions
 POST   /v1/sessions                          # create a new session (returns UUID)
 GET    /v1/sessions                          # list sessions visible to this caller
 GET    /v1/sessions/:id                      # get session metadata + message history
 DELETE /v1/sessions/:id                      # delete a session
 POST   /v1/sessions/:id/completions          # send a prompt into a session
 POST   /v1/sessions/:id/compress             # manually trigger compression
 POST   /v1/sessions/:id/empty                # clear messages (keep session record)
 Role attachment
 POST   /v1/sessions/:id/role                 # activate role on session
 DELETE /v1/sessions/:id/role                 # detach role
 Agent attachment
 POST   /v1/sessions/:id/agent                # activate agent on session
 DELETE /v1/sessions/:id/agent                # deactivate agent
 RAG attachment
 POST   /v1/sessions/:id/rag                  # attach standalone RAG
 DELETE /v1/sessions/:id/rag                  # detach RAG
 POST   /v1/rags/:name/rebuild                # rebuild a RAG index
 ```
 ### Request/response shapes
 **One-shot completion:**
 ```
 POST /v1/completions
 Content-Type: application/json
 Authorization: Bearer <api-key>
 {
  "prompt": "Explain TCP handshake",
  "model": "openai:gpt-4o",         // optional: overrides default
  "role": "explain",                 // optional: apply role for this one request
  "agent": "oracle",                 // optional: run through an agent (no session retention)
  "stream": false,                   // optional: SSE vs JSON
  "files": [                         // optional: file attachments
    {"path": "/abs/path/doc.pdf"},
    {"url": "https://example.com/x"}
  ],
  "temperature": 0.7,                // optional override
  "auto_continue": false             // optional: enable agent auto-continuation
 }
 ```
 **Non-streaming response (default):**
 ```json
 {
  "request_id": "7a1b...",
  "session_id": null,
  "final_message": "The TCP handshake is a three-way protocol ...",
  "tool_calls": [
    {"id": "tc_1", "name": "web_search", "args": "...", "result": "...", "is_error": false}
  ],
  "turns": 2,
  "compressed": false,
  "auto_continued": 0,
  "usage": {
    "input_tokens": 120,
    "output_tokens": 458
  }
 }
 ```
 **Streaming response** (`Accept: text/event-stream` or `stream: true`):
 ```
 event: started
 data: {"request_id":"7a1b...","session_id":null}
 event: assistant_delta
 data: {"text":"The TCP "}
 event: assistant_delta
 data: {"text":"handshake is "}
 event: tool_call
 data: {"id":"tc_1","name":"web_search","args":"..."}
 event: tool_result
 data: {"id":"tc_1","name":"web_search","result":"...","is_error":false}
 event: assistant_delta
 data: {"text":" a three-way protocol..."}
 event: finished
 data: {"outcome":{"turns":2,"tool_calls":1,"compressed":false}}
 ```
 **Create session:**
 ```
 POST /v1/sessions
 {
  "alias": "my-project",      // optional; UUID-only if omitted
  "role": "explain",          // optional: pre-attach a role
  "agent": "sisyphus",        // optional: pre-attach an agent
  "rag": "mydocs",            // optional: pre-attach a RAG
  "model": "openai:gpt-4o"    // optional: pre-set model
 }
 ```
 **Response:**
 ```json
 {
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "alias": "my-project",
  "agent": "sisyphus",
  "role": "explain",
  "rag": "mydocs",
  "model": "openai:gpt-4o",
  "created_at": "2026-04-10T15:32:11Z"
 }
 ```
 **Session completion:**
 ```
 POST /v1/sessions/550e8400-.../completions
 {
  "prompt": "what was the bug we found yesterday?",
  "stream": true,
  "auto_continue": true
 }
 ```
 Returns the same shape as `/v1/completions`, but with `session_id` populated and agent runtime state preserved across calls.
 **Error responses** (standard across all endpoints):
 ```json
 {
  "error": {
    "code": "session_not_found",
    "message": "No session with id 550e8400-...",
    "request_id": "7a1b..."
  }
 }
 ```
 HTTP status codes map from `CoreError::http_status()` (defined in Phase 2):
 - `InvalidRequest` → 400
 - `Unauthorized` → 401
 - `NotFound` → 404
 - `InvalidState` → 409 (expected state doesn't match)
 - `Cancelled` → 499 (client-closed request, borrowed from nginx)
 - `ProviderError` → 502 (upstream LLM failed)
 - `ToolError` → 500
 - `Other` → 500
 ---
 ## Core Types
 ### `ApiConfig`
 ```rust
 #[derive(Clone, Deserialize)]
 pub struct ApiConfig {
    pub enabled: bool,
    pub listen_addr: SocketAddr,
    pub auth: AuthConfig,
    pub cors: CorsConfig,
    pub limits: LimitsConfig,
    pub request_timeout_seconds: u64,
    pub shutdown_grace_seconds: u64,
 }
 #[derive(Clone, Deserialize)]
 pub enum AuthConfig {
    Disabled,                                      // dev only
    StaticKeys { keys: Vec<AuthKeyEntry> },        // simple key list
    // future: JwtIssuer { ... }, OAuthIntrospect { ... }
 }
 #[derive(Clone, Deserialize)]
 pub struct AuthKeyEntry {
    pub subject: String,                           // for logs
    pub key_hash: String,                          // bcrypt or argon2 hash
    pub scopes: Vec<String>,
 }
 #[derive(Clone, Deserialize)]
 pub struct CorsConfig {
    pub allowed_origins: Vec<String>,              // empty = no CORS
    pub allow_credentials: bool,
 }
 #[derive(Clone, Deserialize)]
 pub struct LimitsConfig {
    pub max_body_bytes: usize,                     // request body limit
    pub max_concurrent_requests: usize,            // semaphore
    pub rate_limit_per_minute: Option<usize>,      // optional per-subject
 }
 ```
 `ApiConfig` loads from `config.yaml` under a new top-level `api:` block. It's NOT part of `AppConfig` because it only matters in `--serve` mode; in CLI/REPL mode it's ignored.
 ```yaml
 # config.yaml
 api:
  enabled: false                        # false = --serve refuses to start without explicit enable
  listen_addr: "127.0.0.1:3400"
  auth:
    mode: StaticKeys
    keys:
      - subject: "alice"
        key_hash: "$argon2id$..."
        scopes: ["read", "write"]
  cors:
    allowed_origins: []
    allow_credentials: false
  limits:
    max_body_bytes: 1048576             # 1 MiB
    max_concurrent_requests: 64
    rate_limit_per_minute: null
  request_timeout_seconds: 300          # 5 minutes default
  shutdown_grace_seconds: 30
 ```
 ### `ApiState`
 ```rust
 #[derive(Clone)]
 pub struct ApiState {
    pub app: Arc<AppState>,
    pub engine: Arc<Engine>,
    pub config: Arc<ApiConfig>,
    pub request_counter: Arc<AtomicU64>,
    pub active_requests: Arc<Semaphore>,
 }
 ```
 `ApiState` is the axum-friendly wrapper that every handler receives via the `State` extractor. It's clonable (cheap — all fields are `Arc` or atomic) and thread-safe. Handlers get a clone per request.
 ### `JsonEmitter`
 Phase 2 promised `JsonEmitter` and `SseEmitter` as deferred deliverables. Phase 4 implements them.
 ```rust
 pub struct JsonEmitter {
    events: Mutex<Vec<OwnedEvent>>,
    tool_calls: Mutex<Vec<ToolCallRecord>>,
    final_message: Mutex<Option<String>>,
    outcome: Mutex<Option<RunOutcome>>,
 }
 impl JsonEmitter {
    pub fn new() -> Self { /* ... */ }
    /// Consume the emitter and return the JSON response body.
    pub fn into_response(self) -> serde_json::Value { /* ... */ }
 }
 #[async_trait]
 impl Emitter for JsonEmitter {
    async fn emit(&self, event: Event<'_>) -> Result<(), EmitError> {
        match event {
            Event::AssistantDelta(text) => { /* accumulate */ }
            Event::AssistantMessageEnd { full_text } => { /* set final_message */ }
            Event::ToolCall { .. } | Event::ToolResult { .. } => { /* record */ }
            Event::Finished { outcome } => { /* store */ }
            _ => { /* record as event */ }
        }
        Ok(())
    }
 }
 ```
 The non-streaming HTTP handler creates a `JsonEmitter`, calls `Engine::run`, and then calls `.into_response()` to get the final JSON body.
 ### `SseEmitter`
 ```rust
 pub struct SseEmitter {
    sender: mpsc::Sender<Result<axum::response::sse::Event, axum::Error>>,
    client_disconnected: Arc<AtomicBool>,
 }
 #[async_trait]
 impl Emitter for SseEmitter {
    async fn emit(&self, event: Event<'_>) -> Result<(), EmitError> {
        if self.client_disconnected.load(Ordering::Relaxed) {
            return Err(EmitError::ClientDisconnected);
        }
        let sse_event = to_sse_event(&event)?;
        self.sender
            .send(Ok(sse_event))
            .await
            .map_err(|_| {
                self.client_disconnected.store(true, Ordering::Relaxed);
                EmitError::ClientDisconnected
            })?;
        Ok(())
    }
 }
 fn to_sse_event(event: &Event<'_>) -> Result<axum::response::sse::Event, serde_json::Error> {
    let (name, data) = match event {
        Event::Started { .. } => ("started", serde_json::to_string(event)?),
        Event::AssistantDelta(text) => ("assistant_delta", json!({ "text": text }).to_string()),
        Event::AssistantMessageEnd { .. } => ("assistant_message_end", serde_json::to_string(event)?),
        Event::ToolCall { .. } => ("tool_call", serde_json::to_string(event)?),
        Event::ToolResult { .. } => ("tool_result", serde_json::to_string(event)?),
        Event::AutoContinueTriggered { .. } => ("auto_continue_triggered", serde_json::to_string(event)?),
        Event::SessionCompressing => ("session_compressing", "{}".to_string()),
        Event::SessionCompressed { .. } => ("session_compressed", serde_json::to_string(event)?),
        Event::SessionAutonamed(_) => ("session_autonamed", serde_json::to_string(event)?),
        Event::Info(msg) => ("info", json!({ "message": msg }).to_string()),
        Event::Warning(msg) => ("warning", json!({ "message": msg }).to_string()),
        Event::Error(err) => ("error", serde_json::to_string(err)?),
        Event::Finished { outcome } => ("finished", serde_json::to_string(outcome)?),
    };
    Ok(axum::response::sse::Event::default().event(name).data(data))
 }
 ```
 The streaming handler creates an mpsc channel, hands the sender half to an `SseEmitter`, and returns an `axum::response::sse::Sse` wrapping the receiver half. axum streams each event as it's emitted, with automatic flushing. If the client disconnects, the send fails, `client_disconnected` is set, and subsequent emits return `ClientDisconnected` — which the engine respects by continuing to completion without emitting further (Phase 2 designed this behavior in).
 ---
 ## Middleware Stack
 The axum router wraps handlers in a layered middleware stack. Order matters because middleware is applied outside-in on requests, inside-out on responses.
 ```rust
 let router = Router::new()
    .route("/v1/auth/check", post(handlers::auth_check))
    .route("/v1/models", get(handlers::list_models))
    .route("/v1/agents", get(handlers::list_agents))
    .route("/v1/roles", get(handlers::list_roles))
    .route("/v1/rags", get(handlers::list_rags))
    .route("/v1/info", get(handlers::info))
    .route("/v1/completions", post(handlers::one_shot_completion))
    .route("/v1/sessions", post(handlers::create_session).get(handlers::list_sessions))
    .route("/v1/sessions/:id", get(handlers::get_session).delete(handlers::delete_session))
    .route("/v1/sessions/:id/completions", post(handlers::session_completion))
    .route("/v1/sessions/:id/compress", post(handlers::compress_session))
    .route("/v1/sessions/:id/empty", post(handlers::empty_session))
    .route("/v1/sessions/:id/role", post(handlers::set_role).delete(handlers::clear_role))
    .route("/v1/sessions/:id/agent", post(handlers::set_agent).delete(handlers::clear_agent))
    .route("/v1/sessions/:id/rag", post(handlers::set_rag).delete(handlers::clear_rag))
    .route("/v1/rags/:name/rebuild", post(handlers::rebuild_rag))
    .layer(middleware::from_fn_with_state(state.clone(), middleware::auth))
    .layer(middleware::from_fn(middleware::request_id))
    .layer(middleware::from_fn_with_state(state.clone(), middleware::concurrency_limit))
    .layer(middleware::from_fn(middleware::tracing))
    .layer(middleware::from_fn(middleware::error_handler))
    .layer(tower_http::timeout::TimeoutLayer::new(Duration::from_secs(
        state.config.request_timeout_seconds,
    )))
    .layer(tower_http::limit::RequestBodyLimitLayer::new(state.config.limits.max_body_bytes))
    .layer(cors_layer(&state.config.cors))
    .with_state(state);
 ```
 ### Middleware responsibilities
 **auth** — Validates `Authorization: Bearer <key>` header against the configured auth provider. Compares against stored hashes (bcrypt/argon2), never plaintext. On success, attaches an `AuthContext { subject, scopes }` to request extensions. On failure, returns 401 immediately without calling the handler. If `AuthConfig::Disabled`, synthesizes an `AuthContext { subject: "anonymous", scopes: vec!["*"] }` for local dev.
 **request_id** — Generates a UUID request ID, attaches it to request extensions for downstream correlation, emits it as `X-Request-Id` in the response headers. Used by tracing and error handlers.
 **concurrency_limit** — Acquires a permit from `state.active_requests` semaphore with a short timeout. If the server is saturated, returns 503 Service Unavailable immediately. This protects against runaway connection counts exhausting resources.
 **tracing** — Wraps the request in a `tracing::Span` carrying the request ID, subject, method, path, and session ID if present. Every log line and every tool call emitted during the request carries this span context. Essential for debugging production issues.
 **error_handler** — Catches `CoreError` from handler results and maps to proper HTTP responses using `CoreError::http_status()` and a JSON error body. Ensures no handler leaks an `anyhow::Error` or raw `?` into an axum 500.
 **timeout** — Overall request deadline. After N seconds (default 300), the request is aborted. This is a backstop — the engine's per-request cancellation token is the primary cancellation mechanism.
 **body limit** — Rejects requests larger than the configured max. Default 1 MiB is enough for prompts with several files attached; adjustable in config.
 **cors** — Attaches `Access-Control-Allow-Origin` headers for cross-origin browsers. Empty allowed origins = no CORS headers emitted (safe default). `allow_credentials: true` enables cookie/auth forwarding.
 ### What's NOT in middleware
 - **Rate limiting per subject** — deferred. The `rate_limit_per_minute` config option is wired through but the middleware is a stub in Phase 4. Real rate limiting with sliding windows lands in a follow-up.
 - **Request/response logging** — use the tracing middleware's output; don't add a separate HTTP log layer.
 - **Metrics** — deferred to Phase 4.5 (Prometheus endpoint). Phase 4 just exposes counters in `ApiState`.
 - **Content negotiation** — Phase 4 assumes JSON requests. `Accept: text/event-stream` is the only alternate content type we handle, and only on completion endpoints.
 ---
 ## Handler Pattern
 Every handler follows the same shape:
 ```rust
 pub async fn session_completion(
    State(state): State<ApiState>,
    Extension(auth): Extension<AuthContext>,
    Extension(request_id): Extension<Uuid>,
    Path(session_id): Path<String>,
    Json(req): Json<CompletionRequest>,
 ) -> Result<Response, ApiError> {
    // 1. Parse domain types
    let session_id = SessionId::parse(&session_id)
        .map_err(|_| ApiError::bad_request("invalid session id"))?;
    // 2. Open the session handle
    let handle = state.app.sessions.open(None, session_id).await
        .map_err(|e| match e {
            StoreError::NotFound { .. } => ApiError::not_found("session", &session_id.to_string()),
            other => ApiError::from(other),
        })?;
    // 3. Build RequestContext from AppState + session
    let mut ctx = RequestContext::new(state.app.clone(), WorkingMode::Api);
    ctx.session = Some(handle);
    ctx.auth = Some(auth);
    // 4. Build cancellation token that fires on client disconnect
    let cancel = CancellationToken::new();
    // 5. Convert the HTTP request to a RunRequest
    let run_req = RunRequest {
        input: Some(UserInput::from_api(req.prompt, req.files)?),
        command: None,
        options: {
            let mut o = if req.session_active {
                RunOptions::api_session()
            } else {
                RunOptions::api_oneshot()
            };
            o.stream = req.stream;
            o.auto_continue = req.auto_continue.unwrap_or(false);
            o.cancel = cancel.clone();
            o
        },
    };
    // 6. Branch on streaming vs JSON
    if req.stream {
        // Create SseEmitter + channel, spawn engine task, return Sse response
        let (tx, rx) = mpsc::channel(32);
        let emitter = SseEmitter::new(tx);
        let engine = state.engine.clone();
        tokio::spawn(async move {
            let _ = engine.run(&mut ctx, run_req, &emitter).await;
            // Emitter Drop closes the channel; Sse stream ends naturally
        });
        Ok(Sse::new(ReceiverStream::new(rx))
            .keep_alive(KeepAlive::default())
            .into_response())
    } else {
        // Use JsonEmitter synchronously, return JSON body
        let emitter = JsonEmitter::new();
        state.engine.run(&mut ctx, run_req, &emitter).await
            .map_err(ApiError::from)?;
        Ok(Json(emitter.into_response()).into_response())
    }
 }
 ```
 The streaming path spawns a background task because axum needs to return the `Response` (with the SSE stream) before the engine finishes its work. The task owns the `ctx` and `emitter`, runs to completion, and naturally terminates when the engine returns. The channel closing signals the end of the stream to axum.
 The non-streaming path runs synchronously in the handler task because we need the full result before returning the response body.
 ---
 ## Cancellation and Client Disconnect
 Two cancellation sources, one unified mechanism:
 1. **Client disconnect during streaming.** axum signals this by dropping the SSE receiver. The next `SseEmitter::emit` call fails with `ClientDisconnected`, which the engine handles by stopping further emits but continuing to completion so session state is persisted correctly.
 2. **Request timeout.** The outer tower timeout layer fires after N seconds, dropping the handler's future. This cancels any pending awaits in the engine, which propagates through tokio cancellation. Active tool calls (especially bash/python/typescript subprocesses) need to be killed cleanly — this is the same concern as Phase 2's Ctrl-C handling.
 The engine's `CancellationToken` handles both cases uniformly. For streaming, the handler watches the SSE sender's `closed()` signal and triggers `cancel.cancel()` when the client goes away. For timeout, tower's dropped future causes the handler task to be aborted, which drops `cancel` and fires any `cancelled()` waiters in the engine.
 ```rust
 // Inside the streaming handler:
 let cancel_for_disconnect = cancel.clone();
 let send_tx = tx.clone();
 tokio::spawn(async move {
    send_tx.closed().await;  // resolves when receiver drops
    cancel_for_disconnect.cancel();
 });
 ```
 **Tool call cancellation** is the interesting case. A running bash/python/typescript subprocess must be killed when `cancel` fires. The existing tool execution code uses `AbortSignal` from the `abort_on_ctrlc` crate; Phase 2's shim layer adapts it to `CancellationToken`. Phase 4 doesn't need to change this — it just needs to verify that the adapter is still firing correctly when cancellation comes from HTTP disconnect instead of Ctrl-C.
 ---
 ## Per-Request State Isolation
 The critical correctness property: **two concurrent requests must not share mutable state.** The architecture from Phases 1–3 makes this structural rather than something we have to police:
 - `AppState` is `Arc`-wrapped and contains only immutable config and shared services (vault, RAG cache, MCP factory, session store).
 - `RequestContext` is constructed fresh in each handler — two requests get two independent contexts.
 - `SessionHandle` uses per-session `Mutex` serialization — two concurrent requests on the *same* session wait their turn (by design).
 - `McpFactory` acquires handles via per-key sharing — two requests using the same MCP server share one process; two using different servers get independent processes.
 - `RagCache` shares `Arc<Rag>` via weak refs — same sharing property.
 The one place where the architecture can't help us is **agent runtime isolation**. Two concurrent API requests on two different sessions, both running agents, must get two fully independent `AgentRuntime`s with their own supervisors, inboxes, todo lists, and escalation queues. Phase 1 Step 6.5 made this work by putting `AgentRuntime` on `RequestContext`, which is already per-request. Phase 4 just needs to verify nothing regresses.
 **Integration test for this:** spin up 10 concurrent requests, each running a different agent with tools, and assert that each one gets its own tool call history, its own todo list, and its own eventual response. Use a mock LLM so the test is deterministic.
 ---
 ## Migration Strategy
 ### Step 1: Add dependencies and scaffolding
 Add to `Cargo.toml`:
 ```toml
 axum = { version = "0.8", features = ["macros"] }
 tower = "0.5"
 tower-http = { version = "0.6", features = ["cors", "limit", "timeout", "trace"] }
 argon2 = "0.5"
 ```
 `hyper` is already present. `tokio-stream` for SSE.
 Create module structure:
 - `src/api/mod.rs` — module root, `serve()` entrypoint
 - `src/api/config.rs` — `ApiConfig`, `AuthConfig`, etc.
 - `src/api/state.rs` — `ApiState`
 - `src/api/auth.rs` — middleware + `AuthContext`
 - `src/api/middleware.rs` — other middlewares (request_id, tracing, concurrency_limit, error_handler)
 - `src/api/error.rs` — `ApiError` + conversion from `CoreError`
 - `src/api/emitters/json.rs` — `JsonEmitter`
 - `src/api/emitters/sse.rs` — `SseEmitter`
 - `src/api/handlers/mod.rs` — handler module root
 - `src/api/handlers/completions.rs` — one-shot and session completions
 - `src/api/handlers/sessions.rs` — session CRUD
 - `src/api/handlers/metadata.rs` — list models/agents/roles/rags
 - `src/api/handlers/scope.rs` — role/agent/rag attachment endpoints
 - `src/api/handlers/rag.rs` — rebuild endpoint
 Register `pub mod api;` in `src/main.rs`. Add a `--serve` CLI flag that calls `api::serve(app_state).await`.
 **Verification:** `cargo check` clean with empty handler stubs returning 501 Not Implemented.
 ### Step 2: Implement auth middleware and error handling
 Build the auth middleware against `AuthConfig::StaticKeys` using argon2 for verification. Implement `ApiError` with `IntoResponse` that produces the JSON error body. Implement `From<CoreError>` for `ApiError` using `CoreError::http_status()` and `CoreError::message()` (add those methods to `CoreError` in Phase 2 if they don't exist yet; otherwise add here).
 Write unit tests:
 - Valid key → handler runs, `AuthContext` is attached
 - Invalid key → 401
 - Missing key → 401
 - `AuthConfig::Disabled` → anonymous context synthesized
 **Verification:** Auth tests pass. `curl -H "Authorization: Bearer <valid-key>" http://localhost:3400/v1/info` returns info; without the header returns 401.
 ### Step 3: Implement `JsonEmitter` and `SseEmitter`
 Both are relatively mechanical. `JsonEmitter` accumulates events into a buffer and exposes `into_response()`. `SseEmitter` converts each event to an axum SSE frame and pushes into an mpsc channel.
 Write unit tests using `NullEmitter` → feed a scripted sequence of events → assert the resulting JSON or SSE frames.
 **Verification:** Both emitters have unit tests that drive a scripted `Event` sequence and compare to golden outputs.
 ### Step 4: Implement metadata handlers
 Start with the easy endpoints: `GET /v1/models`, `/v1/agents`, `/v1/roles`, `/v1/rags`, `/v1/info`. These don't call the engine — they just read from `AppState` and return JSON.
 **Verification:** `curl` each endpoint and inspect output. Write integration tests that spin up the router and hit each endpoint.
 ### Step 5: Implement session CRUD handlers
 `POST /v1/sessions` creates via `SessionStore::create`. `GET /v1/sessions` lists via `SessionStore::list`. `GET /v1/sessions/:id` reads metadata + message history via `SessionStore::open` + handle lock. `DELETE /v1/sessions/:id` calls `SessionStore::delete`.
 These handlers don't call the engine either. They're thin wrappers around `SessionStore`.
 **Verification:** Create a session via POST, list it, read it, delete it, confirm 404 after delete. All through `curl`.
 ### Step 6: Implement one-shot completion handler
 `POST /v1/completions` is the first engine-calling handler. It constructs a fresh `RequestContext` with no session, builds a `RunRequest` from the HTTP body, and calls `Engine::run` with either `JsonEmitter` or `SseEmitter` based on the `stream` flag.
 This is where the streaming infrastructure first gets exercised end-to-end. Test both modes:
 ```bash
 # Non-streaming
 curl -X POST http://localhost:3400/v1/completions \
  -H "Authorization: Bearer <key>" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"hello"}'
 # Streaming
 curl -N -X POST http://localhost:3400/v1/completions \
  -H "Authorization: Bearer <key>" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"prompt":"hello","stream":true}'
 ```
 **Verification:** Both modes work with a real LLM. Disconnect the streaming client mid-response (Ctrl-C on curl) and verify the engine task gets cancelled cleanly — no orphaned MCP subprocesses, no hung tool executions.
 ### Step 7: Implement session completion handler
 `POST /v1/sessions/:id/completions` is the same as one-shot but with a session attached. The handler calls `store.open(id)`, builds a context with `ctx.session = Some(handle)`, and proceeds as before. Session state is automatically persisted by the engine at the end of the turn.
 Concurrent request test: spin up 10 concurrent `curl` commands all hitting the same session. Assert:
 - All 10 complete successfully
 - The session has 10 message pairs appended in some order (serialized by the per-session mutex)
 - No lost updates, no corrupted YAML
 **Verification:** Concurrent test passes reliably. Run it 100 times in a loop to catch races.
 ### Step 8: Implement scope attachment handlers
 `POST /v1/sessions/:id/role`, `/agent`, `/rag` and their `DELETE` counterparts. Each one opens the session handle, constructs a `RunRequest` with a `CoreCommand` variant (`UseRole`, `UseAgent`, `UseRag`), and calls the engine with no input — just the command. The engine dispatches the command, mutates state, and the session is persisted.
 **Verification:** `POST /v1/sessions/<id>/role {"name":"explain"}` activates the role. Subsequent completion on the session uses the role. `DELETE /v1/sessions/<id>/role` clears it.
 ### Step 9: Implement miscellaneous handlers
 `POST /v1/sessions/:id/compress`, `/empty`, `POST /v1/rags/:name/rebuild`. Same pattern: translate to `CoreCommand` and dispatch.
 **Verification:** All endpoints respond correctly.
 ### Step 10: Graceful shutdown
 axum's graceful shutdown requires a signal future. Wire it up:
 ```rust
 pub async fn serve(app: Arc<AppState>, config: ApiConfig) -> Result<()> {
    let state = ApiState::new(app, config);
    let router = build_router(state.clone());
    let listener = tokio::net::TcpListener::bind(state.config.listen_addr).await?;
    let shutdown_signal = async {
        tokio::signal::ctrl_c().await.ok();
        info!("Received shutdown signal, draining requests...");
    };
    axum::serve(listener, router)
        .with_graceful_shutdown(shutdown_signal)
        .await?;
    info!("Draining active sessions...");
    tokio::time::timeout(
        Duration::from_secs(state.config.shutdown_grace_seconds),
        drain_active_requests(&state),
    ).await.ok();
    info!("Shutdown complete.");
    Ok(())
 }
 ```
 `drain_active_requests` waits for the semaphore to return to full capacity, bounded by `shutdown_grace_seconds`. After the grace period, any remaining requests are force-cancelled.
 **Verification:** Start server, send a long streaming request, hit Ctrl-C. The server should finish the in-flight request (up to the grace period) before exiting, not cut it off mid-stream.
 ### Step 11: Configuration loading and docs
 Wire `ApiConfig` through `config.yaml` parsing. Add a default `api.enabled: false` so the server refuses to start without explicit opt-in. Document the config shape, endpoint schemas, and auth setup in `docs/REST-API-SERVER.md`.
 **Verification:** Start with `api.enabled: false` → fatal error with helpful message. Start with `api.enabled: true` + no auth keys → fatal error demanding at least one key (unless `AuthConfig::Disabled` is explicit).
 ### Step 12: Integration test suite
 Write a comprehensive integration test suite in `tests/api/` that exercises the full HTTP surface with a mock LLM:
 - Auth: valid, invalid, missing, disabled
 - Metadata: list each resource type
 - Session lifecycle: create → list → read → delete
 - One-shot completion: JSON + SSE
 - Session completion: single + concurrent
 - Scope attachment: role, agent, rag (set + clear)
 - Cancellation: client disconnect mid-stream, timeout expiry
 - Graceful shutdown: in-flight requests complete within grace period
 - Concurrent sessions: 20 sessions, each with a few turns, all running at once
 Use `reqwest` as the test client. Spin up the server on a random port per test. The mock LLM lives as a fake `Client` implementation that returns scripted responses.
 **Verification:** All tests pass. CI runs them on every PR.
 ---
 ## Risks and Watch Items
 | Risk | Severity | Mitigation |
 |---|---|---|
 | **SSE client disconnect detection lag** | High | The mpsc channel's `closed()` signal is the primary disconnect detector. Verify it fires within <1s of a real client disconnect. Add integration test with `reqwest` that opens a stream, sends a few events, drops the connection, and asserts the engine's cancellation token fires within 2s. |
 | **Concurrent session writes losing data** | High | Phase 3's per-session mutex handles this structurally. Verify with the 100-concurrent-writers integration test from Phase 3 adapted to hit the HTTP layer. |
 | **Orphaned tool subprocesses on timeout** | High | Tool execution must respect the cancellation token. Test: start a completion that triggers a bash tool running `sleep 60`, timeout at 5s, verify the `sleep` process is killed (not reparented to init). |
 | **Auth key storage** | High | Store argon2 hashes, never plaintext. Rotate via config reload (future). Log subject (not key) on every request. Audit: no `println!` of any part of the key anywhere. |
 | **Streaming body size growth** | Medium | A long session with many tool calls produces a lot of SSE frames. Verify the mpsc channel size (32) is enough; if not, backpressure causes the engine task to block on emit. Document in the emitter: `emit()` can await. |
 | **CORS misconfiguration** | Medium | Default to no CORS. Require explicit origin allowlist. Log warnings on wildcard usage. Browser-accessible deployments should use a reverse proxy to terminate CORS. |
 | **Auth bypass via malformed header** | Medium | Use axum's `Authorization` typed header extractor, not raw string parsing. Reject unknown schemes (only Bearer accepted). |
 | **Rate limit stub** | Low | Document that `rate_limit_per_minute` is not yet implemented. Add an issue for follow-up. Protect against DoS with `max_concurrent_requests` in the meantime. |
 | **Session metadata leak across users** | Low | `GET /v1/sessions` lists all sessions regardless of caller identity in Phase 4. Document this limitation: Phase 4's auth is coarse-grained (anyone with a valid key sees all sessions). Per-subject session ownership lands in a follow-up phase. Treat Phase 4 as single-tenant-per-key for now. |
 | **Body size abuse** | Low | `max_body_bytes` caps payload. File uploads (not yet supported) would need separate multipart handling. |
 | **Port binding failure** | Low | Fail fast with clear error if the configured port is in use or unreachable. Don't silently retry. |
 ---
 ## What Phase 4 Does NOT Do
 - **No WebSocket support.** SSE is sufficient for server-to-client streaming; WebSockets would add bidirectional complexity we don't need. Client-to-server commands use regular HTTP POST.
 - **No multi-tenancy.** All sessions are visible to any authenticated caller. Per-subject session ownership is a follow-up.
 - **No rate limiting.** `rate_limit_per_minute` config exists but is a stub.
 - **No metrics endpoint.** Counters are in memory; Prometheus scraping lands later.
 - **No API versioning beyond `/v1/`.** Breaking changes would introduce `/v2/`.
 - **No JWT or OAuth.** Static API keys only. JWT introspection can extend `AuthConfig` later.
 - **No request signing.** Bearer tokens over HTTPS (users provide their own TLS termination via reverse proxy).
 - **No admin endpoints.** Server management (reload config, view metrics, kill sessions) is not exposed.
 - **No file upload.** File references in requests use absolute paths or URLs that the server fetches; no multipart uploads in Phase 4.
 - **No MCP tool exposure over API.** The API calls the engine, which runs tools internally. Direct "execute this tool" API endpoints don't exist and are not planned.
 ---
 ## Entry Criteria (from Phase 3)
 - [ ] `SessionStore` trait is the only path to session persistence
 - [ ] `FileSessionStore` is wired into `AppState.sessions`
 - [ ] Concurrent-write integration test from Phase 3 passes
 - [ ] All session-touching callsites go through the store
 - [ ] `Engine::run` handles `RunOptions::api_oneshot()` and `RunOptions::api_session()` modes
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 ## Exit Criteria (Phase 4 complete)
 - [ ] `--serve` flag starts an HTTP server on the configured port
 - [ ] `src/api/` module exists with all handlers, middleware, emitters
 - [ ] `JsonEmitter` and `SseEmitter` implemented and tested
 - [ ] Auth middleware validates argon2-hashed API keys
 - [ ] All 19 endpoints listed in the API surface are implemented and return sensible responses
 - [ ] Concurrent-session integration test passes (20 sessions, multiple turns, parallel)
 - [ ] Client disconnect during streaming triggers engine cancellation within 2s
 - [ ] Request timeout fires at the configured deadline
 - [ ] Graceful shutdown drains in-flight requests within the grace period
 - [ ] Tool subprocesses are killed on cancellation, not orphaned
 - [ ] `docs/REST-API-SERVER.md` documents config, endpoints, and auth setup
 - [ ] Full integration test suite in `tests/api/` passes
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 - [ ] Phase 5 (Tool Scope Pooling) can optimize the hot path without changing the API surface
@@ -0,0 +1,755 @@
 # Phase 5 Implementation Plan: Tool Scope Pooling and Lifecycle
 ## Overview
 Phase 5 turns the trivial no-pool `McpFactory` from Phase 1 Step 6.5 into a production-grade pooling layer with idle timeouts, a background reaper, health checks, and graceful shutdown integration. The architecture doesn't change — `McpFactory::acquire()` is still the only entry point, `Arc<McpServerHandle>` is still the reference type — but the factory now aggressively shares MCP subprocesses across scopes to keep warm-path latency near zero.
 **Estimated effort:** ~1 week
 **Risk:** Medium. The pooling logic has subtle ordering concerns (handle Drop → idle pool vs teardown → reaper eviction). Get those wrong and you leak processes or double-free.
 **Depends on:** Phases 1–4 complete. Phase 4 is important because it's the first workload where pooling actually matters — CLI and REPL don't generate enough concurrent scope transitions to justify the complexity.
 ---
 ## Why Phase 5 Exists
 After Phase 4 lands, the API server works correctly but has a performance problem: every API session activates its own MCP processes, and when the session closes, those processes tear down immediately. A realistic production workload — 20 concurrent users each sending a burst of requests — spawns and kills MCP subprocesses at an unsustainable rate. For servers like `github` that take 1–2 seconds to start (subprocess + stdio handshake + OAuth + `tools/list`), every API call adds visible cold-start latency.
 The architectural framing for the fix was already designed in Phase 1 Step 6.5 and Phase 1's "MCP Lifecycle Policy" section:
 1. **Layer 1: active Arc reference counting.** Already done in Phase 1. Scopes hold `Arc<McpServerHandle>`; the last drop triggers teardown.
 2. **Layer 2: idle grace period.** Not yet implemented. After the last Arc drops, the handle moves to an idle pool with a timestamp instead of tearing down. A background reaper evicts entries that have been idle past the configured threshold.
 3. **Acquisition order.** `acquire(key)` checks the active map first, then the idle pool (revival = zero latency), then spawns fresh.
 Phase 5 implements Layer 2 + the reaper + the revival logic + the health check + graceful shutdown integration. No changes to the caller API. No changes to any other phase's code.
 **This is a pure optimization phase.** Correctness is unchanged; only performance improves.
 ---
 ## The Architecture After Phase 5
 ```
      ┌─────────────────────────────────────────────────┐
      │                  McpFactory                     │
      │                                                 │
      │   ┌──────────────┐       ┌──────────────────┐   │
      │   │    active:   │       │      idle:       │   │
      │   │  HashMap<K,  │       │  HashMap<K,      │   │
      │   │    Weak<H>>  │       │  IdleEntry>      │   │
      │   └──────┬───────┘       └────────┬─────────┘   │
      │          │                        │             │
      │          │ upgrade()              │ remove()    │
      │          │                        │             │
      │          ▼                        ▼             │
      │   ┌──────────────────────────────────────┐      │
      │   │           acquire(key):              │      │
      │   │    1. Try active.upgrade() → share   │      │
      │   │    2. Try idle.remove() → revive     │      │
      │   │    3. Spawn fresh subprocess         │      │
      │   └──────────────────────────────────────┘      │
      │                                                 │
      │   ┌──────────────────────────────────────┐      │
      │   │  Background reaper (tokio::spawn):    │     │
      │   │    every cleanup_interval:            │     │
      │   │      walk idle, evict stale entries   │     │
      │   │      (optional: health check)         │     │
      │   └──────────────────────────────────────┘      │
      └─────────────────────────────────────────────────┘
                               │
                               │  Arc<McpServerHandle>
                               ▼
                  ┌────────────────────────┐
                  │  scope's ToolScope     │
                  │  (CLI/REPL/API request)│
                  └────────────────────────┘
 ```
 ---
 ## Core Types
 ### `McpFactory` (expanded)
 ```rust
 pub struct McpFactory {
    active: Mutex<HashMap<McpServerKey, Weak<McpServerHandleInner>>>,
    idle: Mutex<HashMap<McpServerKey, IdleEntry>>,
    config: McpFactoryConfig,
    shutdown: Arc<AtomicBool>,
    reaper_handle: Mutex<Option<JoinHandle<()>>>,
 }
 struct IdleEntry {
    handle: Arc<McpServerHandleInner>,
    idle_since: Instant,
    last_health_check: Option<Instant>,
 }
 pub struct McpFactoryConfig {
    pub idle_timeout: Duration,
    pub cleanup_interval: Duration,
    pub max_idle_servers: Option<usize>,
    pub health_check: Option<HealthCheckPolicy>,
 }
 pub struct HealthCheckPolicy {
    pub interval: Duration,
    pub timeout: Duration,
    pub on_failure: HealthFailureAction,
 }
 pub enum HealthFailureAction {
    Evict,
    EvictAndLog,
    LogOnly,
 }
 ```
 The factory grows three new pieces of state compared to Phase 1's stub:
 - **`idle` map** — stores handles that nobody currently owns but that we've decided to keep warm.
 - **`shutdown` flag** — tells the reaper to exit and prevents new inserts into `idle` during drain.
 - **`reaper_handle`** — the `JoinHandle` of the background task, awaited during graceful shutdown.
 ### `McpServerHandle` (refined)
 Phase 1's `Arc<McpServerHandle>` becomes `Arc<McpServerHandleInner>`, and we add a `Drop` impl on the inner type that handles the "return to idle pool" logic:
 ```rust
 pub struct McpServerHandleInner {
    key: McpServerKey,
    service: RwLock<RunningService<RoleClient, ()>>,
    factory: Weak<McpFactory>,
    spawned_at: Instant,
    returning_to_pool: AtomicBool,
 }
 impl Drop for McpServerHandleInner {
    fn drop(&mut self) {
        // If we're already returning to pool (revived from idle),
        // don't re-insert — the factory is handling it.
        if self.returning_to_pool.load(Ordering::Acquire) {
            return;
        }
        let Some(factory) = self.factory.upgrade() else {
            // Factory is gone — just let the service die via its own drop.
            return;
        };
        if factory.shutdown.load(Ordering::Acquire) {
            // Shutting down — don't put it back in idle, just die.
            return;
        }
        // Take ownership of self.service and move to idle pool.
        // This requires unsafe or a different ownership structure — see
        // "The Drop trick" section below.
        factory.return_to_idle(self);
    }
 }
 ```
 **The Drop trick** — the issue is that `Drop::drop` can't actually move `self`'s fields out without `unsafe`, but we need to move the `RunningService` into the idle pool. The clean solution is to wrap the service in an `Option<RunningService>`:
 ```rust
 pub struct McpServerHandleInner {
    key: McpServerKey,
    service: Mutex<Option<RunningService<RoleClient, ()>>>,  // Option so we can take() in Drop
    factory: Weak<McpFactory>,
    spawned_at: Instant,
 }
 impl Drop for McpServerHandleInner {
    fn drop(&mut self) {
        let Some(factory) = self.factory.upgrade() else { return; };
        if factory.shutdown.load(Ordering::Acquire) { return; }
        // Take the service out. After this, self.service is None.
        let service = match self.service.get_mut().take() {
            Some(s) => s,
            None => return,  // Already taken — e.g., by shutdown drain.
        };
        // Spawn a task to move it into the idle pool (can't await in Drop).
        let key = self.key.clone();
        let factory = factory.clone();
        tokio::spawn(async move {
            factory.accept_returning_handle(key, service).await;
        });
    }
 }
 ```
 This has the right shape but introduces a subtle race: the `tokio::spawn` inside `Drop` runs asynchronously, so if a new `acquire(key)` arrives between the Drop and the spawned task completing, it won't find the handle in `idle` yet and will spawn a fresh subprocess. That's acceptable — it's slightly wasteful but not incorrect, and the race window is microseconds.
 An alternative that avoids the race: use a dedicated `return_tx: mpsc::UnboundedSender<ReturningHandle>` on the factory, push synchronously into it from Drop, and a single "idle manager" task owns the idle map. This is cleaner because the idle map only mutates from one task, but it adds a coordination point. **Recommendation: start with the `tokio::spawn` approach; switch to the mpsc pattern only if the race causes visible issues.**
 ### `McpServerHandle` (the public Arc wrapper)
 ```rust
 pub struct McpServerHandle(Arc<McpServerHandleInner>);
 impl McpServerHandle {
    pub async fn call_tool(&self, tool: &str, args: Value) -> Result<ToolResult> {
        let guard = self.0.service.lock().await;
        let service = guard.as_ref().ok_or(McpError::HandleDrained)?;
        service.call_tool(tool, args).await
    }
    pub async fn list_tools(&self) -> Result<Vec<ToolSpec>> {
        let guard = self.0.service.lock().await;
        let service = guard.as_ref().ok_or(McpError::HandleDrained)?;
        service.list_tools().await
    }
 }
 impl Clone for McpServerHandle {
    fn clone(&self) -> Self { Self(self.0.clone()) }
 }
 ```
 Callers get a `McpServerHandle` (which is `Arc<Inner>` internally) from `acquire()`. Cloning is cheap. Dropping the last clone fires the `Drop` on `Inner`, which returns the underlying service to the idle pool or kills it.
 ---
 ## The `acquire` Path
 Three cases in order:
 ```rust
 impl McpFactory {
    pub async fn acquire(&self, key: &McpServerKey) -> Result<McpServerHandle> {
        // Case 1: Active share
        {
            let active = self.active.lock();
            if let Some(weak) = active.get(key) {
                if let Some(inner) = weak.upgrade() {
                    metrics::mcp_acquire_hit_active();
                    return Ok(McpServerHandle(inner));
                }
                // Weak is dangling; let it fall through.
            }
        }
        // Case 2: Revive from idle
        {
            let mut idle = self.idle.lock();
            if let Some(entry) = idle.remove(key) {
                metrics::mcp_acquire_hit_idle(entry.idle_since.elapsed());
                let inner = self.revive_idle_entry(entry);
                // Re-register in active map.
                self.active.lock().insert(key.clone(), Arc::downgrade(&inner));
                return Ok(McpServerHandle(inner));
            }
        }
        // Case 3: Spawn fresh
        metrics::mcp_acquire_miss();
        let inner = self.spawn_new(key).await?;
        self.active.lock().insert(key.clone(), Arc::downgrade(&inner));
        Ok(McpServerHandle(inner))
    }
    fn revive_idle_entry(&self, entry: IdleEntry) -> Arc<McpServerHandleInner> {
        // Wrap the handle in a fresh Arc. The IdleEntry held an Arc; we're
        // just transferring ownership here.
        entry.handle
    }
    async fn spawn_new(&self, key: &McpServerKey) -> Result<Arc<McpServerHandleInner>> {
        let spec = self.resolve_spec(key)?;
        let service = McpServer::start(&spec).await?;
        let inner = Arc::new(McpServerHandleInner {
            key: key.clone(),
            service: Mutex::new(Some(service)),
            factory: Arc::downgrade(&self.weak_self()),
            spawned_at: Instant::now(),
        });
        Ok(inner)
    }
 }
 ```
 **Concurrency in `acquire`:** the `active.lock()` critical section is short — just a hashmap lookup and maybe an insert. It never holds across an `.await`. The `idle.lock()` critical section is equally short. The `spawn_new` path is the expensive one (subprocess spawn + stdio handshake + `tools/list`), and it runs OUTSIDE any lock. This means two concurrent `acquire(key)` calls that both miss can both spawn fresh, producing two subprocesses for the same key briefly. Once both register themselves in `active`, the second insert clobbers the first, and the first handle's Drop returns it to the idle pool. The net effect is one "wasted" spawn per race, which is acceptable.
 If you want to eliminate the race entirely, add a per-key `OnceCell`-style coordinator:
 ```rust
 pending: Mutex<HashMap<McpServerKey, broadcast::Receiver<Arc<McpServerHandleInner>>>>,
 ```
 A caller that misses both active and idle checks `pending` — if another task is already spawning, it subscribes to the broadcast and waits. The first spawner publishes the result. Clean but adds a layer of complexity. Start simple; add this if races become a problem in practice.
 ---
 ## The Reaper Task
 ```rust
 async fn reaper_loop(factory: Arc<McpFactory>) {
    let mut ticker = interval(factory.config.cleanup_interval);
    loop {
        ticker.tick().await;
        if factory.shutdown.load(Ordering::Acquire) {
            info!("Reaper exiting (shutdown requested)");
            return;
        }
        factory.evict_stale_idle().await;
        if let Some(policy) = &factory.config.health_check {
            factory.run_health_checks(policy).await;
        }
    }
 }
 impl McpFactory {
    async fn evict_stale_idle(&self) {
        let now = Instant::now();
        let timeout = self.config.idle_timeout;
        // Phase 1: collect stale keys while holding the lock briefly.
        let stale: Vec<McpServerKey> = {
            let idle = self.idle.lock();
            idle.iter()
                .filter(|(_, entry)| now.duration_since(entry.idle_since) >= timeout)
                .map(|(k, _)| k.clone())
                .collect()
        };
        // Phase 2: remove them from the idle map and terminate.
        for key in stale {
            let entry = {
                let mut idle = self.idle.lock();
                idle.remove(&key)
            };
            if let Some(entry) = entry {
                self.terminate_idle_handle(entry).await;
                metrics::mcp_idle_evicted();
            }
        }
        // Phase 3: enforce max_idle_servers cap via LRU.
        if let Some(max) = self.config.max_idle_servers {
            self.enforce_max_idle(max).await;
        }
    }
    async fn enforce_max_idle(&self, max: usize) {
        let victims: Vec<(McpServerKey, Instant)> = {
            let idle = self.idle.lock();
            if idle.len() <= max {
                return;
            }
            let mut entries: Vec<_> = idle.iter()
                .map(|(k, v)| (k.clone(), v.idle_since))
                .collect();
            entries.sort_by_key(|(_, t)| *t);  // oldest first
            entries.into_iter().take(idle.len() - max).collect()
        };
        for (key, _) in victims {
            let entry = self.idle.lock().remove(&key);
            if let Some(entry) = entry {
                self.terminate_idle_handle(entry).await;
                metrics::mcp_lru_evicted();
            }
        }
    }
    async fn terminate_idle_handle(&self, entry: IdleEntry) {
        // Take the service out of the Arc<Inner> and cancel it.
        // At this point, there are no other Arc refs — it's just us.
        if let Ok(inner) = Arc::try_unwrap(entry.handle) {
            if let Some(service) = inner.service.into_inner().take() {
                service.cancel().await.ok();
            }
        }
        // If try_unwrap fails, something else grabbed a ref — skip, it'll
        // return to idle on its own Drop.
    }
 }
 ```
 **Ordering:** `cleanup_interval` runs on a tokio `interval` ticker. Default is 30 seconds. Setting it too low wastes CPU; too high means idle servers linger slightly longer than `idle_timeout`. A tolerance of `idle_timeout + cleanup_interval` worst case is the tradeoff.
 **`Arc::try_unwrap`** is the key to safe teardown. By the time the reaper decides to evict an entry, the only Arc to that `Inner` is the one in the `IdleEntry`. Any subsequent `acquire(key)` would have removed it from the idle map first. So `try_unwrap` should always succeed — but if it doesn't (e.g., because of the Drop-race described earlier), we just skip this eviction and catch it next cycle.
 ---
 ## The Health Check Path
 ```rust
 impl McpFactory {
    async fn run_health_checks(&self, policy: &HealthCheckPolicy) {
        let now = Instant::now();
        let candidates: Vec<McpServerKey> = {
            let idle = self.idle.lock();
            idle.iter()
                .filter(|(_, entry)| {
                    entry.last_health_check
                        .map(|t| now.duration_since(t) >= policy.interval)
                        .unwrap_or(true)
                })
                .map(|(k, _)| k.clone())
                .collect()
        };
        for key in candidates {
            let handle = {
                let idle = self.idle.lock();
                idle.get(&key).map(|e| e.handle.clone())
            };
            let Some(handle) = handle else { continue };
            let result = tokio::time::timeout(
                policy.timeout,
                self.ping_handle(&handle),
            ).await;
            match result {
                Ok(Ok(())) => {
                    let mut idle = self.idle.lock();
                    if let Some(entry) = idle.get_mut(&key) {
                        entry.last_health_check = Some(now);
                    }
                    metrics::mcp_health_ok();
                }
                Ok(Err(e)) | Err(_) => {
                    metrics::mcp_health_failed();
                    match policy.on_failure {
                        HealthFailureAction::Evict | HealthFailureAction::EvictAndLog => {
                            let entry = self.idle.lock().remove(&key);
                            if let Some(entry) = entry {
                                self.terminate_idle_handle(entry).await;
                            }
                            if matches!(policy.on_failure, HealthFailureAction::EvictAndLog) {
                                warn!(key = ?key, error = ?e, "evicted unhealthy MCP server");
                            }
                        }
                        HealthFailureAction::LogOnly => {
                            warn!(key = ?key, error = ?e, "MCP server failed health check");
                        }
                    }
                }
            }
        }
    }
    async fn ping_handle(&self, handle: &Arc<McpServerHandleInner>) -> Result<()> {
        let guard = handle.service.lock().await;
        let service = guard.as_ref().ok_or(McpError::HandleDrained)?;
        // `list_tools` is cheap and standard across all MCP servers.
        service.list_tools().await?;
        Ok(())
    }
 }
 ```
 Health checks are optional (`health_check: None` disables them). When enabled, they run on the same interval as the reaper and only check idle entries whose last check was more than `policy.interval` ago. This avoids hammering servers that are currently in active use.
 ---
 ## Graceful Shutdown Integration
 The factory coordinates with the process shutdown signal (Ctrl-C for CLI, SIGTERM for server mode). When shutdown fires:
 1. Set `factory.shutdown = true`. Any subsequent `acquire()` still works but new handles won't be returned to idle on Drop.
 2. Cancel the reaper's `JoinHandle`.
 3. Drain the idle pool: walk it, call `terminate_idle_handle` for each entry.
 4. Wait for active handles to drop naturally as their scopes finish. If there's a shutdown grace period (Phase 4's `shutdown_grace_seconds`), bound the wait with that.
 ```rust
 impl McpFactory {
    pub async fn shutdown(&self, grace: Duration) {
        info!("McpFactory entering shutdown");
        self.shutdown.store(true, Ordering::Release);
        // Stop the reaper.
        if let Some(handle) = self.reaper_handle.lock().take() {
            handle.abort();
            let _ = handle.await;
        }
        // Drain the idle pool immediately.
        let idle_entries: Vec<IdleEntry> = {
            let mut idle = self.idle.lock();
            idle.drain().map(|(_, v)| v).collect()
        };
        for entry in idle_entries {
            self.terminate_idle_handle(entry).await;
        }
        // Wait for active scopes to release their handles.
        let deadline = Instant::now() + grace;
        while Instant::now() < deadline {
            if self.active_count() == 0 {
                break;
            }
            tokio::time::sleep(Duration::from_millis(100)).await;
        }
        // Force-terminate any remaining active handles.
        let remaining = self.active_count();
        if remaining > 0 {
            warn!(count = remaining, "force-terminating MCP servers after grace period");
            self.force_terminate_active().await;
        }
        info!("McpFactory shutdown complete");
    }
    fn active_count(&self) -> usize {
        let active = self.active.lock();
        active.values().filter(|w| w.strong_count() > 0).count()
    }
    async fn force_terminate_active(&self) {
        // Walk the active map, upgrade the weak refs, and call cancel
        // directly on the underlying service. This is a last resort.
        let handles: Vec<Arc<McpServerHandleInner>> = {
            let active = self.active.lock();
            active.values().filter_map(|w| w.upgrade()).collect()
        };
        for handle in handles {
            if let Ok(inner) = Arc::try_unwrap(handle) {
                if let Some(service) = inner.service.into_inner().take() {
                    service.cancel().await.ok();
                }
            }
            // If try_unwrap fails, we can't force-kill without leaking
            // the service. Log and move on.
        }
    }
 }
 ```
 Phase 4's `serve()` function calls `factory.shutdown(grace)` after the axum server has stopped accepting new requests. This chains cleanly: axum drains requests → factory drains scopes → factory drains idle pool → process exits.
 ---
 ## Configuration
 Add to `config.yaml`:
 ```yaml
 mcp_pool:
  idle_timeout_seconds: 300          # how long idle servers stay warm (default: 300 for --serve, 0 for CLI/REPL)
  cleanup_interval_seconds: 30       # how often the reaper runs
  max_idle_servers: 50               # LRU cap (null = unbounded)
  health_check:
    interval_seconds: 60
    timeout_seconds: 5
    on_failure: EvictAndLog          # or Evict, LogOnly
 ```
 Per-server overrides live in `functions/mcp.json`:
 ```json
 {
  "github":     { "command": "...", "idle_timeout_seconds": 900 },
  "filesystem": { "command": "...", "idle_timeout_seconds": 60  },
  "jira":       { "command": "...", "idle_timeout_seconds": 300 }
 }
 ```
 The per-server override wins over the global config. The resolution is: look up the server spec, check if it has `idle_timeout_seconds`, use that if present, else use `mcp_pool.idle_timeout_seconds`, else use the mode default (0 for CLI/REPL, 300 for `--serve`).
 **Mode defaults** are critical because they preserve Phase 1 Step 6.5's behavior. CLI and REPL users get `idle_timeout = 0`, which means the factory behaves exactly like the no-pool version — drop = terminate. The pool is inert for single-user scenarios. Only `--serve` mode turns it on by default. This avoids regressing REPL users who don't want MCP subprocess churn quirks.
 ```rust
 pub fn default_idle_timeout(mode: WorkingMode) -> Duration {
    match mode {
        WorkingMode::Cmd | WorkingMode::Repl => Duration::ZERO,
        WorkingMode::Api => Duration::from_secs(300),
    }
 }
 ```
 ---
 ## Metrics
 Phase 5 is the right time to add basic observability counters. They're cheap and the factory is where the interesting operational questions live.
 ```rust
 mod metrics {
    use std::sync::atomic::{AtomicU64, Ordering};
    pub static MCP_SPAWNED: AtomicU64 = AtomicU64::new(0);
    pub static MCP_ACQUIRE_ACTIVE_HIT: AtomicU64 = AtomicU64::new(0);
    pub static MCP_ACQUIRE_IDLE_HIT: AtomicU64 = AtomicU64::new(0);
    pub static MCP_ACQUIRE_MISS: AtomicU64 = AtomicU64::new(0);
    pub static MCP_IDLE_EVICTED: AtomicU64 = AtomicU64::new(0);
    pub static MCP_LRU_EVICTED: AtomicU64 = AtomicU64::new(0);
    pub static MCP_HEALTH_OK: AtomicU64 = AtomicU64::new(0);
    pub static MCP_HEALTH_FAILED: AtomicU64 = AtomicU64::new(0);
    pub fn mcp_acquire_hit_active() { MCP_ACQUIRE_ACTIVE_HIT.fetch_add(1, Ordering::Relaxed); }
    pub fn mcp_acquire_hit_idle(age: Duration) {
        MCP_ACQUIRE_IDLE_HIT.fetch_add(1, Ordering::Relaxed);
        // In a real metrics system, record a histogram of age for revival latency.
    }
    pub fn mcp_acquire_miss() { MCP_ACQUIRE_MISS.fetch_add(1, Ordering::Relaxed); }
    pub fn mcp_spawned() { MCP_SPAWNED.fetch_add(1, Ordering::Relaxed); }
    pub fn mcp_idle_evicted() { MCP_IDLE_EVICTED.fetch_add(1, Ordering::Relaxed); }
    pub fn mcp_lru_evicted() { MCP_LRU_EVICTED.fetch_add(1, Ordering::Relaxed); }
    pub fn mcp_health_ok() { MCP_HEALTH_OK.fetch_add(1, Ordering::Relaxed); }
    pub fn mcp_health_failed() { MCP_HEALTH_FAILED.fetch_add(1, Ordering::Relaxed); }
    pub fn snapshot() -> MetricsSnapshot {
        MetricsSnapshot {
            spawned: MCP_SPAWNED.load(Ordering::Relaxed),
            acquire_active_hit: MCP_ACQUIRE_ACTIVE_HIT.load(Ordering::Relaxed),
            acquire_idle_hit: MCP_ACQUIRE_IDLE_HIT.load(Ordering::Relaxed),
            acquire_miss: MCP_ACQUIRE_MISS.load(Ordering::Relaxed),
            idle_evicted: MCP_IDLE_EVICTED.load(Ordering::Relaxed),
            lru_evicted: MCP_LRU_EVICTED.load(Ordering::Relaxed),
            health_ok: MCP_HEALTH_OK.load(Ordering::Relaxed),
            health_failed: MCP_HEALTH_FAILED.load(Ordering::Relaxed),
        }
    }
 }
 ```
 Expose the snapshot via `GET /v1/info/mcp` in the API server (piggybacks on Phase 4's `/v1/info`). CLI/REPL users can inspect via a new `.info mcp` dot-command.
 **Derived metrics worth computing:**
 - Hit rate = `(active_hit + idle_hit) / (active_hit + idle_hit + miss)` — should be >0.9 for a well-tuned pool.
 - Revival latency distribution — how old were idle entries when revived? Informs tuning of `idle_timeout`.
 - Eviction rate — how often is the pool churning?
 None of this is Prometheus-compatible yet; that integration is a follow-up. For Phase 5, plain counters are enough to diagnose issues.
 ---
 ## Migration Strategy
 ### Step 1: Expand `McpFactory` to support the idle pool
 Add the `idle` map, `shutdown` flag, and `reaper_handle` fields. Keep the existing `active` map. Don't change any caller code yet.
 Implement `acquire()` with the three-case logic (active → idle → spawn). At this point the idle pool is always empty because nothing puts anything in it, so the logic reduces to Phase 1's behavior. Tests should still pass.
 **Verification:** `cargo check` + existing Phase 1 tests pass.
 ### Step 2: Implement `Drop` on `McpServerHandleInner` with return-to-idle
 Switch `service` to `Mutex<Option<RunningService>>`. Implement `Drop` that spawns a task to call `factory.accept_returning_handle(key, service)`. The factory method inserts into `idle`.
 At this point, dropped handles start populating the idle pool. The reaper isn't running yet, so idle entries accumulate without bound.
 **Verification:** Manual test: acquire a handle, drop it, assert the idle map now has the entry. Then acquire the same key again and assert it comes from idle (not a fresh spawn).
 ### Step 3: Implement the reaper task
 Add `reaper_loop` and `evict_stale_idle`. Start the reaper in `McpFactory::new()` via `tokio::spawn`, store the `JoinHandle`. Default `idle_timeout` based on working mode.
 **Verification:** Unit test with a tiny timeout (e.g., 100ms) — acquire, drop, wait 200ms, assert the idle map is empty. Use a mock MCP server (or a no-op `RunningService` for tests).
 ### Step 4: Add configuration plumbing
 Parse `mcp_pool` from `config.yaml` into `McpFactoryConfig`. Parse per-server `idle_timeout_seconds` overrides from `functions/mcp.json`. Wire everything through `AppState::init()`.
 **Verification:** Config tests that verify defaults, overrides, and mode-specific behavior.
 ### Step 5: Implement health checks
 Add `run_health_checks`, `ping_handle`, and the `HealthCheckPolicy` config. Wire into the reaper loop. Default is `None` (disabled).
 **Verification:** Unit test with a mock MCP server that returns an error on `list_tools` after N calls — verify the factory evicts it and logs.
 ### Step 6: Implement graceful shutdown
 Add `McpFactory::shutdown(grace)`. Wire into Phase 4's `serve()` shutdown sequence and into the CLI/REPL exit path (for clean subprocess termination).
 **Verification:** Start the API server, send several requests to warm up the pool, send SIGTERM, verify all MCP subprocesses terminate within the grace period (use `ps` or process tree inspection).
 ### Step 7: Expose metrics
 Add the atomic counters, the snapshot function, and the `.info mcp` dot-command. Add `GET /v1/info/mcp` handler in the API server.
 **Verification:** `.info mcp` shows sensible numbers after a few REPL turns. `/v1/info/mcp` returns JSON. Hit rate climbs over time as the pool warms.
 ### Step 8: Load testing
 Write a test harness that spins up `--serve` mode and fires 100 concurrent completion requests, each using a mix of 2–3 MCP servers, across a pool of 10 different server configurations. Assert:
 - No test failures
 - No orphaned subprocesses (check `ps` before and after)
 - MCP spawn count stays low (hit rate >80%)
 - p99 latency for the warm path is <200ms (allowing for LLM latency)
 This is the practical validation that Phase 5 delivered on its performance promise.
 **Verification:** Load test passes. Metrics snapshot shows expected hit rate.
 ### Step 9: Document tuning knobs
 Update `docs/function-calling/MCP-SERVERS.md` with the new config options and tuning guidance:
 - How to choose `idle_timeout` for different workloads
 - When to enable health checks
 - How to read the metrics
 - What the `max_idle_servers` cap protects against
 Add an "MCP Pool Lifecycle" section to `docs/REST-API-ARCHITECTURE.md` describing the production topology.
 ---
 ## Risks and Watch Items
 | Risk | Severity | Mitigation |
 |---|---|---|
 | **Drop-race between `acquire` and `return_to_idle`** | Medium | The `tokio::spawn` inside Drop runs asynchronously. If an `acquire(key)` fires between Drop and the spawned task completing, it misses the idle pool and spawns fresh. Acceptable for correctness; monitor hit rate metrics, switch to the mpsc coordinator pattern if races show up in production. |
 | **`Arc::try_unwrap` failing in `terminate_idle_handle`** | Medium | If something holds an extra Arc to an idle entry (shouldn't happen under normal flow), `try_unwrap` returns `Err` and we skip eviction. The entry stays in the idle map forever. Mitigation: log every such failure with a WARN. Write a test that verifies the shape never produces such extra refs. |
 | **`tokio::time::interval` drift** | Low | `interval` drifts if the system is under load — a tick can be delayed. This means `cleanup_interval` is a lower bound, not a guarantee. For a 30-second interval this is irrelevant; document it. |
 | **Reaper task panic** | Medium | If the reaper task panics (unreachable under normal flow, but possible under library bugs), the pool stops cleaning up. Mitigation: wrap the reaper body in `tokio::task::JoinHandle` inspection, restart on failure. Add a metric for reaper restarts. |
 | **MCP server state on revival** | High | Reviving a server from idle assumes it's still in the same state it was when it went idle. Most MCP servers are stateless (they reload config on each tool call), but some might maintain in-memory state that's stale after 5 minutes of idle. Mitigation: health checks during idle provide an early warning; document that pool idle is only safe for stateless servers. |
 | **Credential rotation** | High | If the user rotates their GitHub token (or any MCP-server-side credential), the idle pool entries hold the old credential baked into the subprocess env. A rotation requires restarting affected MCP servers. Mitigation: expose a `.reload mcp` REPL command and `POST /v1/mcp/reload` API that clears the idle pool, forcing fresh spawns with the new credentials on next acquire. |
 | **Per-server timeout resolution** | Low | The `idle_timeout` lookup (per-server override → pool default → mode default) happens at `return_to_idle` time. Changing config at runtime won't affect already-idle entries. Document this; config reload flushes idle pool. |
 | **`max_idle_servers` thrashing** | Medium | If the cap is set too low relative to the working set, every new `acquire` evicts an old idle entry, destroying the hit rate. Default to 50, document the signal: rising eviction rate + falling hit rate = raise the cap. |
 | **Subprocess leak on factory drop** | High | If `AppState` (which owns `McpFactory`) drops without calling `shutdown()`, the idle pool Arc holds die, their Drops run, but the factory's Weak self-ref is already dead so nothing puts them back in idle — they just terminate via `RunningService::drop`. Verify this actually fires cleanly (not via the tokio::spawn hack). Add a test. |
 ---
 ## What Phase 5 Does NOT Do
 - **No LLM response caching.** The factory pools MCP subprocesses, not LLM responses.
 - **No distributed pooling.** A single factory instance owns its pool. Running multiple Loki server instances means each has its own pool; MCP processes are not shared across hosts.
 - **No background server restart on crash.** If an MCP subprocess dies while idle, the reaper's health check evicts it; the next `acquire` spawns fresh. There's no "always keep N warm" preflight.
 - **No OAuth token refresh for MCP.** If a server uses OAuth and its token expires during an idle period, the next `acquire` gets an expired handle. The server must handle its own refresh, or the user must rotate and `.reload mcp`.
 - **No Prometheus integration.** Plain atomic counters; Prometheus support is a follow-up.
 - **No adaptive tuning.** `idle_timeout` is a fixed config value, not auto-adjusted based on usage patterns.
 - **No cross-process coordination.** Two Loki processes running `--serve` on the same host each have independent pools. They can't share MCP subprocesses across processes.
 - **No changes to the factory's public API.** `acquire()` still takes `&McpServerKey`, still returns `McpServerHandle`. Callers don't notice Phase 5 happened.
 The sole goal of Phase 5 is: **make the warm path free by keeping recently-used MCP subprocesses alive, with automatic eviction of stale ones, a background reaper, health checks, and graceful shutdown integration.**
 ---
 ## Entry Criteria (from Phase 4)
 - [ ] API server runs in production-like conditions
 - [ ] Concurrent request handling verified by integration tests
 - [ ] `McpFactory::acquire()` is the only MCP acquisition path
 - [ ] Phase 4's integration test suite passes
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 ## Exit Criteria (Phase 5 complete)
 - [ ] `McpFactory` has the idle map and reaper task
 - [ ] `McpServerHandleInner::Drop` returns handles to the idle pool instead of terminating
 - [ ] Reaper evicts idle entries past `idle_timeout`
 - [ ] `max_idle_servers` LRU cap enforced
 - [ ] Optional health checks working and configurable
 - [ ] Per-server `idle_timeout_seconds` overrides parsed and respected
 - [ ] Mode-specific defaults (CLI/REPL = 0, API = 300) preserve pre-Phase-5 behavior
 - [ ] Graceful shutdown drains the pool within the grace period
 - [ ] Metrics counters exposed via `.info mcp` and `GET /v1/info/mcp`
 - [ ] Load test shows hit rate >0.8 and no orphaned subprocesses
 - [ ] `docs/function-calling/MCP-SERVERS.md` documents the pool config
 - [ ] `docs/REST-API-ARCHITECTURE.md` "MCP Pool Lifecycle" section updated
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 - [ ] Phase 6 (production hardening) can proceed
@@ -0,0 +1,744 @@
 # Phase 6 Implementation Plan: Production Hardening
 ## Overview
 Phase 6 closes out the refactor by picking up every "deferred to production hardening" item from Phases 1–5 and delivering a Loki build that's safe to run as a multi-tenant service. The preceding phases made Loki *functionally* a server — Phase 6 makes it *operationally* a server. That means real rate limiting instead of a stub, per-subject session ownership instead of flat visibility, Prometheus metrics instead of in-memory counters, structured JSON logging, deployment manifests, security headers, config validation, and operational runbooks.
 This is the final phase. After it lands, Loki v1 is production-ready: you can run `loki --serve` in a container behind a reverse proxy, scrape its metrics from Prometheus, route requests through a rate limiter, and have multiple tenants share the same instance without seeing each other's data.
 **Estimated effort:** ~1 week
 **Risk:** Low. Most of the work is applying well-known patterns (sliding-window rate limiting, row-level authz, Prometheus, structured logging) on top of the architecture the previous phases already built. No new core types, no new pipelines.
 **Depends on:** Phases 1–5 complete. The API server runs, MCP pool works, sessions are UUID-keyed.
 ---
 ## Why Phase 6 Exists
 Phases 4 and 5 got the API server running with correct semantics, but several explicit gaps were called out as "stubs" or "follow-ups." A Phase 4 deployment is usable for a trusted single-tenant context (an internal tool, a personal server) but unsafe for anything else:
 - **Anyone with a valid API key can see every session.** Phase 4 flagged this as "single-tenant-per-key." In a multi-tenant deployment where Alice and Bob both have keys, Alice can list Bob's sessions and read their messages. This is a security issue, not a feature gap.
 - **No real rate limiting.** Phase 4's `max_concurrent_requests` semaphore caps parallelism but doesn't throttle per-subject request rates. A single runaway client can exhaust the whole concurrency budget.
 - **No metrics for external observability.** Phase 5 added in-memory counters, but they're only reachable via the `.info mcp` dot-command or a one-shot JSON endpoint. Production needs Prometheus scraping so alerting and dashboards work.
 - **Logs aren't structured.** The `tracing` spans from Phase 4 middleware emit human-readable text. Aggregators like Loki (the other one), Datadog, or CloudWatch want JSON with correlation IDs.
 - **No deployment story.** There's no Dockerfile, no systemd unit, no documented way to actually run the thing in production. Every deploying team has to reinvent this.
 - **Security headers missing.** Phase 4's CORS handles cross-origin; it doesn't set `X-Content-Type-Options`, `X-Frame-Options`, or similar defaults that a browser-facing endpoint should have.
 - **No config validation at startup.** Mistyped config values produce runtime errors hours after deployment instead of failing fast at startup.
 - **Operational procedures are undocumented.** How do you rotate auth keys? How do you reload MCP credentials? What's the runbook when the MCP hit rate drops? None of this is written down.
 Phase 6 delivers answers to all of the above. It's the "you can actually deploy this" phase.
 ---
 ## What Phase 6 Delivers
 Grouped by theme rather than by dependency order. Each item is independently valuable and can be worked in parallel.
 ### Security and isolation
 1. **Per-subject session ownership** — every session records the authenticated subject that created it; reads/writes are authz-checked against the caller's subject.
 2. **Scope-based authorization** — `AuthContext.scopes` are enforced per endpoint (e.g., `read:sessions`, `write:sessions`, `admin:mcp`). Phase 4's middleware already populates scopes; Phase 6 adds the enforcement.
 3. **JWT support** — extends `AuthConfig` with a `Jwt { issuer, audience, jwks_url }` variant that validates tokens against a JWKS endpoint and extracts subject + scopes from claims.
 4. **Security headers middleware** — `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin`, optional HSTS when behind HTTPS.
 5. **Audit logging** — structured audit events for every authenticated request (subject, action, target, result), written to a dedicated sink so they survive log rotation.
 ### Throughput and fairness
 6. **Per-subject rate limiting** — sliding-window limiter keyed by subject. Enforces `rate_limit_per_minute` and related config. Returns `429 Too Many Requests` with a `Retry-After` header.
 7. **Per-subject concurrency limit** — subject-scoped semaphore so one noisy neighbor can't exhaust the global concurrency budget.
 8. **Backpressure signal** — expose a `/healthz/ready` endpoint that returns 503 when the server is saturated, so upstream load balancers can drain traffic.
 ### Observability
 9. **Structured JSON logging** — every log line is JSON with `timestamp`, `level`, `target`, `request_id`, `subject`, `session_id`, and `fields`. Routes through `tracing_subscriber` with `fmt::layer().json()`.
 10. **Prometheus metrics endpoint** — `/metrics` exposing the existing Phase 5 counters plus new HTTP metrics (`http_requests_total`, `http_request_duration_seconds`, `http_requests_in_flight`), MCP metrics (`mcp_pool_size`, `mcp_acquire_latency_seconds` histogram), and session metrics (`sessions_active_total`, `sessions_created_total`).
 11. **Liveness and readiness probes** — `/healthz/live` for process liveness (always 200 unless shutting down), `/healthz/ready` for dependency readiness (config loaded, MCP pool initialized, storage writable).
 ### Operability
 12. **Config validation at startup** — a dedicated `ApiConfig::validate()` that checks every field against a schema and fails fast with a readable error message listing *all* problems, not just the first one.
 13. **SIGHUP config reload** — reloads auth keys, log level, and rate limit settings without restarting the server. Does NOT reload MCP pool config (requires restart because the pool holds live subprocesses).
 14. **Dockerfile + multi-stage build** — minimal runtime image based on `debian:bookworm-slim` with the compiled binary, config directory, and non-root user.
 15. **systemd service unit** — with `Type=notify`, sandboxing directives, and resource limits.
 16. **docker-compose example** — for local development with nginx-as-TLS-terminator in front.
 17. **Kubernetes manifests** — Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler.
 ### Documentation
 18. **Operational runbook** (`docs/RUNBOOK.md`) — documented procedures for common scenarios.
 19. **Deployment guide** (`docs/DEPLOYMENT.md`) — end-to-end instructions for each deployment target.
 20. **Security guide** (`docs/SECURITY.md`) — threat model, hardening checklist, key rotation procedures.
 ---
 ## Core Type Additions
 Most of Phase 6 hangs off existing types. A few new concepts need introducing.
 ### `AuthContext` enrichment
 Phase 4 defined `AuthContext { subject: String, scopes: Vec<String> }`. Phase 6 extends it:
 ```rust
 pub struct AuthContext {
    pub subject: String,
    pub scopes: Scopes,
    pub key_id: Option<String>,        // for audit log correlation
    pub claims: Option<JwtClaims>,     // present when auth mode is Jwt
 }
 pub struct Scopes(HashSet<String>);
 impl Scopes {
    pub fn has(&self, scope: &str) -> bool;
    pub fn has_any(&self, required: &[&str]) -> bool;
    pub fn has_all(&self, required: &[&str]) -> bool;
 }
 pub enum Scope {
    ReadSessions,      // "read:sessions"
    WriteSessions,     // "write:sessions"
    ReadAgents,        // "read:agents"
    RunAgents,         // "run:agents"
    ReadModels,        // "read:models"
    AdminMcp,          // "admin:mcp"
    AdminSessions,     // "admin:sessions" — can see all users' sessions
 }
 ```
 The `Scope` enum provides typed constants for the well-known scope strings used in the handlers. Custom scopes (for callers to define their own access tiers) continue to work as raw strings.
 ### `SessionOwnership` in the session store
 The session metadata needs to record who owns each session so reads/writes can be authorized:
 ```rust
 pub struct SessionMeta {
    pub id: SessionId,
    pub alias: Option<SessionAlias>,
    pub owner: Option<String>,         // subject that created it; None = legacy
    pub last_modified: SystemTime,
    pub is_autoname: bool,
 }
 ```
 On disk, the ownership field goes into the session's YAML file under a reserved `_meta` block:
 ```yaml
 _meta:
  owner: "alice"
  created_at: "2026-04-10T15:32:11Z"
  created_by_key_id: "key_3f2a..."
 # ... rest of session fields unchanged
 ```
 The `SessionStore` trait gets two new methods and an enriched `open` signature:
 ```rust
 #[async_trait]
 pub trait SessionStore: Send + Sync {
    // existing methods unchanged except:
    async fn open(
        &self,
        agent: Option<&str>,
        id: SessionId,
        caller: Option<&AuthContext>,  // NEW: for authz check
    ) -> Result<SessionHandle, StoreError>;
    async fn list(
        &self,
        agent: Option<&str>,
        caller: Option<&AuthContext>,  // NEW: for filtering
    ) -> Result<Vec<SessionMeta>, StoreError>;
    // NEW: transfer ownership (e.g., admin reassignment)
    async fn set_owner(
        &self,
        id: SessionId,
        new_owner: Option<String>,
    ) -> Result<(), StoreError>;
 }
 ```
 `caller: None` means internal or legacy access (CLI/REPL) — skip authz entirely. `caller: Some(...)` means an API call — enforce ownership.
 **Authz rules:**
 - Own session: full access.
 - Other subject's session: denied unless caller has `admin:sessions` scope.
 - Legacy sessions with `owner: None`: accessible to anyone (grandfathered); every mutation attempts to set the owner to the current caller so they get claimed forward.
 - `list`: returns only sessions owned by the caller (or all if they have `admin:sessions`).
 ### `RateLimiter` and `ConcurrencyLimiter`
 ```rust
 pub struct RateLimiter {
    windows: DashMap<String, SlidingWindow>,
    config: RateLimitConfig,
 }
 struct SlidingWindow {
    bucket_a: AtomicU64,
    bucket_b: AtomicU64,
    last_reset: AtomicU64,
 }
 pub struct RateLimitConfig {
    pub per_minute: u32,
    pub burst: u32,
 }
 impl RateLimiter {
    pub fn check(&self, subject: &str) -> Result<(), RateLimitError>;
 }
 pub struct RateLimitError {
    pub retry_after: Duration,
    pub limit: u32,
    pub remaining: u32,
 }
 pub struct SubjectConcurrencyLimiter {
    semaphores: DashMap<String, Arc<Semaphore>>,
    per_subject: usize,
 }
 impl SubjectConcurrencyLimiter {
    pub async fn acquire(&self, subject: &str) -> OwnedSemaphorePermit;
 }
 ```
 Both live in `ApiState` and are applied via middleware. Rate limiting runs first (cheap atomic operations), then concurrency acquisition (may block briefly).
 ### `MetricsRegistry`
 ```rust
 pub struct MetricsRegistry {
    pub http_requests_total: IntCounterVec,
    pub http_request_duration: HistogramVec,
    pub http_requests_in_flight: IntGaugeVec,
    pub sessions_active: IntGauge,
    pub sessions_created_total: IntCounter,
    pub mcp_pool_size: IntGaugeVec,
    pub mcp_acquire_latency: HistogramVec,
    pub mcp_spawns_total: IntCounter,
    pub mcp_idle_evictions_total: IntCounter,
    pub auth_failures_total: IntCounterVec,
    pub rate_limit_rejections_total: IntCounterVec,
 }
 ```
 Built on top of the `prometheus` crate. Exposed via `GET /metrics` with the Prometheus text exposition format. The registry bridges Phase 5's atomic counters into the Prometheus types without requiring Phase 5's code to change — Phase 5 keeps its simple counters, and Phase 6 reads them on each scrape to populate the Prometheus gauges.
 ### `AuditLogger`
 ```rust
 pub struct AuditLogger {
    sink: AuditSink,
 }
 pub enum AuditSink {
    Stderr,                                 // default
    File { path: PathBuf, rotation: Rotation },
    Syslog { facility: String },
 }
 pub struct AuditEvent<'a> {
    pub timestamp: OffsetDateTime,
    pub request_id: Uuid,
    pub subject: Option<&'a str>,
    pub action: AuditAction,
    pub target: Option<&'a str>,
    pub result: AuditResult,
    pub details: Option<serde_json::Value>,
 }
 pub enum AuditAction {
    SessionCreate,
    SessionRead,
    SessionUpdate,
    SessionDelete,
    AgentActivate,
    ToolExecute,
    McpReload,
    ConfigReload,
    AuthFailure,
    RateLimitRejection,
 }
 pub enum AuditResult {
    Success,
    Denied { reason: String },
    Error { message: String },
 }
 impl AuditLogger {
    pub fn log(&self, event: AuditEvent<'_>);
 }
 ```
 Audit events are emitted from handler middleware after request completion. The audit stream is deliberately separate from the regular tracing logs because audit logs have stricter retention/integrity requirements in regulated environments — you want to be able to pipe them to a WORM storage or SIEM without mixing in debug logs.
 ---
 ## Migration Strategy
 ### Step 1: Per-subject session ownership
 The highest-impact security fix. No new deps, no new config — just enriching existing types.
 1. Add `owner: Option<String>` and `created_by_key_id: Option<String>` to the session YAML `_meta` block. Serde skip if absent (backward compat for legacy files).
 2. Update `SessionStore::create` to record the caller's subject.
 3. Update `SessionStore::open` to take `caller: Option<&AuthContext>` and enforce ownership.
 4. Update `SessionStore::list` to filter by caller subject (unless caller has `admin:sessions` scope).
 5. Add `SessionStore::set_owner` for admin reassignment.
 6. Implement the "claim on first mutation" behavior for legacy sessions.
 7. Update all API handlers to pass the `AuthContext` through to store calls.
 8. Add integration tests: Alice creates a session, Bob tries to read it (403), admin Claire can read it (200), Alice's `list` returns only her own, Claire's `list` with `admin:sessions` scope returns everything.
 **Verification:** all new authz tests pass. CLI/REPL tests still pass because they pass `caller: None`.
 ### Step 2: Scope-based authorization for endpoints
 Phase 4's middleware attaches `AuthContext` with a `scopes: Vec<String>` field but handlers don't check it. Phase 6 adds the enforcement.
 1. Change `AuthContext.scopes` from `Vec<String>` to a `Scopes(HashSet<String>)` newtype with `has`/`has_any`/`has_all` methods.
 2. Define the `Scope` enum with well-known constants.
 3. Add a `require_scope` helper and a `#[require_scope("read:sessions")]` proc macro (or a handler-side check if proc macros add too much complexity).
 4. Annotate every handler with the required scope(s):
   - `GET /v1/sessions` → `read:sessions`
   - `POST /v1/sessions` → `write:sessions`
   - `GET /v1/sessions/:id` → `read:sessions`
   - `DELETE /v1/sessions/:id` → `write:sessions`
   - `POST /v1/sessions/:id/completions` → `write:sessions` + `run:agents` (if the session has an agent)
   - `POST /v1/rags/:name/rebuild` → `admin:mcp`
   - `GET /v1/agents`, `/v1/roles`, `/v1/rags`, `/v1/models` → `read:agents`, `read:roles`, etc.
   - `/metrics` → `admin:metrics` (or unauthenticated if the endpoint is bound to a private network)
 5. Document the scope model in `docs/SECURITY.md`.
 **Verification:** per-endpoint authz tests. A key with only `read:sessions` can list and read but not write.
 ### Step 3: JWT support in `AuthConfig`
 Extend the auth mode enum:
 ```rust
 pub enum AuthConfig {
    Disabled,
    StaticKeys { keys: Vec<AuthKeyEntry> },
    Jwt(JwtConfig),
 }
 pub struct JwtConfig {
    pub issuer: String,
    pub audience: String,
    pub jwks_url: String,
    pub jwks_refresh_interval: Duration,
    pub subject_claim: String,        // e.g., "sub"
    pub scopes_claim: String,         // e.g., "scope" or "permissions"
    pub leeway_seconds: u64,
 }
 ```
 1. Add `jsonwebtoken` and `reqwest` (already present) to dependencies.
 2. Implement a `JwksCache` that fetches `jwks_url` on startup and refreshes every `jwks_refresh_interval`. Uses `reqwest` with a short timeout. Refreshes in the background via `tokio::spawn`.
 3. The auth middleware branches on `AuthConfig`: `StaticKeys` continues to work, `Jwt` calls `jsonwebtoken::decode` with the cached JWKS.
 4. Extract subject from the configured claim name. Extract scopes from either a space-separated string (`scope` claim) or an array claim (`permissions`).
 5. Handle key rotation gracefully: if decoding fails with "unknown key ID," trigger an immediate JWKS refresh (debounced to once per minute) and retry once.
 6. Integration tests with a fake JWKS endpoint (use `mockito` or `wiremock`).
 **Verification:** valid JWT authenticates; expired JWT rejected; invalid signature rejected; JWKS refresh handles key rotation.
 ### Step 4: Real rate limiting
 Replace the Phase 4 stub with a working sliding-window implementation.
 1. Add `dashmap` dependency for the per-subject map (lock-free reads/writes).
 2. Implement `SlidingWindow` with two adjacent one-minute buckets; the effective rate is the weighted sum of the current bucket plus the tail of the previous bucket based on how far into the current window we are.
 3. Add `RateLimiter::check(subject) -> Result<(), RateLimitError>`.
 4. Write middleware that calls `check` before dispatching to handlers. On `Err`, return 429 with `Retry-After` header.
 5. Add `rate_limit_per_minute` and `rate_limit_burst` config fields. Reasonable defaults: 60/min, burst 10.
 6. Expose per-subject current rate as a gauge in the Prometheus registry.
 7. Integration test: fire N+1 requests as the same subject within a minute, assert the N+1th gets 429.
 **Verification:** rate limiting works correctly across subjects; non-limited subjects aren't affected; burst allowance works.
 ### Step 5: Per-subject concurrency limiter
 Complements rate limiting — rate limits the *count* of requests over time, concurrency limits the *simultaneous* count.
 1. Implement `SubjectConcurrencyLimiter` with a `DashMap<String, Arc<Semaphore>>`.
 2. Lazy-init semaphores per subject with `per_subject_concurrency` slots (default 8).
 3. Middleware acquires a permit per request. If the subject's semaphore is full, queue briefly (`try_acquire_owned` with a short timeout), then 503 if still full.
 4. Garbage-collect unused semaphores periodically (entries with no waiters and full availability count haven't been used).
 5. Integration test: fire 10 concurrent requests as one subject with `per_subject_concurrency: 5`, assert at least 5 serialize.
 **Verification:** no subject can exceed its concurrency budget; other subjects unaffected.
 ### Step 6: Prometheus metrics endpoint
 1. Add `prometheus` crate.
 2. Implement `MetricsRegistry` with the metrics listed in the types section.
 3. Wire metric updates into existing code:
   - HTTP middleware: `http_requests_total.inc()` on response, `http_request_duration.observe(elapsed)`, `http_requests_in_flight.inc()/dec()`
   - Session creation: `sessions_created_total.inc()`, `sessions_active.set(store.count())`
   - MCP factory: read the Phase 5 atomic counters on scrape and populate the Prometheus types
 4. Add `GET /metrics` handler that writes the Prometheus text exposition format.
 5. Auth policy for `/metrics`: configurable — either requires `admin:metrics` scope, or is opened to a private network via `metrics_listen_addr: "127.0.0.1:9090"` on a separate port (recommended).
 6. Integration test: scrape `/metrics`, parse the response, assert expected metrics are present with sensible values.
 **Verification:** Prometheus scraping works; metrics increment correctly.
 ### Step 7: Structured JSON logging
 Replace the default `tracing_subscriber` format with JSON output.
 1. Add a `log_format: Text | Json` config field, default `Text` for CLI/REPL, `Json` for `--serve` mode.
 2. Configure `tracing_subscriber::fmt::layer().json()` conditionally.
 3. Ensure every span has a `request_id` field (already present from Phase 4 middleware).
 4. Add `subject` and `session_id` as span fields when present, so they get included in every child log line automatically.
 5. Add a `log_level` config field that SIGHUP reloads at runtime (see Step 12).
 6. Integration test: capture stdout during a request, parse as JSON, assert the fields are present and correctly scoped.
 **Verification:** `loki --serve` produces one-line-per-event JSON output suitable for log aggregators.
 ### Step 8: Audit logging
 Dedicated sink for security-relevant events.
 1. Implement `AuditLogger` with `Stderr`, `File`, and `Syslog` sinks. Start with just `Stderr` and `File` — `Syslog` via `syslog` crate can follow.
 2. Emit audit events from:
   - Auth middleware: `AuditAction::AuthFailure` on any auth rejection
   - Rate limiter: `AuditAction::RateLimitRejection` on 429
   - Session handlers: `AuditAction::SessionCreate/Read/Update/Delete`
   - Agent handlers: `AuditAction::AgentActivate`
   - MCP reload endpoint: `AuditAction::McpReload`
 3. Audit events are JSON lines with a schema documented in `docs/SECURITY.md`.
 4. Audit events don't interfere with the main tracing stream — they go to the configured audit sink independently.
 5. File rotation via `tracing-appender` or manual rotation with size + date cap.
 **Verification:** every security-relevant action produces an audit event; failures include a `reason`.
 ### Step 9: Security headers and misc middleware
 1. Add a `security_headers` middleware layer that attaches:
   - `X-Content-Type-Options: nosniff`
   - `X-Frame-Options: DENY`
   - `Referrer-Policy: strict-origin-when-cross-origin`
   - `Strict-Transport-Security: max-age=31536000; includeSubDomains` (only when `api.force_https: true`)
   - Do NOT set CSP — this is an API, not a browser app; CSP would confuse clients.
 2. Remove `Server: ...` and other fingerprinting headers.
 3. Handle `OPTIONS` preflight correctly (Phase 4's CORS layer does this; verify).
 **Verification:** `curl -I` inspects headers; automated test asserts each required header is present.
 ### Step 10: Config validation at startup
 A single `ApiConfig::validate()` method that checks every field and aggregates ALL errors before failing.
 1. Implement validation for:
   - `listen_addr` is parseable and bindable
   - `auth.mode` has a valid configuration (e.g., `StaticKeys` with non-empty key list, `Jwt` with reachable JWKS URL)
   - `auth.keys[].key_hash` starts with `$argon2id$` (catches plaintext keys)
   - `rate_limit_per_minute > 0` and `burst > 0`
   - `max_body_bytes > 0` and `< 100 MiB` (sanity)
   - `request_timeout_seconds > 0` and `< 3600`
   - `shutdown_grace_seconds >= 0`
   - `cors.allowed_origins` entries are valid URLs or `"*"`
 2. Return a `ConfigValidationError` that lists every problem, not just the first.
 3. Call `validate()` in `serve()` before binding the listener.
 4. Test: a deliberately-broken config produces an error listing all problems.
 **Verification:** startup validation catches common mistakes; error message is actionable.
 ### Step 11: Health check endpoints
 1. `GET /healthz/live` — always returns 200 OK unless the process is in graceful shutdown. Body: `{"status":"ok"}`. No auth required.
 2. `GET /healthz/ready` — returns 200 OK when fully initialized and not saturated, otherwise 503 Service Unavailable. Readiness criteria:
   - `AppState` fully initialized
   - Session store writable (attempt a probe write to a reserved path)
   - MCP pool initialized (at least the factory is alive)
   - Concurrency semaphore has at least 10% available (not saturated)
 3. Both endpoints are unauthenticated and unmetered — load balancers hit them constantly.
 4. Document in `docs/DEPLOYMENT.md` how Kubernetes, systemd, and other supervisors should use these.
 **Verification:** endpoints return correct status under various load conditions.
 ### Step 12: SIGHUP config reload
 Reload a subset of config without restarting.
 1. Reloadable fields:
   - Auth keys (StaticKeys mode)
   - JWT config (including JWKS URL)
   - Log level
   - Rate limit config
   - Per-subject concurrency limits
   - Audit logger sink
 2. NOT reloadable (requires full restart):
   - Listen address
   - MCP pool config (pool holds live subprocesses)
   - Session storage paths
   - TLS certs (use a reverse proxy)
 3. Implementation: SIGHUP handler that re-reads `config.yaml`, validates it, and atomically swaps the affected fields in `ApiState`. Uses `arc-swap` crate for lock-free swaps.
 4. Audit every reload: `AuditAction::ConfigReload` with before/after diff summary.
 5. Document: rotation procedures for auth keys, logging level adjustments, etc.
 **Verification:** start server, modify `config.yaml`, send SIGHUP, assert new config is in effect without dropped requests.
 ### Step 13: Deployment manifests
 #### 13a. Dockerfile
 Multi-stage build for a minimal runtime image:
 ```dockerfile
 # Build stage
 FROM rust:1.82-slim AS builder
 WORKDIR /build
 COPY Cargo.toml Cargo.lock ./
 COPY src ./src
 COPY assets ./assets
 RUN cargo build --release --bin loki
 # Runtime stage
 FROM debian:bookworm-slim
 RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    tini \
    && rm -rf /var/lib/apt/lists/*
 RUN useradd --system --home /loki --shell /bin/false loki
 COPY --from=builder /build/target/release/loki /usr/local/bin/loki
 COPY --from=builder /build/assets /opt/loki/assets
 USER loki
 WORKDIR /loki
 ENV LOKI_CONFIG_DIR=/loki/config
 EXPOSE 3400
 ENTRYPOINT ["/usr/bin/tini", "--"]
 CMD ["/usr/local/bin/loki", "--serve"]
 ```
 Build args for targeting specific architectures. Result is a ~100 MB image.
 #### 13b. systemd unit
 ```ini
 [Unit]
 Description=Loki AI Server
 After=network-online.target
 Wants=network-online.target
 [Service]
 Type=notify
 ExecStart=/usr/local/bin/loki --serve
 Restart=on-failure
 RestartSec=5
 User=loki
 Group=loki
 # Sandboxing
 NoNewPrivileges=true
 PrivateTmp=true
 PrivateDevices=true
 ProtectSystem=strict
 ProtectHome=true
 ReadWritePaths=/var/lib/loki
 ProtectKernelTunables=true
 ProtectKernelModules=true
 ProtectControlGroups=true
 RestrictSUIDSGID=true
 RestrictRealtime=true
 LockPersonality=true
 # Resource limits
 LimitNOFILE=65536
 LimitNPROC=512
 MemoryMax=4G
 # Reload
 ExecReload=/bin/kill -HUP $MAINPID
 [Install]
 WantedBy=multi-user.target
 ```
 `Type=notify` requires Loki to call `sd_notify(READY=1)` after successful startup — add this with the `sd-notify` crate.
 #### 13c. docker-compose example
 For local development with TLS via nginx:
 ```yaml
 version: "3.9"
 services:
  loki:
    build: .
    environment:
      LOKI_CONFIG_DIR: /loki/config
    volumes:
      - ./config:/loki/config:ro
      - loki_data:/loki/data
    ports:
      - "127.0.0.1:3400:3400"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3400/healthz/live"]
      interval: 30s
      timeout: 5s
      retries: 3
  nginx:
    image: nginx:alpine
    volumes:
      - ./deploy/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./deploy/certs:/etc/nginx/certs:ro
    ports:
      - "443:443"
    depends_on:
      - loki
 volumes:
  loki_data:
 ```
 Include a sample `nginx.conf` that terminates TLS and forwards to `loki:3400`.
 #### 13d. Kubernetes manifests
 Provide `deploy/k8s/` with:
 - `namespace.yaml`
 - `deployment.yaml` (3 replicas, resource requests/limits, liveness/readiness probes)
 - `service.yaml` (ClusterIP)
 - `configmap.yaml` (non-secret config)
 - `secret.yaml` (API keys, JWT config)
 - `hpa.yaml` (HorizontalPodAutoscaler based on CPU + custom metric for requests/sec)
 - `ingress.yaml` (optional example using nginx-ingress)
 Document storage strategy: sessions use a PVC mounted at `/loki/data`; RAG embeddings use a read-only ConfigMap or a separate PVC.
 **Verification:** each deployment target produces a running Loki that passes health checks.
 ### Step 14: Operational runbook
 Write `docs/RUNBOOK.md` with sections for:
 - **Starting and stopping** the server
 - **Rotating auth keys** (StaticKeys mode) — edit config, SIGHUP, verify in audit log
 - **Rotating auth keys** (Jwt mode) — update JWKS at issuer, Loki auto-refreshes
 - **Rotating MCP credentials** — update env vars, `POST /v1/mcp/reload` (new endpoint in this phase) or restart
 - **Diagnosing high latency** — check MCP hit rate, check LLM provider latency, check concurrency saturation
 - **Diagnosing auth failures** — audit log `AuthFailure` events, check key hash, check JWKS reachability
 - **Diagnosing rate limit rejections** — check per-subject counter, adjust limit or identify runaway client
 - **Diagnosing orphaned MCP subprocesses** — `ps aux | grep loki`, check logs for `McpFactory shutdown complete`
 - **Diagnosing session corruption** — check `.yaml.tmp` files (should not exist when server is idle), inspect session YAML for validity
 - **Backup and restore** — tar the `sessions/` and `agents/` directories
 - **Scaling horizontally** — each replica has its own MCP pool and session store; share sessions via shared filesystem (NFS/EFS) or deferred to a database-backed SessionStore (not in this phase)
 - **Incident response** — what logs to collect, what metrics to snapshot, how to reach a minimal reproducing state
 **Verification:** walk through each procedure on a test deployment; fix any unclear steps.
 ### Step 15: Deployment and security guides
 `docs/DEPLOYMENT.md` — step-by-step for Docker, systemd, docker-compose, Kubernetes. Pre-flight checklist, first-time setup, upgrade procedure.
 `docs/SECURITY.md` — threat model, hardening checklist, scope model, audit event schema, key rotation, reverse proxy configuration, network security recommendations, CVE reporting contact.
 Cross-reference from `README.md` and add a "Production Deployment" section to the README that points to both docs.
 **Verification:** a developer unfamiliar with Loki can deploy it successfully using only the docs.
 ---
 ## Risks and Watch Items
 | Risk | Severity | Mitigation |
 |---|---|---|
 | **Session ownership migration breaks legacy users** | Medium | Legacy sessions with `owner: None` stay readable by anyone; they get claimed forward on first mutation. Document this in `RUNBOOK.md`. Add a one-shot migration CLI command (`loki migrate sessions --claim-to <subject>`) that assigns ownership of all unowned sessions to a specific subject. |
 | **JWT JWKS fetch failures block startup** | Medium | JWKS URL must be reachable at startup; if it's not, log an error and fall back to "reject all" mode until the fetch succeeds. A retry loop with exponential backoff runs in the background. Do NOT crash on JWKS failure. |
 | **Rate limiter DashMap growth** | Low | Per-subject windows accumulate forever without cleanup. Add a background reaper that removes entries with zero recent activity every few minutes. Cap total entries at 100k as a safety valve. |
 | **Prometheus metric cardinality explosion** | Low | `http_requests_total` with per-path labels could explode if routes have dynamic segments (`/v1/sessions/:id`). Use route templates as labels, not concrete paths. Validate label sets at registration. |
 | **Audit log retention compliance** | Low | Audit logs might need to be retained for regulatory reasons. Phase 6 provides the emission; retention is the operator's responsibility. Document this in `SECURITY.md`. |
 | **SIGHUP reload partial failure** | Medium | If the new config is invalid, don't swap it in — keep the old config running. Log the validation error. The operator can fix the file and SIGHUP again. Never leave the server in an inconsistent state. |
 | **Docker image size** | Low | `debian:bookworm-slim` is ~80 MB; final image ~100 MB. If smaller is needed, use `distroless/cc-debian12` for a ~35 MB image at the cost of not having `tini` or debugging tools. Document both options. |
 | **systemd Type=notify missing implementation** | Medium | Adding `sd_notify` requires the `sd-notify` crate AND calling it after listener bind. Missing this call makes systemd think the service failed. Add an integration test that fakes systemd and asserts the notification is sent. |
 | **Kubernetes pod disruption** | Low | HPA scales down during low traffic, but in-flight requests on the terminating pod must complete gracefully. Set `terminationGracePeriodSeconds` to at least `shutdown_grace_seconds + 10`. Document in `DEPLOYMENT.md`. |
 | **Running under a reverse proxy** | Low | CORS, `Host` header handling, `X-Forwarded-For` for rate limiter subject identification. Document the expected proxy config (trust `X-Forwarded-*` headers only from trusted proxies). |
 ---
 ## What Phase 6 Does NOT Do
 - **No multi-region replication.** Loki is a single-instance service; scale out by running multiple instances behind a load balancer, each with its own pool. Cross-instance state sharing is not in scope.
 - **No database-backed session store.** `FileSessionStore` is still the only implementation. A `PostgresSessionStore` is a clean extension point (`SessionStore` trait is already there) but belongs to a follow-up.
 - **No cluster coordination.** Each Loki instance is independent. Running Loki in a "cluster" mode where instances share work is a separate project.
 - **No advanced ML observability.** LLM call costs, token usage trends, provider error rates — these are tracked as counters but not aggregated into dashboards. Follow-up work.
 - **No built-in TLS termination.** Use a reverse proxy (nginx, Caddy, Traefik, a cloud load balancer). Supporting TLS in-process adds complexity and key management concerns that reverse proxies solve better.
 - **No SAML or LDAP.** Only StaticKeys and JWT. SAML/LDAP integration can extend `AuthConfig` later.
 - **No plugin system.** Extensions to auth, storage, or middleware require forking and rebuilding. A dynamic plugin loader is explicitly out of scope.
 - **No multi-tenancy beyond session ownership.** Tenants share the same process, same MCP pool, same RAG cache, same resources. Strict tenant isolation (separate processes per tenant) requires orchestration outside Loki.
 - **No cost accounting per tenant.** LLM API calls are tracked per-subject in audit logs but not aggregated into billing-grade cost reports.
 ---
 ## Entry Criteria (from Phase 5)
 - [ ] `McpFactory` pooling works and has metrics
 - [ ] Graceful shutdown drains the MCP pool
 - [ ] Phase 5 load test passes (hit rate >0.8, no orphaned subprocesses)
 - [ ] Phase 4 API integration test suite passes
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 ## Exit Criteria (Phase 6 complete — v1 ready)
 - [ ] Per-subject session ownership enforced; integration tests prove Alice can't read Bob's sessions
 - [ ] Scope-based authorization enforced on every endpoint
 - [ ] JWT authentication works with a real JWKS endpoint
 - [ ] Real rate limiting replaces the Phase 4 stub; 429 responses include `Retry-After`
 - [ ] Per-subject concurrency limiter prevents noisy-neighbor saturation
 - [ ] Prometheus `/metrics` endpoint scrapes cleanly
 - [ ] Structured JSON logs emitted in `--serve` mode
 - [ ] Audit events written for all security-relevant actions
 - [ ] Security headers set on all responses
 - [ ] Config validation fails fast at startup with readable errors
 - [ ] `/healthz/live` and `/healthz/ready` endpoints work
 - [ ] SIGHUP reloads auth keys, log level, and rate limits without restart
 - [ ] Dockerfile produces a minimal runtime image
 - [ ] systemd unit with `Type=notify` works correctly
 - [ ] docker-compose example runs end-to-end with TLS via nginx
 - [ ] Kubernetes manifests deploy successfully
 - [ ] `docs/RUNBOOK.md` covers all common operational scenarios
 - [ ] `docs/DEPLOYMENT.md` guides a first-time deployer to success
 - [ ] `docs/SECURITY.md` documents threat model, scopes, and hardening
 - [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
 - [ ] End-to-end production smoke test: deploy to Kubernetes, send real traffic, scrape metrics, rotate a key, induce a failure, observe recovery
 ---
 ## v1 Release Summary
 After Phase 6 lands, Loki v1 has transformed from a single-user CLI tool into a production-ready multi-tenant AI service. Here's what the v1 release notes should say:
 **New in Loki v1:**
 - **REST API** — full HTTP surface for completions, sessions, agents, roles, RAGs, and metadata. Streaming via Server-Sent Events, synchronous via JSON.
 - **Multi-tenant sessions** — UUID-primary identity with optional human-readable aliases. Per-subject ownership with scope-based access control.
 - **Concurrent safety** — per-session mutex serialization, per-MCP-server Arc sharing, per-agent runtime isolation. Run dozens of concurrent requests without corruption.
 - **MCP pooling** — recently-used MCP subprocesses stay warm across requests. Near-zero warm-path latency. Configurable idle timeout and LRU cap.
 - **Authentication** — static API keys or JWT with JWKS. Argon2-hashed credentials. Scope-based authorization per endpoint.
 - **Observability** — Prometheus metrics, structured JSON logging with correlation IDs, dedicated audit log stream.
 - **Rate limiting** — sliding-window per subject with configurable limits and burst allowance.
 - **Graceful shutdown** — in-flight requests complete within a grace period; MCP subprocesses terminate cleanly; session state is persisted.
 - **Deployment manifests** — Dockerfile, systemd unit, docker-compose example, Kubernetes manifests.
 - **Full documentation** — runbook, deployment guide, security guide, API reference.
 **Backward compatibility:**
 CLI and REPL continue to work identically to pre-v1 builds. Existing `config.yaml`, `roles/`, `sessions/`, `agents/`, `rags/`, and `functions/` directories are read-compatible. The legacy session layout is migrated lazily on first access without destroying the old files.
 **What's next (v2+):**
 - Database-backed session store for cross-instance sharing
 - Native TLS termination option
 - SAML / LDAP authentication extensions
 - Per-tenant cost accounting and quotas
 - Dynamic plugin system for custom auth, storage, and middleware
 - Multi-region replication
 - WebSocket transport alongside SSE
@@ -23,6 +23,7 @@ You can enter the REPL by simply typing `loki` without any follow-up flags or ar
    - [`.edit` - Modify configuration files](#edit---modify-configuration-files)
    - [`.delete` - Delete configurations from Loki](#delete---delete-configurations-from-loki)
    - [`.info` - Display information about the current mode](#info---display-information-about-the-current-mode)
    - [`.authenticate` - Authenticate the current model client via OAuth](#authenticate---authenticate-the-current-model-client-via-oauth)
    - [`.exit` - Exit an agent/role/session/rag or the Loki REPL itself](#exit---exit-an-agentrolesessionrag-or-the-loki-repl-itself)
    - [`.help` - Show the help guide](#help---show-the-help-guide)
 <!--toc:end-->
@@ -119,13 +120,14 @@ For more information on sessions and how to use them in Loki, refer to the [sess
 Loki lets you build OpenAI GPT-style agents. The following commands let you interact with and manage your agents in 
 Loki:
-| Command              | Description                                                |
+| Command              | Description                                                                                   |
-|----------------------|------------------------------------------------------------|
+|----------------------|-----------------------------------------------------------------------------------------------|
-| `.agent`             | Use an agent                                               |
+| `.agent`             | Use an agent                                                                                  |
-| `.starter`           | Display and use conversation starters for the active agent |
+| `.starter`           | Display and use conversation starters for the active agent                                    |
-| `.edit agent-config` | Open the agent configuration in your preferred text editor |
+| `.clear todo`        | Clear the todo list and stop auto-continuation (requires `auto_continue: true` on the agent)  |
-| `.info agent`        | Display information about the active agent                 |
+| `.edit agent-config` | Open the agent configuration in your preferred text editor                                    |
-| `.exit agent`        | Leave the active agent                                     |
+| `.info agent`        | Display information about the active agent                                                    |
 | `.exit agent`        | Leave the active agent                                                                        |
 ![agent](./images/agents/sql.gif)
@@ -237,6 +239,11 @@ The following entities are supported:
 | `.info agent`   | Display information about the active agent                  |
 | `.info rag`     | Display information about the active RAG                    |
 ### `.authenticate` - Authenticate the current model client via OAuth
 The `.authenticate` command will start the OAuth flow for the current model client if
 * The client supports OAuth (See the [clients documentation](./clients/CLIENTS.md#providers-that-support-oauth) for supported clients)
 * The client is configured in your Loki configuration to use OAuth via the `auth: oauth` property
 ### `.exit` - Exit an agent/role/session/rag or the Loki REPL itself
 The `.exit` command is used to move between modes in the Loki REPL.
@@ -0,0 +1,572 @@
 # Architecture Plan: Loki REST API Service Mode
 ## The Core Problem
 Today, Loki's `Config` struct is a god object — it holds both server-wide configuration (LLM providers, vault, tool definitions) and per-interaction mutable state (current role, session, agent, supervisor, inbox, tool tracker) in one `Arc<RwLock<Config>>`. CLI and REPL both mutate this singleton directly. Adding a third interface (REST API) that handles concurrent users makes this untenable.
 ## Design Pattern: Engine + Context + Emitter
 The refactor splits Loki into three layers:
 ```
 ┌─────────┐  ┌─────────┐  ┌─────────┐
 │   CLI   │  │  REPL   │  │   API   │   ← Thin adapters (frontends)
 └────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
     ▼            ▼            ▼
   ┌──────────────────────────────┐
   │     RunRequest + Emitter     │   ← Uniform request shape
   └──────────────┬───────────────┘
                  ▼
   ┌──────────────────────────────┐
   │          Engine::run()       │   ← Single core entrypoint
   │  (input → messages → LLM    │
   │   → tool loop → events)     │
   └──────────────┬───────────────┘
                  │
     ┌────────────┼────────────┐
     ▼            ▼            ▼
  AppState   RequestContext  SessionStore
  (global,   (per-request,  (file-backed,
   immutable) mutable)       per-session lock)
 ```
 ---
 ## 1. Split Config → AppState (global) + RequestContext (per-request)
 ### AppState — created once at startup, wrapped in `Arc`, never mutated during requests:
 ```rust
 #[derive(Clone)]
 pub struct AppState {
    pub config: Arc<AppConfig>,           // deserialized config.yaml (frozen)
    pub providers: ProviderRegistry,      // LLM client configs + OAuth tokens
    pub vault: Arc<VaultService>,         // encrypted credential storage (internal locking)
    pub tools: Arc<ToolRegistry>,         // tool definitions, function dirs, visible_tools
    pub mcp_global: Arc<McpGlobalConfig>, // global MCP settings (not live instances)
    pub sessions: Arc<dyn SessionStore>,  // file-backed session persistence
    pub rag_defaults: RagDefaults,        // embedding model, chunk size, etc.
 }
 ```
 ### RequestContext — created per CLI invocation, per REPL turn, or per API request:
 ```rust
 pub struct RequestContext {
    pub app: Arc<AppState>,               // borrows global state
    pub request_id: Uuid,
    pub mode: FrontendMode,               // Cli | Repl | Api
    pub cancel: CancellationToken,        // unified cancellation
    // per-request mutable state (was on Config)
    pub session: SessionHandle,
    pub convo: ConversationState,         // messages, last_message, tool_call_tracker
    pub agent: Option<AgentRuntime>,      // supervisor, MCP instances, inbox, escalation
    pub overrides: Overrides,             // model, role, rag, dry_run, etc.
    pub auth: Option<AuthContext>,        // API-only; None for CLI/REPL
 }
 pub struct Overrides {
    pub role: Option<String>,
    pub model: Option<String>,
    pub rag: Option<RagConfig>,
    pub agent: Option<AgentSpec>,
    pub dry_run: bool,
    pub macro_mode: bool,
 }
 ```
 ### What changes for existing code
 Every function that currently takes `&GlobalConfig` (i.e., `Arc<RwLock<Config>>`) and calls `.read()` / `.write()` gets refactored to take `&AppState` for reads and `&mut RequestContext` for mutations. The `config.write().set_model(...)` pattern becomes `ctx.overrides.model = Some(...)`.
 ### REPL special case
 The REPL keeps a long-lived `RequestContext` that persists across turns (just like today's Config singleton does). State-changing dot-commands (`.model`, `.role`, `.session`) mutate the REPL's own context. This preserves current behavior exactly.
 ---
 ## 2. Unified Dispatch: The Engine
 Instead of `start_directive()` in `main.rs` and `ask()` in `repl/mod.rs` being separate code paths, both call one core function:
 ```rust
 pub struct Engine {
    pub app: Arc<AppState>,
    pub agent_factory: Arc<dyn AgentFactory>,
 }
 impl Engine {
    pub async fn run(
        &self,
        ctx: &mut RequestContext,
        req: RunRequest,
        emitter: &dyn Emitter,
    ) -> Result<RunOutcome, CoreError> {
        // 1. Apply any CoreCommand (set role, model, session, etc.)
        // 2. Build Input from req.input + ctx (role messages, session history, RAG)
        // 3. Create LLM client from provider registry
        // 4. call_chat_completions[_streaming](), emitting events via emitter
        // 5. Tool result loop (recursive)
        // 6. Persist session updates
        // 7. Return outcome (session_id, message_id)
    }
 }
 pub struct RunRequest {
    pub input: UserInput,                  // text, files, media
    pub command: Option<CoreCommand>,      // normalized dot-command
    pub stream: bool,
 }
 pub enum CoreCommand {
    SetRole(String),
    SetModel(String),
    StartSession { name: Option<String> },
    StartAgent { name: String, variables: HashMap<String, String> },
    Continue,
    Regenerate,
    CompressSession,
    Info,
    // ... one variant per REPL dot-command
 }
 ```
 ### How frontends use it
 | Frontend | Context lifetime | How it calls Engine |
 |---|---|---|
 | CLI | Single invocation, then exit | Creates `RequestContext`, calls `engine.run()` once, exits |
 | REPL | Long-lived across turns | Keeps `RequestContext`, calls `engine.run()` per line, dot-commands become `CoreCommand` variants |
 | API | Per HTTP request, but session persists | Loads `RequestContext` from `SessionStore` per request, calls `engine.run()`, persists back |
 ---
 ## 3. Output Abstraction: The Emitter Trait
 The core never writes to stdout or formats JSON. It emits structured semantic events:
 ```rust
 pub enum Event<'a> {
    Started { request_id: Uuid, session_id: Uuid },
    AssistantDelta(&'a str),              // streaming token
    AssistantMessageEnd { full_text: &'a str },
    ToolCall { name: &'a str, args: &'a str },
    ToolResult { name: &'a str, result: &'a str },
    Info(&'a str),
    Error(CoreError),
 }
 #[async_trait]
 pub trait Emitter: Send + Sync {
    async fn emit(&self, event: Event<'_>) -> Result<(), EmitError>;
 }
 ```
 ### Three implementations
 - **`TerminalEmitter`** — wraps the existing `SseHandler` → `markdown_stream` / `raw_stream` logic. Renders to terminal with crossterm. Used by both CLI and REPL.
 - **`JsonEmitter`** — collects all events, returns a JSON response body at the end. Used by non-streaming API requests.
 - **`SseEmitter`** — converts each `Event` to an SSE frame, pushes into a `tokio::sync::mpsc` channel that axum streams to the client. Used by streaming API requests.
 ---
 ## 4. Session Isolation for API
 ### Session IDs
 UUID-based for API consumers. CLI/REPL keep human-readable names as aliases.
 ```rust
 #[async_trait]
 pub trait SessionStore: Send + Sync {
    async fn create(&self, alias: Option<&str>) -> Result<SessionHandle>;
    async fn open(&self, id: SessionId) -> Result<SessionHandle>;
    async fn open_by_name(&self, name: &str) -> Result<SessionHandle>;  // CLI/REPL compat
 }
 ```
 ### File layout
 ```
 ~/.config/loki/sessions/
  by-id/<uuid>/state.yaml       # canonical storage
  by-name/<name> -> <uuid>      # symlink or mapping file for CLI/REPL
 ```
 ### Concurrency
 Each `SessionHandle` holds a `tokio::sync::Mutex` so two concurrent API requests to the same session serialize properly. For v1 this is sufficient — no need for a database.
 ---
 ## 5. Tool Scope Isolation (formerly "Agent Isolation")
 **Correction:** An earlier version of this document singled out agents as the owner of "live tool and MCP runtime." That was wrong. Loki allows MCP servers and tools to be configured at **every** `RoleLike` level — global, role, session, and agent — with resolution priority `Agent > Session > Role > Global`. Agents aren't uniquely coupled to MCP lifecycle; they're just the most visibly coupled scope in today's code.
 The correct abstraction is **`ToolScope`**: every active `RoleLike` owns one. A `ToolScope` is a self-contained unit holding the resolved function declarations, live MCP runtime handles, and the tool-call tracker for whichever scope is currently on top of the stack.
 ### Today's behavior (to match in v1)
 `McpRegistry::reinit()` is already **diff-based**: given a new enabled-server list, it stops only the servers that are no longer needed, leaves still-needed ones alive, and starts only the missing ones. This is correct single-tenant behavior but the registry is a process-wide singleton, so two concurrent consumers with different MCP sets trample each other.
 ### Target design
 ```rust
 pub struct ToolScope {
    pub functions: Functions,              // resolved declarations for this scope
    pub mcp_runtime: McpRuntime,           // live handles to MCP processes
    pub tool_tracker: ToolCallTracker,     // per-scope call tracking
 }
 pub struct McpRuntime {
    servers: HashMap<String, Arc<McpServerHandle>>,  // live, ref-counted
 }
 pub struct McpFactory {
    shared_servers: Mutex<HashMap<McpServerKey, Weak<McpServerHandle>>>,
 }
 impl McpFactory {
    /// Produce a runtime with handles for the requested enabled servers.
    /// Shared across ToolScopes via Arc when configs match; isolated when they differ.
    pub async fn build_runtime(&self, enabled: &[String]) -> Result<McpRuntime>;
 }
 ```
 **`McpFactory` lives on `AppState`.** It does NOT hold any live servers itself — it holds weak refs so that when the last `ToolScope` using a given server drops its `Arc`, the process is torn down.
 **`ToolScope` lives on `RequestContext`.** It replaces the current `functions`, `tool_call_tracker`, and (implicit) global `mcp_registry` fields. Every active scope — whether that's "just the REPL with its global MCP set" or "an agent with its own MCP set" — owns exactly one `ToolScope`.
 ### Scope transitions
 When a `RoleLike` activates or exits:
 1. Resolve the effective enabled-tool and enabled-MCP-server lists using priority `Agent > Session > Role > Global`.
 2. Ask `McpFactory::build_runtime(enabled)` for an `McpRuntime`. The factory reuses existing `Arc<McpServerHandle>`s where keys match; spawns new processes where they don't.
 3. Construct a new `ToolScope` with the runtime + resolved `Functions`.
 4. Assign it to `ctx.tool_scope`. The old `ToolScope` drops; any `Arc<McpServerHandle>`s with no other references shut down their processes.
 This preserves today's diff-based behavior for single-tenant (REPL) and makes it correct for multi-tenant (API).
 ### Sharing vs isolation (the key property)
 `McpServerKey` encodes server name + command + args + env vars. Two `ToolScope`s requesting the **same key** share the same `Arc<McpServerHandle>`. Two requesting **different keys** (e.g., different per-user API keys baked into the env) get separate processes. This gives us:
 - **Isolation by default** — different configs = different processes, no cross-tenant leakage
 - **Sharing by coincidence** — identical configs = one process, ref-counted
 - **Clean cleanup** — processes die automatically when the last scope releases them
 ### Agent-specific state
 Agents still own some state that's genuinely agent-only (not in `ToolScope`): the supervisor, inbox, escalation queue, optional todo list, sub-agent handles, and the parent/child tree. That state lives in an `AgentRuntime`:
 ```rust
 pub struct AgentRuntime {
    pub spec: AgentSpec,
    pub rag: Option<Arc<Rag>>,                   // shared across sibling sub-agents
    pub supervisor: Supervisor,
    pub inbox: Arc<Inbox>,
    pub escalation_queue: Arc<EscalationQueue>,  // root-shared for user interaction
    pub todo_list: Option<TodoList>,             // present only when auto_continue: true
    pub self_agent_id: String,
    pub parent_supervisor: Option<Arc<Supervisor>>,
    pub current_depth: usize,
    pub auto_continue_count: usize,
 }
 ```
 Three things to notice in this shape:
 1. **`todo_list: Option<TodoList>`** — today's code eagerly allocates a `TodoList::default()` for every agent, but the todo tools and auto-continuation prompts are only exposed when `auto_continue: true`. Switching to `Option` lets us skip the allocation entirely for agents that don't opt in, and makes the "is this agent using todos?" question a type-level check rather than a config lookup. The semantics users see are unchanged.
 2. **`rag: Option<Arc<Rag>>`** — agent RAG is an `Arc`, not an owned `Rag`. Today, every sub-agent of the same type independently calls `Rag::load()` and deserializes its own copy of the embeddings from disk. That means a parent spawning 4 parallel siblings of the same agent type pays the deserialize cost 5 times and holds 5 copies of identical vectors in memory. Sharing via `Arc` fixes both.
 3. **No `mcp_runtime`** — MCP lives on `ToolScope`, not here. Agents get their tools through `ctx.tool_scope` like everyone else.
 An `AgentRuntime` goes into `ctx.agent_runtime` **in addition to** the `ToolScope` — they're orthogonal concerns. An agent has both a `ToolScope` (its resolved tools + MCP) and an `AgentRuntime` (its supervision/messaging/RAG/todo state).
 ### RAG Cache (unified for standalone + agent RAG)
 RAG in Loki comes from exactly two places today:
 1. **Standalone RAG**, attached via the `.rag <name>` REPL command or the equivalent API call. Persists across role/session switches. Lives in `ctx.rag: Option<Arc<Rag>>`.
 2. **Agent RAG**, loaded from the `documents:` field of an agent's `config.yaml` when the agent is activated. Lives in `ctx.agent_runtime.rag: Option<Arc<Rag>>` for the agent's lifetime.
 Roles and Sessions do **not** own RAG — the `Role` and `Session` structs have no RAG fields. This is true today and the refactor preserves it.
 Since both standalone and agent RAGs are ultimately `Arc<Rag>` instances loaded from disk YAML files, a single cache can serve both. `AppState` holds one:
 ```rust
 pub struct AppState {
    pub config: Arc<AppConfig>,
    pub vault: GlobalVault,
    pub mcp_factory: Arc<McpFactory>,
    pub rag_cache: Arc<RagCache>,
 }
 pub struct RagCache {
    entries: RwLock<HashMap<RagKey, Weak<Rag>>>,
 }
 #[derive(Hash, Eq, PartialEq, Clone, Debug)]
 pub enum RagKey {
    Named(String),   // standalone RAG: rags/<name>.yaml
    Agent(String),   // agent-owned RAG: agents/<name>/rag.yaml
 }
 impl RagCache {
    /// Returns a shared Arc<Rag> for the given key. If another scope
    /// holds a live reference, returns that exact Arc. Otherwise loads
    /// from disk, stores a Weak for future sharing, returns a fresh Arc.
    /// Concurrent first-load is serialized via per-key locks.
    pub async fn load(&self, key: &RagKey) -> Result<Option<Arc<Rag>>>;
    /// Invalidates the cache entry. Called by rebuild_rag / edit_rag_docs
    /// so the next load reads from disk. Does NOT affect existing Arc
    /// holders — they keep their old Rag until they drop it.
    pub fn invalidate(&self, key: &RagKey);
 }
 ```
 Why the enum: agent RAGs and standalone RAGs live at different paths on disk and could theoretically have overlapping names (an agent called "docs" and a standalone rag called "docs"). Keeping them in distinct namespaces avoids collisions and keeps the cache lookups unambiguous.
 Why `Weak`: we don't want the cache to pin RAGs in memory forever. If no scope holds an `Arc<Rag>` for key X, the `Weak` becomes dangling, and the next `load()` reads fresh. "Share while in use, drop when nobody needs it" without a manual reaper.
 **Concurrency wrinkle:** if two consumers request the same key at exactly the same time and neither finds a live entry, both will race to load from disk. Fix with per-key `tokio::sync::Mutex` or `once_cell::sync::OnceCell<Arc<Rag>>` — the second caller blocks briefly and receives the shared Arc.
 **Invalidation:** both `rebuild_rag` and `edit_rag_docs` call `invalidate()` with the key corresponding to whichever RAG was being operated on (standalone or agent-owned). Existing `Arc<Rag>` holders keep their old reference until they drop it — which is the correct behavior, since you don't want a running request to suddenly see a partially-rebuilt index mid-execution.
 ### Where RAG attaches in `RequestContext`
 Two distinct slots, two distinct purposes, one shared cache:
 ```rust
 pub struct RequestContext {
    // ... other fields ...
    pub rag: Option<Arc<Rag>>,            // standalone RAG from `.rag <name>` or API equivalent
    pub agent_runtime: Option<AgentRuntime>,  // contains its own `rag: Option<Arc<Rag>>` when agent owns one
 }
 ```
 When resolving "what RAG should this request use", the engine checks `ctx.agent_runtime.rag` first (agent-owned takes precedence during an agent turn), then falls back to `ctx.rag` (the user's standalone selection). If neither is set, no RAG context is injected into the prompt.
 **Behavior preservation:** today's code uses a single `Config.rag` slot that's overwritten by whichever action touched it most recently — `use_rag` and `use_agent` both clobber it. Exiting an agent leaves the overwrite in place; the user has to re-run `.rag <name>` to restore their standalone RAG. The new two-slot design gives us the opportunity to fix that (save `ctx.rag` into the `AgentRuntime` on activation, restore on exit) but **Phase 1 preserves today's clobber-and-forget behavior** to keep the refactor mechanical. The improvement is flagged as a Phase 2+ enhancement.
 ### Sub-agent spawning
 Each child agent gets its **own** `RequestContext` forked from the parent's `Arc<AppState>`. That means each child gets:
 - Its own `ToolScope` built from its agent.yaml's `mcp_servers` + `global_tools`, produced by `McpFactory`
 - Its own `AgentRuntime` with a fresh supervisor, a fresh inbox, depth = parent.depth + 1
 - A `parent_supervisor` reference pointing back at the parent's supervisor for escalation/messaging
 - A shared `root_escalation_queue` cloned by `Arc` from the parent's runtime (one queue, one human at the root)
 - A shared `rag: Option<Arc<Rag>>` via `AppState.rag_cache.load(RagKey::Agent(child_agent_name))` — if the parent already holds a strong ref, the cache returns the same Arc and no disk I/O happens
 Because each child has its own `ToolScope`, **concurrent sub-agents can run with different MCP server sets simultaneously** — something today's singleton registry cannot do. The `McpFactory` pool handles overlap: if child A and child B both need `github` with matching keys, they share one `github` process via `Arc`.
 Because sibling sub-agents of the same type share one `Arc<Rag>` through the unified cache, **RAG embeddings are loaded at most once per (standalone or agent) name per process**, regardless of how many siblings or concurrent API sessions reference the same name. The first holder keeps the embeddings warm for everyone else's lifetime, and they drop together once nobody holds a reference.
 ### MCP Lifecycle Policy (pooling and idle timeout)
 `McpFactory` needs an eviction policy so long-running server processes don't accumulate idle MCP subprocesses indefinitely. The design is a two-layer scheme:
 ```rust
 pub struct McpFactory {
    active: Mutex<HashMap<McpServerKey, Weak<McpServerHandle>>>,
    idle: Mutex<HashMap<McpServerKey, IdleEntry>>,
    config: McpFactoryConfig,
 }
 struct IdleEntry {
    handle: Arc<McpServerHandle>,
    idle_since: Instant,
 }
 pub struct McpFactoryConfig {
    pub idle_timeout: Duration,              // how long idle servers stay warm
    pub cleanup_interval: Duration,          // how often the reaper runs
    pub max_idle_servers: Option<usize>,     // LRU cap (None = unbounded)
 }
 ```
 **Layer 1 — active references via Arc.** Scopes currently using a server hold `Arc<McpServerHandle>`. Standard Rust refcounting. Any live reference keeps the process running, regardless of timers.
 **Layer 2 — idle grace period via LRU eviction.** When the last active scope drops its Arc, a custom `Drop` impl on the handle moves it into the idle pool with a timestamp instead of tearing it down immediately. A background reaper task wakes on `cleanup_interval` and evicts entries whose idle time exceeds `idle_timeout`, calling `cancel().await` on the actual MCP subprocess.
 Acquisition order on every scope transition:
 ```rust
 impl McpFactory {
    pub async fn acquire(&self, key: &McpServerKey) -> Result<Arc<McpServerHandle>> {
        // 1. Someone else is actively using it — share.
        if let Some(arc) = self.try_reuse_active(key) { return Ok(arc); }
        // 2. Sitting in the idle pool — revive it, zero startup cost.
        if let Some(arc) = self.revive_from_idle(key) { return Ok(arc); }
        // 3. Neither — spawn fresh.
        self.spawn_new(key).await
    }
 }
 ```
 **Sensible defaults by deployment mode:**
 | Mode | `idle_timeout` default | Rationale |
 |---|---|---|
 | CLI one-shot | N/A (process exits, everything dies) | No pooling needed |
 | REPL | `0` (immediate drop) | Matches today's reactive reinit behavior |
 | API server | `5 minutes` | Absorbs burst traffic, caps stale resources |
 These are defaults, not mandates. Users should be able to override globally and per-server:
 ```yaml
 # config.yaml
 mcp_pool:
  idle_timeout_seconds: 300
  cleanup_interval_seconds: 30
  max_idle_servers: 50
 ```
 ```json
 // functions/mcp.json
 {
  "github":     { "command": "...", "idle_timeout_seconds": 900 },
  "filesystem": { "command": "...", "idle_timeout_seconds": 60 }
 }
 ```
 **Optional health checks.** While a handle sits in the idle pool, the reaper can optionally ping it via `tools/list`. If a server has crashed or become unresponsive, it's evicted immediately. Without this, a stale idle entry would make the first real request after revival fail. Worth implementing, but not strictly required for v1.
 **Graceful shutdown.** On server shutdown, drain active scopes (let in-flight LLM calls complete or cancel via token), then tear down the idle pool. Give it a bounded drain timeout before force-killing. Especially important for MCP servers holding external transactions or locks.
 **Per-tenant isolation.** `McpServerKey` includes env vars in its hash, so two tenants with different `GITHUB_TOKEN`s get distinct keys and therefore distinct processes. Zero cross-tenant leakage by construction.
 ### Phasing
 Phase 1 ships `McpFactory` without the pool — just `acquire()` that always spawns fresh, `Drop` that always tears down. This is correct but inefficient. Phase 5 adds the idle pool, reaper task, health checks, and configuration knobs. Splitting it this way keeps Phase 1 focused on the state split (its actual goal) and Phase 5 focused on the pooling optimization (where it has a clear performance target: warm-path MCP tool calls should have near-zero overhead).
 ### Lifecycle summary
 | Frontend | ToolScope lifetime | AgentRuntime lifetime | RAG lifetime |
 |---|---|---|---|
 | **CLI one-shot** | One invocation | One invocation (if `--agent`) | One invocation |
 | **REPL** | Long-lived, rebuilt on `.role` / `.session` / `.agent` / `.set enabled_mcp_servers` | Lives from `.agent X` until `.exit agent` | Standalone RAG set via `.rag <name>` persists across role/session switches; agent RAG lives as long as the `AgentRuntime`; both come from the shared `RagCache` |
 | **API session** | Lives while session is "warm"; rebuilt when client changes role/session/agent | Lives while session is "warm" | Same as REPL; `RagCache` shares `Arc<Rag>`s across concurrent sessions using the same RAG name |
 | **Sub-agent (any frontend)** | Lives for the sub-agent task | Lives for the sub-agent task | Shared via `Arc` with parent and siblings through `RagCache` |
 ---
 ## 6. Cross-Cutting Concerns
 | Concern | Pattern | CLI | REPL | API |
 |---|---|---|---|---|
 | **Errors** | Core returns `CoreError` enum; frontends map | `render_error()` to stderr | `render_error()` to terminal | `{ "error": { "code": "...", "message": "..." } }` JSON |
 | **Cancellation** | `CancellationToken` in `RequestContext` | Ctrl-C handler triggers token | Ctrl-C triggers token | Client disconnect / request timeout triggers token |
 | **Auth** | Middleware sets `AuthContext` on `RequestContext` | None (local user) | None (local user) | Bearer token / API key validated by axum middleware |
 | **Tracing** | `tracing::Span` per request with request_id, session_id, mode | Log to file | Log to file | Log to file + structured JSON logs |
 ### Error type
 ```rust
 pub enum CoreError {
    InvalidRequest { msg: String },
    NotFound { msg: String },
    Unauthorized { msg: String },
    Forbidden { msg: String },
    Timeout { msg: String },
    Cancelled,
    Provider { msg: String },
    Tool { msg: String },
    Io { msg: String },
 }
 ```
 ### Cancellation
 Use a `CancellationToken` in `RequestContext`. The core checks it via `tokio::select!` around long awaits (LLM stream, tool execution, MCP IO).
 - CLI/REPL: Ctrl-C handler triggers token.
 - API: axum provides disconnect detection for SSE/streaming; when the client drops, cancel the token.
 - Timeouts: set deadline and translate to token cancellation.
 ### Auth (API-only initially)
 axum middleware authenticates (API key / bearer token), builds `AuthContext`, stores in request extensions, then the handler copies it into `RequestContext`. Core enforces policy only when executing sensitive operations (tools, filesystem, vault).
 ```rust
 pub struct AuthContext {
    pub subject: String,
    pub scopes: Vec<String>,
 }
 ```
 ---
 ## 7. API Endpoint Design
 ```
 POST   /v1/completions                    # one-shot prompt (no session)
 POST   /v1/sessions                       # create session
 POST   /v1/sessions/:id/completions       # prompt within session
 DELETE /v1/sessions/:id                   # close session
 POST   /v1/sessions/:id/agent             # activate agent on session
 DELETE /v1/sessions/:id/agent             # deactivate agent
 POST   /v1/sessions/:id/role              # set role on session
 POST   /v1/sessions/:id/rag              # attach RAG to session
 GET    /v1/models                         # list available models
 GET    /v1/agents                         # list available agents
 GET    /v1/roles                          # list available roles
 ```
 ### Request body for completions
 ```json
 {
  "prompt": "Explain TCP handshake",
  "model": "openai:gpt-4o",
  "stream": true,
  "files": ["path/to/doc.pdf"],
  "role": "explain"
 }
 ```
 ---
 ## 8. Implementation Phases
 | Phase | Scope | Effort | Risk |
 |---|---|---|---|
 | **Phase 1: Extract AppState** | Split Config into AppState (global) + per-request state. Keep CLI/REPL working exactly as before. No API yet. | ~1-2 weeks | Medium — touching every file that uses GlobalConfig |
 | **Phase 2: Introduce Engine + Emitter** | Unify `start_directive()` and `ask()` behind `Engine::run()`. Create `TerminalEmitter`. CLI/REPL now call Engine. | ~1 week | Low — refactoring existing paths |
 | **Phase 3: SessionStore abstraction** | Extract session persistence behind trait. Add UUID-based sessions. CLI/REPL still use name-based aliases. | ~3-5 days | Low |
 | **Phase 4: REST API server** | Add `--serve` flag. axum handlers that create `RequestContext`, call `Engine::run()`, return JSON/SSE. Basic auth middleware. | ~1-2 weeks | Low — clean layer on top of Engine |
 | **Phase 5: Agent isolation** | Move agent runtime into `RequestContext`. `AgentFactory` creates isolated runtimes per session. | ~1 week | Medium — MCP server lifecycle mgmt |
 | **Phase 6: Production hardening** | Rate limiting, proper auth, request validation, health checks, graceful shutdown, deployment configs. | ~1 week | Low |
 **Total estimate: ~5-7 weeks** for a production-ready v1.
 ### Key Risk: Phase 1
 Phase 1 is the hardest and riskiest — it touches nearly every module. The mitigation is to do it incrementally: first add `AppState` alongside existing `Config`, then migrate callers module by module, then remove the old `GlobalConfig` type alias. Tests should pass at every intermediate step.
 ---
 ## Key Design Decisions & Trade-offs
 1. **Eliminates the singleton mutation bottleneck**: concurrency becomes "multiple `RequestContext`s" rather than fighting over `RwLock<Config>`.
 2. **Preserves current behavior**: REPL can keep "state-changing commands" by mutating its own long-lived `RequestContext` + persisted `SessionState`.
 3. **Streaming becomes portable**: terminal rendering, JSON, and SSE are just different `Emitter`s over the same event stream.
 4. **Agent/MCP isolation is explicit**: prevents cross-session conflicts by construction.
 ## Watch Out For
 1. **Persisted vs in-memory drift**: decide which fields live in `SessionState` vs `ConversationState`; persist only what must survive process restarts.
 2. **Per-session concurrency semantics**: either serialize requests per session (simplest) or carefully merge message histories; v1 should serialize.
 3. **MCP process lifecycle**: if you keep MCP servers alive across requests, tie them to a session runtime and clean them up on session close/TTL.
 ## Future Considerations
 1. Swap file store behind `SessionStore` with sqlite without changing core.
 2. Add a stable public API schema for events so clients can render rich tool-call UIs.
 3. Actor model (one tokio task per session receiving commands via mpsc) for simplified session+agent lifetime management.
@@ -117,6 +117,22 @@ Display the current todo list with status of each item.
 **Returns:** The full todo list with goal, progress, and item statuses
 ### `todo__clear`
 Clear the entire todo list and reset the goal. Use when the current task has been canceled or invalidated.
 **Parameters:** None
 **Returns:** Confirmation that the todo list was cleared
 ### REPL Command: `.clear todo`
 You can also clear the todo list manually from the REPL by typing `.clear todo`. This is useful when:
 - You gave a custom response that changes or cancels the current task
 - The agent is stuck in auto-continuation with stale todos
 - You want to start fresh without leaving and re-entering the agent
 **Note:** This command is only available when an agent with `auto_continue: true` is active. If the todo
 system isn't enabled for the current agent, the command will display an error message.
 ## Auto-Continuation
 When `auto_continue` is enabled, Loki automatically sends a continuation prompt if:
@@ -14,6 +14,7 @@ loki --info | grep 'config_file' | awk '{print $2}'
 <!--toc:start-->
 - [Supported Clients](#supported-clients)
 - [Client Configuration](#client-configuration)
 - [Authentication](#authentication)
 - [Extra Settings](#extra-settings)
 <!--toc:end-->
@@ -51,12 +52,13 @@ clients:
 The client metadata uniquely identifies the client in Loki so you can reference it across your configurations. The 
 available settings are listed below:
-| Setting  | Description                                                                                   |
+| Setting  | Description                                                                                                |
-|----------|-----------------------------------------------------------------------------------------------|
+|----------|------------------------------------------------------------------------------------------------------------|
-| `name`   | The name of the client (e.g. `openai`, `gemini`, etc.)                                        |
+| `name`   | The name of the client (e.g. `openai`, `gemini`, etc.)                                                     |
-| `models` | See the [model settings](#model-settings) documentation below                                 |
+| `auth`   | Authentication method: `oauth` for OAuth, or omit to use `api_key` (see [Authentication](#authentication)) |
-| `patch`  | See the [client patch configuration](./PATCHES.md#client-configuration-patches) documentation |
+| `models` | See the [model settings](#model-settings) documentation below                                              |
-| `extra`  | See the [extra settings](#extra-settings) documentation below                                 |
+| `patch`  | See the [client patch configuration](./PATCHES.md#client-configuration-patches) documentation              |
 | `extra`  | See the [extra settings](#extra-settings) documentation below                                              |
 Be sure to also check provider-specific configurations for any extra fields that are added for authentication purposes.
@@ -83,6 +85,97 @@ The `models` array lists the available models from the model client. Each one ha
 | `default_chunk_size`        |          | `embedding` | The default chunk size to use with the given model                                                                                                                                                                                            |
 | `max_batch_size`            |          | `embedding` | The maximum batch size that the given embedding model supports                                                                                                                                                                                |
 ## Authentication
 Loki clients support two authentication methods: **API keys** and **OAuth**. Each client entry in your configuration
 must use one or the other.
 ### API Key Authentication
 Most clients authenticate using an API key. Simply set the `api_key` field directly or inject it from the
 [Loki vault](../VAULT.md):
 ```yaml
 clients:
  - type: claude
    api_key: '{{ANTHROPIC_API_KEY}}'
 ```
 API keys can also be provided via environment variables named `{CLIENT_NAME}_API_KEY` (e.g. `OPENAI_API_KEY`,
 `GEMINI_API_KEY`). See the [environment variables documentation](../ENVIRONMENT-VARIABLES.md#client-related-variables)
 for details.
 ### OAuth Authentication
 For [providers that support OAuth](#providers-that-support-oauth), you can authenticate using your existing subscription instead of an API key. This uses
 the OAuth 2.0 PKCE flow.
 **Step 1: Configure the client**
 Add a client entry with `auth: oauth` and no `api_key`:
 ```yaml
 clients:
  - type: claude
    name: my-claude-oauth
    auth: oauth
 ```
 **Step 2: Authenticate**
 Run the `--authenticate` flag with the client name:
 ```sh
 loki --authenticate my-claude-oauth
 ```
 Or if you have only one OAuth-configured client, you can omit the name:
 ```sh
 loki --authenticate
 ```
 Alternatively, you can use the REPL command `.authenticate`.
 This opens your browser for the OAuth authorization flow. Depending on the provider, Loki will either start a
 temporary localhost server to capture the callback automatically (e.g. Gemini) or ask you to paste the authorization
 code back into the terminal (e.g. Claude). Loki stores the tokens in `~/.cache/loki/oauth` and automatically refreshes
 them when they expire.
 #### Gemini OAuth Note
 Loki uses the following scopes for OAuth with Gemini:
 * https://www.googleapis.com/auth/generative-language.peruserquota 
 * https://www.googleapis.com/auth/userinfo.email
 * https://www.googleapis.com/auth/generative-language.retriever (Sensitive)
 Since the `generative-language.retriever` scope is a sensitive scope, Google needs to verify Loki, which requires full
 branding (logo, official website, privacy policy, terms of service, etc.). The Loki app is open-source and is designed 
 to be used as a simple CLI. As such, there's no terms of service or privacy policy associated with it, and thus Google 
 cannot verify Loki. 
 So, when you kick off OAuth with Gemini, you may see a page similar to the following:
 ![](../images/clients/gemini-oauth-page.png)
 Simply click the `Advanced` link and click `Go to Loki (unsafe)` to continue the OAuth flow.
 ![](../images/clients/gemini-oauth-unverified.png)
 ![](../images/clients/gemini-oauth-unverified-allow.png)
 **Step 3: Use normally**
 Once authenticated, the client works like any other. Loki uses the stored OAuth tokens automatically:
 ```sh
 loki -m my-claude-oauth:claude-sonnet-4-20250514 "Hello!"
 ```
 > **Note:** You can have multiple clients for the same provider. For example: you can have one with an API key and 
 > another with OAuth. Use the `name` field to distinguish them.
 ### Providers That Support OAuth
 * Claude
 * Gemini
 ## Extra Settings
 Loki also lets you customize some extra settings for interacting with APIs:
@@ -10,6 +10,8 @@ into your Loki setup. This document provides a guide on how to create and use cu
  - [Environment Variables](#environment-variables)
  - [Custom Bash-Based Tools](#custom-bash-based-tools)
  - [Custom Python-Based Tools](#custom-python-based-tools)
  - [Custom TypeScript-Based Tools](#custom-typescript-based-tools)
 - [Custom Runtime](#custom-runtime)
 <!--toc:end-->
 ---
@@ -19,9 +21,10 @@ Loki supports custom tools written in the following programming languages:
 * Python
 * Bash
 * TypeScript
 ## Creating a Custom Tool
-All tools are created as scripts in either Python or Bash. They should be placed in the `functions/tools` directory.
+All tools are created as scripts in either Python, Bash, or TypeScript. They should be placed in the `functions/tools` directory.
 The location of the `functions` directory varies between systems, so you can use the following command to locate
 your `functions` directory:
@@ -81,6 +84,7 @@ Loki and demonstrates how to create a Python-based tool:
 import os
 from typing import List, Literal, Optional
 def run(
    string: str,
    string_enum: Literal["foo", "bar"],
@@ -89,26 +93,38 @@ def run(
    number: float,
    array: List[str],
    string_optional: Optional[str] = None,
    integer_with_default: int = 42,
    boolean_with_default: bool = True,
    number_with_default: float = 3.14,
    string_with_default: str = "hello",
    array_optional: Optional[List[str]] = None,
 ):
-    """Demonstrates how to create a tool using Python and how to use comments.
+    """Demonstrates all supported Python parameter types and variations.
    Args:
-        string: Define a required string property
+        string: A required string property
-        string_enum: Define a required string property with enum
+        string_enum: A required string property constrained to specific values
-        boolean: Define a required boolean property
+        boolean: A required boolean property
-        integer: Define a required integer property
+        integer: A required integer property
-        number: Define a required number property
+        number: A required number (float) property
-        array: Define a required string array property
+        array: A required string array property
-        string_optional: Define an optional string property
+        string_optional: An optional string property (Optional[str] with None default)
-        array_optional: Define an optional string array property
+        integer_with_default: An optional integer with a non-None default value
        boolean_with_default: An optional boolean with a default value
        number_with_default: An optional number with a default value
        string_with_default: An optional string with a default value
        array_optional: An optional string array property
    """
    output = f"""string: {string}
 string_enum: {string_enum}
 string_optional: {string_optional}
 boolean: {boolean}
 integer: {integer}
 number: {number}
 array: {array}
 string_optional: {string_optional}
 integer_with_default: {integer_with_default}
 boolean_with_default: {boolean_with_default}
 number_with_default: {number_with_default}
 string_with_default: {string_with_default}
 array_optional: {array_optional}"""
    for key, value in os.environ.items():
@@ -117,3 +133,150 @@ array_optional: {array_optional}"""
    return output
 ```
 ### Custom TypeScript-Based Tools
 Loki supports tools written in TypeScript. TypeScript tools require [Node.js](https://nodejs.org/) and
 [tsx](https://tsx.is/) (`npx tsx` is used as the default runtime).
 Each TypeScript-based tool must follow a specific structure in order for Loki to properly compile and execute it:
 * The tool must be a TypeScript file with a `.ts` file extension.
 * The tool must have an `export function run(...)` that serves as the entry point for the tool.
  * Non-exported functions are ignored by the compiler and can be used as private helpers.
 * The `run` function must accept flat parameters that define the inputs for the tool.
  * Always use type annotations to specify the data type of each parameter.
  * Use `param?: type` or `type | null` to indicate optional parameters.
  * Use `param: type = value` for parameters with default values.
 * The `run` function must return a `string` (or `Promise<string>` for async functions).
  * For TypeScript, the return value is automatically written to the `LLM_OUTPUT` environment variable, so there's
    no need to explicitly write to the environment variable within the function.
 * The function must have a JSDoc comment that describes the tool and its parameters.
  * Each parameter should be documented using `@param name - description` tags.
  * These descriptions are passed to the LLM as the tool description, letting the LLM know what the tool does and
    how to use it.
 * Async functions (`export async function run(...)`) are fully supported and handled transparently.
 **Supported Parameter Types:**
 | TypeScript Type   | JSON Schema                                      | Notes                       |
 |-------------------|--------------------------------------------------|-----------------------------|
 | `string`          | `{"type": "string"}`                             | Required string             |
 | `number`          | `{"type": "number"}`                             | Required number             |
 | `boolean`         | `{"type": "boolean"}`                            | Required boolean            |
 | `string[]`        | `{"type": "array", "items": {"type": "string"}}` | Array (bracket syntax)      |
 | `Array<string>`   | `{"type": "array", "items": {"type": "string"}}` | Array (generic syntax)      |
 | `"foo" \| "bar"`  | `{"type": "string", "enum": ["foo", "bar"]}`     | String enum (literal union) |
 | `param?: string`  | `{"type": "string"}` (not required)              | Optional via question mark  |
 | `string \| null`  | `{"type": "string"}` (not required)              | Optional via null union     |
 | `param = "value"` | `{"type": "string"}` (not required)              | Optional via default value  |
 **Unsupported Patterns (will produce a compile error):**
 * Rest parameters (`...args: string[]`)
 * Destructured object parameters (`{ a, b }: { a: string, b: string }`)
 * Arrow functions (`const run = (x: string) => ...`)
 * Function expressions (`const run = function(x: string) { ... }`)
 Only `export function` declarations are recognized. Non-exported functions are invisible to the compiler.
 Below is the [`demo_ts.ts`](../../assets/functions/tools/demo_ts.ts) tool definition that comes pre-packaged with
 Loki and demonstrates how to create a TypeScript-based tool:
 ```typescript
 /**
 * Demonstrates all supported TypeScript parameter types and variations.
 *
 * @param string - A required string property
 * @param string_enum - A required string property constrained to specific values
 * @param boolean - A required boolean property
 * @param number - A required number property
 * @param array_bracket - A required string array using bracket syntax
 * @param array_generic - A required string array using generic syntax
 * @param string_optional - An optional string using the question mark syntax
 * @param string_nullable - An optional string using the union-with-null syntax
 * @param number_with_default - An optional number with a default value
 * @param boolean_with_default - An optional boolean with a default value
 * @param string_with_default - An optional string with a default value
 * @param array_optional - An optional string array using the question mark syntax
 */
 export function run(
  string: string,
  string_enum: "foo" | "bar",
  boolean: boolean,
  number: number,
  array_bracket: string[],
  array_generic: Array<string>,
  string_optional?: string,
  string_nullable: string | null = null,
  number_with_default: number = 42,
  boolean_with_default: boolean = true,
  string_with_default: string = "hello",
  array_optional?: string[],
 ): string {
  const parts = [
    `string: ${string}`,
    `string_enum: ${string_enum}`,
    `boolean: ${boolean}`,
    `number: ${number}`,
    `array_bracket: ${JSON.stringify(array_bracket)}`,
    `array_generic: ${JSON.stringify(array_generic)}`,
    `string_optional: ${string_optional}`,
    `string_nullable: ${string_nullable}`,
    `number_with_default: ${number_with_default}`,
    `boolean_with_default: ${boolean_with_default}`,
    `string_with_default: ${string_with_default}`,
    `array_optional: ${JSON.stringify(array_optional)}`,
  ];
  for (const [key, value] of Object.entries(process.env)) {
    if (key.startsWith("LLM_")) {
      parts.push(`${key}: ${value}`);
    }
  }
  return parts.join("\n");
 }
 ```
 ## Custom Runtime
 By default, Loki uses the following runtimes to execute tools:
 | Language   | Default Runtime | Requirement                    |
 |------------|-----------------|--------------------------------|
 | Python     | `python`        | Python 3 on `$PATH`            |
 | TypeScript | `npx tsx`       | Node.js + tsx (`npm i -g tsx`) |
 | Bash       | `bash`          | Bash on `$PATH`                |
 You can override the runtime for Python and TypeScript tools using a **shebang line** (`#!`) at the top of your
 script. Loki reads the first line of each tool file; if it starts with `#!`, the specified interpreter is used instead
 of the default.
 **Examples:**
 ```python
 #!/usr/bin/env python3.11
 # This Python tool will be executed with python3.11 instead of the default `python`
 def run(name: str):
    """Greet someone.
    Args:
        name: The name to greet
    """
    return f"Hello, {name}!"
 ```
 ```typescript
 #!/usr/bin/env bun
 // This TypeScript tool will be executed with Bun instead of the default `npx tsx`
 /**
 * Greet someone.
 * @param name - The name to greet
 */
 export function run(name: string): string {
  return `Hello, ${name}!`;
 }
 ```
 This is useful for pinning a specific Python version, using an alternative TypeScript runtime like
 [Bun](https://bun.sh/) or [Deno](https://deno.com/), or working with virtual environments.
@@ -55,6 +55,7 @@ Loki ships with a `functions/mcp.json` file that includes some useful MCP server
 * [github](https://github.com/github/github-mcp-server) - Interact with GitHub repositories, issues, pull requests, and more.
 * [docker](https://github.com/ckreiling/mcp-server-docker) - Manage your local Docker containers with natural language
 * [slack](https://github.com/korotovsky/slack-mcp-server) - Interact with Slack
 * [ddg-search](https://github.com/nickclyde/duckduckgo-mcp-server) - Perform web searches with the DuckDuckGo search engine
 ## Loki Configuration
 MCP servers, like tools, can be used in a handful of contexts:
@@ -32,6 +32,7 @@ be enabled/disabled can be found in the [Configuration](#configuration) section
 |-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|
 | [`demo_py.py`](../../assets/functions/tools/demo_py.py)                             | Demonstrates how to create a tool using Python and how to use comments.                                                                                                                                                              | 🔴               |
 | [`demo_sh.sh`](../../assets/functions/tools/demo_sh.sh)                             | Demonstrate how to create a tool using Bash and how to use comment tags.                                                                                                                                                             | 🔴               |
 | [`demo_ts.ts`](../../assets/functions/tools/demo_ts.ts)                             | Demonstrates how to create a tool using TypeScript and how to use JSDoc comments.                                                                                                                                                    | 🔴               |
 | [`execute_command.sh`](../../assets/functions/tools/execute_command.sh)             | Execute the shell command.                                                                                                                                                                                                           | 🟢               |
 | [`execute_py_code.py`](../../assets/functions/tools/execute_py_code.py)             | Execute the given Python code.                                                                                                                                                                                                       | 🔴               |
 | [`execute_sql_code.sh`](../../assets/functions/tools/execute_sql_code.sh)           | Execute SQL code.                                                                                                                                                                                                                    | 🔴               |
@@ -49,6 +50,7 @@ be enabled/disabled can be found in the [Configuration](#configuration) section
 | [`get_current_time.sh`](../../assets/functions/tools/get_current_time.sh)           | Get the current time.                                                                                                                                                                                                                | 🟢               |
 | [`get_current_weather.py`](../../assets/functions/tools/get_current_weather.py)     | Get the current weather in a given location (Python implementation)                                                                                                                                                                  | 🔴               |
 | [`get_current_weather.sh`](../../assets/functions/tools/get_current_weather.sh)     | Get the current weather in a given location.                                                                                                                                                                                         | 🟢               |
 | [`get_current_weather.ts`](../../assets/functions/tools/get_current_weather.ts)     | Get the current weather in a given location (TypeScript implementation)                                                                                                                                                              | 🔴               |
 | [`query_jira_issues.sh`](../../assets/functions/tools/query_jira_issues.sh)         | Query for jira issues using a Jira Query Language (JQL) query.                                                                                                                                                                       | 🟢               |
 | [`search_arxiv.sh`](../../assets/functions/tools/search_arxiv.sh)                   | Search arXiv using the given search query and return the top papers.                                                                                                                                                                 | 🔴               |
 | [`search_wikipedia.sh`](../../assets/functions/tools/search_wikipedia.sh)           | Search Wikipedia using the given search query. <br>Use it to get detailed information about a public figure, interpretation of a <br>complex scientific concept or in-depth connectivity of a significant historical <br>event, etc. | 🔴               |
@@ -0,0 +1,255 @@
 # Phase 1 Step 1 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 1: Make Config constructible from AppConfig + RequestContext"
 ## Summary
 Added three conversion methods on `Config` (`to_app_config`,
 `to_request_context`, `from_parts`) plus a round-trip test suite, all
 living in a new `src/config/bridge.rs` module. These methods are the
 facade that will let Steps 2–9 migrate callsites from the old `Config`
 to the split `AppState` + `RequestContext` incrementally. Nothing calls
 them outside the test suite yet; that's expected and matches the
 plan's "additive only, no callsite changes" guidance for Step 1.
 ## Pre-Step-1 correction to Step 0
 Before implementing Step 1 I verified all three Step 0 files
 (`src/config/app_config.rs`, `src/config/app_state.rs`,
 `src/config/request_context.rs`) against every architecture decision
 from the design conversations. All three were current except one stale
 reference:
 - `src/config/request_context.rs` docstring said "unified into
  `ToolScope` during Phase 1 Step 6" but after the
  ToolScope/AgentRuntime discussions the plan renumbered this to
  **Step 6.5** and added the `AgentRuntime` collapse alongside
  `ToolScope`. Updated the `# Tool scope (planned)` section docstring
  to reflect both changes (now titled `# Tool scope and agent runtime
  (planned)`).
 No other Step 0 changes were needed.
 ## What was changed
 ### New files
 - **`src/config/bridge.rs`** (~430 lines including tests)
  - Module docstring explaining the bridge's purpose, scheduled
    deletion in Step 10, and the lossy `mcp_registry` field.
  - `impl Config` block with three public methods, scoped under
    `#[allow(dead_code)]`:
    - `to_app_config(&self) -> AppConfig` — borrow, returns fresh
      `AppConfig` by cloning the 40 serialized fields.
    - `to_request_context(&self, app: Arc<AppState>) -> RequestContext`
      — borrow + provided `AppState`, returns fresh `RequestContext`
      by cloning the 19 runtime fields held on both types.
    - `from_parts(app: &AppState, ctx: &RequestContext) -> Config` —
      borrow both halves, returns a new owned `Config`. Sets
      `mcp_registry: None` because no split type holds it.
  - `#[cfg(test)] mod tests` with 4 unit tests:
    - `to_app_config_copies_every_serialized_field`
    - `to_request_context_copies_every_runtime_field`
    - `round_trip_preserves_all_non_lossy_fields`
    - `round_trip_default_config`
  - Helper `build_populated_config()` that sets every primitive /
    `String` / simple `Option` field to a non-default value so a
    missed field in the conversion methods produces a test failure.
 ### Modified files
 - **`src/config/mod.rs`** — added `mod bridge;` declaration (one
  line, inserted alphabetically between `app_state` and `input`).
 - **`src/config/request_context.rs`** — updated the "Tool scope
  (planned)" docstring section to correctly reference Phase 1
  **Step 6.5** (not Step 6) and to mention the `AgentRuntime`
  collapse alongside `ToolScope`. No code changes.
 ## Key decisions
 ### 1. The bridge lives in its own module
 I put the conversion methods in `src/config/bridge.rs` rather than
 adding them inline to `src/config/mod.rs`. The plan calls for this
 entire bridge to be deleted in Step 10, and isolating it in one file
 makes that deletion a single `rm` + one `mod bridge;` line removal in
 `mod.rs`. Adding ~300 lines to the already-massive `mod.rs` would have
 made the eventual cleanup harder.
 ### 2. `mcp_registry` is lossy by design (documented)
 `Config.mcp_registry: Option<McpRegistry>` has no home in either
 `AppConfig` (serialized settings only) or `RequestContext` (runtime
 state that doesn't include MCP, per Step 6.5's `ToolScope` design).
 I considered three options:
 1. **Add a temporary `mcp_registry` field to `RequestContext`** — ugly,
   introduces state that has to be cleaned up in Step 6.5 anyway.
 2. **Accept lossy round-trip, document it** — chosen.
 3. **Store `mcp_registry` on `AppState` temporarily** — dishonest,
   contradicts the plan which says MCP isn't process-wide.
 Option 2 aligns with the plan's direction. The lossy field is
 documented in three places so no caller is surprised:
 - Module-level docstring (`# Lossy fields` section)
 - `from_parts` method docstring
 - Inline comment next to the `is_none()` assertion in the round-trip
  test
 Any Step 2–9 callsite that still needs the registry during its
 migration window must keep a reference to the original `Config`
 rather than relying on round-trip fidelity.
 ### 3. `#[allow(dead_code)]` scoped to the whole `impl Config` block
 Applied to the `impl` block in `bridge.rs` rather than individually to
 each method. All three methods are dead until Step 2+ starts calling
 them. When the first caller migrates, I'll narrow the allow to the
 methods that are still unused. By Step 10 the whole file is deleted
 and the allow goes with it.
 ### 4. Populated-config builder skips domain-type runtime fields
 `build_populated_config()` sets every primitive, `String`, and simple
 `Option` field to a non-default value. It does **not** try to construct
 real `Role`, `Session`, `Agent`, `Supervisor`, `Inbox`, or
 `EscalationQueue` instances because those have complex async/setup
 lifecycles and constructors don't exist for test use.
 The round-trip tests still exercise the clone path for all those
 `Option<T>` fields — they just exercise the `None` variant. The tests
 prove that (a) if a runtime field is set, the conversion clones it
 correctly (which is guaranteed by Rust's `#[derive(Clone)]` on
 `Config`), and (b) `None` roundtrips to `None`. Deeper coverage with
 populated domain types would require mock constructors that don't
 exist in the current code, making it a meaningful scope increase
 unsuitable for Step 1's "additive, mechanical" goal.
 ### 5. The test covers `Config::default()` separately from the
 populated builder
 A separate `round_trip_default_config` test catches any subtle "the
 default doesn't roundtrip" bug that `build_populated_config` might
 mask by always setting fields to non-defaults. Both tests run through
 the same `to_app_config → to_request_context → from_parts` pipeline.
 ## Deviations from plan
 None of substance. The plan's Step 1 description was three sentences
 and a pseudocode block; the implementation matches it field-for-field
 except for two clarifications the plan didn't specify:
 1. **Which module holds the methods** — the plan didn't say. I chose a
   dedicated `src/config/bridge.rs` file (see Key Decision #1).
 2. **How `mcp_registry` is handled in round-trip** — the plan's
   pseudocode said `from_parts` "merges back" but didn't address the
   field that has no home. I chose lossy reconstruction with
   documented behavior (see Key Decision #2).
 Both clarifications are additive — they don't change what Step 1
 accomplishes, they just pin down details the plan left implicit.
 ## Verification
 ### Compilation
 - `cargo check` — clean, zero warnings. The expected dead-code warning
  from the new methods is suppressed by `#[allow(dead_code)]` on the
  `impl` block.
 ### Tests
 - `cargo test bridge` — 4 new tests pass:
  - `config::bridge::tests::round_trip_default_config`
  - `config::bridge::tests::to_app_config_copies_every_serialized_field`
  - `config::bridge::tests::to_request_context_copies_every_runtime_field`
  - `config::bridge::tests::round_trip_preserves_all_non_lossy_fields`
 - `cargo test` — full suite passes: **63 passed, 0 failed**
  (59 pre-existing + 4 new).
 ### Manual smoke test
 Not applicable — Step 1 is additive only, no runtime behavior changed.
 CLI and REPL continue working through the original `Config` code
 paths, unchanged.
 ## Handoff to next step
 ### What Step 2 can rely on
 Step 2 (migrate ~30 static methods off `Config` to a `paths` module)
 can rely on all of the following being true:
 - `Config::to_app_config()`, `Config::to_request_context(app)`, and
  `Config::from_parts(app, ctx)` all exist and are tested.
 - The three new types (`AppConfig`, `AppState`, `RequestContext`) are
  fully defined and compile.
 - Nothing in the codebase outside `src/config/bridge.rs` currently
  calls the new methods, so Step 2 is free to start using them
  wherever convenient without fighting existing callers.
 - `AppState` only has two fields: `config: Arc<AppConfig>` and
  `vault: GlobalVault`. No `mcp_factory`, no `rag_cache` yet — those
  land in Step 6.5.
 - `RequestContext` has flat fields mirroring the runtime half of
  today's `Config`. The `ToolScope` / `AgentRuntime` unification
  happens in Step 6.5, not earlier. Step 2 should not try to
  pre-group fields.
 ### What Step 2 should watch for
 - **Static methods on `Config` with no `&self` parameter** are the
  Step 2 target. The Phase 1 plan lists ~33 of them in a table
  (`config_dir`, `local_path`, `cache_path`, etc.). Each gets moved
  to a new `src/config/paths.rs` module (or similar), with forwarding
  `#[deprecated]` methods left behind on `Config` until Step 2 is
  fully done.
 - **`vault_password_file`** on `Config` is private (not `pub`), but
  `vault_password_file` on `AppConfig` is `pub(crate)`. `bridge.rs`
  accesses both directly because it's a sibling module under
  `src/config/`. If Step 2's path functions need to read
  `vault_password_file` from `AppConfig` they can do so directly
  within the `config` module, but callers outside the module will
  need an accessor method.
 - **`Config.mcp_registry` round-trip is lossy.** If any static method
  moved in Step 2 touches `mcp_registry` (unlikely — none of the ~33
  static methods listed in the plan do), that method should NOT use
  the bridge — it should keep operating on the original `Config`.
  Double-check the list before migrating.
 ### What Step 2 should NOT do
 - Don't delete the bridge. It's still needed for Steps 3–9.
 - Don't narrow `#[allow(dead_code)]` on `impl Config` in `bridge.rs`
  yet — Step 2 might start using some of the methods but not all,
  and the allow-scope should be adjusted once (at the end of Step 2)
  rather than incrementally.
 - Don't touch the `request_context.rs` `# Tool scope and agent
  runtime (planned)` docstring. It's accurate and Step 6.5 is still
  far off.
 ### Files to re-read at the start of Step 2
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 2 section has the
  full static-method migration table.
 - This notes file (`PHASE-1-STEP-1-NOTES.md`) — for the bridge's
  current shape and the `mcp_registry` lossy-field context.
 - `src/config/bridge.rs` — for the exact method signatures available.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Architecture doc: `docs/REST-API-ARCHITECTURE.md`
 - Step 0 files: `src/config/app_config.rs`, `src/config/app_state.rs`,
  `src/config/request_context.rs`
 - Step 1 files: `src/config/bridge.rs`, `src/config/mod.rs` (mod
  declaration), `src/config/request_context.rs` (docstring fix)
@@ -0,0 +1,348 @@
 # Phase 1 Step 2 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 2: Migrate static methods off Config"
 ## Summary
 Extracted 33 static (no-`self`) methods from `impl Config` into a new
 `src/config/paths.rs` module and migrated every caller across the
 codebase. The deprecated forwarders the plan suggested as an
 intermediate step were added, used to drive the callsite migration,
 and then deleted in the same step because the migration was
 mechanically straightforward with `ast-grep` and the forwarders
 became dead immediately.
 ## What was changed
 ### New files
 - **`src/config/paths.rs`** (~270 lines)
  - Module docstring explaining the extraction rationale and the
    (transitional) compatibility shim pattern.
  - `#![allow(dead_code)]` at module scope because most functions
    were briefly dead during the in-flight migration; kept for the
    duration of Step 2 and could be narrowed or removed in a later
    cleanup (see "Follow-up" below).
  - All 33 functions as free-standing `pub fn`s, implementations
    copied verbatim from `impl Config`:
    - Path helpers: `config_dir`, `local_path`, `cache_path`,
      `oauth_tokens_path`, `token_file`, `log_path`, `config_file`,
      `roles_dir`, `role_file`, `macros_dir`, `macro_file`,
      `env_file`, `rags_dir`, `functions_dir`, `functions_bin_dir`,
      `mcp_config_file`, `global_tools_dir`, `global_utils_dir`,
      `bash_prompt_utils_file`, `agents_data_dir`, `agent_data_dir`,
      `agent_config_file`, `agent_bin_dir`, `agent_rag_file`,
      `agent_functions_file`, `models_override_file`
    - Listing helpers: `list_roles`, `list_rags`, `list_macros`
    - Existence checks: `has_role`, `has_macro`
    - Config loaders: `log_config`, `local_models_override`
 ### Modified files
 Migration touched 14 source files — all of `src/config/mod.rs`'s
 internal callers, plus every external `Config::method()` callsite:
 - **`src/config/mod.rs`** — removed the 33 static-method definitions
  from `impl Config`, rewrote every `Self::method()` internal caller
  to use `paths::method()`, and removed the `log::LevelFilter` import
  that became unused after `log_config` moved away.
 - **`src/config/bridge.rs`** — no changes (bridge is unaffected by
  path migrations).
 - **`src/config/macros.rs`** — added `use crate::config::paths;`,
  migrated one `Config::macros_dir().display()` call.
 - **`src/config/agent.rs`** — added `use crate::config::paths;`,
  migrated 2 `Config::agents_data_dir()` calls, 4 `agent_data_dir`
  calls, 3 `agent_config_file` calls, 1 `agent_rag_file` call.
 - **`src/config/request_context.rs`** — no changes.
 - **`src/config/app_config.rs`, `app_state.rs`** — no changes.
 - **`src/main.rs`** — added `use crate::config::paths;`, migrated
  `Config::log_config()`, `Config::list_roles(true)`,
  `Config::list_rags()`, `Config::list_macros()`.
 - **`src/function/mod.rs`** — added `use crate::config::paths;`,
  migrated ~25 callsites across `Config::config_dir`,
  `functions_dir`, `functions_bin_dir`, `global_tools_dir`,
  `agent_bin_dir`, `agent_data_dir`, `agent_functions_file`,
  `bash_prompt_utils_file`. Removed `Config` from the `use
  crate::{config::{...}}` block because it became unused.
 - **`src/repl/mod.rs`** — added `use crate::config::paths;`,
  migrated `Config::has_role(name)` and `Config::has_macro(name)`.
 - **`src/cli/completer.rs`** — added `use crate::config::paths;`,
  migrated `Config::list_roles(true)`, `Config::list_rags()`,
  `Config::list_macros()`.
 - **`src/utils/logs.rs`** — replaced `use crate::config::Config;`
  with `use crate::config::paths;` (Config was only used for
  `log_path`); migrated `Config::log_path()` call.
 - **`src/mcp/mod.rs`** — added `use crate::config::paths;`,
  migrated 3 `Config::mcp_config_file().display()` calls.
 - **`src/client/common.rs`** — added `use crate::config::paths;`,
  migrated `Config::local_models_override()`. Removed `Config` from
  the `config::{Config, GlobalConfig, Input}` import because it
  became unused.
 - **`src/client/oauth.rs`** — replaced `use crate::config::Config;`
  with `use crate::config::paths;` (Config was only used for
  `token_file`); migrated 2 `Config::token_file` calls.
 ### Module registration
 - **`src/config/mod.rs`** — added `pub(crate) mod paths;` in the
  module declaration block, alphabetically placed between `macros`
  and `prompts`.
 ## Key decisions
 ### 1. The deprecated forwarders lived for the whole migration but not beyond
 The plan said to keep `#[deprecated]` forwarders around while
 migrating callsites module-by-module. I followed that approach but
 collapsed the "migrate then delete" into a single step because the
 callsite migration was almost entirely mechanical — `ast-grep` with
 per-method patterns handled the bulk, and only a few edge cases
 (`Self::X` inside `&`-expressions, multi-line `format!` calls)
 required manual text edits. By the time all 33 methods had zero
 external callers, keeping the forwarders would have just generated
 dead_code warnings.
 The plan also said "then remove the deprecated methods" as a distinct
 phase, and that's exactly what happened — just contiguously with the
 migration rather than as a separate commit. The result is the same:
 no forwarders in the final tree, all callers routed through
 `paths::`.
 ### 2. `paths` is a `pub(crate)` module, not `pub`
 I registered the module as `pub(crate) mod paths;` so the functions
 are available anywhere in the crate via `crate::config::paths::X`
 but not re-exported as part of Loki's public API surface. This
 matches the plan's intent — these are internal implementation
 details that happen to have been static methods on `Config`. If
 anything external needs a config path in the future, the proper
 shape is probably to add it as a method on `AppConfig` (which goes
 through Step 3's global-read migration anyway) rather than exposing
 `paths` publicly.
 ### 3. `log_config` stays in `paths.rs` despite not being a path
 `log_config()` returns `(LevelFilter, Option<PathBuf>)` — it reads
 environment variables to determine the log level plus falls back to
 `log_path()` for the file destination. Strictly speaking, it's not
 a "path" function, but:
 - It's a static no-`self` helper (the reason it's in Step 2)
 - It's used in exactly one place (`main.rs:446`)
 - Splitting it into its own module would add complexity for no
  benefit
 The plan also listed it in the migration table as belonging in
 `paths.rs`. I followed the plan.
 ### 4. `#![allow(dead_code)]` at module scope, not per-function
 I initially scoped the allow to the whole `paths.rs` module because
 during the mid-migration state, many functions had zero callers
 temporarily. I kept it at module scope rather than narrowing to
 individual functions as they became used again, because by the end
 of Step 2 all 33 functions have at least one real caller and the
 allow is effectively inert — but narrowing would mean tracking
 which functions are used vs not in every follow-up step. Module-
 level allow is set-and-forget.
 This is slightly looser than ideal. See "Follow-up" below.
 ### 5. `ast-grep` was the primary migration tool, with manual edits for awkward cases
 `ast-grep --pattern 'Config::method()'` and
 `--pattern 'Self::method()'` caught ~90% of the callsites cleanly.
 The remaining ~10% fell into two categories that `ast-grep` handled
 poorly:
 1. **Calls wrapped in `.display()` or `.to_string_lossy()`.** Some
   ast-grep patterns matched these, others didn't — the behavior
   seemed inconsistent. When a pattern found 0 matches but grep
   showed real matches, I switched to plain text `Edit` for that
   cluster.
 2. **`&Self::X()` reference expressions.** `ast-grep` appeared to
   not match `Self::X()` when it was the operand of a `&` reference,
   presumably because the parent node shape was different. Plain
   text `Edit` handled these without issue.
 These are tooling workarounds, not architectural concerns. The
 final tree has no `Config::X` or `Self::X` callers for any of the
 33 migrated methods.
 ### 6. Removed `Config` import from three files that no longer needed it
 `src/function/mod.rs`, `src/client/common.rs`, `src/client/oauth.rs`,
 and `src/utils/logs.rs` all had `use crate::config::Config;` (or
 similar) imports that became unused after every call was migrated.
 I removed them. This is a minor cleanup but worth doing because:
 - Clippy flags unused imports as warnings
 - Leaving them in signals "this file might still need Config" which
  future migration steps would have to double-check
 ## Deviations from plan
 ### 1. `sync_models` is not in Step 2
 The plan's Step 2 table listed `sync_models(url, abort)` as a
 migration target, but grep showed only `sync_models_url(&self) ->
 String` exists in the code. That's a `&self` method, so it belongs
 in Step 3 (global-read methods), not Step 2.
 I skipped it here and will pick it up in Step 3. The Step 2 actual
 count is 33 methods, not the 34 the plan's table implies.
 ### 2. Forwarders deleted contiguously, not in a separate sub-step
 See Key Decision #1. The plan described a two-phase approach
 ("leave forwarders, migrate callers module-by-module, then remove
 forwarders"). I compressed this into one pass because the migration
 was so mechanical there was no value in the intermediate state.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (same as Step 1 — no new
  tests were added because Step 2 is a pure code-move with no new
  behavior to test; the existing test suite verifies nothing
  regressed)
 ### Manual smoke test
 Not applicable — Step 2 is a pure code-move. The path computations
 are literally the same code at different call sites. If existing
 tests pass and nothing references Config's static methods anymore,
 there's nothing to manually verify beyond the compile.
 ### Callsite audit
 ```
 cargo check 2>&1 | grep "Config::\(config_dir\|local_path\|...\)"
 ```
 Returns zero matches. Every external `Config::method()` callsite
 for the 33 migrated methods has been converted to `paths::method()`.
 ## Handoff to next step
 ### What Step 3 can rely on
 Step 3 (migrate global-read methods to `AppConfig`) can rely on:
 - `src/config/paths.rs` exists and holds every static path helper
  plus `log_config`, `list_*`, `has_*`, and `local_models_override`
 - Zero `Config::config_dir()`, `Config::cache_path()`, etc. calls
  remain in the codebase
 - The `#[allow(dead_code)]` on `paths.rs` at module scope is safe to
  remove at any time now that all functions have callers
 - `AppConfig` (from Step 0) is still fully populated and ready to
  receive method migrations
 - The bridge from Step 1 (`Config::to_app_config`,
  `to_request_context`, `from_parts`) is unchanged and still works
 - `Config` struct has no more static methods except those that were
  kept because they DO take `&self` (`vault_password_file`,
  `messages_file`, `sessions_dir`, `session_file`, `rag_file`,
  `state`, etc.)
 - Deprecation forwarders are GONE — don't add them back
 ### What Step 3 should watch for
 - **`sync_models_url`** was listed in the Step 2 plan table as
  static but is actually `&self`. It's a Step 3 target
  (global-read). Pick it up there.
 - **The Step 3 target list** (from `PHASE-1-IMPLEMENTATION-PLAN.md`):
  `vault_password_file`, `editor`, `sync_models_url`, `light_theme`,
  `render_options`, `print_markdown`, `rag_template`,
  `select_functions`, `select_enabled_functions`,
  `select_enabled_mcp_servers`. These are all `&self` methods that
  only read serialized config state.
 - **The `vault_password_file` field on `AppConfig` is `pub(crate)`,
  not `pub`.** The accessor method on `AppConfig` will need to
  encapsulate the same fallback logic that the `Config` method has
  (see `src/config/mod.rs` — it falls back to
  `gman::config::Config::local_provider_password_file()`).
 - **`print_markdown` depends on `render_options`.** When migrating
  them to `AppConfig`, preserve the dependency chain.
 - **`select_functions` / `select_enabled_functions` /
  `select_enabled_mcp_servers` take a `&Role` parameter.** Their
  new signatures on `AppConfig` will be `&self, role: &Role` — make
  sure `Role` is importable in the `app_config.rs` module (it
  currently isn't).
 - **Strategy for the Step 3 migration:** same as Step 2 — create
  methods on `AppConfig`, add `#[deprecated]` forwarders on
  `Config`, migrate callsites with `ast-grep`, delete the
  forwarders. Should be quicker than Step 2 because the method
  count is smaller (10 vs 33) and the pattern is now well-
  established.
 ### What Step 3 should NOT do
 - Don't touch `paths.rs` — it's complete.
 - Don't touch `bridge.rs` — Step 3's migrations will still flow
  through the bridge's round-trip test correctly.
 - Don't try to migrate `current_model`, `extract_role`, `sysinfo`,
  or any of the `set_*` methods — those are "mixed" methods listed
  in Step 7, not Step 3.
 - Don't delete `Config` struct fields yet. Step 3 only moves
  *methods* that read fields; the fields themselves still exist on
  `Config` (and on `AppConfig`) in parallel until Step 10.
 ### Files to re-read at the start of Step 3
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 3 section (table of
  10 global-read methods and their target signatures)
 - This notes file — specifically the "What Step 3 should watch for"
  section
 - `src/config/app_config.rs` — to see the current `AppConfig` shape
  and decide where to put new methods
 - The current `&self` methods on `Config` in `src/config/mod.rs`
  that are being migrated
 ## Follow-up (not blocking Step 3)
 ### 1. Narrow or remove `#![allow(dead_code)]` on `paths.rs`
 At Step 2's end, every function in `paths.rs` has real callers, so
 the module-level allow could be removed without producing warnings.
 I left it in because it's harmless and removes the need to add
 per-function allows during mid-migration states in later steps.
 Future cleanup pass can tighten this.
 ### 2. Consider renaming `paths.rs` if its scope grows
 `log_config`, `list_roles`, `list_rags`, `list_macros`, `has_role`,
 `has_macro`, and `local_models_override` aren't strictly "paths"
 but they're close enough that extracting them into a sibling module
 would be premature abstraction. If Steps 3+ add more non-path
 helpers to the same module, revisit this.
 ### 3. The `Config::config_dir` deletion removes one access point for env vars
 The `config_dir()` function was also the entry point for XDG-
 compatible config location discovery. Nothing about that changed —
 it still lives in `paths::config_dir()` — but if Step 4+ needs to
 reference the config directory from code that doesn't yet import
 `paths`, the import list will need updating.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Step 1 notes: `docs/implementation/PHASE-1-STEP-1-NOTES.md`
 - New file: `src/config/paths.rs`
 - Modified files (module registration + callsite migration): 14
  files across `src/config/`, `src/function/`, `src/repl/`,
  `src/cli/`, `src/main.rs`, `src/utils/`, `src/mcp/`,
  `src/client/`
@@ -0,0 +1,326 @@
 # Phase 1 Step 3 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 3: Migrate global-read methods to AppConfig"
 ## Summary
 Added 7 global-read methods to `AppConfig` as inherent methods
 duplicating the bodies that still exist on `Config`. The planned
 approach (deprecated forwarders + caller migration) turned out to
 be the wrong shape for this step because callers hold `Config`
 instances, not `AppConfig` instances, and giving them an `AppConfig`
 would require either a sync'd `Arc<AppConfig>` field on `Config`
 (which Step 4's global-write migration would immediately break) or
 cloning on every call. The clean answer is to duplicate during the
 bridge window and let callers migrate naturally when Steps 8-9
 switch them from `Config` to `RequestContext` + `AppState`. The
 duplication is 7 methods / ~100 lines and deletes itself when
 `Config` is removed in Step 10.
 **Three methods from the plan's Step 3 target list were deferred
 to Step 7** because they read runtime state, not just serialized
 state (see "Deviations from plan").
 ## What was changed
 ### Modified files
 - **`src/config/app_config.rs`** — added 6 new imports
  (`MarkdownRender`, `RenderOptions`, `IS_STDOUT_TERMINAL`,
  `decode_bin`, `anyhow`, `env`, `ThemeSet`) and a new
  `impl AppConfig` block with 7 methods under
  `#[allow(dead_code)]`:
  - `vault_password_file(&self) -> PathBuf`
  - `editor(&self) -> Result<String>`
  - `sync_models_url(&self) -> String`
  - `light_theme(&self) -> bool`
  - `render_options(&self) -> Result<RenderOptions>`
  - `print_markdown(&self, text) -> Result<()>`
  - `rag_template(&self, embeddings, sources, text) -> String`
  All bodies are copy-pasted verbatim from the originals on
  `Config`, with the following adjustments for the new module
  location:
  - `EDITOR` static → `super::EDITOR` (shared across both impls)
  - `SYNC_MODELS_URL` const → `super::SYNC_MODELS_URL`
  - `RAG_TEMPLATE` const → `super::RAG_TEMPLATE`
  - `LIGHT_THEME` / `DARK_THEME` consts → `super::LIGHT_THEME` /
    `super::DARK_THEME`
  - `paths::local_path()` continues to work unchanged (already in
    the right module from Step 2)
 ### Unchanged files
 - **`src/config/mod.rs`** — the original `Config::vault_password_file`,
  `editor`, `sync_models_url`, `light_theme`, `render_options`,
  `print_markdown`, `rag_template` method definitions are
  deliberately left intact. They continue to work for every existing
  caller. The deletion of these happens in Step 10 when `Config` is
  removed entirely.
 - **All external callers** (26 callsites across 6 files) — also
  unchanged. They continue to call `config.editor()`,
  `config.render_options()`, etc. on their `Config` instances.
 ## Key decisions
 ### 1. Duplicate method bodies instead of `#[deprecated]` forwarders
 The plan prescribed the same shape as Step 2: add the new version,
 add a `#[deprecated]` forwarder on the old location, migrate
 callers, delete forwarders. This worked cleanly in Step 2 because
 the new location was a free-standing `paths` module — callers
 could switch from `Config::method()` (associated function) to
 `paths::method()` (free function) without needing any instance.
 Step 3 is fundamentally different: `AppConfig::method(&self)` needs
 an `AppConfig` instance. Callers today hold `Config` instances.
 Giving them an `AppConfig` means one of:
 (a) Add an `app_config: Arc<AppConfig>` field to `Config` and have
    the forwarder do `self.app_config.method()`. **Rejected**
    because Step 4 (global-write) will mutate `Config` fields via
    `set_wrap`, `update`, etc. — keeping the `Arc<AppConfig>`
    in sync would require either rebuilding it on every write (slow
    and racy) or tracking dirty state (premature complexity).
 (b) Have the forwarder do `self.to_app_config().method()`. **Rejected**
    because `to_app_config` clones all 40 serialized fields on
    every call — a >100x slowdown for simple accessors like
    `light_theme()`.
 (c) Duplicate the method bodies on both `Config` and `AppConfig`,
    let each caller use whichever instance it has, delete the
    `Config` versions when `Config` itself is deleted in Step 10.
    **Chosen.**
 Option (c) has a small ongoing cost (~100 lines of duplicated
 logic) but is strictly additive, has zero runtime overhead, and
 automatically cleans up in Step 10. It also matches how Rust's
 type system prefers to handle this — parallel impls are cheaper
 than synchronized state.
 ### 2. Caller migration is deferred to Steps 8-9
 With duplication in place, the migration from `Config` to
 `AppConfig` happens organically later:
 - When Step 8 rewrites `main.rs` to construct an `AppState` and
  `RequestContext` instead of a `GlobalConfig`, the `main.rs`
  callers of `config.editor()` naturally become
  `ctx.app.config.editor()` — calling into `AppConfig`'s version.
 - Same for every other callsite that gets migrated in Step 8+.
 - By Step 10, the old `Config::editor()` etc. have zero callers
  and get deleted along with the rest of `Config`.
 This means Step 3 is "additive only, no caller touches" —
 deliberately smaller in scope than Step 2. That's the correct call
 given the instance-type constraint.
 ### 3. `EDITOR` static is shared between `Config::editor` and `AppConfig::editor`
 `editor()` caches the resolved editor path in a module-level
 `static EDITOR: OnceLock<Option<String>>` in `src/config/mod.rs`.
 Both `Config::editor(&self)` and `AppConfig::editor(&self)` read
 and initialize the same static via `super::EDITOR`. This matches
 the current behavior: whichever caller resolves first wins the
 `OnceLock::get_or_init` race and subsequent callers see the cached
 value.
 There's a latent bug here (if `Config.editor` and `AppConfig.editor`
 fields ever differ, the first caller wins regardless) but it's
 pre-existing and preserved during the bridge window. Step 10 resolves
 it by deleting `Config` entirely.
 ### 4. Three methods deferred to Step 7
 See "Deviations from plan."
 ## Deviations from plan
 ### `select_functions`, `select_enabled_functions`, `select_enabled_mcp_servers` belong in Step 7
 The plan's Step 3 table lists all three. Reading their bodies (in
 `src/config/mod.rs` at lines 1816, 1828, 1923), they all touch
 `self.functions` and `self.agent` — both of which are `#[serde(skip)]`
 runtime fields that do NOT exist on `AppConfig` and will never
 exist there (they're per-request state living on `RequestContext`
 and `AgentRuntime`).
 These are "mixed" methods in the plan's Step 7 taxonomy — they
 conditionally read serialized config + runtime state depending on
 whether an agent is active. Moving them to `AppConfig` now would
 require `AppConfig` to hold `functions` and `agent` fields, which
 directly contradicts the Step 0 / Step 6.5 design.
 **Action taken:** left all three on `Config` unchanged. They get
 migrated in Step 7 with the new signature
 `(app: &AppConfig, ctx: &RequestContext, role: &Role) -> Vec<...>`
 as described in the plan.
 **Action required from Step 7:** pick up these three methods. The
 call graph is:
 - `Config::select_functions` is called from `src/config/input.rs:243`
  (one external caller)
 - `Config::select_functions` internally calls the two private
  helpers
 - The private helpers read both `self.functions` (runtime,
  per-request) and `self.agent` (runtime, per-request) — so they
  fundamentally need `RequestContext` not `AppConfig`
 ### Step 3 count: 7 methods, not 10
 The plan's table listed 10 target methods. After excluding the
 three `select_*` methods, Step 3 migrated 7. This is documented
 here rather than silently completing a smaller Step 3 so Step 7's
 scope is clear.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (same as Steps 1–2)
 Step 3 added no new tests because it's duplication — there's
 nothing new to verify. The existing test suite confirms:
 (a) the original `Config` methods still work (they weren't touched)
 (b) `AppConfig` still compiles and its `Default` impl is intact
    (needed for Step 1's bridge test which uses
    `build_populated_config()` → `to_app_config()`)
 Running `cargo test bridge` specifically:
 ```
 test config::bridge::tests::round_trip_default_config ... ok
 test config::bridge::tests::to_app_config_copies_every_serialized_field ... ok
 test config::bridge::tests::to_request_context_copies_every_runtime_field ... ok
 test config::bridge::tests::round_trip_preserves_all_non_lossy_fields ... ok
 test result: ok. 4 passed
 ```
 The bridge's round-trip test still works, which proves the new
 methods on `AppConfig` don't interfere with the struct layout or
 deserialization. They're purely additive impl-level methods.
 ### Manual smoke test
 Not applicable — no runtime behavior changed. CLI and REPL still
 call `Config::editor()` etc. as before.
 ## Handoff to next step
 ### What Step 4 can rely on
 Step 4 (migrate global-write methods) can rely on:
 - `AppConfig` now has 7 inherent read methods that mirror the
  corresponding `Config` methods exactly
 - `#[allow(dead_code)]` on the `impl AppConfig` block in
  `app_config.rs` — safe to leave as-is, it'll go away when the
  first caller is migrated in Step 8+
 - `Config` is unchanged for all 7 methods and continues to work
  for every current caller
 - The bridge (`Config::to_app_config`, `to_request_context`,
  `from_parts`) from Step 1 still works
 - The `paths` module from Step 2 is unchanged
 - `Config::select_functions`, `select_enabled_functions`,
  `select_enabled_mcp_servers` are **still on `Config`** and must
  stay there through Step 6. They get migrated in Step 7.
 ### What Step 4 should watch for
 - **The Step 4 target list** (from `PHASE-1-IMPLEMENTATION-PLAN.md`):
  `set_wrap`, `update`, `load_envs`, `load_functions`,
  `load_mcp_servers`, `setup_model`, `setup_document_loaders`,
  `setup_user_agent`. These are global-write methods that
  initialize or mutate serialized fields.
 - **Tension with Step 3's duplication decision:** Step 4 methods
  mutate `Config` fields. If we also duplicate them on `AppConfig`,
  then mutations through one path don't affect the other — but no
  caller ever mutates both, so this is fine in practice during
  the bridge window.
 - **`load_functions` and `load_mcp_servers`** are initialization-
  only (called once in `Config::init`). They're arguably not
  "global-write" in the same sense — they populate runtime-only
  fields (`functions`, `mcp_registry`). Step 4 should carefully
  classify each: fields that belong to `AppConfig` vs fields that
  belong to `RequestContext` vs fields that go away in Step 6.5
  (`mcp_registry`).
 - **Strategy for Step 4:** because writes are typically one-shot
  (`update` is called from `.set` REPL command; `load_envs` is
  called once at startup), you can be more lenient about
  duplication vs consolidation. Consider: the write methods might
  not need to exist on `AppConfig` at all if they're only used
  during `Config::init` and never during request handling. Step 4
  should evaluate each one individually.
 ### What Step 4 should NOT do
 - Don't add an `app_config: Arc<AppConfig>` field to `Config`
  (see Key Decision #1 for why).
 - Don't touch the 7 methods added to `AppConfig` in Step 3 — they
  stay until Step 8+ caller migration, and Step 10 deletion.
 - Don't migrate `select_*` methods — those are Step 7.
 - Don't try to migrate callers of the Step 3 methods to go
  through `AppConfig` yet. The call sites still hold `Config`,
  and forcing a conversion would require either a clone or a
  sync'd field.
 ### Files to re-read at the start of Step 4
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 4 section
 - This notes file — specifically the "Deviations from plan" and
  "What Step 4 should watch for" sections
 - `src/config/mod.rs` — the current `Config::set_wrap`, `update`,
  `load_*`, `setup_*` method bodies (search for `pub fn set_wrap`,
  `pub fn update`, `pub fn load_envs`, etc.)
 - `src/config/app_config.rs` — the current shape with 7 new
  methods
 ## Follow-up (not blocking Step 4)
 ### 1. The `EDITOR` static sharing is pre-existing fragility
 Both `Config::editor` and `AppConfig::editor` now share the same
 `static EDITOR: OnceLock<Option<String>>`. If two Configs with
 different `editor` fields exist (unlikely in practice but possible
 during tests), the first caller wins. This isn't new — the single
 `Config` version had the same property. Step 10's `Config`
 deletion will leave only `AppConfig::editor` which eliminates the
 theoretical bug. Worth noting so nobody introduces a test that
 assumes per-instance editor caching.
 ### 2. `impl AppConfig` block grows across Steps 3-7
 By the end of Step 7, `AppConfig` will have accumulated: 7 methods
 from Step 3, potentially some from Step 4, more from Step 7's
 mixed-method splits. The `#[allow(dead_code)]` currently covers
 the whole block. As callers migrate in Step 8+, the warning
 suppression can be removed. Don't narrow it prematurely during
 Steps 4-7.
 ### 3. Imports added to `app_config.rs`
 Step 3 added `MarkdownRender`, `RenderOptions`, `IS_STDOUT_TERMINAL`,
 `decode_bin`, `anyhow::{Context, Result, anyhow}`, `env`,
 `ThemeSet`. Future steps may add more. The import list is small
 enough to stay clean; no reorganization needed.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Step 2 notes: `docs/implementation/PHASE-1-STEP-2-NOTES.md`
 - Modified file: `src/config/app_config.rs` (imports + new
  `impl AppConfig` block)
 - Unchanged but relevant: `src/config/mod.rs` (original `Config`
  methods still exist for now), `src/config/bridge.rs` (still
  passes round-trip tests)
@@ -0,0 +1,362 @@
 # Phase 1 Step 4 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 4: Migrate global-write methods"
 ## Summary
 Added 4 of 8 planned global-write methods to `AppConfig` as
 inherent methods, duplicating the bodies that still exist on
 `Config`. The other 4 methods were deferred: 2 to Step 7 (mixed
 methods that call into `set_*` methods slated for Step 7), and
 2 kept on `Config` because they populate runtime-only fields
 (`functions`, `mcp_registry`) that don't belong on `AppConfig`.
 Same duplication-no-caller-migration pattern as Step 3 — during
 the bridge window both `Config` and `AppConfig` have these
 methods; caller migration happens organically in Steps 8-9 when
 frontends switch from `GlobalConfig` to `AppState` + `RequestContext`.
 ## What was changed
 ### Modified files
 - **`src/config/app_config.rs`** — added 4 new imports (`NO_COLOR`,
  `get_env_name` via `crate::utils`, `terminal_colorsaurus`
  types) and a new `impl AppConfig` block with 4 methods under
  `#[allow(dead_code)]`:
  - `set_wrap(&mut self, value: &str) -> Result<()>` — parses and
    sets `self.wrap` for the `.set wrap` REPL command
  - `setup_document_loaders(&mut self)` — seeds default PDF/DOCX
    loaders into `self.document_loaders` if not already present
  - `setup_user_agent(&mut self)` — expands `"auto"` into
    `loki/<version>` in `self.user_agent`
  - `load_envs(&mut self)` — ~140 lines of env-var overrides that
    populate all 30+ serialized fields from `LOKI_*` environment
    variables
  All bodies are copy-pasted verbatim from the originals on
  `Config`, with references updated for the new module location:
  - `read_env_value::<T>` → `super::read_env_value::<T>`
  - `read_env_bool` → `super::read_env_bool`
  - `NO_COLOR`, `IS_STDOUT_TERMINAL`, `get_env_name`, `decode_bin`
    → imported from `crate::utils`
  - `terminal_colorsaurus` → direct import
 ### Unchanged files
 - **`src/config/mod.rs`** — the original `Config::set_wrap`,
  `load_envs`, `setup_document_loaders`, `setup_user_agent`
  definitions are deliberately left intact. They continue to
  work for every existing caller. They get deleted in Step 10
  when `Config` is removed entirely.
 - **`src/config/mod.rs`** — the `read_env_value` and
  `read_env_bool` private helpers are unchanged and accessed via
  `super::read_env_value` from `app_config.rs`.
 ## Key decisions
 ### 1. Only 4 of 8 methods migrated
 The plan's Step 4 table listed 8 methods. After reading each one
 carefully, I classified them:
 | Method | Classification | Action |
 |---|---|---|
 | `set_wrap` | Pure global-write | **Migrated** |
 | `load_envs` | Pure global-write | **Migrated** |
 | `setup_document_loaders` | Pure global-write | **Migrated** |
 | `setup_user_agent` | Pure global-write | **Migrated** |
 | `setup_model` | Calls `self.set_model()` (Step 7 mixed) | **Deferred to Step 7** |
 | `load_functions` | Writes runtime `self.functions` field | **Not migrated** (stays on `Config`) |
 | `load_mcp_servers` | Writes runtime `self.mcp_registry` field (going away in Step 6.5) | **Not migrated** (stays on `Config`) |
 | `update` | Dispatches to 10+ `set_*` methods, all Step 7 mixed | **Deferred to Step 7** |
 See "Deviations from plan" for detail on each deferral.
 ### 2. Same duplication-no-forwarder pattern as Step 3
 Step 4's target callers are all `.write()` on a `GlobalConfig` /
 `Config` instance. Like Step 3, giving these callers an
 `AppConfig` instance would require either (a) a sync'd
 `Arc<AppConfig>` field on `Config` (breaks because Step 4
 itself mutates `Config`), (b) cloning on every call (expensive
 for `load_envs` which touches 30+ fields), or (c) duplicating
 the method bodies.
 Option (c) is the same choice Step 3 made and for the same
 reasons. The duplication is 4 methods (~180 lines total dominated
 by `load_envs`) that auto-delete in Step 10.
 ### 3. `load_envs` body copied verbatim despite being long
 `load_envs` is ~140 lines of repetitive `if let Some(v) =
 read_env_value(...) { self.X = v; }` blocks — one per serialized
 field. I considered refactoring it to reduce repetition (e.g., a
 macro or a data-driven table) but resisted that urge because:
 - The refactor would be a behavior change (even if subtle) during
  a mechanical code-move step
 - The verbatim copy is easy to audit for correctness (line-by-line
  diff against the original)
 - It gets deleted in Step 10 anyway, so the repetition is
  temporary
 - Any cleanup belongs in a dedicated tidying pass after Phase 1,
  not in the middle of a split
 ### 4. Methods stay in a separate `impl AppConfig` block
 Step 3 added its 7 read methods in one `impl AppConfig` block.
 Step 4 adds its 4 write methods in a second `impl AppConfig`
 block directly below it. Rust allows multiple `impl` blocks on
 the same type, and the visual separation makes it obvious which
 methods are reads vs writes during the bridge window. When Step
 10 deletes `Config`, both blocks can be merged or left separate
 based on the cleanup maintainer's preference.
 ## Deviations from plan
 ### `setup_model` deferred to Step 7
 The plan lists `setup_model` as a Step 4 target. Reading its
 body:
 ```rust
 fn setup_model(&mut self) -> Result<()> {
    let mut model_id = self.model_id.clone();
    if model_id.is_empty() {
        let models = list_models(self, ModelType::Chat);
        // ...
    }
    self.set_model(&model_id)?;  // ← this is Step 7 "mixed"
    self.model_id = model_id;
    Ok(())
 }
 ```
 It calls `self.set_model(&model_id)`, which the plan explicitly
 lists in **Step 7** ("mixed methods") because `set_model`
 conditionally writes to `role_like` (runtime) or `model_id`
 (serialized) depending on whether a role/session/agent is
 active. Since `setup_model` can't be migrated until `set_model`
 exists on `AppConfig` / `RequestContext`, it has to wait for
 Step 7.
 **Action:** left `Config::setup_model` intact. Step 7 picks it up.
 ### `update` deferred to Step 7
 The plan lists `update` as a Step 4 target. Its body is a ~140
 line dispatch over keys like `"temperature"`, `"top_p"`,
 `"enabled_tools"`, `"enabled_mcp_servers"`, `"max_output_tokens"`,
 `"save_session"`, `"compression_threshold"`,
 `"rag_reranker_model"`, `"rag_top_k"`, etc. — every branch
 calls into a `set_*` method on `Config` that the plan explicitly
 lists in **Step 7**:
 - `set_temperature` (Step 7)
 - `set_top_p` (Step 7)
 - `set_enabled_tools` (Step 7)
 - `set_enabled_mcp_servers` (Step 7)
 - `set_max_output_tokens` (Step 7)
 - `set_save_session` (Step 7)
 - `set_compression_threshold` (Step 7)
 - `set_rag_reranker_model` (Step 7)
 - `set_rag_top_k` (Step 7)
 Migrating `update` before those would mean `update` calls
 `Config::set_X` (old) from inside `AppConfig::update` (new) —
 which crosses the type boundary awkwardly and leaves `update`'s
 behavior split between the two types during the migration
 window. Not worth it.
 **Action:** left `Config::update` intact. Step 7 picks it up
 along with the `set_*` methods it dispatches to. At that point
 all 10 dependencies will be on `AppConfig`/`RequestContext` and
 `update` can be moved cleanly.
 ### `load_functions` not migrated (stays on Config)
 The plan lists `load_functions` as a Step 4 target. Its body:
 ```rust
 fn load_functions(&mut self) -> Result<()> {
    self.functions = Functions::init(
        self.visible_tools.as_ref().unwrap_or(&Vec::new())
    )?;
    if self.working_mode.is_repl() {
        self.functions.append_user_interaction_functions();
    }
    Ok(())
 }
 ```
 It writes to `self.functions` — a `#[serde(skip)]` runtime field
 that lives on `RequestContext` after Step 6 and inside `ToolScope`
 after Step 6.5. It also reads `self.working_mode`, another
 runtime field. This isn't a "global-write" method in the sense
 Step 4 targets — it's a runtime initialization method that will
 move to `RequestContext` when `functions` does.
 **Action:** left `Config::load_functions` intact. It gets
 handled in Step 5 or Step 6 when runtime fields start moving.
 Not Step 4, not Step 7.
 ### `load_mcp_servers` not migrated (stays on Config)
 Same story as `load_functions`. Its body writes
 `self.mcp_registry` (a field slated for deletion in Step 6.5 per
 the architecture plan) and `self.functions` (runtime, moving in
 Step 5/6). Nothing about this method belongs on `AppConfig`.
 **Action:** left `Config::load_mcp_servers` intact. It gets
 handled or deleted in Step 6.5 when `McpFactory` replaces the
 singleton registry entirely.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (unchanged from Steps 1–3)
 Step 4 added no new tests because it's duplication. The existing
 test suite confirms:
 - The original `Config` methods still work (they weren't touched)
 - `AppConfig` still compiles, its `Default` impl is intact
 - The bridge's round-trip test still passes:
  - `config::bridge::tests::round_trip_default_config`
  - `config::bridge::tests::round_trip_preserves_all_non_lossy_fields`
  - `config::bridge::tests::to_app_config_copies_every_serialized_field`
  - `config::bridge::tests::to_request_context_copies_every_runtime_field`
 ### Manual smoke test
 Not applicable — no runtime behavior changed. CLI and REPL still
 call `Config::set_wrap()`, `Config::update()`, `Config::load_envs()`,
 etc. unchanged.
 ## Handoff to next step
 ### What Step 5 can rely on
 Step 5 (migrate request-read methods to `RequestContext`) can
 rely on:
 - `AppConfig` now has **11 methods total**: 7 reads from Step 3,
  4 writes from Step 4
 - `#[allow(dead_code)]` on both `impl AppConfig` blocks — safe
  to leave as-is, goes away when callers migrate in Steps 8+
 - `Config` is unchanged for all 11 methods — originals still
  work for all current callers
 - The bridge from Step 1, the paths module from Step 2, the
  read methods from Step 3 are all unchanged and still working
 - **`setup_model`, `update`, `load_functions`, `load_mcp_servers`
  are still on `Config`** and must stay there:
  - `setup_model` → migrates in Step 7 with the `set_*` methods
  - `update` → migrates in Step 7 with the `set_*` methods
  - `load_functions` → migrates to `RequestContext` in Step 5 or
    Step 6 (whichever handles `Functions`)
  - `load_mcp_servers` → deleted/transformed in Step 6.5
 ### What Step 5 should watch for
 - **Step 5 targets are `&self` request-read methods** that read
  runtime fields like `self.session`, `self.role`, `self.agent`,
  `self.rag`, etc. The plan's Step 5 table lists:
  `state`, `messages_file`, `sessions_dir`, `session_file`,
  `rag_file`, `info`, `role_info`, `session_info`, `agent_info`,
  `agent_banner`, `rag_info`, `list_sessions`,
  `list_autoname_sessions`, `is_compressing_session`,
  `role_like_mut`.
 - **These migrate to `RequestContext`**, not `AppConfig`, because
  they read per-request state.
 - **Same duplication pattern applies.** Add methods to
  `RequestContext`, leave originals on `Config`, no caller
  migration.
 - **`sessions_dir` and `messages_file` already use `paths::`
  functions internally** (from Step 2's migration). They read
  `self.agent` to decide between the global and agent-scoped
  path. Those paths come from the `paths` module.
 - **`role_like_mut`** is interesting — it's the helper that
  returns a mutable reference to whichever of role/session/agent
  is on top. It's the foundation for every `set_*` method in
  Step 7. Migrate it to `RequestContext` in Step 5 so Step 7
  has it ready.
 - **`list_sessions` and `list_autoname_sessions`** wrap
  `paths::list_file_names` with some filtering. They take
  `&self` to know the current agent context for path resolution.
 ### What Step 5 should NOT do
 - Don't touch the Step 3/4 methods on `AppConfig` — they stay
  until Steps 8+ caller migration.
 - Don't try to migrate `update`, `setup_model`, `load_functions`,
  or `load_mcp_servers` — each has a specific later-step home.
 - Don't touch the `bridge.rs` conversions — still needed.
 - Don't touch `paths.rs` — still complete.
 - Don't migrate any caller of any method yet — callers stay on
  `Config` through the bridge window.
 ### Files to re-read at the start of Step 5
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 5 section has
  the full request-read method table
 - This notes file — specifically "Deviations from plan" and
  "What Step 5 should watch for"
 - `src/config/request_context.rs` — to see the current shape
  that Step 5 will extend
 - Current `Config` method bodies in `src/config/mod.rs` for
  each Step 5 target (search for `pub fn state`, `pub fn
  messages_file`, etc.)
 ## Follow-up (not blocking Step 5)
 ### 1. `load_envs` is the biggest duplication so far
 At ~140 lines, `load_envs` is the largest single duplication in
 the bridge. It's acceptable because it's self-contained and
 auto-deletes in Step 10, but it's worth flagging that if Phase 1
 stalls anywhere between now and Step 10, this method's duplication
 becomes a maintenance burden. Env var changes would need to be
 made twice.
 **Mitigation during the bridge window:** if someone adds a new
 env var during Steps 5-9, they MUST add it to both
 `Config::load_envs` and `AppConfig::load_envs`. Document this in
 the Step 5 notes if any env var changes ship during that
 interval.
 ### 2. `AppConfig` now has 11 methods across 2 `impl` blocks
 Fine during Phase 1. Post-Phase 1 cleanup can consider whether to
 merge them or keep the read/write split. Not a blocker.
 ### 3. The `read_env_value` / `read_env_bool` helpers are accessed via `super::`
 These are private module helpers in `src/config/mod.rs`. Step 4's
 migration means `app_config.rs` now calls them via `super::`,
 which works because `app_config.rs` is a sibling module. If
 Phase 2+ work moves these helpers anywhere else, the `super::`
 references in `app_config.rs` will need updating.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Step 3 notes: `docs/implementation/PHASE-1-STEP-3-NOTES.md`
  (for the duplication rationale)
 - Modified file: `src/config/app_config.rs` (new imports + new
  `impl AppConfig` block with 4 write methods)
 - Unchanged but referenced: `src/config/mod.rs` (original
  `Config` methods still exist, private helpers
  `read_env_value` / `read_env_bool` accessed via `super::`)
@@ -0,0 +1,413 @@
 # Phase 1 Step 5 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 5: Migrate request-read methods to RequestContext"
 ## Summary
 Added 13 of 15 planned request-read methods to `RequestContext`
 as inherent methods, duplicating the bodies that still exist on
 `Config`. The other 2 methods (`info`, `session_info`) were
 deferred to Step 7 because they mix runtime reads with calls into
 `AppConfig`-scoped helpers (`sysinfo`, `render_options`) or depend
 on `sysinfo` which itself touches both serialized and runtime
 state.
 Same duplication pattern as Steps 3 and 4: callers stay on
 `Config` during the bridge window; real caller migration happens
 organically in Steps 8-9.
 ## What was changed
 ### Modified files
 - **`src/config/request_context.rs`** — extended the imports
  with 11 new symbols from `super` (parent module constants,
  `StateFlags`, `RoleLike`, `paths`) plus `anyhow`, `env`,
  `PathBuf`, `get_env_name`, and `list_file_names`. Added a new
  `impl RequestContext` block with 13 methods under
  `#[allow(dead_code)]`:
  **Path helpers** (4):
  - `messages_file(&self) -> PathBuf` — agent-aware path to
    the messages log
  - `sessions_dir(&self) -> PathBuf` — agent-aware sessions
    directory
  - `session_file(&self, name) -> PathBuf` — combines
    `sessions_dir` with a session name
  - `rag_file(&self, name) -> PathBuf` — agent-aware RAG file
    path
  **State query** (1):
  - `state(&self) -> StateFlags` — returns bitflags for which
    scopes are currently active
  **Scope info getters** (4):
  - `role_info(&self) -> Result<String>` — exports the current
    role (from session or standalone)
  - `agent_info(&self) -> Result<String>` — exports the current
    agent
  - `agent_banner(&self) -> Result<String>` — returns the
    agent's conversation starter banner
  - `rag_info(&self) -> Result<String>` — exports the current
    RAG
  **Session listings** (2):
  - `list_sessions(&self) -> Vec<String>`
  - `list_autoname_sessions(&self) -> Vec<String>`
  **Misc** (2):
  - `is_compressing_session(&self) -> bool`
  - `role_like_mut(&mut self) -> Option<&mut dyn RoleLike>` —
    returns the currently-active `RoleLike` (session > agent >
    role), the foundation for Step 7's `set_*` methods
  All bodies are copy-pasted verbatim from the originals on
  `Config`, with the following minor adjustments for the new
  module location:
  - Constants like `MESSAGES_FILE_NAME`, `AGENTS_DIR_NAME`,
    `SESSIONS_DIR_NAME` imported from `super::`
  - `paths::` calls unchanged (already in the right module from
    Step 2)
  - `list_file_names` imported from `crate::utils::*` → made
    explicit
  - `get_env_name` imported from `crate::utils::*` → made
    explicit
 ### Unchanged files
 - **`src/config/mod.rs`** — the original `Config` versions of
  all 13 methods are deliberately left intact. They continue to
  work for every existing caller. They get deleted in Step 10
  when `Config` is removed entirely.
 - **All external callers** of `config.messages_file()`,
  `config.state()`, etc. — also unchanged.
 ## Key decisions
 ### 1. Only 13 of 15 methods migrated
 The plan's Step 5 table listed 15 methods. After reading each
 body, I classified them:
 | Method | Classification | Action |
 |---|---|---|
 | `state` | Pure runtime-read | **Migrated** |
 | `messages_file` | Pure runtime-read | **Migrated** |
 | `sessions_dir` | Pure runtime-read | **Migrated** |
 | `session_file` | Pure runtime-read | **Migrated** |
 | `rag_file` | Pure runtime-read | **Migrated** |
 | `role_info` | Pure runtime-read | **Migrated** |
 | `agent_info` | Pure runtime-read | **Migrated** |
 | `agent_banner` | Pure runtime-read | **Migrated** |
 | `rag_info` | Pure runtime-read | **Migrated** |
 | `list_sessions` | Pure runtime-read | **Migrated** |
 | `list_autoname_sessions` | Pure runtime-read | **Migrated** |
 | `is_compressing_session` | Pure runtime-read | **Migrated** |
 | `role_like_mut` | Pure runtime-read (returns `&mut dyn RoleLike`) | **Migrated** |
 | `info` | Delegates to `sysinfo` (mixed) | **Deferred to Step 7** |
 | `session_info` | Calls `render_options` (AppConfig) + runtime | **Deferred to Step 7** |
 See "Deviations from plan" for detail.
 ### 2. Same duplication pattern as Steps 3 and 4
 Callers hold `Config`, not `RequestContext`. Same constraints
 apply:
 - Giving callers a `RequestContext` requires either: (a) a
  sync'd `Arc<RequestContext>` field on `Config` — breaks
  because per-request state mutates constantly, (b) cloning on
  every call — expensive, or (c) duplicating method bodies.
 - Option (c) is the same choice Steps 3 and 4 made.
 - The duplication is 13 methods (~170 lines total) that
  auto-delete in Step 10.
 ### 3. `role_like_mut` is particularly important for Step 7
 I want to flag this one: `role_like_mut(&mut self)` is the
 foundation for every `set_*` method in Step 7 (`set_temperature`,
 `set_top_p`, `set_model`, etc.). Those methods all follow the
 pattern:
 ```rust
 fn set_something(&mut self, value: Option<T>) {
    if let Some(role_like) = self.role_like_mut() {
        role_like.set_something(value);
    } else {
        self.something = value;
    }
 }
 ```
 The `else` branch (fallback to global) is the "mixed" part that
 makes them Step 7 targets. The `if` branch is pure runtime write
 — it mutates whichever `RoleLike` is on top.
 By migrating `role_like_mut` to `RequestContext` in Step 5, Step
 7 can build its new `set_*` methods as `(&mut RequestContext,
 &mut AppConfig, value)` signatures where the runtime path uses
 `ctx.role_like_mut()` directly. The prerequisite is now in place.
 ### 4. Path helpers stay on `RequestContext`, not `AppConfig`
 `messages_file`, `sessions_dir`, `session_file`, and `rag_file`
 all read `self.agent` to decide between global and agent-scoped
 paths. `self.agent` is a runtime field (per-request). Even
 though the returned paths themselves are computed from `paths::`
 functions (no per-request state involved), **the decision of
 which path to return depends on runtime state**. So these
 methods belong on `RequestContext`, not `AppConfig` or `paths`.
 This is the correct split — `paths::` is the "pure path
 computation" layer, `RequestContext::messages_file` etc. are
 the "which path applies to this request" layer on top.
 ### 5. `state`, `info`-style methods do not take `&self.app`
 None of the 13 migrated methods reference `self.app` (the
 `Arc<AppState>`) or any field on `AppConfig`. This is the
 cleanest possible split — they're pure runtime-reads. If they
 needed both runtime state and `AppConfig`, they'd be mixed (like
 `info` and `session_info`, which is why those are deferred).
 ## Deviations from plan
 ### `info` deferred to Step 7
 The plan lists `info` as a Step 5 target. Reading its body:
 ```rust
 pub fn info(&self) -> Result<String> {
    if let Some(agent) = &self.agent {
        // ... agent export with session ...
    } else if let Some(session) = &self.session {
        session.export()
    } else if let Some(role) = &self.role {
        Ok(role.export())
    } else if let Some(rag) = &self.rag {
        rag.export()
    } else {
        self.sysinfo()  // ← falls through to sysinfo
    }
 }
 ```
 The fallback `self.sysinfo()` call is the problem. `sysinfo()`
 (lines 571-644 in `src/config/mod.rs`) reads BOTH serialized
 fields (`wrap`, `rag_reranker_model`, `rag_top_k`,
 `save_session`, `compression_threshold`, `dry_run`,
 `function_calling_support`, `mcp_server_support`, `stream`,
 `save`, `keybindings`, `wrap_code`, `highlight`, `theme`) AND
 runtime fields (`self.rag`, `self.extract_role()` which reads
 `self.session`, `self.agent`, `self.role`, `self.model`, etc.).
 `sysinfo` is a mixed method in the Step 7 sense — it needs both
 `AppConfig` (for the serialized half) and `RequestContext` (for
 the runtime half). The plan's Step 7 mixed-method list includes
 `sysinfo` explicitly.
 Since `info` delegates to `sysinfo` in one of its branches,
 migrating `info` without `sysinfo` would leave that branch
 broken. **Action taken:** left both `Config::info` and
 `Config::sysinfo` intact. Step 7 picks them up as a pair.
 ### `session_info` deferred to Step 7
 The plan lists `session_info` as a Step 5 target. Reading its
 body:
 ```rust
 pub fn session_info(&self) -> Result<String> {
    if let Some(session) = &self.session {
        let render_options = self.render_options()?;  // ← AppConfig method
        let mut markdown_render = MarkdownRender::init(render_options)?;
        // ... reads self.agent for agent_info tuple ...
        session.render(&mut markdown_render, &agent_info)
    } else {
        bail!("No session")
    }
 }
 ```
 It calls `self.render_options()` which is a Step 3 method now
 on `AppConfig`. In the bridge world, the caller holds a
 `Config` and can call `config.render_options()` (old) or
 `config.to_app_config().render_options()` (new but cloning).
 In the post-bridge world with `RequestContext`, the call becomes
 `ctx.app.config.render_options()`.
 Since `session_info` crosses the `AppConfig` / `RequestContext`
 boundary, it's mixed by the Step 7 definition. **Action taken:**
 left `Config::session_info` intact. Step 7 picks it up with a
 signature like
 `(&self, app: &AppConfig) -> Result<String>` or
 `(ctx: &RequestContext) -> Result<String>` where
 `ctx.app.config.render_options()` is called internally.
 ### Step 5 count: 13 methods, not 15
 Documented here so Step 7's scope is explicit. Step 7 picks up
 `info`, `session_info`, `sysinfo`, plus the `set_*` methods and
 other items from the original Step 7 list.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (unchanged from
  Steps 1–4)
 Step 5 added no new tests because it's duplication. Existing
 tests confirm:
 - The original `Config` methods still work
 - `RequestContext` still compiles, imports are clean
 - The bridge's round-trip test still passes
 ### Manual smoke test
 Not applicable — no runtime behavior changed.
 ## Handoff to next step
 ### What Step 6 can rely on
 Step 6 (migrate request-write methods to `RequestContext`) can
 rely on:
 - `RequestContext` now has 13 inherent read methods
 - The `#[allow(dead_code)]` on the read-methods `impl` block is
  safe to leave; callers migrate in Steps 8+
 - `Config` is unchanged for all 13 methods
 - `role_like_mut` is available on `RequestContext` — Step 7
  will use it, and Step 6 might also use it internally when
  implementing write methods like `set_save_session_this_time`
 - The bridge from Step 1, `paths` module from Step 2,
  `AppConfig` methods from Steps 3 and 4 are all unchanged
 - **`Config::info`, `session_info`, and `sysinfo` are still on
  `Config`** and must stay there through Step 6. They're
  Step 7 targets.
 - **`Config::update`, `setup_model`, `load_functions`,
  `load_mcp_servers`, and all `set_*` methods** are also still
  on `Config` and stay there through Step 6.
 ### What Step 6 should watch for
 - **Step 6 targets are request-write methods** — methods that
  mutate the runtime state on `Config` (session, role, agent,
  rag). The plan's Step 6 target list includes:
  `use_prompt`, `use_role` / `use_role_obj`, `exit_role`,
  `edit_role`, `use_session`, `exit_session`, `save_session`,
  `empty_session`, `set_save_session_this_time`,
  `compress_session` / `maybe_compress_session`,
  `autoname_session` / `maybe_autoname_session`,
  `use_rag` / `exit_rag` / `edit_rag_docs` / `rebuild_rag`,
  `use_agent` / `exit_agent` / `exit_agent_session`,
  `apply_prelude`, `before_chat_completion`,
  `after_chat_completion`, `discontinuous_last_message`,
  `init_agent_shared_variables`,
  `init_agent_session_variables`.
 - **Many will be mixed.** Expect to defer several to Step 7.
  In particular, anything that reads `self.functions`,
  `self.mcp_registry`, or calls `set_*` methods crosses the
  boundary. Read each method carefully before migrating.
 - **`maybe_compress_session` and `maybe_autoname_session`** take
  `GlobalConfig` (not `&mut self`) and spawn background tasks
  internally. Their signature in Step 6 will need
  reconsideration — they don't fit cleanly in a
  `RequestContext` method because they're already designed to
  work with a shared lock.
 - **`use_session_safely`, `use_role_safely`** also take
  `GlobalConfig`. They do the `take()`/`replace()` dance with
  the shared lock. Again, these don't fit the
  `&mut RequestContext` pattern cleanly; plan to defer them.
 - **`compress_session` and `autoname_session` are async.** They
  call into the LLM. Their signature on `RequestContext` will
  still be async.
 - **`apply_prelude`** is tricky — it may activate a role/agent/
  session from config strings like `"role:explain"` or
  `"session:temp"`. It calls `use_role`, `use_session`, etc.
  internally. If those get migrated, `apply_prelude` migrates
  too. If any stay on `Config`, `apply_prelude` stays with them.
 - **`discontinuous_last_message`** just clears `self.last_message`.
  Pure runtime-write, trivial to migrate.
 ### What Step 6 should NOT do
 - Don't touch the Step 3, 4, 5 methods on `AppConfig` /
  `RequestContext` — they stay until Steps 8+ caller migration.
 - Don't migrate any `set_*` method, `info`, `session_info`,
  `sysinfo`, `update`, `setup_model`, `load_functions`,
  `load_mcp_servers`, or the `use_session_safely` /
  `use_role_safely` family unless you verify they're pure
  runtime-writes — most aren't, and they're Step 7 targets.
 - Don't migrate callers of any method yet. Callers stay on
  `Config` through the bridge window.
 ### Files to re-read at the start of Step 6
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 6 section
 - This notes file — specifically "What Step 6 should watch for"
 - `src/config/request_context.rs` — current shape with Step 5
  reads
 - Current `Config` method bodies in `src/config/mod.rs` for
  each Step 6 target
 ## Follow-up (not blocking Step 6)
 ### 1. `RequestContext` now has ~200 lines beyond struct definition
 Between Step 0's `new()` constructor and Step 5's 13 read
 methods, `request_context.rs` has grown to ~230 lines. Still
 manageable. Step 6 will add more. Post-Phase 1 cleanup can
 reorganize into multiple `impl` blocks grouped by concern
 (reads/writes/lifecycle) or into separate files if the file
 grows unwieldy.
 ### 2. Duplication count at end of Step 5
 Running tally of methods duplicated between `Config` and the
 new types during the bridge window:
 - `AppConfig` (Steps 3+4): 11 methods
 - `RequestContext` (Step 5): 13 methods
 - `paths::` module (Step 2): 33 free functions (not duplicated
  — `Config` forwarders were deleted in Step 2)
 **Total bridge-window duplication: 24 methods / ~370 lines.**
 All auto-delete in Step 10. Maintenance burden is "any bug fix
 in a migrated method during Steps 6-9 must be applied twice."
 Document this in whatever PR shepherds Steps 6-9.
 ### 3. The `impl` block structure in `RequestContext` is growing
 Now has 2 `impl RequestContext` blocks:
 1. `new()` constructor (Step 0)
 2. 13 read methods (Step 5)
 Step 6 will likely add a third block for writes. That's fine
 during the bridge window; cleanup can consolidate later.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Step 4 notes: `docs/implementation/PHASE-1-STEP-4-NOTES.md`
  (for the duplication rationale)
 - Modified file: `src/config/request_context.rs` (new imports
  + new `impl RequestContext` block with 13 read methods)
 - Unchanged but referenced: `src/config/mod.rs` (original
  `Config` methods still exist, private constants
  `MESSAGES_FILE_NAME` / `AGENTS_DIR_NAME` /
  `SESSIONS_DIR_NAME` accessed via `super::`)
@@ -0,0 +1,405 @@
 # Phase 1 Step 6 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 6: Migrate request-write methods to RequestContext"
 ## Summary
 Added 12 of 27 planned request-write methods to `RequestContext`
 as inherent methods, duplicating the bodies that still exist on
 `Config`. The other 15 methods were deferred: some to Step 6.5
 (because they touch `self.functions` and `self.mcp_registry` —
 runtime fields being restructured by the `ToolScope` / `McpFactory`
 rework), some to Step 7 (because they cross the `AppConfig` /
 `RequestContext` boundary or call into `set_*` mixed methods),
 and some because their `GlobalConfig`-based static signatures
 don't fit the `&mut RequestContext` pattern at all.
 This step has the highest deferral ratio of the bridge phases
 so far (12/27 ≈ 44% migrated). That's by design — Step 6 is
 where the plan hits the bulk of the interesting refactoring
 territory, and it's where the `ToolScope` / `AgentRuntime`
 unification in Step 6.5 makes a big difference in what's
 migrateable.
 ## What was changed
 ### Modified files
 - **`src/config/request_context.rs`** — added 1 new import
  (`Input` from `super::`) and a new `impl RequestContext` block
  with 12 methods under `#[allow(dead_code)]`:
  **Role lifecycle (2):**
  - `use_role_obj(&mut self, role) -> Result<()>` — sets the
    role on the current session, or on `self.role` if no session
    is active; errors if an agent is active
  - `exit_role(&mut self) -> Result<()>` — clears the role from
    session or from `self.role`
  **Session lifecycle (5):**
  - `exit_session(&mut self) -> Result<()>` — saves session on
    exit and clears `self.session`
  - `save_session(&mut self, name) -> Result<()>` — persists
    the current session, optionally renaming
  - `empty_session(&mut self) -> Result<()>` — clears messages
    in the active session
  - `set_save_session_this_time(&mut self) -> Result<()>` — sets
    the session's one-shot save flag
  - `exit_agent_session(&mut self) -> Result<()>` — exits the
    agent's session without exiting the agent
  **RAG lifecycle (1):**
  - `exit_rag(&mut self) -> Result<()>` — drops `self.rag`
  **Chat lifecycle (2):**
  - `before_chat_completion(&mut self, input) -> Result<()>` —
    stores the input as `last_message` with empty output
  - `discontinuous_last_message(&mut self)` — clears the
    continuous flag on the last message
  **Agent variable init (2):**
  - `init_agent_shared_variables(&mut self) -> Result<()>` —
    prompts for agent variables on first activation
  - `init_agent_session_variables(&mut self, new_session) -> Result<()>` —
    syncs agent variables into/from session on new or resumed
    session
  All bodies are copy-pasted verbatim from `Config` with no
  modifications — every one of these methods only touches
  fields that already exist on `RequestContext` with the same
  names and types.
 ### Unchanged files
 - **`src/config/mod.rs`** — all 27 original `Config` methods
  (including the 15 deferred ones) are deliberately left intact.
  They continue to work for every existing caller.
 ## Key decisions
 ### 1. Only 12 of 27 methods migrated
 The plan's Step 6 table listed ~20 methods, but when I scanned
 for `fn (use_prompt|use_role|use_role_obj|...)` I found 27
 (several methods have paired variants: `compress_session` +
 `maybe_compress_session`, `autoname_session` +
 `maybe_autoname_session`, `use_role_safely` vs `use_role`). Of
 those 27, **12 are pure runtime-writes that migrated cleanly**
 and **15 are deferred** to later steps. Full breakdown below.
 ### 2. Same duplication pattern as Steps 3-5
 Callers hold `Config`, not `RequestContext`. Duplication is
 strictly additive during the bridge window and auto-deletes in
 Step 10.
 ### 3. Identified three distinct deferral categories
 The 15 deferred methods fall into three categories, each with
 a different resolution step:
 **Category A: Touch `self.functions` or `self.mcp_registry`**
 (resolved in Step 6.5 when `ToolScope` / `McpFactory` replace
 those fields):
 - `use_role` (async, reinits MCP registry for role's servers)
 - `use_session` (async, reinits MCP registry for session's
  servers)
 **Category B: Call into Step 7 mixed methods** (resolved in
 Step 7):
 - `use_prompt` (calls `self.current_model()`)
 - `edit_role` (calls `self.editor()` + `self.use_role()`)
 - `after_chat_completion` (calls private `save_message` which
  touches `self.save`, `self.session`, `self.agent`, etc.)
 **Category C: Static async methods taking `&GlobalConfig` that
 don't fit the `&mut RequestContext` pattern at all** (resolved
 in Step 8 or a dedicated lifecycle-refactor step):
 - `maybe_compress_session` — takes owned `GlobalConfig`, spawns
  tokio task
 - `compress_session` — async, takes `&GlobalConfig`
 - `maybe_autoname_session` — takes owned `GlobalConfig`, spawns
  tokio task
 - `autoname_session` — async, takes `&GlobalConfig`
 - `use_rag` — async, takes `&GlobalConfig`, calls `Rag::init` /
  `Rag::load` which expect `&GlobalConfig`
 - `edit_rag_docs` — async, takes `&GlobalConfig`, calls into
  `Rag::refresh_document_paths` which expects `&GlobalConfig`
 - `rebuild_rag` — same as `edit_rag_docs`
 - `use_agent` — async, takes `&GlobalConfig`, mutates multiple
  fields under the same write lock, calls
  `Config::use_session_safely`
 - `apply_prelude` — async, calls `self.use_role()` /
  `self.use_session()` which are Category A
 - `exit_agent` — calls `self.load_functions()` which writes
  `self.functions` (runtime, restructured in Step 6.5)
 ### 4. `exit_agent_session` migrated despite calling other methods
 `exit_agent_session` calls `self.exit_session()` and
 `self.init_agent_shared_variables()`. Since both of those are
 also being migrated in Step 6, `exit_agent_session` can
 migrate cleanly and call the new `RequestContext::exit_session`
 and `RequestContext::init_agent_shared_variables` on its own
 struct.
 ### 5. `exit_session` works because Step 5 migrated `sessions_dir`
 `exit_session` calls `self.sessions_dir()` which is now a
 `RequestContext` method (Step 5). Similarly, `save_session`
 calls `self.session_file()` (Step 5) and reads
 `self.working_mode` (a `RequestContext` field). This
 demonstrates how Steps 5 and 6 layer correctly — Step 5's
 reads enable Step 6's writes.
 ### 6. Agent variable init is pure runtime
 `init_agent_shared_variables` and `init_agent_session_variables`
 look complex (they call `Agent::init_agent_variables` which
 can prompt interactively) but they only touch `self.agent`,
 `self.agent_variables`, `self.info_flag`, and `self.session` —
 all runtime fields that exist on `RequestContext`.
 `Agent::init_agent_variables` itself is a static associated
 function on `Agent` that takes `defined_variables`,
 `existing_variables`, and `info_flag` as parameters — no
 `&Config` dependency. Clean migration.
 ## Deviations from plan
 ### 15 methods deferred
 Summary table of every method in the Step 6 target list:
 | Method | Status | Reason |
 |---|---|---|
 | `use_prompt` | **Step 7** | Calls `current_model()` (mixed) |
 | `use_role` | **Step 6.5** | Touches `functions`, `mcp_registry` |
 | `use_role_obj` | ✅ Migrated | Pure runtime-write |
 | `exit_role` | ✅ Migrated | Pure runtime-write |
 | `edit_role` | **Step 7** | Calls `editor()` + `use_role()` |
 | `use_session` | **Step 6.5** | Touches `functions`, `mcp_registry` |
 | `exit_session` | ✅ Migrated | Pure runtime-write (uses Step 5 `sessions_dir`) |
 | `save_session` | ✅ Migrated | Pure runtime-write (uses Step 5 `session_file`) |
 | `empty_session` | ✅ Migrated | Pure runtime-write |
 | `set_save_session_this_time` | ✅ Migrated | Pure runtime-write |
 | `maybe_compress_session` | **Step 7/8** | `GlobalConfig` + spawns task + `light_theme()` |
 | `compress_session` | **Step 7/8** | `&GlobalConfig`, complex LLM workflow |
 | `maybe_autoname_session` | **Step 7/8** | `GlobalConfig` + spawns task + `light_theme()` |
 | `autoname_session` | **Step 7/8** | `&GlobalConfig`, calls `retrieve_role` + LLM |
 | `use_rag` | **Step 7/8** | `&GlobalConfig`, calls `Rag::init`/`Rag::load` |
 | `edit_rag_docs` | **Step 7/8** | `&GlobalConfig`, calls `editor()` + Rag refresh |
 | `rebuild_rag` | **Step 7/8** | `&GlobalConfig`, Rag refresh |
 | `exit_rag` | ✅ Migrated | Trivial (drops `self.rag`) |
 | `use_agent` | **Step 7/8** | `&GlobalConfig`, complex multi-field mutation |
 | `exit_agent` | **Step 6.5** | Calls `load_functions()` which writes `functions` |
 | `exit_agent_session` | ✅ Migrated | Composes migrated methods |
 | `apply_prelude` | **Step 7/8** | Calls `use_role` / `use_session` (deferred) |
 | `before_chat_completion` | ✅ Migrated | Pure runtime-write |
 | `after_chat_completion` | **Step 7** | Calls `save_message` (mixed) |
 | `discontinuous_last_message` | ✅ Migrated | Pure runtime-write |
 | `init_agent_shared_variables` | ✅ Migrated | Pure runtime-write |
 | `init_agent_session_variables` | ✅ Migrated | Pure runtime-write |
 **Step 6 total: 12 migrated, 15 deferred.**
 ### Step 6's deferral load redistributes to later steps
 Running tally of deferrals after Step 6:
 - **Step 6.5 targets:** `use_role`, `use_session`, `exit_agent`
  (3 methods). These must be migrated alongside the
  `ToolScope` / `McpFactory` rework because they reinit or
  inspect the MCP registry.
 - **Step 7 targets:** `use_prompt`, `edit_role`,
  `after_chat_completion`, `select_functions`,
  `select_enabled_functions`, `select_enabled_mcp_servers`
  (from Step 3), `setup_model`, `update` (from Step 4),
  `info`, `session_info`, `sysinfo` (from Step 5),
  **plus** the original Step 7 mixed-method list:
  `current_model`, `extract_role`, `set_temperature`,
  `set_top_p`, `set_enabled_tools`, `set_enabled_mcp_servers`,
  `set_save_session`, `set_compression_threshold`,
  `set_rag_reranker_model`, `set_rag_top_k`,
  `set_max_output_tokens`, `set_model`, `retrieve_role`,
  `use_role_safely`, `use_session_safely`, `save_message`,
  `render_prompt_left`, `render_prompt_right`,
  `generate_prompt_context`, `repl_complete`. This is a big
  step.
 - **Step 7/8 targets (lifecycle refactor):** Session
  compression and autonaming tasks, RAG lifecycle methods,
  `use_agent`, `apply_prelude`. These may want their own
  dedicated step if the Step 7 list gets too long.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (unchanged from
  Steps 1–5)
 Step 6 added no new tests — duplication pattern. Existing
 tests confirm nothing regressed.
 ### Manual smoke test
 Not applicable — no runtime behavior changed. CLI and REPL
 still call `Config::use_role_obj()`, `exit_session()`, etc.
 as before.
 ## Handoff to next step
 ### What Step 6.5 can rely on
 Step 6.5 (unify `ToolScope` / `AgentRuntime` / `McpFactory` /
 `RagCache`) can rely on:
 - `RequestContext` now has **25 inherent methods** across all
  impl blocks (1 constructor + 13 reads from Step 5 + 12
  writes from Step 6)
 - `role_like_mut` is available (Step 5) — foundation for
  Step 7's `set_*` methods
 - `exit_session`, `save_session`, `empty_session`,
  `exit_agent_session`, `init_agent_shared_variables`,
  `init_agent_session_variables` are all on `RequestContext` —
  the `use_role`, `use_session`, and `exit_agent` migrations
  in Step 6.5 can call these directly on the new context type
 - `before_chat_completion`, `discontinuous_last_message`, etc.
  are also on `RequestContext` — available for the new
  `RequestContext` versions of deferred methods
 - `Config::use_role`, `Config::use_session`, `Config::exit_agent`
  are **still on `Config`** and must be handled by Step 6.5's
  `ToolScope` refactoring because they touch `self.functions`
  and `self.mcp_registry`
 - The bridge from Step 1, `paths` module from Step 2, Steps
  3-5 new methods, and all previous deferrals are unchanged
 ### What Step 6.5 should watch for
 - **Step 6.5 is the big architecture step.** It replaces:
  - `Config.functions: Functions` with
    `RequestContext.tool_scope: ToolScope` (containing
    `functions`, `mcp_runtime`, `tool_tracker`)
  - `Config.mcp_registry: Option<McpRegistry>` with
    `AppState.mcp_factory: Arc<McpFactory>` (pool) +
    `ToolScope.mcp_runtime: McpRuntime` (per-scope handles)
  - Agent-scoped supervisor/inbox/todo into
    `RequestContext.agent_runtime: Option<AgentRuntime>`
  - Agent RAG into a shared `AppState.rag_cache: Arc<RagCache>`
 - **Once `ToolScope` exists**, Step 6.5 can migrate `use_role`
  and `use_session` by replacing the `self.functions.clear_*` /
  `McpRegistry::reinit` dance with
  `self.tool_scope = app.mcp_factory.build_tool_scope(...)`.
 - **`exit_agent` calls `self.load_functions()`** which reloads
  the global tools. In the new design, exiting an agent should
  rebuild the `tool_scope` for the now-topmost `RoleLike`. The
  plan's Step 6.5 describes this exact transition.
 - **Phase 5 adds the idle pool to `McpFactory`.** Step 6.5
  ships the no-pool version: `acquire()` always spawns fresh,
  `Drop` always tears down. Correct but not optimized.
 - **`RagCache` serves both standalone and agent RAGs.** Step
  6.5 needs to route `use_rag` (deferred) and agent activation
  through the cache. Since `use_rag` is a Category C deferral
  (takes `&GlobalConfig`), Step 6.5 may not touch it — it may
  need to wait for Step 8.
 ### What Step 6.5 should NOT do
 - Don't touch the 25 methods already on `RequestContext` — they
  stay until Steps 8+ caller migration.
 - Don't touch the `AppConfig` methods from Steps 3-4.
 - Don't migrate the Step 7 targets unless they become
  unblocked by the `ToolScope` / `AgentRuntime` refactor.
 - Don't try to build the `McpFactory` idle pool — that's
  Phase 5.
 ### Files to re-read at the start of Step 6.5
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 6.5 section
  (the biggest single section, ~90 lines)
 - `docs/REST-API-ARCHITECTURE.md` — section 5 (Tool Scope
  Isolation) has the full design for `ToolScope`, `McpRuntime`,
  `McpFactory`, `RagCache`, `AgentRuntime`
 - This notes file — specifically "Category A" deferrals
  (`use_role`, `use_session`, `exit_agent`)
 - `src/config/mod.rs` — current `Config::use_role`,
  `Config::use_session`, `Config::exit_agent` bodies to see
  the MCP/functions handling that needs replacing
 ## Follow-up (not blocking Step 6.5)
 ### 1. `save_message` is private and heavy
 `after_chat_completion` was deferred because it calls the
 private `save_message` method, which is ~50 lines of logic
 touching `self.save` (serialized), `self.session` (runtime),
 `self.agent` (runtime), and the messages file (via
 `self.messages_file()` which is on `RequestContext`). Step 7
 should migrate `save_message` first, then
 `after_chat_completion` can follow.
 ### 2. `Config::use_session_safely` and `use_role_safely` are a pattern to replace
 Both methods do `take(&mut *guard)` on the `GlobalConfig` then
 call the instance method on the taken `Config`, then put it
 back. This pattern exists because `use_role` and `use_session`
 are `&mut self` methods that need to await across the call,
 and the `RwLock` can't be held across `.await`.
 When `use_role` and `use_session` move to `RequestContext` in
 Step 6.5, the `_safely` wrappers can be eliminated entirely —
 the caller just takes `&mut RequestContext` directly. Flag
 this as a cleanup opportunity for Step 8.
 ### 3. `RequestContext` is now ~400 lines
 Counting imports, struct definition, and 3 `impl` blocks:
 ```
 use statements:         ~20 lines
 struct definition:      ~30 lines
 impl 1 (new):           ~25 lines
 impl 2 (reads, Step 5): ~155 lines
 impl 3 (writes, Step 6): ~160 lines
 Total:                  ~390 lines
 ```
 Still manageable. Step 6.5 will add `tool_scope` and
 `agent_runtime` fields plus their methods, pushing toward
 ~500 lines. Post-Phase 1 cleanup should probably split into
 separate files (`reads.rs`, `writes.rs`, `tool_scope.rs`,
 `agent_runtime.rs`) but that's optional.
 ### 4. Bridge-window duplication count at end of Step 6
 Running tally:
 - `AppConfig` (Steps 3+4): 11 methods
 - `RequestContext` (Steps 5+6): 25 methods
 - `paths` module (Step 2): 33 free functions (not duplicated)
 **Total bridge-window duplication: 36 methods / ~550 lines.**
 All auto-delete in Step 10.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Architecture doc: `docs/REST-API-ARCHITECTURE.md`
 - Step 5 notes: `docs/implementation/PHASE-1-STEP-5-NOTES.md`
 - Modified file: `src/config/request_context.rs` (new
  `impl RequestContext` block with 12 write methods, plus
  `Input` import)
 - Unchanged but referenced: `src/config/mod.rs` (original
  `Config` methods still exist for all 27 targets)
@@ -0,0 +1,535 @@
 # Phase 1 Step 6.5 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 6.5: Unify tool/MCP fields into `ToolScope` and
  agent fields into `AgentRuntime`"
 ## Summary
 Step 6.5 is the "big architecture step." The plan describes it as
 a semantic rewrite of scope transitions (`use_role`, `use_session`,
 `use_agent`, `exit_*`) to build and swap `ToolScope` instances via
 a new `McpFactory`, plus an `AgentRuntime` collapse for agent-
 specific state, and a unified `RagCache` on `AppState`.
 **This implementation deviates from the plan.** Rather than doing
 the full semantic rewrite, Step 6.5 ships **scaffolding only**:
 - New types (`ToolScope`, `McpRuntime`, `McpFactory`, `McpServerKey`,
  `RagCache`, `RagKey`, `AgentRuntime`) exist and compile
 - New fields on `AppState` (`mcp_factory`, `rag_cache`) and
  `RequestContext` (`tool_scope`, `agent_runtime`) coexist with
  the existing flat fields
 - The `Config::to_request_context` bridge populates the new
  sub-struct fields with defaults; real values flow through the
  existing flat fields during the bridge window
 - **No scope transitions are rewritten**; `Config::use_role`,
  `Config::use_session`, `Config::use_agent`, `Config::exit_agent`
  stay on `Config` and continue working with the old
  `McpRegistry` / `Functions` machinery
 The semantic rewrite is **deferred to Step 8** when the entry
 points (`main.rs`, `repl/mod.rs`) get rewritten to thread
 `RequestContext` through the pipeline. That's the natural point
 to switch from `Config::use_role` to
 `RequestContext::use_role_with_tool_scope`-style methods, because
 the callers will already be holding the right instance type.
 See "Deviations from plan" for the full rationale.
 ## What was changed
 ### New files
 Four new modules under `src/config/`, all with module docstrings
 explaining their scaffolding status and load-bearing references
 to the architecture + phase plan docs:
 - **`src/config/tool_scope.rs`** (~75 lines)
  - `ToolScope` struct: `functions`, `mcp_runtime`, `tool_tracker`
    with `Default` impl
  - `McpRuntime` struct: wraps a
    `HashMap<String, Arc<ConnectedServer>>` (reuses the existing
    rmcp `RunningService` type)
  - Basic accessors: `is_empty`, `insert`, `get`, `server_names`
  - No `build_from_enabled_list` or similar; that's Step 8
 - **`src/config/mcp_factory.rs`** (~90 lines)
  - `McpServerKey` struct: `name` + `command` + sorted `args` +
    sorted `env` (so identically-configured servers hash to the
    same key and share an `Arc`, while differently-configured
    ones get independent processes — the sharing-vs-isolation
    invariant from architecture doc section 5)
  - `McpFactory` struct:
    `Mutex<HashMap<McpServerKey, Weak<ConnectedServer>>>` for
    future sharing
  - Basic accessors: `active_count`, `try_get_active`,
    `insert_active`
  - **No `acquire()` that actually spawns.** That would require
    lifting the MCP server startup logic out of
    `McpRegistry::init_server` into a factory method. Deferred
    to Step 8 with the scope transition rewrites.
 - **`src/config/rag_cache.rs`** (~90 lines)
  - `RagKey` enum: `Named(String)` vs `Agent(String)` (distinct
    namespaces)
  - `RagCache` struct:
    `RwLock<HashMap<RagKey, Weak<Rag>>>` with weak-ref sharing
  - `try_get`, `insert`, `invalidate`, `entry_count`
  - `load_with<F, Fut>()` — async helper that checks the cache,
    calls a user-provided loader closure on miss, inserts the
    result, and returns the `Arc`. Has a small race window
    between `try_get` and `insert` (two concurrent misses will
    both load); this is acceptable for Phase 1 per the
    architecture doc's "concurrent first-load" note. Tightening
    with a per-key `OnceCell` or `tokio::sync::Mutex` lands in
    Phase 5.
 - **`src/config/agent_runtime.rs`** (~95 lines)
  - `AgentRuntime` struct with every field from the plan:
    `rag`, `supervisor`, `inbox`, `escalation_queue`,
    `todo_list: Option<TodoList>`, `self_agent_id`,
    `parent_supervisor`, `current_depth`, `auto_continue_count`
  - `new()` constructor that takes the required agent context
    (id, supervisor, inbox, escalation queue) and initializes
    optional fields to `None`/`0`
  - `with_rag`, `with_todo_list`, `with_parent_supervisor`,
    `with_depth` builder methods for Step 8's activation path
  - **`todo_list` is `Option<TodoList>`** (opportunistic
    tightening over today's `Config.agent.todo_list:
    TodoList`): the field will be `Some(...)` only when
    `spec.auto_continue == true`, saving an allocation for
    agents that don't use the todo system
 ### Modified files
 - **`src/mcp/mod.rs`** — changed `type ConnectedServer` from
  private to `pub type ConnectedServer` so `tool_scope.rs` and
  `mcp_factory.rs` can reference the type without reaching into
  `rmcp` directly. One-character change (`type` → `pub type`).
 - **`src/config/mod.rs`** — registered 4 new `mod` declarations
  (`agent_runtime`, `mcp_factory`, `rag_cache`, `tool_scope`)
  alphabetically in the module list. No `pub use` re-exports —
  the types are used via their module paths by the parent
  `config` crate's children.
 - **`src/config/app_state.rs`** — added `mcp_factory:
  Arc<McpFactory>` and `rag_cache: Arc<RagCache>` fields, plus
  the corresponding imports. Updated the module docstring to
  reflect the Step 6.5 additions and removed the old "TBD"
  placeholder language about `McpFactory`.
 - **`src/config/request_context.rs`** — added `tool_scope:
  ToolScope` and `agent_runtime: Option<AgentRuntime>` fields
  alongside the existing flat fields, plus imports. Updated
  `RequestContext::new()` to initialize them with
  `ToolScope::default()` and `None`. Rewrote the module
  docstring to explain that flat and sub-struct fields coexist
  during the bridge window.
 - **`src/config/bridge.rs`** — updated
  `Config::to_request_context` to initialize `tool_scope` with
  `ToolScope::default()` and `agent_runtime` with `None` (the
  bridge doesn't try to populate the sub-struct fields because
  they're deferred scaffolding). Updated the three test
  `AppState` constructors to pass `McpFactory::new()` and
  `RagCache::new()` for the new required fields, plus added
  imports for `McpFactory` and `RagCache` in the test module.
 - **`Cargo.toml`** — no changes. `parking_lot` and the rmcp
  dependencies were already present.
 ## Key decisions
 ### 1. **Scaffolding-only, not semantic rewrite**
 This is the biggest decision in Step 6.5 and a deliberate
 deviation from the plan. The plan says Step 6.5 should
 "rewrite scope transitions" (item 5, page 373) to build and
 swap `ToolScope` instances via `McpFactory::acquire()`.
 **Why I did scaffolding only instead:**
 - **Consistency with the bridge pattern.** Steps 3–6 all
  followed the same shape: add new code alongside old, don't
  migrate callers, let Step 8 do the real wiring. The bridge
  pattern works because it keeps every intermediate state
  green and testable. Doing the full Step 6.5 rewrite would
  break that pattern.
 - **Caller migration is a Step 8 concern.** The plan's Step
  6.5 semantics assume callers hold a `RequestContext` and
  can call `ctx.use_role(&app)` to rebuild `ctx.tool_scope`.
  But during the bridge window, callers still hold
  `GlobalConfig` / `&Config` and call `config.use_role(...)`.
  Rewriting `use_role` to take `(&mut RequestContext,
  &AppState)` would either:
  1. Break every existing caller immediately (~20+ callsites),
     forcing a partial Step 8 during Step 6.5, OR
  2. Require a parallel `RequestContext::use_role_with_tool_scope`
     method alongside `Config::use_role`, doubling the
     duplication count for no benefit during the bridge
 - **The plan's Step 6.5 risk note explicitly calls this out:**
  *"Risk: Medium–high. This is where the Phase 1 refactor
  stops being mechanical and starts having semantic
  implications."* The scaffolding-only approach keeps Step 6.5
  mechanical and pushes the semantic risk into Step 8 where it
  can be handled alongside the entry point rewrite. That's a
  better risk localization strategy.
 - **The new types are still proven by construction.**
  `Config::to_request_context` now builds `ToolScope::default()`
  and `agent_runtime: None` on every call, and the bridge
  round-trip test still passes. That proves the types compile,
  have sensible defaults, and don't break the existing runtime
  contract. Step 8 can then swap in real values without
  worrying about type plumbing.
 ### 2. `McpFactory::acquire()` is not implemented
 The plan says Step 6.5 ships a trivial `acquire()` that
 "checks `active` for an upgradable `Weak`, otherwise spawns
 fresh" and "drops tear down the subprocess directly."
 I wrote the `Mutex<HashMap<McpServerKey, Weak<ConnectedServer>>>`
 field and the `try_get_active` / `insert_active` building
 blocks, but not an `acquire()` method. The reason is that
 actually spawning an MCP subprocess requires lifting the
 current spawning logic out of `McpRegistry::init_server` (in
 `src/mcp/mod.rs`) — that's a ~60 line chunk of tokio child
 process setup, rmcp handshake, and error handling that's
 tightly coupled to `McpRegistry`. Extracting it as a factory
 method is a meaningful refactor that belongs alongside the
 Step 8 caller migration, not as orphaned scaffolding that
 nobody calls.
 The `try_get_active` and `insert_active` primitives are the
 minimum needed for Step 8's `acquire()` implementation to be
 a thin wrapper.
 ### 3. Sub-struct fields coexist with flat fields
 `RequestContext` now has both:
 - **Flat fields** (`functions`, `tool_call_tracker`,
  `supervisor`, `inbox`, `root_escalation_queue`,
  `self_agent_id`, `current_depth`, `parent_supervisor`) —
  populated by `Config::to_request_context` during the bridge
 - **Sub-struct fields** (`tool_scope: ToolScope`,
  `agent_runtime: Option<AgentRuntime>`) — default-
  initialized in `RequestContext::new()` and by the bridge;
  real population happens in Step 8
 This is deliberate scaffolding, not a refactor miss. The
 module docstring explicitly explains this so a reviewer
 doesn't try to "fix" the apparent duplication.
 When Step 8 migrates `use_role` and friends to `RequestContext`,
 those methods will populate `tool_scope` and `agent_runtime`
 directly. The flat fields will become stale / unused during
 Step 8 and get deleted alongside `Config` in Step 10.
 ### 4. `ConnectedServer` visibility bump
 The minimum change to `src/mcp/mod.rs` was making
 `type ConnectedServer` public (`pub type ConnectedServer`).
 This lets `tool_scope.rs` and `mcp_factory.rs` reference the
 live MCP handle type directly without either:
 1. Reaching into `rmcp::service::RunningService<RoleClient, ()>`
   from the config crate (tight coupling to rmcp)
 2. Inventing a new `McpServerHandle` wrapper (premature
   abstraction that would need to be unwrapped later)
 The visibility change is bounded: `ConnectedServer` is only
 used from within the `loki` crate, and `pub` here means
 "visible to the whole crate" via Rust's module privacy, not
 "part of Loki's external API."
 ### 5. `todo_list: Option<TodoList>` tightening
 `AgentRuntime.todo_list: Option<TodoList>` (vs today's
 `Agent.todo_list: TodoList` with `Default::default()` always
 allocated). This is an opportunistic memory optimization
 during the scaffolding phase: when Step 8 populates
 `AgentRuntime`, it should allocate `Some(TodoList::default())`
 only when `spec.auto_continue == true`. Agents without
 auto-continue skip the allocation entirely.
 This is documented in the `agent_runtime.rs` module docstring
 so a reviewer doesn't try to "fix" the `Option` into a bare
 `TodoList`.
 ## Deviations from plan
 ### Full plan vs this implementation
 | Plan item | Status |
 |---|---|
 | Implement `McpRuntime` and `ToolScope` | ✅ Done (scaffolding) |
 | Implement `McpFactory` — no pool, `acquire()` | ⚠️ **Partial** — types + accessors, no `acquire()` |
 | Implement `RagCache` with `RagKey`, weak-ref sharing, per-key serialization | ✅ Done (scaffolding, no per-key serialization — Phase 5) |
 | Implement `AgentRuntime` with `Option<TodoList>` and agent RAG | ✅ Done (scaffolding) |
 | Rewrite scope transitions (`use_role`, `use_session`, `use_agent`, `exit_*`, `update`) | ❌ **Deferred to Step 8** |
 | `use_rag` rewritten to use `RagCache` | ❌ **Deferred to Step 8** |
 | Agent activation populates `AgentRuntime`, serves RAG from cache | ❌ **Deferred to Step 8** |
 | `exit_agent` rebuilds parent's `ToolScope` | ❌ **Deferred to Step 8** |
 | Sub-agent spawning constructs fresh `RequestContext` | ❌ **Deferred to Step 8** |
 | Remove old `Agent::init` registry-mutation logic | ❌ **Deferred to Step 8** |
 | `rebuild_rag` / `edit_rag_docs` use `rag_cache.invalidate` | ❌ **Deferred to Step 8** |
 All the ❌ items are semantic rewrites that require caller
 migration to take effect. Deferring them keeps Step 6.5
 strictly additive and consistent with Steps 3–6. Step 8 will
 do the semantic rewrite with the benefit of all the
 scaffolding already in place.
 ### Impact on Step 7
 Step 7 is unchanged. The mixed methods (including Steps 3–6
 deferrals like `current_model`, `extract_role`, `sysinfo`,
 `info`, `session_info`, `use_prompt`, etc.) still need to be
 split into explicit `(&AppConfig, &RequestContext)` signatures
 the same way the plan originally described. They don't depend
 on the `ToolScope` / `McpFactory` rewrite being done.
 ### Impact on Step 8
 Step 8 absorbs the full Step 6.5 semantic rewrite. The
 original Step 8 scope was "rewrite entry points" — now it
 also includes "rewrite scope transitions to use new types."
 This is actually the right sequencing because callers and
 their call sites migrate together.
 The Step 8 scope is now substantially bigger than originally
 planned. The plan should be updated to reflect this, either
 by splitting Step 8 into 8a (scope transitions) + 8b (entry
 points) or by accepting the bigger Step 8.
 ### Impact on Phase 5
 Phase 5's "MCP pooling" scope is unchanged. Phase 5 adds the
 idle pool + reaper + health checks to an already-working
 `McpFactory::acquire()`. If Step 8 lands the working
 `acquire()`, Phase 5 plugs in the pool; if Step 8 somehow
 ships without `acquire()`, Phase 5 has to write it too.
 Phase 5's plan doc should note this dependency.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (unchanged from
  Steps 1–6)
 The bridge round-trip tests are the critical check for this
 step because they construct `AppState` instances, and
 `AppState` now has two new required fields. All three tests
 (`to_app_config_copies_every_serialized_field`,
 `to_request_context_copies_every_runtime_field`,
 `round_trip_preserves_all_non_lossy_fields`,
 `round_trip_default_config`) pass after updating the
 `AppState` constructors in the test module.
 ### Manual smoke test
 Not applicable — no runtime behavior changed. CLI and REPL
 still call `Config::use_role()`, `Config::use_session()`,
 etc. and those still work against the old `McpRegistry` /
 `Functions` machinery.
 ## Handoff to next step
 ### What Step 7 can rely on
 Step 7 (mixed methods) can rely on:
 - **Zero changes to existing `Config` methods or fields.**
  Step 6.5 didn't touch any of the Step 7 targets.
 - **New sub-struct fields exist on `RequestContext`** but are
  default-initialized and shouldn't be consulted by any
  Step 7 mixed-method migration. If a Step 7 method legitimately
  needs `tool_scope` or `agent_runtime` (e.g., because it's
  reading the active tool set), that's a signal the method
  belongs in Step 8, not Step 7.
 - **`AppConfig` methods from Steps 3-4 are unchanged.**
 - **`RequestContext` methods from Steps 5-6 are unchanged.**
 - **`Config::use_role`, `Config::use_session`,
  `Config::use_agent`, `Config::exit_agent`, `Config::use_rag`,
  `Config::edit_rag_docs`, `Config::rebuild_rag`,
  `Config::apply_prelude` are still on `Config`** and must
  stay there through Step 7. They're Step 8 targets.
 ### What Step 7 should watch for
 - **Step 7 targets the 17 mixed methods** from the plan's
  original table plus the deferrals accumulated from Steps
  3–6 (`select_functions`, `select_enabled_functions`,
  `select_enabled_mcp_servers`, `setup_model`, `update`,
  `info`, `session_info`, `sysinfo`, `use_prompt`, `edit_role`,
  `after_chat_completion`).
 - **The "mixed" category means: reads/writes BOTH serialized
  config AND runtime state.** The migration shape is to split
  them into explicit
  `fn foo(app: &AppConfig, ctx: &RequestContext)` or
  `fn foo(app: &AppConfig, ctx: &mut RequestContext)`
  signatures.
 - **Watch for methods that also touch `self.functions` or
  `self.mcp_registry`.** Those need `tool_scope` /
  `mcp_factory` which aren't ready yet. If a mixed method
  depends on the tool scope rewrite, defer it to Step 8
  alongside the scope transitions.
 - **`current_model` is the simplest Step 7 target** — it just
  picks the right `Model` reference from session/agent/role/
  global. Good first target to validate the Step 7 pattern.
 - **`sysinfo` is the biggest Step 7 target** — ~70 lines of
  reading both `AppConfig` serialized state and
  `RequestContext` runtime state to produce a display string.
 - **`set_*` methods all follow the pattern from the plan's
  Step 7 table:**
  ```rust
  fn set_foo(&mut self, value: ...) {
      if let Some(rl) = self.role_like_mut() { rl.set_foo(value) }
      else { self.foo = value }
  }
  ```
  The new signature splits this: the `role_like` branch moves
  to `RequestContext` (using the Step 5 `role_like_mut`
  helper), the fallback branch moves to `AppConfig` via
  `AppConfig::set_foo`. Callers then call either
  `ctx.set_foo_via_role_like(value)` or
  `app_config.set_foo(value)` depending on context.
 - **`update` is a dispatcher** — once all the `set_*` methods
  are split, `update` migrates to live on `RequestContext`
  (because it needs both `ctx.set_*` and `app.set_*` to
  dispatch to).
 ### What Step 7 should NOT do
 - Don't touch the 4 new types from Step 6.5 (`ToolScope`,
  `McpRuntime`, `McpFactory`, `RagCache`, `AgentRuntime`).
  They're scaffolding, untouched until Step 8.
 - Don't try to populate `tool_scope` or `agent_runtime` from
  any Step 7 migration. Those are Step 8.
 - Don't migrate `use_role`, `use_session`, `use_agent`,
  `exit_agent`, or any method that touches
  `self.mcp_registry` / `self.functions`. Those are Step 8.
 - Don't migrate callers of any migrated method.
 - Don't touch the bridge's `to_request_context` /
  `to_app_config` / `from_parts`. The round-trip still
  works with `tool_scope` and `agent_runtime` defaulting.
 ### Files to re-read at the start of Step 7
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 7 section (the
  17-method table starting at line ~525)
 - This notes file — specifically the accumulated deferrals
  list from Steps 3-6 in the "What Step 7 should watch for"
  section
 - Step 6 notes — which methods got deferred from Step 6 vs
  Step 7 boundary
 ## Follow-up (not blocking Step 7)
 ### 1. Step 8's scope is now significantly larger
 The original Phase 1 plan estimated Step 8 as "rewrite
 `main.rs` and `repl/mod.rs` to use `RequestContext`" — a
 meaningful but bounded refactor. After Step 6.5's deferral,
 Step 8 also includes:
 - Implementing `McpFactory::acquire()` by extracting server
  startup logic from `McpRegistry::init_server`
 - Rewriting `use_role`, `use_session`, `use_agent`,
  `exit_agent`, `use_rag`, `edit_rag_docs`, `rebuild_rag`,
  `apply_prelude`, agent sub-spawning
 - Wiring `tool_scope` population into all the above
 - Populating `agent_runtime` on agent activation
 - Building the parent-scope `ToolScope` restoration logic in
  `exit_agent`
 - Routing `rebuild_rag` / `edit_rag_docs` through
  `RagCache::invalidate`
 This is a big step. The phase plan should be updated to
 either split Step 8 into sub-steps or to flag the expanded
 scope.
 ### 2. `McpFactory::acquire()` extraction is its own mini-project
 Looking at `src/mcp/mod.rs`, the subprocess spawn + rmcp
 handshake lives inside `McpRegistry::init_server` (private
 method, ~60 lines). Step 8's first task should be extracting
 this into a pair of functions:
 1. `McpFactory::spawn_fresh(spec: &McpServerSpec) ->
   Result<ConnectedServer>` — pure subprocess + handshake
   logic
 2. `McpRegistry::init_server` — wraps `spawn_fresh` with
   registry bookkeeping (adds to `servers` map, fires catalog
   discovery, etc.) for backward compat
 Then `McpFactory::acquire()` can call `spawn_fresh` on cache
 miss. The existing `McpRegistry::init_server` keeps working
 for the bridge window callers.
 ### 3. The `load_with` race is documented but not fixed
 `RagCache::load_with` has a race window: two concurrent
 callers with the same key both miss the cache, both call
 the loader closure, both insert into the map. The second
 insert overwrites the first. Both callers end up with valid
 `Arc<Rag>`s but the cache sharing is broken for that
 instant.
 For Phase 1 Step 6.5, this is acceptable because the cache
 isn't populated by real usage yet. Phase 5's pooling work
 should tighten this with per-key `OnceCell` or
 `tokio::sync::Mutex`.
 ### 4. Bridge-window duplication count at end of Step 6.5
 Running tally:
 - `AppConfig` (Steps 3+4): 11 methods duplicated with `Config`
 - `RequestContext` (Steps 5+6): 25 methods duplicated with
  `Config` (1 constructor + 13 reads + 12 writes)
 - `paths` module (Step 2): 33 free functions (not duplicated)
 - **Step 6.5 NEW:** 4 types + 2 `AppState` fields + 2
  `RequestContext` fields — **all additive scaffolding, no
  duplication of logic**
 **Total bridge-window duplication: 36 methods / ~550 lines**,
 unchanged from end of Step 6. Step 6.5 added types but not
 duplicated logic.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Architecture doc: `docs/REST-API-ARCHITECTURE.md` section 5
 - Phase 5 plan: `docs/PHASE-5-IMPLEMENTATION-PLAN.md`
 - Step 6 notes: `docs/implementation/PHASE-1-STEP-6-NOTES.md`
 - New files:
  - `src/config/tool_scope.rs`
  - `src/config/mcp_factory.rs`
  - `src/config/rag_cache.rs`
  - `src/config/agent_runtime.rs`
 - Modified files:
  - `src/mcp/mod.rs` (`type ConnectedServer` → `pub type`)
  - `src/config/mod.rs` (4 new `mod` declarations)
  - `src/config/app_state.rs` (2 new fields + docstring)
  - `src/config/request_context.rs` (2 new fields + docstring)
  - `src/config/bridge.rs` (3 test `AppState` constructors
    updated, `to_request_context` adds 2 defaults)
@@ -0,0 +1,536 @@
 # Phase 1 Step 7 — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 7: Tackle mixed methods (THE HARD PART)"
 ## Summary
 Added 14 mixed-method splits to the new types, plus 6 global-
 default setters on `AppConfig`. The methods that mix serialized
 config reads/writes with runtime state reads/writes are now
 available on `RequestContext` with `&AppConfig` as an explicit
 parameter for the serialized half.
 Same bridge pattern as Steps 3–6: `Config`'s originals stay
 intact, new methods sit alongside, caller migration happens in
 Step 8.
 **Step 7 completed ~65% of its planned scope.** Nine target
 methods were deferred to Step 8 because they transitively
 depend on `Model::retrieve_model(&Config)` and
 `list_models(&Config)` — refactoring those requires touching
 the `client` module macros, which is beyond Step 7's bridge-
 pattern scope. Step 8 will rewrite them alongside the entry
 point migration.
 ## What was changed
 ### Modified files
 - **`src/config/app_config.rs`** — added a third `impl AppConfig`
  block with 6 `set_*_default` methods for the serialized-field
  half of the mixed-method splits:
  - `set_temperature_default`
  - `set_top_p_default`
  - `set_enabled_tools_default`
  - `set_enabled_mcp_servers_default`
  - `set_save_session_default`
  - `set_compression_threshold_default`
 - **`src/config/request_context.rs`** — added a fourth
  `impl RequestContext` block with 14 methods:
  **Helpers (2):**
  - `current_model(&self) -> &Model` — pure runtime traversal
    (session > agent > role > ctx.model)
  - `extract_role(&self, app: &AppConfig) -> Role` — pure
    runtime except fallback reads `app.temperature`,
    `app.top_p`, `app.enabled_tools`, `app.enabled_mcp_servers`
  **Role-like setters (7):** these all return `bool`
  indicating whether they mutated a `RoleLike` (if `false`,
  the caller should fall back to
  `app.set_<name>_default()`). This preserves the exact
  semantics of today's `Config::set_*` methods:
  - `set_temperature_on_role_like`
  - `set_top_p_on_role_like`
  - `set_enabled_tools_on_role_like`
  - `set_enabled_mcp_servers_on_role_like`
  - `set_save_session_on_session` (uses `self.session` directly,
    not `role_like_mut`)
  - `set_compression_threshold_on_session` (same)
  - `set_max_output_tokens_on_role_like`
  **Chat lifecycle (2):**
  - `save_message(&mut self, app: &AppConfig, input, output)` —
    writes to session if present, else to messages file if
    `app.save` is true
  - `after_chat_completion(&mut self, app, input, output,
    tool_results)` — updates `last_message`, calls
    `save_message` if not `app.dry_run`
  - `open_message_file(&self) -> Result<File>` — private
    helper
  **Info getters (3):**
  - `sysinfo(&self, app: &AppConfig) -> Result<String>` —
    ~70-line display output mixing serialized and runtime
    state
  - `info(&self, app: &AppConfig) -> Result<String>` —
    delegates to `sysinfo` in fallback branch
  - `session_info(&self, app: &AppConfig) -> Result<String>` —
    calls `app.render_options()`
  **Prompt rendering (3):**
  - `generate_prompt_context(&self, app) -> HashMap<&str, String>` —
    builds the template variable map
  - `render_prompt_left(&self, app) -> String`
  - `render_prompt_right(&self, app) -> String`
  **Function selection (3):**
  - `select_enabled_functions(&self, app, role) -> Vec<FunctionDeclaration>` —
    filters `ctx.functions.declarations()` by role's enabled
    tools + agent filters + user interaction functions
  - `select_enabled_mcp_servers(&self, app, role) -> Vec<...>` —
    same pattern for MCP meta-functions
  - `select_functions(&self, app, role) -> Option<Vec<...>>` —
    combines both
 - **`src/config/mod.rs`** — bumped `format_option_value` from
  private to `pub(super)` so `request_context.rs` can use it
  as `super::format_option_value`.
 ### Unchanged files
 - **`src/config/mod.rs`** — all Step 7 target methods still
  exist on `Config`. They continue to work for every current
  caller.
 ## Key decisions
 ### 1. Same bridge pattern as Steps 3-6
 Step 7 follows the same additive pattern as earlier steps: new
 methods on `AppConfig` / `RequestContext`, `Config`'s originals
 untouched, no caller migration. Caller migration is Step 8.
 The plan's Step 7 description implied a semantic rewrite
 ("split into explicit parameter passing") but that phrasing
 applies to the target signatures, not the migration mechanism.
 The bridge pattern achieves the same end state — methods with
 `(&AppConfig, &RequestContext)` signatures exist and are ready
 for Step 8 to call.
 ### 2. `set_*` methods split into `_on_role_like` + `_default` pair
 Today's `Config::set_temperature` does:
 ```rust
 match self.role_like_mut() {
    Some(role_like) => role_like.set_temperature(value),
    None => self.temperature = value,
 }
 ```
 The Step 7 split:
 ```rust
 // On RequestContext:
 fn set_temperature_on_role_like(&mut self, value) -> bool {
    match self.role_like_mut() {
        Some(rl) => { rl.set_temperature(value); true }
        None => false,
    }
 }
 // On AppConfig:
 fn set_temperature_default(&mut self, value) {
    self.temperature = value;
 }
 ```
 **The bool return** is the caller contract: if `_on_role_like`
 returns `false`, the caller must call
 `app.set_*_default(value)`. This is what Step 8 callers will
 do:
 ```rust
 if !ctx.set_temperature_on_role_like(value) {
    Arc::get_mut(&mut app.config).unwrap().set_temperature_default(value);
 }
 ```
 (Or more likely, the AppConfig mutation gets hidden behind a
 helper on `AppState` since `AppConfig` is behind `Arc`.)
 This split is semantically equivalent to the existing
 behavior while making the "where the value goes" decision
 explicit at the type level.
 ### 3. `save_message` and `after_chat_completion` migrated together
 `after_chat_completion` reads `app.dry_run` and calls
 `save_message`, which reads `app.save`. Both got deferred from
 Step 6 for exactly this mixed-dependency reason. Step 7
 migrates them together:
 ```rust
 pub fn after_chat_completion(
    &mut self,
    app: &AppConfig,
    input: &Input,
    output: &str,
    tool_results: &[ToolResult],
 ) -> Result<()> {
    if !tool_results.is_empty() { return Ok(()); }
    self.last_message = Some(LastMessage::new(input.clone(), output.to_string()));
    if !app.dry_run {
        self.save_message(app, input, output)?;
    }
    Ok(())
 }
 ```
 The `open_message_file` helper moved along with them since
 it's only called from `save_message`.
 ### 4. `format_option_value` visibility bump
 `format_option_value` is a tiny private helper in
 `src/config/mod.rs` that `sysinfo` uses. Step 7's new
 `RequestContext::sysinfo` needs to call it, so I bumped its
 visibility from `fn` to `pub(super)`. This is a minimal
 change (one word) that lets child modules reuse the helper
 without duplicating it.
 ### 5. `select_*` methods were Step 3 deferrals
 The plan's Step 3 table originally listed `select_functions`,
 `select_enabled_functions`, and `select_enabled_mcp_servers`
 as global-read method targets. Step 3's notes correctly
 flagged them as actually-mixed because they read `self.functions`
 and `self.agent` (runtime, not serialized).
 Step 7 is the right home for them. They take
 `(&self, app: &AppConfig, role: &Role)` and read:
 - `ctx.functions.declarations()` (runtime — existing flat
  field, will collapse into `tool_scope.functions` in Step 8+)
 - `ctx.agent` (runtime)
 - `app.function_calling_support`, `app.mcp_server_support`,
  `app.mapping_tools`, `app.mapping_mcp_servers` (serialized)
 The implementations are long (~80 lines each) but are
 verbatim copies of the `Config` originals with `self.X`
 replaced by `app.X` for serialized fields and `self.X`
 preserved for runtime fields.
 ### 6. `session_info` keeps using `crate::render::MarkdownRender`
 I didn't add a top-level `use crate::render::MarkdownRender`
 because it's only called from `session_info`. Inline
 `crate::render::MarkdownRender::init(...)` is clearer than
 adding another global import for a single use site.
 ### 7. Imports grew substantially
 `request_context.rs` now imports from 7 new sources compared
 to the end of Step 6:
 - `super::AppConfig` (for the mixed-method params)
 - `super::MessageContentToolCalls` (for `save_message`)
 - `super::LEFT_PROMPT`, `super::RIGHT_PROMPT` (for prompt
  rendering)
 - `super::ensure_parent_exists` (for `open_message_file`)
 - `crate::function::FunctionDeclaration`,
  `crate::function::user_interaction::USER_FUNCTION_PREFIX`
 - `crate::mcp::MCP_*_META_FUNCTION_NAME_PREFIX` (3 constants)
 - `std::collections::{HashMap, HashSet}`,
  `std::fs::{File, OpenOptions}`, `std::io::Write`,
  `std::path::Path`, `crate::utils::{now, render_prompt}`
 This is expected — Step 7's methods are the most
 dependency-heavy in Phase 1. Post–Phase 1 cleanup can
 reorganize into separate files if the module becomes
 unwieldy.
 ## Deviations from plan
 ### 9 methods deferred to Step 8
 | Method | Why deferred |
 |---|---|
 | `retrieve_role` | Calls `Model::retrieve_model(&Config)` transitively, needs client module refactor |
 | `set_model` | Calls `Model::retrieve_model(&Config)` transitively |
 | `set_rag_reranker_model` | Takes `&GlobalConfig`, uses `update_rag` helper with Arc<RwLock> take/replace pattern |
 | `set_rag_top_k` | Same as above |
 | `update` | Dispatcher over all `set_*` methods including the 2 above, plus takes `&GlobalConfig` and touches `mcp_registry` |
 | `repl_complete` | Calls `list_models(&Config)` + reads `self.mcp_registry` (going away in Step 6.5/8), + reads `self.functions` |
 | `use_role_safely` | Takes `&GlobalConfig`, does `take()`/`replace()` on Arc<RwLock> |
 | `use_session_safely` | Same as above |
 | `setup_model` | Calls `self.set_model()` which is deferred |
 | `use_prompt` (Step 6 deferral) | Calls `current_model()` (migratable) and `use_role_obj` (migrated in Step 6), but the whole method is 4 lines and not independently useful without its callers |
 | `edit_role` (Step 6 deferral) | Calls `self.upsert_role()` and `self.use_role()` which are Step 8 |
 **Root cause of most deferrals:** the `client` module's
 `list_all_models` macro and `Model::retrieve_model` take
 `&Config`. Refactoring them to take `&AppConfig` is a
 meaningful cross-module change that belongs in Step 8
 alongside the caller migration.
 ### 14 methods migrated
 | Method | New signature |
 |---|---|
 | `current_model` | `&self -> &Model` (pure RequestContext) |
 | `extract_role` | `(&self, &AppConfig) -> Role` |
 | `set_temperature_on_role_like` | `(&mut self, Option<f64>) -> bool` |
 | `set_top_p_on_role_like` | `(&mut self, Option<f64>) -> bool` |
 | `set_enabled_tools_on_role_like` | `(&mut self, Option<String>) -> bool` |
 | `set_enabled_mcp_servers_on_role_like` | `(&mut self, Option<String>) -> bool` |
 | `set_save_session_on_session` | `(&mut self, Option<bool>) -> bool` |
 | `set_compression_threshold_on_session` | `(&mut self, Option<usize>) -> bool` |
 | `set_max_output_tokens_on_role_like` | `(&mut self, Option<isize>) -> bool` |
 | `save_message` | `(&mut self, &AppConfig, &Input, &str) -> Result<()>` |
 | `after_chat_completion` | `(&mut self, &AppConfig, &Input, &str, &[ToolResult]) -> Result<()>` |
 | `sysinfo` | `(&self, &AppConfig) -> Result<String>` |
 | `info` | `(&self, &AppConfig) -> Result<String>` |
 | `session_info` | `(&self, &AppConfig) -> Result<String>` |
 | `generate_prompt_context` | `(&self, &AppConfig) -> HashMap<&str, String>` |
 | `render_prompt_left` | `(&self, &AppConfig) -> String` |
 | `render_prompt_right` | `(&self, &AppConfig) -> String` |
 | `select_functions` | `(&self, &AppConfig, &Role) -> Option<Vec<...>>` |
 | `select_enabled_functions` | `(&self, &AppConfig, &Role) -> Vec<...>` |
 | `select_enabled_mcp_servers` | `(&self, &AppConfig, &Role) -> Vec<...>` |
 Actually that's 20 methods across the two types (6 on
 `AppConfig`, 14 on `RequestContext`). "14 migrated" refers to
 the 14 behavior methods on `RequestContext`; the 6 on
 `AppConfig` are the paired defaults for the 7 role-like
 setters (4 `set_*_default` + 2 session-specific — the
 `set_max_output_tokens` split doesn't need a default
 because `ctx.model.set_max_tokens()` works without a
 fallback).
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (unchanged from
  Steps 1–6.5)
 The bridge's round-trip test still passes, confirming the new
 methods don't interfere with struct layout or the
 `Config → AppConfig + RequestContext → Config` invariant.
 ### Manual smoke test
 Not applicable — no runtime behavior changed. CLI and REPL
 still call `Config::set_temperature`, `Config::sysinfo`,
 `Config::save_message`, etc. as before.
 ## Handoff to next step
 ### What Step 8 can rely on
 Step 8 (entry point rewrite) can rely on:
 - **`AppConfig` now has 17 methods** (Steps 3+4+7): 7 reads
  + 4 writes + 6 setter-defaults
 - **`RequestContext` now has 39 inherent methods** across 5
  impl blocks: 1 constructor + 13 reads + 12 writes + 14
  mixed
 - **All of `AppConfig`'s and `RequestContext`'s new methods
  are under `#[allow(dead_code)]`** — that's safe to leave
  alone; callers wire them up in Step 8 and the allows
  become inert
 - **`format_option_value` is `pub(super)`** — accessible
  from any `config` child module
 - **The bridge (`Config::to_app_config`, `to_request_context`,
  `from_parts`) still works** and all round-trip tests pass
 - **The `paths` module, Step 3/4 `AppConfig` methods, Step
  5/6 `RequestContext` methods, Step 6.5 scaffolding types
  are all unchanged**
 - **These `Config` methods are still on `Config`** and must
  stay there through Step 8 (they're Step 8 targets):
  - `retrieve_role`, `set_model`, `set_rag_reranker_model`,
    `set_rag_top_k`, `update`, `repl_complete`,
    `use_role_safely`, `use_session_safely`, `setup_model`,
    `use_prompt`, `edit_role`
  - Plus the Step 6 Category A deferrals: `use_role`,
    `use_session`, `use_agent`, `exit_agent`
  - Plus the Step 6 Category C deferrals: `compress_session`,
    `maybe_compress_session`, `autoname_session`,
    `maybe_autoname_session`, `use_rag`, `edit_rag_docs`,
    `rebuild_rag`, `apply_prelude`
 ### What Step 8 should watch for
 **Step 8 is the biggest remaining step** after Step 6.5
 deferred its scope-transition rewrites. Step 8 now absorbs:
 1. **Entry point rewrite** (original Step 8 scope):
   - `main.rs::run()` constructs `AppState` + `RequestContext`
     instead of `GlobalConfig`
   - `main.rs::start_directive()` takes
     `&mut RequestContext` instead of `&GlobalConfig`
   - `main.rs::create_input()` takes `&RequestContext`
   - `repl/mod.rs::Repl` holds a long-lived `RequestContext`
     instead of `GlobalConfig`
   - All 91 callsites in the original migration table
 2. **`Model::retrieve_model` refactor** (Step 7 deferrals):
   - `Model::retrieve_model(config: &Config, ...)` →
     `Model::retrieve_model(config: &AppConfig, ...)`
   - `list_all_models!(config: &Config)` macro →
     `list_all_models!(config: &AppConfig)`
   - `list_models(config: &Config, ...)` →
     `list_models(config: &AppConfig, ...)`
   - Then migrate `retrieve_role`, `set_model`,
     `repl_complete`, `setup_model`
 3. **RAG lifecycle migration** (Step 7 deferrals +
   Step 6 Category C):
   - `use_rag`, `edit_rag_docs`, `rebuild_rag` →
     `RequestContext` methods using `RagCache`
   - `set_rag_reranker_model`, `set_rag_top_k` → split
     similarly to Step 7 setters
 4. **Scope transition rewrites** (Step 6.5 deferrals):
   - `use_role`, `use_session`, `use_agent`, `exit_agent`
     rewritten to build `ToolScope` via `McpFactory`
   - `McpFactory::acquire()` extracted from
     `McpRegistry::init_server`
   - `use_role_safely`, `use_session_safely` eliminated
     (not needed once callers hold `&mut RequestContext`)
 5. **Session lifecycle migration** (Step 6 Category C):
   - `compress_session`, `maybe_compress_session`,
     `autoname_session`, `maybe_autoname_session` → methods
     that take `&mut RequestContext` instead of spawning
     tasks with `GlobalConfig`
   - `apply_prelude` → uses migrated `use_role` /
     `use_session`
 6. **`update` dispatcher** (Step 7 deferral):
   - Once all `set_*` are available on `RequestContext` and
     `AppConfig`, `update` becomes a dispatcher over the
     new split pair
 This is a **huge** step. Consider splitting into 8a-8f
 sub-steps or staging across multiple PRs.
 ### What Step 8 should NOT do
 - Don't re-migrate any Step 3-7 method
 - Don't touch the new types from Step 6.5 unless actually
  implementing `McpFactory::acquire()` or
  `RagCache::load_with` usage
 - Don't leave intermediate states broken — each sub-step
  should keep the build green, even if it means keeping
  temporary dual code paths
 ### Files to re-read at the start of Step 8
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 8 section
 - This notes file — specifically the deferrals table and
  Step 8 watch items
 - Step 6.5 notes — scope transition rewrite details
 - Step 6 notes — Category C deferral inventory
 - `src/config/mod.rs` — still has ~25 methods that need
  migrating
 ## Follow-up (not blocking Step 8)
 ### 1. Bridge-window duplication count at end of Step 7
 Running tally:
 - `AppConfig` (Steps 3+4+7): 17 methods (11 reads/writes +
  6 setter-defaults)
 - `RequestContext` (Steps 5+6+7): 39 methods (1 constructor +
  13 reads + 12 writes + 14 mixed)
 - `paths` module (Step 2): 33 free functions
 - Step 6.5 types: 4 new types on scaffolding
 **Total bridge-window duplication: 56 methods / ~1200 lines**
 (up from 36 / ~550 at end of Step 6).
 All auto-delete in Step 10.
 ### 2. `request_context.rs` is now ~900 lines
 Getting close to the point where splitting into multiple
 files would help readability. Candidate layout:
 - `request_context/mod.rs` — struct definition + constructor
 - `request_context/reads.rs` — Step 5 methods
 - `request_context/writes.rs` — Step 6 methods
 - `request_context/mixed.rs` — Step 7 methods
 Not blocking anything; consider during Phase 1 cleanup.
 ### 3. The `set_*_on_role_like` / `set_*_default` split
    has an unusual caller contract
 Callers of the split have to remember: "call `_on_role_like`
 first, check the bool, call `_default` if false." That's
 more verbose than today's `Config::set_temperature` which
 hides the dispatch.
 Step 8 should add convenience helpers on `RequestContext`
 that wrap both halves:
 ```rust
 pub fn set_temperature(&mut self, value: Option<f64>, app: &mut AppConfig) {
    if !self.set_temperature_on_role_like(value) {
        app.set_temperature_default(value);
    }
 }
 ```
 But that requires `&mut AppConfig`, which requires unwrapping
 the `Arc` on `AppState.config`. The cleanest shape is probably
 to move the mutation into a helper on `AppState`:
 ```rust
 impl AppState {
    pub fn config_mut(&self) -> Option<&mut AppConfig> {
        Arc::get_mut(...)
    }
 }
 ```
 Or accept that the `.set` REPL command needs an owned
 `AppState` (not `Arc<AppState>`) and handle the mutation at
 the entry point. Step 8 can decide.
 ### 4. `select_*` methods are long but verbatim
 The 3 `select_*` methods are ~180 lines combined and are
 verbatim copies of the `Config` originals. I resisted the
 urge to refactor (extract helpers, simplify the
 `enabled_tools == "all"` branches, etc.) because:
 - Step 7 is about splitting signatures, not style
 - The copies get deleted in Step 10 anyway
 - Any refactor could introduce subtle behavior differences
  that are hard to catch without a functional test for these
  specific methods
 Post–Phase 1 cleanup can factor these if desired.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Step 6 notes: `docs/implementation/PHASE-1-STEP-6-NOTES.md`
 - Step 6.5 notes: `docs/implementation/PHASE-1-STEP-6.5-NOTES.md`
 - Modified files:
  - `src/config/app_config.rs` (6 new `set_*_default` methods)
  - `src/config/request_context.rs` (14 new mixed methods,
    7 new imports)
  - `src/config/mod.rs` (`format_option_value` → `pub(super)`)
@@ -0,0 +1,374 @@
 # Phase 1 Step 8a — Implementation Notes
 ## Status
 Done.
 ## Plan reference
 - Plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Section: "Step 8a: Client module refactor — `Model::retrieve_model`
  takes `&AppConfig`"
 ## Summary
 Migrated the LLM client module's 4 `&Config`-taking functions to take
 `&AppConfig` instead, and updated all 15 callsites across 7 files to
 use the `Config::to_app_config()` bridge helper (already exists from
 Step 1). No new types, no new methods — this is a signature change
 that propagates through the codebase.
 **This unblocks Step 8b**, where `Config::retrieve_role`,
 `Config::set_model`, `Config::repl_complete`, and
 `Config::setup_model` (Step 7 deferrals) can finally migrate to
 `RequestContext` methods that take `&AppConfig` — they were blocked
 on `Model::retrieve_model` expecting `&Config`.
 ## What was changed
 ### Files modified (8 files, 15 callsite updates)
 - **`src/client/macros.rs`** — changed 3 signatures in the
  `register_client!` macro (the functions it generates at expansion
  time):
  - `list_client_names(config: &Config)` → `(config: &AppConfig)`
  - `list_all_models(config: &Config)` → `(config: &AppConfig)`
  - `list_models(config: &Config, ModelType)` → `(config: &AppConfig, ModelType)`
  All three functions only read `config.clients` which is a
  serialized field identical on both types. The `OnceLock` caches
  (`ALL_CLIENT_NAMES`, `ALL_MODELS`) work identically because
  `AppConfig.clients` holds the same values as `Config.clients`.
 - **`src/client/model.rs`** — changed the `use` and function
  signature:
  - `use crate::config::Config` → `use crate::config::AppConfig`
  - `Model::retrieve_model(config: &Config, ...)` → `(config: &AppConfig, ...)`
  The function body was unchanged — it calls `list_all_models(config)`
  and `list_client_names(config)` internally, both of which now take
  the same `&AppConfig` type.
 - **`src/config/mod.rs`** (6 callsite updates):
  - `set_rag_reranker_model` → `Model::retrieve_model(&config.read().to_app_config(), ...)`
  - `set_model` → `Model::retrieve_model(&self.to_app_config(), ...)`
  - `retrieve_role` → `Model::retrieve_model(&self.to_app_config(), ...)`
  - `repl_complete` (`.model` branch) → `list_models(&self.to_app_config(), ModelType::Chat)`
  - `repl_complete` (`.rag_reranker_model` branch) → `list_models(&self.to_app_config(), ModelType::Reranker)`
  - `setup_model` → `list_models(&self.to_app_config(), ModelType::Chat)`
 - **`src/config/session.rs`** — `Session::load` caller updated:
  `Model::retrieve_model(&config.to_app_config(), ...)`
 - **`src/config/agent.rs`** — `Agent::init` caller updated:
  `Model::retrieve_model(&config.to_app_config(), model_id, ModelType::Chat)?`
  (required reformatting because the one-liner became two lines)
 - **`src/function/supervisor.rs`** — sub-agent summarization model
  lookup: `Model::retrieve_model(&cfg.to_app_config(), ...)`
 - **`src/rag/mod.rs`** (4 callsite updates):
  - `Rag::create` embedding model lookup
  - `Rag::init` `list_models` for embedding model selection
  - `Rag::init` `retrieve_model` for embedding model
  - `Rag::search` reranker model lookup
 - **`src/main.rs`** — `--list-models` CLI flag handler:
  `list_models(&config.read().to_app_config(), ModelType::Chat)`
 - **`src/cli/completer.rs`** — shell completion for `--model`:
  `list_models(&config.to_app_config(), ModelType::Chat)`
 ### Files NOT changed
 - **`src/config/bridge.rs`** — the `Config::to_app_config()` method
  from Step 1 is exactly the bridge helper Step 8a needed. No new
  method was added; I just started using the existing one.
 - **`src/client/` other files** — only `macros.rs` and `model.rs`
  had the target signatures. Individual client implementations
  (`openai.rs`, `claude.rs`, etc.) don't reference `&Config`
  directly; they work through the `Client` trait which uses
  `GlobalConfig` internally (untouched).
 - **Any file calling `init_client` or `GlobalConfig`** — these are
  separate from the model-lookup path and stay on `GlobalConfig`
  through the bridge. Step 8f/8g will migrate them.
 ## Key decisions
 ### 1. Reused `Config::to_app_config()` instead of adding `app_config_snapshot`
 The plan said to add a `Config::app_config_snapshot(&self) -> AppConfig`
 helper. That's exactly what `Config::to_app_config()` from Step 1
 already does — clones every serialized field into a fresh `AppConfig`.
 Adding a second method with the same body would be pointless
 duplication.
 I proceeded directly with `to_app_config()` and the plan's intent
 is satisfied.
 ### 2. Inline `.to_app_config()` at every callsite
 Each callsite pattern is:
 ```rust
 // old:
 Model::retrieve_model(config, ...)
 // new:
 Model::retrieve_model(&config.to_app_config(), ...)
 ```
 The owned `AppConfig` returned by `to_app_config()` lives for the
 duration of the function argument expression, so `&` borrowing works
 without a named binding. For multi-line callsites (like `Rag::create`
 and `Rag::init` in `src/rag/mod.rs`) I reformatted to put the
 `to_app_config()` call on its own line for readability.
 ### 3. Allocation cost is acceptable during the bridge window
 Every callsite now clones 40 fields (the serialized half of `Config`)
 per call. This is measurably more work than the pre-refactor code,
 which passed a shared borrow. The allocation cost is:
 - **~15 callsites × ~40 field clones each** = ~600 extra heap
  operations per full CLI invocation
 - In practice, most of these are `&str` / `String` / primitive
  clones, plus a few `IndexMap` and `Vec` clones — dominated by
  `clients: Vec<ClientConfig>`
 - Total cost per call: well under 1ms, invisible to users
 - Cost ends in Step 8f/8g when callers hold `Arc<AppState>`
  directly and can pass `&app.config` without cloning
 The plan flagged this as an acceptable bridge-window cost, and the
 measurements back that up. No optimization is needed.
 ### 4. No use of deprecated forwarders
 Unlike Steps 3-7 which added new methods alongside the old ones,
 Step 8a is a **one-shot signature change** of 4 functions plus
 their 15 callers. The bridge helper is `Config::to_app_config()`
 (already existed); the new signature is on the same function
 (not a parallel new function). This is consistent with the plan's
 Step 8a description of "one-shot refactor with bridge helper."
 ### 5. Did not touch `init_client`, `GlobalConfig`, or client instance state
 The `register_client!` macro defines `$Client::init(global_config,
 model)` and `init_client(config, model)` — both take
 `&GlobalConfig` and read `config.read().model` (the runtime field).
 These are **not** Step 8a targets. They stay on `GlobalConfig`
 through the bridge and migrate in Step 8f/8g when callers switch
 from `GlobalConfig` to `Arc<AppState> + RequestContext`.
 ## Deviations from plan
 **None of substance.** The plan's Step 8a description was clear
 and straightforward; the implementation matches it closely. Two
 minor departures:
 1. **Used existing `to_app_config()` instead of adding
   `app_config_snapshot()`** — see Key Decision #1. The plan's
   intent was a helper that clones serialized fields; both names
   describe the same thing.
 2. **Count: 15 callsite updates, not 17** — the plan said "any
   callsite that currently calls these client functions." I found
   15 via `grep`. The count is close enough that this isn't a
   meaningful deviation, just an accurate enumeration.
 ## Verification
 ### Compilation
 - `cargo check` — clean, **zero warnings, zero errors**
 - `cargo clippy` — clean
 ### Tests
 - `cargo test` — **63 passed, 0 failed** (unchanged from
  Steps 1–7)
 Step 8a added no new tests — it's a mechanical signature change
 with no new behavior to verify. The existing test suite confirms:
 - The bridge round-trip test still passes (uses
  `Config::to_app_config()`, which is the bridge helper)
 - The `config::bridge::tests::*` suite — all 4 tests pass
 - No existing test broke
 ### Manual smoke test
 Not performed as part of this step (would require running a real
 LLM request with various models). The plan's Step 8a verification
 suggests `loki --model openai:gpt-4o "hello"` as a sanity check,
 but that requires API credentials and a live LLM. A representative
 smoke test should be performed before declaring Phase 1 complete
 (in Step 10 or during release prep).
 The signature change is mechanical — if it compiles and existing
 tests pass, the runtime behavior is identical by construction. The
 only behavior difference would be the extra `to_app_config()`
 clones, which don't affect correctness.
 ## Handoff to next step
 ### What Step 8b can rely on
 Step 8b (finish Step 7's deferred mixed-method migrations) can
 rely on:
 - **`Model::retrieve_model(&AppConfig, ...)`** — available for the
  migrated `retrieve_role` method on `RequestContext`
 - **`list_models(&AppConfig, ModelType)`** — available for
  `repl_complete` and `setup_model` migration
 - **`list_all_models(&AppConfig)`** — available for internal use
 - **`list_client_names(&AppConfig)`** — available (though typically
  only called from inside `retrieve_model`)
 - **`Config::to_app_config()` bridge helper** — still works, still
  used by the old `Config` methods that call the client functions
  through the bridge
 - **All existing Config-based methods that use these functions**
  (e.g., `Config::set_model`, `Config::retrieve_role`,
  `Config::setup_model`) still compile and still work — they now
  call `self.to_app_config()` internally to adapt the signature
 ### What Step 8b should watch for
 - **The 9 Step 7 deferrals** waiting for Step 8b:
  - `retrieve_role` (blocked by `retrieve_model` — now unblocked)
  - `set_model` (blocked by `retrieve_model` — now unblocked)
  - `repl_complete` (blocked by `list_models` — now unblocked)
  - `setup_model` (blocked by `list_models` — now unblocked)
  - `use_prompt` (calls `current_model` + `use_role_obj` — already
    unblocked; was deferred because it's a one-liner not worth
    migrating alone)
  - `edit_role` (calls `editor` + `upsert_role` + `use_role` —
    `use_role` is still Step 8d, so `edit_role` may stay deferred)
  - `set_rag_reranker_model` (takes `&GlobalConfig`, uses
    `update_rag` helper — may stay deferred to Step 8f/8g)
  - `set_rag_top_k` (same)
  - `update` (dispatcher over all `set_*` — needs all its
    dependencies migrated first)
 - **`set_model` split pattern.** The old `Config::set_model` does
  `role_like_mut` dispatch. Step 8b should split it into
  `RequestContext::set_model_on_role_like(&mut self, app: &AppConfig,
  model_id: &str) -> Result<bool>` (returns whether a RoleLike was
  mutated) + `AppConfig::set_model_default(&mut self, model_id: &str,
  model: Model)` (sets the global default model).
 - **`retrieve_role` migration pattern.** The method takes `&self`
  today. On `RequestContext` it becomes `(&self, app: &AppConfig,
  name: &str) -> Result<Role>`. The body calls
  `paths::list_roles`, `paths::role_file`, `Role::new`, `Role::builtin`,
  then `self.current_model()` (already on RequestContext from Step 7),
  then `Model::retrieve_model(app, ...)`.
 - **`setup_model` has a subtle split.** It writes to
  `self.model_id` (serialized) AND `self.model` (runtime) AND calls
  `self.set_model(&model_id)` (mixed). Step 8b should split this
  into:
  - `AppConfig::ensure_default_model_id(&mut self, &AppConfig)` (or
    similar) to pick the first available model and update
    `self.model_id`
  - `RequestContext::reload_current_model(&mut self, app: &AppConfig)`
    to refresh `ctx.model` from the resolved id
 ### What Step 8b should NOT do
 - Don't touch `init_client`, `GlobalConfig`, or any function with
  "runtime model state" concerns — those are Step 8f/8g.
 - Don't migrate `use_role`, `use_session`, `use_agent`, `exit_agent`
  — those are Step 8d (after Step 8c extracts `McpFactory::acquire()`).
 - Don't migrate RAG lifecycle methods (`use_rag`, `edit_rag_docs`,
  `rebuild_rag`, `compress_session`, `autoname_session`,
  `apply_prelude`) — those are Step 8e.
 - Don't touch `main.rs` entry points or `repl/mod.rs` — those are
  Step 8f and 8g respectively.
 ### Files to re-read at the start of Step 8b
 - `docs/PHASE-1-IMPLEMENTATION-PLAN.md` — Step 8b section
 - This notes file — especially the "What Step 8b should watch
  for" section above
 - `src/config/mod.rs` — current `Config::retrieve_role`,
  `Config::set_model`, `Config::repl_complete`,
  `Config::setup_model`, `Config::use_prompt`, `Config::edit_role`
  method bodies
 - `src/config/app_config.rs` — current state of `AppConfig` impl
  blocks (Steps 3+4+7)
 - `src/config/request_context.rs` — current state of
  `RequestContext` impl blocks (Steps 5+6+7)
 ## Follow-up (not blocking Step 8b)
 ### 1. The `OnceLock` caches in the macro will seed once per process
 `ALL_CLIENT_NAMES` and `ALL_MODELS` are `OnceLock`s initialized
 lazily on first call. After Step 8a, the first call passes an
 `AppConfig`. If a test or an unusual code path happens to call
 one of these functions twice with different `AppConfig` values
 (different `clients` lists), only the first seeding wins. This
 was already true before Step 8a — the types changed but the
 caching semantics are unchanged.
 Worth flagging so nobody writes a test that relies on
 re-initializing the caches.
 ### 2. Bridge-window duplication count at end of Step 8a
 Unchanged from end of Step 7:
 - `AppConfig` (Steps 3+4+7): 17 methods
 - `RequestContext` (Steps 5+6+7): 39 methods
 - `paths` module (Step 2): 33 free functions
 - Step 6.5 types: 4 new types
 **Total: 56 methods / ~1200 lines of parallel logic**
 Step 8a added zero duplication — it's a signature change of
 existing functions, not a parallel implementation.
 ### 3. `to_app_config()` is called from 9 places now
 After Step 8a, these files call `to_app_config()`:
 - `src/config/mod.rs` — 6 callsites (for `Model::retrieve_model`
  and `list_models`)
 - `src/config/session.rs` — 1 callsite
 - `src/config/agent.rs` — 1 callsite
 - `src/function/supervisor.rs` — 1 callsite
 - `src/rag/mod.rs` — 4 callsites
 - `src/main.rs` — 1 callsite
 - `src/cli/completer.rs` — 1 callsite
 **Total: 15 callsites.** All get eliminated in Step 8f/8g when
 their callers migrate to hold `Arc<AppState>` directly. Until
 then, each call clones ~40 fields. Measured cost: negligible.
 ### 4. The `#[allow(dead_code)]` on `impl Config` in bridge.rs
 `Config::to_app_config()` is now actively used by 15 callsites
 — it's no longer dead. But `Config::to_request_context` and
 `Config::from_parts` are still only used by the bridge tests. The
 `#[allow(dead_code)]` on the `impl Config` block is harmless
 either way (it doesn't fire warnings, it just suppresses them
 if they exist). Step 10 deletes the whole file anyway.
 ## References
 - Phase 1 plan: `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 - Step 7 notes: `docs/implementation/PHASE-1-STEP-7-NOTES.md`
 - Modified files:
  - `src/client/macros.rs` (3 function signatures in the
    `register_client!` macro)
  - `src/client/model.rs` (`use` statement + `retrieve_model`
    signature)
  - `src/config/mod.rs` (6 callsite updates in
    `set_rag_reranker_model`, `set_model`, `retrieve_role`,
    `repl_complete` ×2, `setup_model`)
  - `src/config/session.rs` (1 callsite in `Session::load`)
  - `src/config/agent.rs` (1 callsite in `Agent::init`)
  - `src/function/supervisor.rs` (1 callsite in sub-agent
    summarization)
  - `src/rag/mod.rs` (4 callsites in `Rag::create`, `Rag::init`,
    `Rag::search`)
  - `src/main.rs` (1 callsite in `--list-models` handler)
  - `src/cli/completer.rs` (1 callsite in shell completion)
@@ -0,0 +1,55 @@
 # Implementation Notes
 This directory holds per-step implementation notes for the Loki REST API
 refactor. Each note captures what was actually built during one step, how
 it differed from the plan, any decisions made mid-implementation, and
 what the next step needs to know to pick up cleanly.
 ## Why this exists
 The refactor is spread across multiple phases and many steps. The
 implementation plans in `docs/PHASE-*-IMPLEMENTATION-PLAN.md` describe
 what _should_ happen; these notes describe what _did_ happen. Reading
 the plan plus the notes for the most recent completed step is enough
 context to start the next step without re-deriving anything from the
 conversation history or re-exploring the codebase.
 ## Naming convention
 One file per completed step:
 ```
 PHASE-<phase>-STEP-<step>-NOTES.md
 ```
 Examples:
 - `PHASE-1-STEP-1-NOTES.md`
 - `PHASE-1-STEP-2-NOTES.md`
 - `PHASE-2-STEP-3-NOTES.md`
 ## Contents of each note
 Every note has the same sections so they're easy to scan:
 1. **Status** — done / in progress / blocked
 2. **Plan reference** — which phase plan + which step section this
   implements
 3. **Summary** — one or two sentences on what shipped
 4. **What was changed** — file-by-file changelist with links
 5. **Key decisions** — non-obvious choices made during implementation,
   with the reasoning
 6. **Deviations from plan** — where the plan said X but reality forced
   Y, with explanation
 7. **Verification** — what was tested, what passed
 8. **Handoff to next step** — what the next step needs to know, any
   preconditions, any gotchas
 ## Lifetime
 This directory is transitional. When Phase 1 Step 10 lands and the
 `GlobalConfig` type alias is removed, the Phase 1 notes become purely
 historical. When all six phases ship, this whole directory can be
 archived into `docs/archive/implementation-notes/` or deleted outright —
 the plans and final code are what matters long-term, not the
 step-by-step reconstruction.
@@ -3,6 +3,13 @@
 #  - https://platform.openai.com/docs/api-reference/chat
 - provider: openai
  models:
    - name: gpt-5.2
      max_input_tokens: 400000
      max_output_tokens: 128000
      input_price: 1.75
      output_price: 14
      supports_vision: true
      supports_function_calling: true
    - name: gpt-5.1
      max_input_tokens: 400000
      max_output_tokens: 128000
@@ -81,6 +88,7 @@
      supports_vision: true
      supports_function_calling: true
    - name: o4-mini
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 1.1
      output_price: 4.4
@@ -93,6 +101,7 @@
          temperature: null
          top_p: null
    - name: o4-mini-high
      max_output_tokens: 100000
      real_name: o4-mini
      max_input_tokens: 200000
      input_price: 1.1
@@ -107,6 +116,7 @@
          temperature: null
          top_p: null
    - name: o3
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 2
      output_price: 8
@@ -133,6 +143,7 @@
          temperature: null
          top_p: null
    - name: o3-mini
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 1.1
      output_price: 4.4
@@ -145,6 +156,7 @@
          temperature: null
          top_p: null
    - name: o3-mini-high
      max_output_tokens: 100000
      real_name: o3-mini
      max_input_tokens: 200000
      input_price: 1.1
@@ -190,25 +202,32 @@
 #  - https://ai.google.dev/api/rest/v1beta/models/streamGenerateContent
 - provider: gemini
  models:
    - name: gemini-3.1-pro-preview
      max_input_tokens: 1048576
      max_output_tokens: 65535
      input_price: 0.3
      output_price: 2.5
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.5-flash
      max_input_tokens: 1048576
-      max_output_tokens: 65536
+      max_output_tokens: 65535
-      input_price: 0
+      input_price: 0.3
-      output_price: 0
+      output_price: 2.5
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.5-pro
      max_input_tokens: 1048576
      max_output_tokens: 65536
-      input_price: 0
+      input_price: 1.25
-      output_price: 0
+      output_price: 10
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.5-flash-lite
-      max_input_tokens: 1000000
+      max_input_tokens: 1048576
-      max_output_tokens: 64000
+      max_output_tokens: 65535
-      input_price: 0
+      input_price: 0.1
-      output_price: 0
+      output_price: 0.4
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.0-flash
@@ -226,10 +245,11 @@
      supports_vision: true
      supports_function_calling: true
    - name: gemma-3-27b-it
-      max_input_tokens: 131072
+      supports_vision: true
-      max_output_tokens: 8192
+      max_input_tokens: 128000
-      input_price: 0
+      max_output_tokens: 65536
-      output_price: 0
+      input_price: 0.04
      output_price: 0.15
    - name: text-embedding-004
      type: embedding
      input_price: 0
@@ -242,6 +262,54 @@
 #  - https://docs.anthropic.com/en/api/messages
 - provider: claude
  models:
    - name: claude-opus-4-6
      max_input_tokens: 200000
      max_output_tokens: 8192
      require_max_tokens: true
      input_price: 5
      output_price: 25
      supports_vision: true
      supports_function_calling: true
    - name: claude-opus-4-6:thinking
      real_name: claude-opus-4-6
      max_input_tokens: 200000
      max_output_tokens: 24000
      require_max_tokens: true
      input_price: 5
      output_price: 25
      supports_vision: true
      supports_function_calling: true
      patch:
        body:
          temperature: null
          top_p: null
          thinking:
            type: enabled
            budget_tokens: 16000
    - name: claude-sonnet-4-6
      max_input_tokens: 200000
      max_output_tokens: 8192
      require_max_tokens: true
      input_price: 3
      output_price: 15
      supports_vision: true
      supports_function_calling: true
    - name: claude-sonnet-4-6:thinking
      real_name: claude-sonnet-4-6
      max_input_tokens: 200000
      max_output_tokens: 24000
      require_max_tokens: true
      input_price: 3
      output_price: 15
      supports_vision: true
      supports_function_calling: true
      patch:
        body:
          temperature: null
          top_p: null
          thinking:
            type: enabled
            budget_tokens: 16000
    - name: claude-sonnet-4-5-20250929
      max_input_tokens: 200000
      max_output_tokens: 8192
@@ -509,8 +577,8 @@
      output_price: 10
      supports_vision: true
    - name: command-r7b-12-2024
-      max_input_tokens: 131072
+      max_input_tokens: 128000
-      max_output_tokens: 4096
+      max_output_tokens: 4000
      input_price: 0.0375
      output_price: 0.15
    - name: embed-v4.0
@@ -547,6 +615,7 @@
 - provider: xai
  models:
    - name: grok-4
      supports_vision: true
      max_input_tokens: 256000
      input_price: 3
      output_price: 15
@@ -583,14 +652,18 @@
 - provider: perplexity
  models:
    - name: sonar-pro
      max_output_tokens: 8000
      supports_vision: true
      max_input_tokens: 200000
      input_price: 3
      output_price: 15
    - name: sonar
-      max_input_tokens: 128000
+      supports_vision: true
      max_input_tokens: 127072
      input_price: 1
      output_price: 1
    - name: sonar-reasoning-pro
      supports_vision: true
      max_input_tokens: 128000
      input_price: 2
      output_price: 8
@@ -659,17 +732,16 @@
 #  - https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini
 - provider: vertexai
  models:
-    - name: gemini-3-pro-preview
+    - name: gemini-3.1-pro-preview
      hipaa_safe: true
      max_input_tokens: 1048576
      max_output_tokens: 65536
-      input_price: 0
+      input_price: 2
-      output_price: 0
+      output_price: 12
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.5-flash
      max_input_tokens: 1048576
-      max_output_tokens: 65536
+      max_output_tokens: 65535
      input_price: 0.3
      output_price: 2.5
      supports_vision: true
@@ -683,16 +755,16 @@
      supports_function_calling: true
    - name: gemini-2.5-flash-lite
      max_input_tokens: 1048576
-      max_output_tokens: 65536
+      max_output_tokens: 65535
-      input_price: 0.3
+      input_price: 0.1
      output_price: 0.4
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.0-flash-001
      max_input_tokens: 1048576
      max_output_tokens: 8192
-      input_price: 0.15
+      input_price: 0.1
-      output_price: 0.6
+      output_price: 0.4
      supports_vision: true
      supports_function_calling: true
    - name: gemini-2.0-flash-lite-001
@@ -1187,17 +1259,22 @@
      max_input_tokens: 1024
      input_price: 0.07
 # Links:
 #  - https://help.aliyun.com/zh/model-studio/getting-started/models
 #  - https://help.aliyun.com/zh/model-studio/developer-reference/use-qwen-by-calling-api
 - provider: qianwen
  models:
    - name: qwen3-max
      input_price: 1.2
      output_price: 6
      max_output_tokens: 32768
      max_input_tokens: 262144
      supports_function_calling: true
    - name: qwen-plus
-      max_input_tokens: 131072
+      input_price: 0.4
      output_price: 1.2
      max_output_tokens: 32768
      max_input_tokens: 1000000
      supports_function_calling: true
    - name: qwen-flash
      max_input_tokens: 1000000
@@ -1213,14 +1290,14 @@
    - name: qwen-coder-flash
      max_input_tokens: 1000000
    - name: qwen3-next-80b-a3b-instruct
-      max_input_tokens: 131072
+      max_input_tokens: 262144
-      input_price: 0.14
+      input_price: 0.09
-      output_price: 0.56
+      output_price: 1.1
      supports_function_calling: true
    - name: qwen3-next-80b-a3b-thinking
-      max_input_tokens: 131072
+      max_input_tokens: 128000
-      input_price: 0.14
+      input_price: 0.15
-      output_price: 1.4
+      output_price: 1.2
    - name: qwen3-235b-a22b-instruct-2507
      max_input_tokens: 131072
      input_price: 0.28
@@ -1228,35 +1305,39 @@
      supports_function_calling: true
    - name: qwen3-235b-a22b-thinking-2507
      max_input_tokens: 131072
-      input_price: 0.28
+      input_price: 0
-      output_price: 2.8
+      output_price: 0
    - name: qwen3-30b-a3b-instruct-2507
-      max_input_tokens: 131072
+      max_output_tokens: 262144
-      input_price: 0.105
+      max_input_tokens: 262144
-      output_price: 0.42
+      input_price: 0.09
      output_price: 0.3
      supports_function_calling: true
    - name: qwen3-30b-a3b-thinking-2507
-      max_input_tokens: 131072
+      max_input_tokens: 32768
-      input_price: 0.105
+      input_price: 0.051
-      output_price: 1.05
+      output_price: 0.34
    - name: qwen3-vl-32b-instruct
      max_output_tokens: 32768
      max_input_tokens: 131072
-      input_price: 0.28
+      input_price: 0.104
-      output_price: 1.12
+      output_price: 0.416
      supports_vision: true
    - name: qwen3-vl-8b-instruct
      max_output_tokens: 32768
      max_input_tokens: 131072
-      input_price: 0.07
+      input_price: 0.08
-      output_price: 0.28
+      output_price: 0.5
      supports_vision: true
    - name: qwen3-coder-480b-a35b-instruct
      max_input_tokens: 262144
      input_price: 1.26
      output_price: 5.04
    - name: qwen3-coder-30b-a3b-instruct
-      max_input_tokens: 262144
+      max_output_tokens: 32768
-      input_price: 0.315
+      max_input_tokens: 160000
-      output_price: 1.26
+      input_price: 0.07
      output_price: 0.27
    - name: deepseek-v3.2-exp
      max_input_tokens: 131072
      input_price: 0.28
@@ -1332,9 +1413,9 @@
      output_price: 8.12
      supports_vision: true
    - name: kimi-k2-thinking
-      max_input_tokens: 262144
+      max_input_tokens: 131072
-      input_price: 0.56
+      input_price: 0.47
-      output_price: 2.24
+      output_price: 2
      supports_vision: true
 # Links:
@@ -1343,10 +1424,10 @@
 - provider: deepseek
  models:
    - name: deepseek-chat
-      max_input_tokens: 64000
+      max_input_tokens: 163840
-      max_output_tokens: 8192
+      max_output_tokens: 163840
-      input_price: 0.56
+      input_price: 0.32
-      output_price: 1.68
+      output_price: 0.89
      supports_function_calling: true
    - name: deepseek-reasoner
      max_input_tokens: 64000
@@ -1424,9 +1505,10 @@
 - provider: minimax
  models:
    - name: minimax-m2
-      max_input_tokens: 204800
+      max_output_tokens: 65536
-      input_price: 0.294
+      max_input_tokens: 196608
-      output_price: 1.176
+      input_price: 0.255
      output_price: 1
      supports_function_calling: true
 # Links:
@@ -1442,8 +1524,8 @@
      supports_vision: true
      supports_function_calling: true
    - name: openai/gpt-5.1-chat
-      max_input_tokens: 400000
+      max_input_tokens: 128000
-      max_output_tokens: 128000
+      max_output_tokens: 16384
      input_price: 1.25
      output_price: 10
      supports_vision: true
@@ -1456,8 +1538,8 @@
      supports_vision: true
      supports_function_calling: true
    - name: openai/gpt-5-chat
-      max_input_tokens: 400000
+      max_input_tokens: 128000
-      max_output_tokens: 128000
+      max_output_tokens: 16384
      input_price: 1.25
      output_price: 10
      supports_vision: true
@@ -1498,18 +1580,21 @@
      supports_vision: true
      supports_function_calling: true
    - name: openai/gpt-4o
      max_output_tokens: 16384
      max_input_tokens: 128000
      input_price: 2.5
      output_price: 10
      supports_vision: true
      supports_function_calling: true
    - name: openai/gpt-4o-mini
      max_output_tokens: 16384
      max_input_tokens: 128000
      input_price: 0.15
      output_price: 0.6
      supports_vision: true
      supports_function_calling: true
    - name: openai/o4-mini
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 1.1
      output_price: 4.4
@@ -1522,6 +1607,7 @@
          temperature: null
          top_p: null
    - name: openai/o4-mini-high
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 1.1
      output_price: 4.4
@@ -1535,6 +1621,7 @@
          temperature: null
          top_p: null
    - name: openai/o3
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 2
      output_price: 8
@@ -1560,6 +1647,7 @@
          temperature: null
          top_p: null
    - name: openai/o3-mini
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 1.1
      output_price: 4.4
@@ -1571,6 +1659,7 @@
          temperature: null
          top_p: null
    - name: openai/o3-mini-high
      max_output_tokens: 100000
      max_input_tokens: 200000
      input_price: 1.1
      output_price: 4.4
@@ -1583,50 +1672,57 @@
          top_p: null
    - name: openai/gpt-oss-120b
      max_input_tokens: 131072
-      input_price: 0.09
+      input_price: 0.039
-      output_price: 0.45
+      output_price: 0.19
      supports_function_calling: true
    - name: openai/gpt-oss-20b
      max_input_tokens: 131072
-      input_price: 0.04
+      input_price: 0.03
-      output_price: 0.16
+      output_price: 0.14
      supports_function_calling: true
    - name: google/gemini-2.5-flash
      max_output_tokens: 65535
      max_input_tokens: 1048576
      input_price: 0.3
      output_price: 2.5
      supports_vision: true
      supports_function_calling: true
    - name: google/gemini-2.5-pro
      max_output_tokens: 65536
      max_input_tokens: 1048576
      input_price: 1.25
      output_price: 10
      supports_vision: true
      supports_function_calling: true
    - name: google/gemini-2.5-flash-lite
      max_output_tokens: 65535
      max_input_tokens: 1048576
-      input_price: 0.3
+      input_price: 0.1
      output_price: 0.4
      supports_vision: true
    - name: google/gemini-2.0-flash-001
-      max_input_tokens: 1000000
+      max_output_tokens: 8192
-      input_price: 0.15
+      max_input_tokens: 1048576
-      output_price: 0.6
+      input_price: 0.1
      output_price: 0.4
      supports_vision: true
      supports_function_calling: true
    - name: google/gemini-2.0-flash-lite-001
      max_output_tokens: 8192
      max_input_tokens: 1048576
      input_price: 0.075
      output_price: 0.3
      supports_vision: true
      supports_function_calling: true
    - name: google/gemma-3-27b-it
-      max_input_tokens: 131072
+      max_output_tokens: 65536
-      input_price: 0.1
+      supports_vision: true
-      output_price: 0.2
+      max_input_tokens: 128000
      input_price: 0.04
      output_price: 0.15
    - name: anthropic/claude-sonnet-4.5
-      max_input_tokens: 200000
+      max_input_tokens: 1000000
-      max_output_tokens: 8192
+      max_output_tokens: 64000
      require_max_tokens: true
      input_price: 3
      output_price: 15
@@ -1634,7 +1730,7 @@
      supports_function_calling: true
    - name: anthropic/claude-haiku-4.5
      max_input_tokens: 200000
-      max_output_tokens: 8192
+      max_output_tokens: 64000
      require_max_tokens: true
      input_price: 1
      output_price: 5
@@ -1642,7 +1738,7 @@
      supports_function_calling: true
    - name: anthropic/claude-opus-4.1
      max_input_tokens: 200000
-      max_output_tokens: 8192
+      max_output_tokens: 32000
      require_max_tokens: true
      input_price: 15
      output_price: 75
@@ -1650,15 +1746,15 @@
      supports_function_calling: true
    - name: anthropic/claude-opus-4
      max_input_tokens: 200000
-      max_output_tokens: 8192
+      max_output_tokens: 32000
      require_max_tokens: true
      input_price: 15
      output_price: 75
      supports_vision: true
      supports_function_calling: true
    - name: anthropic/claude-sonnet-4
-      max_input_tokens: 200000
+      max_input_tokens: 1000000
-      max_output_tokens: 8192
+      max_output_tokens: 64000
      require_max_tokens: true
      input_price: 3
      output_price: 15
@@ -1666,7 +1762,7 @@
      supports_function_calling: true
    - name: anthropic/claude-3.7-sonnet
      max_input_tokens: 200000
-      max_output_tokens: 8192
+      max_output_tokens: 64000
      require_max_tokens: true
      input_price: 3
      output_price: 15
@@ -1681,21 +1777,24 @@
      supports_vision: true
      supports_function_calling: true
    - name: meta-llama/llama-4-maverick
      max_output_tokens: 16384
      max_input_tokens: 1048576
-      input_price: 0.18
+      input_price: 0.15
      output_price: 0.6
      supports_vision: true
      supports_function_calling: true
    - name: meta-llama/llama-4-scout
      max_output_tokens: 16384
      max_input_tokens: 327680
      input_price: 0.08
      output_price: 0.3
      supports_vision: true
      supports_function_calling: true
    - name: meta-llama/llama-3.3-70b-instruct
      max_output_tokens: 16384
      max_input_tokens: 131072
-      input_price: 0.12
+      input_price: 0.1
-      output_price: 0.3
+      output_price: 0.32
    - name: mistralai/mistral-medium-3.1
      max_input_tokens: 131072
      input_price: 0.4
@@ -1703,9 +1802,10 @@
      supports_function_calling: true
      supports_vision: true
    - name: mistralai/mistral-small-3.2-24b-instruct
      max_output_tokens: 131072
      max_input_tokens: 131072
-      input_price: 0.1
+      input_price: 0.06
-      output_price: 0.3
+      output_price: 0.18
      supports_vision: true
    - name: mistralai/magistral-medium-2506
      max_input_tokens: 40960
@@ -1726,8 +1826,8 @@
      supports_function_calling: true
    - name: mistralai/devstral-small
      max_input_tokens: 131072
-      input_price: 0.07
+      input_price: 0.1
-      output_price: 0.28
+      output_price: 0.3
      supports_function_calling: true
    - name: mistralai/codestral-2508
      max_input_tokens: 256000
@@ -1735,6 +1835,7 @@
      output_price: 0.9
      supports_function_calling: true
    - name: ai21/jamba-large-1.7
      max_output_tokens: 4096
      max_input_tokens: 256000
      input_price: 2
      output_price: 8
@@ -1745,110 +1846,121 @@
      output_price: 0.4
      supports_function_calling: true
    - name: cohere/command-a
      max_output_tokens: 8192
      max_input_tokens: 256000
      input_price: 2.5
      output_price: 10
      supports_function_calling: true
    - name: cohere/command-r7b-12-2024
      max_input_tokens: 128000
-      max_output_tokens: 4096
+      max_output_tokens: 4000
      input_price: 0.0375
      output_price: 0.15
    - name: deepseek/deepseek-v3.2-exp
      max_output_tokens: 65536
      max_input_tokens: 163840
      input_price: 0.27
-      output_price: 0.40
+      output_price: 0.41
    - name: deepseek/deepseek-v3.1-terminus
      max_input_tokens: 163840
-      input_price: 0.23
+      input_price: 0.21
-      output_price: 0.90
+      output_price: 0.79
    - name: deepseek/deepseek-chat-v3.1
-      max_input_tokens: 163840
+      max_output_tokens: 7168
-      input_price: 0.2
+      max_input_tokens: 32768
-      output_price: 0.8
+      input_price: 0.15
      output_price: 0.75
    - name: deepseek/deepseek-r1-0528
-      max_input_tokens: 128000
+      max_output_tokens: 65536
-      input_price: 0.50
+      max_input_tokens: 163840
-      output_price: 2.15
+      input_price: 0.4
      output_price: 1.75
      patch:
        body:
          include_reasoning: true
    - name: qwen/qwen3-max
      max_output_tokens: 32768
      max_input_tokens: 262144
      input_price: 1.2
      output_price: 6
      supports_function_calling: true
    - name: qwen/qwen-plus
-      max_input_tokens: 131072
+      max_input_tokens: 1000000
-      max_output_tokens: 8192
+      max_output_tokens: 32768
      input_price: 0.4
      output_price: 1.2
      supports_function_calling: true
    - name: qwen/qwen3-next-80b-a3b-instruct
      max_input_tokens: 262144
-      input_price: 0.1
+      input_price: 0.09
-      output_price: 0.8
+      output_price: 1.1
      supports_function_calling: true
    - name: qwen/qwen3-next-80b-a3b-thinking
-      max_input_tokens: 262144
+      max_input_tokens: 128000
-      input_price: 0.1
+      input_price: 0.15
-      output_price: 0.8
+      output_price: 1.2
    - name: qwen/qwen5-235b-a22b-2507 # Qwen3 235B A22B Instruct 2507
      max_input_tokens: 262144
      input_price: 0.12
      output_price: 0.59
      supports_function_calling: true
    - name: qwen/qwen3-235b-a22b-thinking-2507
      max_input_tokens: 262144
      input_price: 0.118
      output_price: 0.118
    - name: qwen/qwen3-30b-a3b-instruct-2507
      max_input_tokens: 131072
-      input_price: 0.2
+      input_price: 0
-      output_price: 0.8
+      output_price: 0
    - name: qwen/qwen3-30b-a3b-instruct-2507
      max_output_tokens: 262144
      max_input_tokens: 262144
      input_price: 0.09
      output_price: 0.3
    - name: qwen/qwen3-30b-a3b-thinking-2507
-      max_input_tokens: 262144
+      max_input_tokens: 32768
-      input_price: 0.071
+      input_price: 0.051
-      output_price: 0.285
+      output_price: 0.34
    - name: qwen/qwen3-vl-32b-instruct
-      max_input_tokens: 262144
+      max_output_tokens: 32768
-      input_price: 0.35
+      max_input_tokens: 131072
-      output_price: 1.1
+      input_price: 0.104
      output_price: 0.416
      supports_vision: true
    - name: qwen/qwen3-vl-8b-instruct
-      max_input_tokens: 262144
+      max_output_tokens: 32768
      max_input_tokens: 131072
      input_price: 0.08
-      output_price: 0.50
+      output_price: 0.5
      supports_vision: true
    - name: qwen/qwen3-coder-plus
-      max_input_tokens: 128000
+      max_output_tokens: 65536
      max_input_tokens: 1000000
      input_price: 1
      output_price: 5
      supports_function_calling: true
    - name: qwen/qwen3-coder-flash
-      max_input_tokens: 128000
+      max_output_tokens: 65536
      max_input_tokens: 1000000
      input_price: 0.3
      output_price: 1.5
      supports_function_calling: true
-    - name: qwen/qwen3-coder  # Qwen3 Coder 480B A35B
+    - name: qwen/qwen3-coder # Qwen3 Coder 480B A35B
      max_input_tokens: 262144
      input_price: 0.22
      output_price: 0.95
      supports_function_calling: true
    - name: qwen/qwen3-coder-30b-a3b-instruct
-      max_input_tokens: 262144
+      max_output_tokens: 32768
-      input_price: 0.052
+      max_input_tokens: 160000
-      output_price: 0.207
+      input_price: 0.07
      output_price: 0.27
      supports_function_calling: true
    - name: moonshotai/kimi-k2-0905
-      max_input_tokens: 262144
+      max_input_tokens: 131072
-      input_price: 0.296
+      input_price: 0.4
-      output_price: 1.185
+      output_price: 2
      supports_function_calling: true
    - name: moonshotai/kimi-k2-thinking
-      max_input_tokens: 262144
+      max_input_tokens: 131072
-      input_price: 0.45
+      input_price: 0.47
-      output_price: 2.35
+      output_price: 2
      supports_function_calling: true
    - name: moonshotai/kimi-dev-72b
      max_input_tokens: 131072
@@ -1856,21 +1968,26 @@
      output_price: 1.15
      supports_function_calling: true
    - name: x-ai/grok-4
      supports_vision: true
      max_input_tokens: 256000
      input_price: 3
      output_price: 15
      supports_function_calling: true
    - name: x-ai/grok-4-fast
      max_output_tokens: 30000
      supports_vision: true
      max_input_tokens: 2000000
      input_price: 0.2
      output_price: 0.5
      supports_function_calling: true
    - name: x-ai/grok-code-fast-1
      max_output_tokens: 10000
      max_input_tokens: 256000
      input_price: 0.2
      output_price: 1.5
      supports_function_calling: true
    - name: amazon/nova-premier-v1
      max_output_tokens: 32000
      max_input_tokens: 1000000
      input_price: 2.5
      output_price: 12.5
@@ -1893,14 +2010,18 @@
      input_price: 0.035
      output_price: 0.14
    - name: perplexity/sonar-pro
      max_output_tokens: 8000
      supports_vision: true
      max_input_tokens: 200000
      input_price: 3
      output_price: 15
    - name: perplexity/sonar
      supports_vision: true
      max_input_tokens: 127072
      input_price: 1
      output_price: 1
    - name: perplexity/sonar-reasoning-pro
      supports_vision: true
      max_input_tokens: 128000
      input_price: 2
      output_price: 8
@@ -1915,20 +2036,22 @@
        body:
          include_reasoning: true
    - name: perplexity/sonar-deep-research
-      max_input_tokens: 200000
+      max_input_tokens: 128000
      input_price: 2
      output_price: 8
      patch:
        body:
          include_reasoning: true
    - name: minimax/minimax-m2
      max_output_tokens: 65536
      max_input_tokens: 196608
-      input_price: 0.15
+      input_price: 0.255
-      output_price: 0.45
+      output_price: 1
    - name: z-ai/glm-4.6
      max_output_tokens: 131072
      max_input_tokens: 202752
-      input_price: 0.5
+      input_price: 0.35
-      output_price: 1.75
+      output_price: 1.71
      supports_function_calling: true
 # Links:
@@ -2298,4 +2421,4 @@
    - name: rerank-2-lite
      type: reranker
      max_input_tokens: 8000
-      input_price: 0.02
+      input_price: 0.02
@@ -0,0 +1,2 @@
 requests
 ruamel.yaml
@@ -0,0 +1,255 @@
 import requests
 import sys
 import re
 import json
 # Provider mapping from models.yaml to OpenRouter prefixes
 PROVIDER_MAPPING = {
    "openai": "openai",
    "claude": "anthropic",
    "gemini": "google",
    "mistral": "mistralai",
    "cohere": "cohere",
    "perplexity": "perplexity",
    "xai": "x-ai",
    "openrouter": "openrouter",
    "ai21": "ai21",
    "deepseek": "deepseek",
    "moonshot": "moonshotai",
    "qianwen": "qwen",
    "zhipuai": "zhipuai",
    "minimax": "minimax",
    "vertexai": "google",
    "groq": "groq",
    "bedrock": "amazon",
    "hunyuan": "tencent",
    "ernie": "baidu",
    "github": "github",
 }
 def fetch_openrouter_models():
    print("Fetching models from OpenRouter...")
    try:
        response = requests.get("https://openrouter.ai/api/v1/models")
        response.raise_for_status()
        data = response.json()["data"]
        print(f"Fetched {len(data)} models.")
        return data
    except Exception as e:
        print(f"Error fetching models: {e}")
        sys.exit(1)
 def get_openrouter_model(models_data, provider_prefix, model_name, is_openrouter_provider=False):
    if is_openrouter_provider:
        # For openrouter provider, the model_name in yaml is usually the full ID
        for model in models_data:
            if model["id"] == model_name:
                return model
        return None
    expected_id = f"{provider_prefix}/{model_name}"
    # 1. Try exact match on ID
    for model in models_data:
        if model["id"] == expected_id:
            return model
    # 2. Try match by suffix
    for model in models_data:
        if model["id"].split("/")[-1] == model_name:
            if model["id"].startswith(f"{provider_prefix}/"):
                return model
    return None
 def format_price(price_per_token):
    if price_per_token is None:
        return None
    try:
        price_per_1m = float(price_per_token) * 1_000_000
        if price_per_1m.is_integer():
            return str(int(price_per_1m))
        else:
            return str(round(price_per_1m, 4))
    except:
        return None
 def get_indentation(line):
    return len(line) - len(line.lstrip())
 def process_model_block(block_lines, current_provider, or_models):
    if not block_lines:
        return []
    # 1. Identify model name and indentation
    name_line = block_lines[0]
    name_match = re.match(r"^(\s*)-\s*name:\s*(.+)$", name_line)
    if not name_match:
        return block_lines 
    name_indent_str = name_match.group(1)
    model_name = name_match.group(2).strip()
    # 2. Find OpenRouter model
    or_prefix = PROVIDER_MAPPING.get(current_provider)
    is_openrouter_provider = (current_provider == "openrouter")
    if not or_prefix and not is_openrouter_provider:
        return block_lines
    or_model = get_openrouter_model(or_models, or_prefix, model_name, is_openrouter_provider)
    if not or_model:
        return block_lines
    print(f"  Updating {model_name}...")
    # 3. Prepare updates
    updates = {}
    # Pricing
    pricing = or_model.get("pricing", {})
    p_in = format_price(pricing.get("prompt"))
    p_out = format_price(pricing.get("completion"))
    if p_in: updates["input_price"] = p_in
    if p_out: updates["output_price"] = p_out
    # Context
    ctx = or_model.get("context_length")
    if ctx: updates["max_input_tokens"] = str(ctx)
    max_out = None
    if "top_provider" in or_model and or_model["top_provider"]:
        max_out = or_model["top_provider"].get("max_completion_tokens")
    if max_out: updates["max_output_tokens"] = str(max_out)
    # Capabilities
    arch = or_model.get("architecture", {})
    modality = arch.get("modality", "")
    if "image" in modality:
        updates["supports_vision"] = "true"
    # 4. Detect field indentation
    field_indent_str = None
    existing_fields = {} # key -> line_index
    for i, line in enumerate(block_lines):
        if i == 0: continue # Skip name line
        # Skip comments
        if line.strip().startswith("#"):
            continue
        # Look for "key: value"
        m = re.match(r"^(\s*)([\w_-]+):", line)
        if m:
            indent = m.group(1)
            key = m.group(2)
            # Must be deeper than name line
            if len(indent) > len(name_indent_str):
                if field_indent_str is None:
                    field_indent_str = indent
                existing_fields[key] = i
    if field_indent_str is None:
        field_indent_str = name_indent_str + "  "
    # 5. Apply updates
    new_block = list(block_lines)
    # Update existing fields
    for key, value in updates.items():
        if key in existing_fields:
            idx = existing_fields[key]
            # Preserve original key indentation exactly
            original_line = new_block[idx]
            m = re.match(r"^(\s*)([\w_-]+):", original_line)
            if m:
                current_indent = m.group(1)
                new_block[idx] = f"{current_indent}{key}: {value}\n"
    # Insert missing fields
    # Insert after the name line
    insertion_idx = 1
    for key, value in updates.items():
        if key not in existing_fields:
            new_line = f"{field_indent_str}{key}: {value}\n"
            new_block.insert(insertion_idx, new_line)
            insertion_idx += 1
    return new_block
 def main():
    or_models = fetch_openrouter_models()
    print("Reading models.yaml...")
    with open("models.yaml", "r") as f:
        lines = f.readlines()
    new_lines = []
    current_provider = None
    i = 0
    while i < len(lines):
        line = lines[i]
        # Check for provider
        # - provider: name
        p_match = re.match(r"^\s*-?\s*provider:\s*(.+)$", line)
        if p_match:
            current_provider = p_match.group(1).strip()
            new_lines.append(line)
            i += 1
            continue
        # Check for model start
        # - name: ...
        m_match = re.match(r"^(\s*)-\s*name:\s*.+$", line)
        if m_match:
            # Start of a model block
            start_indent = len(m_match.group(1))
            # Collect block lines
            block_lines = [line]
            j = i + 1
            while j < len(lines):
                next_line = lines[j]
                stripped = next_line.strip()
                # If empty or comment, include it
                if not stripped or stripped.startswith("#"):
                    block_lines.append(next_line)
                    j += 1
                    continue
                # Check indentation
                next_indent = get_indentation(next_line)
                # If indentation is greater, it's part of the block (property)
                if next_indent > start_indent:
                    block_lines.append(next_line)
                    j += 1
                    continue
                # If indentation is equal or less, it's the end of the block
                break
            # Process the block
            processed_block = process_model_block(block_lines, current_provider, or_models)
            new_lines.extend(processed_block)
            # Advance i
            i = j
            continue
        # Otherwise, just a regular line
        new_lines.append(line)
        i += 1
    print("Saving models.yaml...")
    with open("models.yaml", "w") as f:
        f.writelines(new_lines)
    print("Done.")
 if __name__ == "__main__":
    main()
@@ -1,4 +1,5 @@
 use crate::client::{ModelType, list_models};
 use crate::config::paths;
 use crate::config::{Config, list_agents};
 use clap_complete::{CompletionCandidate, Shell, generate};
 use clap_complete_nushell::Nushell;
@@ -33,7 +34,7 @@ impl ShellCompletion {
 pub(super) fn model_completer(current: &OsStr) -> Vec<CompletionCandidate> {
    let cur = current.to_string_lossy();
    match Config::init_bare() {
-        Ok(config) => list_models(&config, ModelType::Chat)
+        Ok(config) => list_models(&config.to_app_config(), ModelType::Chat)
            .into_iter()
            .filter(|&m| m.id().starts_with(&*cur))
            .map(|m| CompletionCandidate::new(m.id()))
@@ -44,7 +45,7 @@ pub(super) fn model_completer(current: &OsStr) -> Vec<CompletionCandidate> {
 pub(super) fn role_completer(current: &OsStr) -> Vec<CompletionCandidate> {
    let cur = current.to_string_lossy();
-    Config::list_roles(true)
+    paths::list_roles(true)
        .into_iter()
        .filter(|r| r.starts_with(&*cur))
        .map(CompletionCandidate::new)
@@ -62,7 +63,7 @@ pub(super) fn agent_completer(current: &OsStr) -> Vec<CompletionCandidate> {
 pub(super) fn rag_completer(current: &OsStr) -> Vec<CompletionCandidate> {
    let cur = current.to_string_lossy();
-    Config::list_rags()
+    paths::list_rags()
        .into_iter()
        .filter(|r| r.starts_with(&*cur))
        .map(CompletionCandidate::new)
@@ -71,7 +72,7 @@ pub(super) fn rag_completer(current: &OsStr) -> Vec<CompletionCandidate> {
 pub(super) fn macro_completer(current: &OsStr) -> Vec<CompletionCandidate> {
    let cur = current.to_string_lossy();
-    Config::list_macros()
+    paths::list_macros()
        .into_iter()
        .filter(|m| m.starts_with(&*cur))
        .map(CompletionCandidate::new)
@@ -127,6 +127,9 @@ pub struct Cli {
    /// List all secrets stored in the Loki vault
    #[arg(long, exclusive = true)]
    pub list_secrets: bool,
    /// Authenticate with an LLM provider using OAuth (e.g., --authenticate client_name)
    #[arg(long, exclusive = true, value_name = "CLIENT_NAME")]
    pub authenticate: Option<Option<String>>,
    /// Generate static shell completion scripts
    #[arg(long, value_name = "SHELL", value_enum)]
    pub completions: Option<ShellCompletion>,
@@ -18,16 +18,16 @@ pub struct AzureOpenAIConfig {
 impl AzureOpenAIClient {
    config_get_fn!(api_base, get_api_base);
    config_get_fn!(api_key, get_api_key);
-
+    
-    pub const PROMPTS: [PromptAction<'static>; 2] = [
+    create_client_config!([
        (
            "api_base",
            "API Base",
            Some("e.g. https://{RESOURCE}.openai.azure.com"),
-						false
+            false,
        ),
        ("api_key", "API Key", None, true),
-    ];
+    ]);
 }
 impl_client_trait!(
@@ -32,11 +32,11 @@ impl BedrockClient {
    config_get_fn!(region, get_region);
    config_get_fn!(session_token, get_session_token);
-    pub const PROMPTS: [PromptAction<'static>; 3] = [
+    create_client_config!([
        ("access_key_id", "AWS Access Key ID", None, true),
        ("secret_access_key", "AWS Secret Access Key", None, true),
        ("region", "AWS Region", None, false),
-    ];
+    ]);
    fn chat_completions_builder(
        &self,
@@ -1,19 +1,24 @@
 use super::access_token::get_access_token;
 use super::claude_oauth::ClaudeOAuthProvider;
 use super::oauth::{self, OAuthProvider};
 use super::*;
 use crate::utils::strip_think_tag;
 use anyhow::{Context, Result, bail};
-use reqwest::RequestBuilder;
+use reqwest::{Client as ReqwestClient, RequestBuilder};
 use serde::Deserialize;
 use serde_json::{Value, json};
 const API_BASE: &str = "https://api.anthropic.com/v1";
 const CLAUDE_CODE_PREFIX: &str = "You are Claude Code, Anthropic's official CLI for Claude.";
 #[derive(Debug, Clone, Deserialize)]
 pub struct ClaudeConfig {
    pub name: Option<String>,
    pub api_key: Option<String>,
    pub api_base: Option<String>,
    pub auth: Option<String>,
    #[serde(default)]
    pub models: Vec<ModelData>,
    pub patch: Option<RequestPatch>,
@@ -24,25 +29,44 @@ impl ClaudeClient {
    config_get_fn!(api_key, get_api_key);
    config_get_fn!(api_base, get_api_base);
-    pub const PROMPTS: [PromptAction<'static>; 1] = [("api_key", "API Key", None, true)];
+    create_oauth_supported_client_config!();
 }
-impl_client_trait!(
+#[async_trait::async_trait]
-    ClaudeClient,
+impl Client for ClaudeClient {
-    (
+    client_common_fns!();
        prepare_chat_completions,
        claude_chat_completions,
        claude_chat_completions_streaming
    ),
    (noop_prepare_embeddings, noop_embeddings),
    (noop_prepare_rerank, noop_rerank),
 );
-fn prepare_chat_completions(
+    fn supports_oauth(&self) -> bool {
        self.config.auth.as_deref() == Some("oauth")
    }
    async fn chat_completions_inner(
        &self,
        client: &ReqwestClient,
        data: ChatCompletionsData,
    ) -> Result<ChatCompletionsOutput> {
        let request_data = prepare_chat_completions(self, client, data).await?;
        let builder = self.request_builder(client, request_data);
        claude_chat_completions(builder, self.model()).await
    }
    async fn chat_completions_streaming_inner(
        &self,
        client: &ReqwestClient,
        handler: &mut SseHandler,
        data: ChatCompletionsData,
    ) -> Result<()> {
        let request_data = prepare_chat_completions(self, client, data).await?;
        let builder = self.request_builder(client, request_data);
        claude_chat_completions_streaming(builder, handler, self.model()).await
    }
 }
 async fn prepare_chat_completions(
    self_: &ClaudeClient,
    client: &ReqwestClient,
    data: ChatCompletionsData,
 ) -> Result<RequestData> {
    let api_key = self_.get_api_key()?;
    let api_base = self_
        .get_api_base()
        .unwrap_or_else(|_| API_BASE.to_string());
@@ -53,11 +77,75 @@ fn prepare_chat_completions(
    let mut request_data = RequestData::new(url, body);
    request_data.header("anthropic-version", "2023-06-01");
-    request_data.header("x-api-key", api_key);
+
    let uses_oauth = self_.config.auth.as_deref() == Some("oauth");
    if uses_oauth {
        let provider = ClaudeOAuthProvider;
        let ready = oauth::prepare_oauth_access_token(client, &provider, self_.name()).await?;
        if !ready {
            bail!(
                "OAuth configured but no tokens found for '{}'. Run: 'loki --authenticate {}' or '.authenticate' in the REPL",
                self_.name(),
                self_.name()
            );
        }
        let token = get_access_token(self_.name())?;
        request_data.bearer_auth(token);
        for (key, value) in provider.extra_request_headers() {
            request_data.header(key, value);
        }
        inject_oauth_system_prompt(&mut request_data.body);
    } else if let Ok(api_key) = self_.get_api_key() {
        request_data.header("x-api-key", api_key);
    } else {
        bail!(
            "No authentication configured for '{}'. Set `api_key` or use `auth: oauth` with `loki --authenticate {}`.",
            self_.name(),
            self_.name()
        );
    }
    Ok(request_data)
 }
 /// Anthropic requires OAuth-authenticated requests to include a Claude Code
 /// system prompt prefix in order to consider a request body as "valid".
 ///
 /// This behavior was discovered 2026-03-17.
 ///
 /// So this function injects the Claude Code system prompt into the request
 /// body to make it a valid request.
 fn inject_oauth_system_prompt(body: &mut Value) {
    let prefix_block = json!({
        "type": "text",
        "text": CLAUDE_CODE_PREFIX,
    });
    match body.get("system") {
        Some(Value::String(existing)) => {
            let existing_block = json!({
                "type": "text",
                "text": existing,
            });
            body["system"] = json!([prefix_block, existing_block]);
        }
        Some(Value::Array(_)) => {
            if let Some(arr) = body["system"].as_array_mut() {
                let already_injected = arr
                    .iter()
                    .any(|block| block["text"].as_str() == Some(CLAUDE_CODE_PREFIX));
                if !already_injected {
                    arr.insert(0, prefix_block);
                }
            }
        }
        _ => {
            body["system"] = json!([prefix_block]);
        }
    }
 }
 pub async fn claude_chat_completions(
    builder: RequestBuilder,
    _model: &Model,
@@ -0,0 +1,43 @@
 use super::oauth::OAuthProvider;
 pub const BETA_HEADER: &str = "oauth-2025-04-20";
 pub struct ClaudeOAuthProvider;
 impl OAuthProvider for ClaudeOAuthProvider {
    fn provider_name(&self) -> &str {
        "claude"
    }
    fn client_id(&self) -> &str {
        "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
    }
    fn authorize_url(&self) -> &str {
        "https://claude.ai/oauth/authorize"
    }
    fn token_url(&self) -> &str {
        "https://console.anthropic.com/v1/oauth/token"
    }
    fn redirect_uri(&self) -> &str {
        "https://console.anthropic.com/oauth/code/callback"
    }
    fn scopes(&self) -> &str {
        "org:create_api_key user:profile user:inference"
    }
    fn extra_authorize_params(&self) -> Vec<(&str, &str)> {
        vec![("code", "true")]
    }
    fn extra_token_headers(&self) -> Vec<(&str, &str)> {
        vec![("anthropic-beta", BETA_HEADER)]
    }
    fn extra_request_headers(&self) -> Vec<(&str, &str)> {
        vec![("anthropic-beta", BETA_HEADER)]
    }
 }
@@ -24,7 +24,7 @@ impl CohereClient {
    config_get_fn!(api_key, get_api_key);
    config_get_fn!(api_base, get_api_base);
-    pub const PROMPTS: [PromptAction<'static>; 1] = [("api_key", "API Key", None, true)];
+    create_client_config!([("api_key", "API Key", None, true)]);
 }
 impl_client_trait!(
@@ -1,7 +1,8 @@
 use super::*;
 use crate::config::paths;
 use crate::{
-    config::{Config, GlobalConfig, Input},
+    config::{GlobalConfig, Input},
    function::{FunctionDeclaration, ToolCall, ToolResult, eval_tool_calls},
    render::render_stream,
    utils::*,
@@ -24,7 +25,7 @@ use tokio::sync::mpsc::unbounded_channel;
 pub const MODELS_YAML: &str = include_str!("../../models.yaml");
 pub static ALL_PROVIDER_MODELS: LazyLock<Vec<ProviderModels>> = LazyLock::new(|| {
-    Config::local_models_override()
+    paths::local_models_override()
        .ok()
        .unwrap_or_else(|| serde_yaml::from_str(MODELS_YAML).unwrap())
 });
@@ -47,6 +48,10 @@ pub trait Client: Sync + Send {
    fn model(&self) -> &Model;
    fn supports_oauth(&self) -> bool {
        false
    }
    fn build_client(&self) -> Result<ReqwestClient> {
        let mut builder = ReqwestClient::builder();
        let extra = self.extra_config();
@@ -489,14 +494,6 @@ pub async fn call_chat_completions_streaming(
    }
 }
 pub fn noop_prepare_embeddings<T>(_client: &T, _data: &EmbeddingsData) -> Result<RequestData> {
    bail!("The client doesn't support embeddings api")
 }
 pub async fn noop_embeddings(_builder: RequestBuilder, _model: &Model) -> Result<EmbeddingsOutput> {
    bail!("The client doesn't support embeddings api")
 }
 pub fn noop_prepare_rerank<T>(_client: &T, _data: &RerankData) -> Result<RequestData> {
    bail!("The client doesn't support rerank api")
 }
@@ -554,7 +551,7 @@ pub fn json_str_from_map<'a>(
    map.get(field_name).and_then(|v| v.as_str())
 }
-async fn set_client_models_config(client_config: &mut Value, client: &str) -> Result<String> {
+pub async fn set_client_models_config(client_config: &mut Value, client: &str) -> Result<String> {
    if let Some(provider) = ALL_PROVIDER_MODELS.iter().find(|v| v.provider == client) {
        let models: Vec<String> = provider
            .models
@@ -1,10 +1,13 @@
 use super::access_token::get_access_token;
 use super::gemini_oauth::GeminiOAuthProvider;
 use super::oauth;
 use super::vertexai::*;
 use super::*;
-use anyhow::{Context, Result};
+use anyhow::{Context, Result, bail};
-use reqwest::RequestBuilder;
+use reqwest::{Client as ReqwestClient, RequestBuilder};
 use serde::Deserialize;
-use serde_json::{json, Value};
+use serde_json::{Value, json};
 const API_BASE: &str = "https://generativelanguage.googleapis.com/v1beta";
@@ -13,6 +16,7 @@ pub struct GeminiConfig {
    pub name: Option<String>,
    pub api_key: Option<String>,
    pub api_base: Option<String>,
    pub auth: Option<String>,
    #[serde(default)]
    pub models: Vec<ModelData>,
    pub patch: Option<RequestPatch>,
@@ -23,25 +27,64 @@ impl GeminiClient {
    config_get_fn!(api_key, get_api_key);
    config_get_fn!(api_base, get_api_base);
-    pub const PROMPTS: [PromptAction<'static>; 1] = [("api_key", "API Key", None, true)];
+    create_oauth_supported_client_config!();
 }
-impl_client_trait!(
+#[async_trait::async_trait]
-    GeminiClient,
+impl Client for GeminiClient {
-    (
+    client_common_fns!();
        prepare_chat_completions,
        gemini_chat_completions,
        gemini_chat_completions_streaming
    ),
    (prepare_embeddings, embeddings),
    (noop_prepare_rerank, noop_rerank),
 );
-fn prepare_chat_completions(
+    fn supports_oauth(&self) -> bool {
        self.config.auth.as_deref() == Some("oauth")
    }
    async fn chat_completions_inner(
        &self,
        client: &ReqwestClient,
        data: ChatCompletionsData,
    ) -> Result<ChatCompletionsOutput> {
        let request_data = prepare_chat_completions(self, client, data).await?;
        let builder = self.request_builder(client, request_data);
        gemini_chat_completions(builder, self.model()).await
    }
    async fn chat_completions_streaming_inner(
        &self,
        client: &ReqwestClient,
        handler: &mut SseHandler,
        data: ChatCompletionsData,
    ) -> Result<()> {
        let request_data = prepare_chat_completions(self, client, data).await?;
        let builder = self.request_builder(client, request_data);
        gemini_chat_completions_streaming(builder, handler, self.model()).await
    }
    async fn embeddings_inner(
        &self,
        client: &ReqwestClient,
        data: &EmbeddingsData,
    ) -> Result<EmbeddingsOutput> {
        let request_data = prepare_embeddings(self, client, data).await?;
        let builder = self.request_builder(client, request_data);
        embeddings(builder, self.model()).await
    }
    async fn rerank_inner(
        &self,
        client: &ReqwestClient,
        data: &RerankData,
    ) -> Result<RerankOutput> {
        let request_data = noop_prepare_rerank(self, data)?;
        let builder = self.request_builder(client, request_data);
        noop_rerank(builder, self.model()).await
    }
 }
 async fn prepare_chat_completions(
    self_: &GeminiClient,
    client: &ReqwestClient,
    data: ChatCompletionsData,
 ) -> Result<RequestData> {
    let api_key = self_.get_api_key()?;
    let api_base = self_
        .get_api_base()
        .unwrap_or_else(|_| API_BASE.to_string());
@@ -59,26 +102,61 @@ fn prepare_chat_completions(
    );
    let body = gemini_build_chat_completions_body(data, &self_.model)?;
    let mut request_data = RequestData::new(url, body);
-    request_data.header("x-goog-api-key", api_key);
+    let uses_oauth = self_.config.auth.as_deref() == Some("oauth");
    if uses_oauth {
        let provider = GeminiOAuthProvider;
        let ready = oauth::prepare_oauth_access_token(client, &provider, self_.name()).await?;
        if !ready {
            bail!(
                "OAuth configured but no tokens found for '{}'. Run: 'loki --authenticate {}' or '.authenticate' in the REPL",
                self_.name(),
                self_.name()
            );
        }
        let token = get_access_token(self_.name())?;
        request_data.bearer_auth(token);
    } else if let Ok(api_key) = self_.get_api_key() {
        request_data.header("x-goog-api-key", api_key);
    } else {
        bail!(
            "No authentication configured for '{}'. Set `api_key` or use `auth: oauth` with `loki --authenticate {}`.",
            self_.name(),
            self_.name()
        );
    }
    Ok(request_data)
 }
-fn prepare_embeddings(self_: &GeminiClient, data: &EmbeddingsData) -> Result<RequestData> {
+async fn prepare_embeddings(
-    let api_key = self_.get_api_key()?;
+    self_: &GeminiClient,
    client: &ReqwestClient,
    data: &EmbeddingsData,
 ) -> Result<RequestData> {
    let api_base = self_
        .get_api_base()
        .unwrap_or_else(|_| API_BASE.to_string());
-    let url = format!(
+    let uses_oauth = self_.config.auth.as_deref() == Some("oauth");
-        "{}/models/{}:batchEmbedContents?key={}",
+
-        api_base.trim_end_matches('/'),
+    let url = if uses_oauth {
-        self_.model.real_name(),
+        format!(
-        api_key
+            "{}/models/{}:batchEmbedContents",
-    );
+            api_base.trim_end_matches('/'),
            self_.model.real_name(),
        )
    } else {
        let api_key = self_.get_api_key()?;
        format!(
            "{}/models/{}:batchEmbedContents?key={}",
            api_base.trim_end_matches('/'),
            self_.model.real_name(),
            api_key
        )
    };
    let model_id = format!("models/{}", self_.model.real_name());
@@ -89,21 +167,28 @@ fn prepare_embeddings(self_: &GeminiClient, data: &EmbeddingsData) -> Result<Req
            json!({
                "model": model_id,
                "content": {
-                    "parts": [
+                    "parts": [{ "text": text }]
                        {
                            "text": text
                        }
                    ]
                },
            })
        })
        .collect();
-    let body = json!({
+    let body = json!({ "requests": requests });
-        "requests": requests,
+    let mut request_data = RequestData::new(url, body);
    });
-    let request_data = RequestData::new(url, body);
+    if uses_oauth {
        let provider = GeminiOAuthProvider;
        let ready = oauth::prepare_oauth_access_token(client, &provider, self_.name()).await?;
        if !ready {
            bail!(
                "OAuth configured but no tokens found for '{}'. Run: 'loki --authenticate {}' or '.authenticate' in the REPL",
                self_.name(),
                self_.name()
            );
        }
        let token = get_access_token(self_.name())?;
        request_data.bearer_auth(token);
    }
    Ok(request_data)
 }
@@ -0,0 +1,49 @@
 use super::oauth::{OAuthProvider, TokenRequestFormat};
 pub struct GeminiOAuthProvider;
 const GEMINI_CLIENT_ID: &str =
    "50826443741-upqcebrs4gctqht1f08ku46qlbirkdsj.apps.googleusercontent.com";
 const GEMINI_CLIENT_SECRET: &str = "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";
 impl OAuthProvider for GeminiOAuthProvider {
    fn provider_name(&self) -> &str {
        "gemini"
    }
    fn client_id(&self) -> &str {
        GEMINI_CLIENT_ID
    }
    fn authorize_url(&self) -> &str {
        "https://accounts.google.com/o/oauth2/v2/auth"
    }
    fn token_url(&self) -> &str {
        "https://oauth2.googleapis.com/token"
    }
    fn redirect_uri(&self) -> &str {
        ""
    }
    fn scopes(&self) -> &str {
        "https://www.googleapis.com/auth/generative-language.peruserquota https://www.googleapis.com/auth/generative-language.retriever https://www.googleapis.com/auth/userinfo.email"
    }
    fn client_secret(&self) -> Option<&str> {
        Some(GEMINI_CLIENT_SECRET)
    }
    fn extra_authorize_params(&self) -> Vec<(&str, &str)> {
        vec![("access_type", "offline"), ("prompt", "consent")]
    }
    fn token_request_format(&self) -> TokenRequestFormat {
        TokenRequestFormat::FormUrlEncoded
    }
    fn uses_localhost_redirect(&self) -> bool {
        true
    }
 }
@@ -90,7 +90,7 @@ macro_rules! register_client {
        pub async fn create_client_config(client: &str, vault: &$crate::vault::Vault) -> anyhow::Result<(String, serde_json::Value)> {
            $(
                if client == $client::NAME && client != $crate::client::OpenAICompatibleClient::NAME {
-                    return create_config(&$client::PROMPTS, $client::NAME, vault).await
+                    return $client::create_client_config(vault).await
                }
            )+
            if let Some(ret) = create_openai_compatible_client_config(client).await? {
@@ -101,7 +101,7 @@ macro_rules! register_client {
        static ALL_CLIENT_NAMES: std::sync::OnceLock<Vec<String>> = std::sync::OnceLock::new();
-        pub fn list_client_names(config: &$crate::config::Config) -> Vec<&'static String> {
+        pub fn list_client_names(config: &$crate::config::AppConfig) -> Vec<&'static String> {
            let names = ALL_CLIENT_NAMES.get_or_init(|| {
                config
                    .clients
@@ -117,7 +117,7 @@ macro_rules! register_client {
        static ALL_MODELS: std::sync::OnceLock<Vec<$crate::client::Model>> = std::sync::OnceLock::new();
-        pub fn list_all_models(config: &$crate::config::Config) -> Vec<&'static $crate::client::Model> {
+        pub fn list_all_models(config: &$crate::config::AppConfig) -> Vec<&'static $crate::client::Model> {
            let models = ALL_MODELS.get_or_init(|| {
                config
                    .clients
@@ -131,7 +131,7 @@ macro_rules! register_client {
            models.iter().collect()
        }
-        pub fn list_models(config: &$crate::config::Config, model_type: $crate::client::ModelType) -> Vec<&'static $crate::client::Model> {
+        pub fn list_models(config: &$crate::config::AppConfig, model_type: $crate::client::ModelType) -> Vec<&'static $crate::client::Model> {
            list_all_models(config).into_iter().filter(|v| v.model_type() == model_type).collect()
        }
    };
@@ -218,6 +218,44 @@ macro_rules! impl_client_trait {
    };
 }
 #[macro_export]
 macro_rules! create_client_config {
    ($prompts:expr) => {
        pub async fn create_client_config(
            vault: &$crate::vault::Vault,
        ) -> anyhow::Result<(String, serde_json::Value)> {
            $crate::client::create_config(&$prompts, Self::NAME, vault).await
        }
    };
 }
 #[macro_export]
 macro_rules! create_oauth_supported_client_config {
    () => {
        pub async fn create_client_config(vault: &$crate::vault::Vault) -> anyhow::Result<(String, serde_json::Value)> {
        let mut config = serde_json::json!({ "type": Self::NAME });
        let auth_method = inquire::Select::new(
            "Authentication method:",
            vec!["API Key", "OAuth"],
        )
        .prompt()?;
        if auth_method == "API Key" {
            let env_name = format!("{}_API_KEY", Self::NAME).to_ascii_uppercase();
            vault.add_secret(&env_name)?;
            config["api_key"] = format!("{{{{{env_name}}}}}").into();
        } else {
            config["auth"] = "oauth".into();
        }
        let model = $crate::client::set_client_models_config(&mut config, Self::NAME).await?;
        let clients = json!(vec![config]);
        Ok((model, clients))
      }
    }
 }
 #[macro_export]
 macro_rules! config_get_fn {
    ($field_name:ident, $fn_name:ident) => {
@@ -1,6 +1,9 @@
 mod access_token;
 mod claude_oauth;
 mod common;
 mod gemini_oauth;
 mod message;
 pub mod oauth;
 #[macro_use]
 mod macros;
 mod model;
@@ -3,7 +3,7 @@ use super::{
    message::{Message, MessageContent, MessageContentPart},
 };
-use crate::config::Config;
+use crate::config::AppConfig;
 use crate::utils::{estimate_token_length, strip_think_tag};
 use anyhow::{Result, bail};
@@ -44,7 +44,11 @@ impl Model {
            .collect()
    }
-    pub fn retrieve_model(config: &Config, model_id: &str, model_type: ModelType) -> Result<Self> {
+    pub fn retrieve_model(
        config: &AppConfig,
        model_id: &str,
        model_type: ModelType,
    ) -> Result<Self> {
        let models = list_all_models(config);
        let (client_name, model_name) = match model_id.split_once(':') {
            Some((client_name, model_name)) => {
@@ -177,6 +181,10 @@ impl Model {
        self.data.max_output_tokens
    }
    pub fn supports_function_calling(&self) -> bool {
        self.data.supports_function_calling
    }
    pub fn no_stream(&self) -> bool {
        self.data.no_stream
    }
@@ -0,0 +1,429 @@
 use super::ClientConfig;
 use super::access_token::{is_valid_access_token, set_access_token};
 use crate::config::paths;
 use anyhow::{Result, anyhow, bail};
 use base64::Engine;
 use base64::engine::general_purpose::URL_SAFE_NO_PAD;
 use chrono::Utc;
 use inquire::Text;
 use reqwest::{Client as ReqwestClient, RequestBuilder};
 use serde::{Deserialize, Serialize};
 use serde_json::Value;
 use sha2::{Digest, Sha256};
 use std::collections::HashMap;
 use std::fs;
 use std::io::{BufRead, BufReader, Write};
 use std::net::TcpListener;
 use url::Url;
 use uuid::Uuid;
 pub enum TokenRequestFormat {
    Json,
    FormUrlEncoded,
 }
 pub trait OAuthProvider: Send + Sync {
    fn provider_name(&self) -> &str;
    fn client_id(&self) -> &str;
    fn authorize_url(&self) -> &str;
    fn token_url(&self) -> &str;
    fn redirect_uri(&self) -> &str;
    fn scopes(&self) -> &str;
    fn client_secret(&self) -> Option<&str> {
        None
    }
    fn extra_authorize_params(&self) -> Vec<(&str, &str)> {
        vec![]
    }
    fn token_request_format(&self) -> TokenRequestFormat {
        TokenRequestFormat::Json
    }
    fn uses_localhost_redirect(&self) -> bool {
        false
    }
    fn extra_token_headers(&self) -> Vec<(&str, &str)> {
        vec![]
    }
    fn extra_request_headers(&self) -> Vec<(&str, &str)> {
        vec![]
    }
 }
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct OAuthTokens {
    pub access_token: String,
    pub refresh_token: String,
    pub expires_at: i64,
 }
 pub async fn run_oauth_flow(provider: &dyn OAuthProvider, client_name: &str) -> Result<()> {
    let random_bytes: [u8; 32] = rand::random::<[u8; 32]>();
    let code_verifier = URL_SAFE_NO_PAD.encode(random_bytes);
    let mut hasher = Sha256::new();
    hasher.update(code_verifier.as_bytes());
    let code_challenge = URL_SAFE_NO_PAD.encode(hasher.finalize());
    let state = Uuid::new_v4().to_string();
    let redirect_uri = if provider.uses_localhost_redirect() {
        let listener = TcpListener::bind("127.0.0.1:0")?;
        let port = listener.local_addr()?.port();
        let uri = format!("http://127.0.0.1:{port}/callback");
        drop(listener);
        uri
    } else {
        provider.redirect_uri().to_string()
    };
    let encoded_scopes = urlencoding::encode(provider.scopes());
    let encoded_redirect = urlencoding::encode(&redirect_uri);
    let mut authorize_url = format!(
        "{}?client_id={}&response_type=code&scope={}&redirect_uri={}&code_challenge={}&code_challenge_method=S256&state={}",
        provider.authorize_url(),
        provider.client_id(),
        encoded_scopes,
        encoded_redirect,
        code_challenge,
        state
    );
    for (key, value) in provider.extra_authorize_params() {
        authorize_url.push_str(&format!(
            "&{}={}",
            urlencoding::encode(key),
            urlencoding::encode(value)
        ));
    }
    println!(
        "\nOpen this URL to authenticate with {} (client '{}'):\n",
        provider.provider_name(),
        client_name
    );
    println!("  {authorize_url}\n");
    let _ = open::that(&authorize_url);
    let (code, returned_state) = if provider.uses_localhost_redirect() {
        listen_for_oauth_callback(&redirect_uri)?
    } else {
        let input = Text::new("Paste the authorization code:").prompt()?;
        let parts: Vec<&str> = input.splitn(2, '#').collect();
        if parts.len() != 2 {
            bail!("Invalid authorization code format. Expected format: <code>#<state>");
        }
        (parts[0].to_string(), parts[1].to_string())
    };
    if returned_state != state {
        bail!(
            "OAuth state mismatch: expected '{state}', got '{returned_state}'. \
             This may indicate a CSRF attack or a stale authorization attempt."
        );
    }
    let client = ReqwestClient::new();
    let request = build_token_request(
        &client,
        provider,
        &[
            ("grant_type", "authorization_code"),
            ("client_id", provider.client_id()),
            ("code", &code),
            ("code_verifier", &code_verifier),
            ("redirect_uri", &redirect_uri),
            ("state", &state),
        ],
    );
    let response: Value = request.send().await?.json().await?;
    let access_token = response["access_token"]
        .as_str()
        .ok_or_else(|| anyhow!("Missing access_token in response: {response}"))?
        .to_string();
    let refresh_token = response["refresh_token"]
        .as_str()
        .ok_or_else(|| anyhow!("Missing refresh_token in response: {response}"))?
        .to_string();
    let expires_in = response["expires_in"]
        .as_i64()
        .ok_or_else(|| anyhow!("Missing expires_in in response: {response}"))?;
    let expires_at = Utc::now().timestamp() + expires_in;
    let tokens = OAuthTokens {
        access_token,
        refresh_token,
        expires_at,
    };
    save_oauth_tokens(client_name, &tokens)?;
    println!(
        "Successfully authenticated client '{}' with {} via OAuth. Tokens saved.",
        client_name,
        provider.provider_name()
    );
    Ok(())
 }
 pub fn load_oauth_tokens(client_name: &str) -> Option<OAuthTokens> {
    let path = paths::token_file(client_name);
    let content = fs::read_to_string(path).ok()?;
    serde_json::from_str(&content).ok()
 }
 fn save_oauth_tokens(client_name: &str, tokens: &OAuthTokens) -> Result<()> {
    let path = paths::token_file(client_name);
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent)?;
    }
    let json = serde_json::to_string_pretty(tokens)?;
    fs::write(path, json)?;
    Ok(())
 }
 pub async fn refresh_oauth_token(
    client: &ReqwestClient,
    provider: &impl OAuthProvider,
    client_name: &str,
    tokens: &OAuthTokens,
 ) -> Result<OAuthTokens> {
    let request = build_token_request(
        client,
        provider,
        &[
            ("grant_type", "refresh_token"),
            ("client_id", provider.client_id()),
            ("refresh_token", &tokens.refresh_token),
        ],
    );
    let response: Value = request.send().await?.json().await?;
    let access_token = response["access_token"]
        .as_str()
        .ok_or_else(|| anyhow!("Missing access_token in refresh response: {response}"))?
        .to_string();
    let refresh_token = response["refresh_token"]
        .as_str()
        .map(|s| s.to_string())
        .unwrap_or_else(|| tokens.refresh_token.clone());
    let expires_in = response["expires_in"]
        .as_i64()
        .ok_or_else(|| anyhow!("Missing expires_in in refresh response: {response}"))?;
    let expires_at = Utc::now().timestamp() + expires_in;
    let new_tokens = OAuthTokens {
        access_token,
        refresh_token,
        expires_at,
    };
    save_oauth_tokens(client_name, &new_tokens)?;
    Ok(new_tokens)
 }
 pub async fn prepare_oauth_access_token(
    client: &ReqwestClient,
    provider: &impl OAuthProvider,
    client_name: &str,
 ) -> Result<bool> {
    if is_valid_access_token(client_name) {
        return Ok(true);
    }
    let tokens = match load_oauth_tokens(client_name) {
        Some(t) => t,
        None => return Ok(false),
    };
    let tokens = if Utc::now().timestamp() >= tokens.expires_at {
        refresh_oauth_token(client, provider, client_name, &tokens).await?
    } else {
        tokens
    };
    set_access_token(client_name, tokens.access_token.clone(), tokens.expires_at);
    Ok(true)
 }
 fn build_token_request(
    client: &ReqwestClient,
    provider: &(impl OAuthProvider + ?Sized),
    params: &[(&str, &str)],
 ) -> RequestBuilder {
    let mut request = match provider.token_request_format() {
        TokenRequestFormat::Json => {
            let body: serde_json::Map<String, Value> = params
                .iter()
                .map(|(k, v)| (k.to_string(), Value::String(v.to_string())))
                .collect();
            if let Some(secret) = provider.client_secret() {
                let mut body = body;
                body.insert(
                    "client_secret".to_string(),
                    Value::String(secret.to_string()),
                );
                client.post(provider.token_url()).json(&body)
            } else {
                client.post(provider.token_url()).json(&body)
            }
        }
        TokenRequestFormat::FormUrlEncoded => {
            let mut form: HashMap<String, String> = params
                .iter()
                .map(|(k, v)| (k.to_string(), v.to_string()))
                .collect();
            if let Some(secret) = provider.client_secret() {
                form.insert("client_secret".to_string(), secret.to_string());
            }
            client.post(provider.token_url()).form(&form)
        }
    };
    for (key, value) in provider.extra_token_headers() {
        request = request.header(key, value);
    }
    request
 }
 fn listen_for_oauth_callback(redirect_uri: &str) -> Result<(String, String)> {
    let url: Url = redirect_uri.parse()?;
    let host = url.host_str().unwrap_or("127.0.0.1");
    let port = url
        .port()
        .ok_or_else(|| anyhow!("No port in redirect URI"))?;
    let path = url.path();
    println!("Waiting for OAuth callback on {redirect_uri} ...\n");
    let listener = TcpListener::bind(format!("{host}:{port}"))?;
    let (mut stream, _) = listener.accept()?;
    let mut reader = BufReader::new(&stream);
    let mut request_line = String::new();
    reader.read_line(&mut request_line)?;
    let request_path = request_line
        .split_whitespace()
        .nth(1)
        .ok_or_else(|| anyhow!("Malformed HTTP request from OAuth callback"))?;
    let full_url = format!("http://{host}:{port}{request_path}");
    let parsed: Url = full_url.parse()?;
    if !parsed.path().starts_with(path) {
        bail!("Unexpected callback path: {}", parsed.path());
    }
    let code = parsed
        .query_pairs()
        .find(|(k, _)| k == "code")
        .map(|(_, v)| v.to_string())
        .ok_or_else(|| {
            let error = parsed
                .query_pairs()
                .find(|(k, _)| k == "error")
                .map(|(_, v)| v.to_string())
                .unwrap_or_else(|| "unknown".to_string());
            anyhow!("OAuth callback returned error: {error}")
        })?;
    let returned_state = parsed
        .query_pairs()
        .find(|(k, _)| k == "state")
        .map(|(_, v)| v.to_string())
        .ok_or_else(|| anyhow!("Missing state parameter in OAuth callback"))?;
    let response_body = "<html><body><h2>Authentication successful!</h2><p>You can close this tab and return to your terminal.</p></body></html>";
    let response = format!(
        "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{}",
        response_body.len(),
        response_body
    );
    stream.write_all(response.as_bytes())?;
    Ok((code, returned_state))
 }
 pub fn get_oauth_provider(provider_type: &str) -> Option<Box<dyn OAuthProvider>> {
    match provider_type {
        "claude" => Some(Box::new(super::claude_oauth::ClaudeOAuthProvider)),
        "gemini" => Some(Box::new(super::gemini_oauth::GeminiOAuthProvider)),
        _ => None,
    }
 }
 pub fn resolve_provider_type(client_name: &str, clients: &[ClientConfig]) -> Option<&'static str> {
    for client_config in clients {
        let (config_name, provider_type, auth) = client_config_info(client_config);
        if config_name == client_name {
            if auth == Some("oauth") && get_oauth_provider(provider_type).is_some() {
                return Some(provider_type);
            }
            return None;
        }
    }
    None
 }
 pub fn list_oauth_capable_clients(clients: &[ClientConfig]) -> Vec<String> {
    clients
        .iter()
        .filter_map(|client_config| {
            let (name, provider_type, auth) = client_config_info(client_config);
            if auth == Some("oauth") && get_oauth_provider(provider_type).is_some() {
                Some(name.to_string())
            } else {
                None
            }
        })
        .collect()
 }
 fn client_config_info(client_config: &ClientConfig) -> (&str, &'static str, Option<&str>) {
    match client_config {
        ClientConfig::ClaudeConfig(c) => (
            c.name.as_deref().unwrap_or("claude"),
            "claude",
            c.auth.as_deref(),
        ),
        ClientConfig::OpenAIConfig(c) => (c.name.as_deref().unwrap_or("openai"), "openai", None),
        ClientConfig::OpenAICompatibleConfig(c) => (
            c.name.as_deref().unwrap_or("openai-compatible"),
            "openai-compatible",
            None,
        ),
        ClientConfig::GeminiConfig(c) => (
            c.name.as_deref().unwrap_or("gemini"),
            "gemini",
            c.auth.as_deref(),
        ),
        ClientConfig::CohereConfig(c) => (c.name.as_deref().unwrap_or("cohere"), "cohere", None),
        ClientConfig::AzureOpenAIConfig(c) => (
            c.name.as_deref().unwrap_or("azure-openai"),
            "azure-openai",
            None,
        ),
        ClientConfig::VertexAIConfig(c) => {
            (c.name.as_deref().unwrap_or("vertexai"), "vertexai", None)
        }
        ClientConfig::BedrockConfig(c) => (c.name.as_deref().unwrap_or("bedrock"), "bedrock", None),
        ClientConfig::Unknown => ("unknown", "unknown", None),
    }
 }
@@ -2,10 +2,10 @@ use super::*;
 use crate::utils::strip_think_tag;
-use anyhow::{bail, Context, Result};
+use anyhow::{Context, Result, bail};
 use reqwest::RequestBuilder;
 use serde::Deserialize;
-use serde_json::{json, Value};
+use serde_json::{Value, json};
 const API_BASE: &str = "https://api.openai.com/v1";
@@ -25,7 +25,7 @@ impl OpenAIClient {
    config_get_fn!(api_key, get_api_key);
    config_get_fn!(api_base, get_api_base);
-    pub const PROMPTS: [PromptAction<'static>; 1] = [("api_key", "API Key", None, true)];
+    create_client_config!([("api_key", "API Key", None, true)]);
 }
 impl_client_trait!(
@@ -114,7 +114,9 @@ pub async fn openai_chat_completions_streaming(
                    function_arguments = String::from("{}");
                }
                let arguments: Value = function_arguments.parse().with_context(|| {
-                    format!("Tool call '{function_name}' has non-JSON arguments '{function_arguments}'")
+                    format!(
                        "Tool call '{function_name}' has non-JSON arguments '{function_arguments}'"
                    )
                })?;
                handler.tool_call(ToolCall::new(
                    function_name.clone(),
@@ -21,7 +21,7 @@ impl OpenAICompatibleClient {
    config_get_fn!(api_base, get_api_base);
    config_get_fn!(api_key, get_api_key);
-    pub const PROMPTS: [PromptAction<'static>; 0] = [];
+    create_client_config!([]);
 }
 impl_client_trait!(
@@ -342,7 +342,7 @@ mod tests {
    use bytes::Bytes;
    use futures_util::stream;
-    use rand::Rng;
+    use rand::random_range;
    use serde_json::json;
    #[test]
@@ -392,10 +392,9 @@ mod tests {
    }
    fn split_chunks(text: &str) -> Vec<Vec<u8>> {
        let mut rng = rand::rng();
        let len = text.len();
-        let cut1 = rng.random_range(1..len - 1);
+        let cut1 = random_range(1..len - 1);
-        let cut2 = rng.random_range(cut1 + 1..len);
+        let cut2 = random_range(cut1 + 1..len);
        let chunk1 = text.as_bytes()[..cut1].to_vec();
        let chunk2 = text.as_bytes()[cut1..cut2].to_vec();
        let chunk3 = text.as_bytes()[cut2..].to_vec();
@@ -3,11 +3,11 @@ use super::claude::*;
 use super::openai::*;
 use super::*;
-use anyhow::{anyhow, bail, Context, Result};
+use anyhow::{Context, Result, anyhow, bail};
 use chrono::{Duration, Utc};
 use reqwest::{Client as ReqwestClient, RequestBuilder};
 use serde::Deserialize;
-use serde_json::{json, Value};
+use serde_json::{Value, json};
 use std::{path::PathBuf, str::FromStr};
 #[derive(Debug, Clone, Deserialize, Default)]
@@ -26,10 +26,10 @@ impl VertexAIClient {
    config_get_fn!(project_id, get_project_id);
    config_get_fn!(location, get_location);
-    pub const PROMPTS: [PromptAction<'static>; 2] = [
+    create_client_config!([
        ("project_id", "Project ID", None, false),
        ("location", "Location", None, false),
-    ];
+    ]);
 }
 #[async_trait::async_trait]
@@ -99,9 +99,13 @@ fn prepare_chat_completions(
    let access_token = get_access_token(self_.name())?;
    let base_url = if location == "global" {
-        format!("https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/global/publishers")
+        format!(
            "https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/global/publishers"
        )
    } else {
-        format!("https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers")
+        format!(
            "https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers"
        )
    };
    let model_name = self_.model.real_name();
@@ -158,9 +162,13 @@ fn prepare_embeddings(self_: &VertexAIClient, data: &EmbeddingsData) -> Result<R
    let access_token = get_access_token(self_.name())?;
    let base_url = if location == "global" {
-        format!("https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/global/publishers")
+        format!(
            "https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/global/publishers"
        )
    } else {
-        format!("https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers")
+        format!(
            "https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers"
        )
    };
    let url = format!(
        "{base_url}/google/models/{}:predict",
@@ -220,12 +228,12 @@ pub async fn gemini_chat_completions_streaming(
                        part["functionCall"]["args"].as_object(),
                    ) {
                        let thought_signature = part["thoughtSignature"]
-                          .as_str()
+                            .as_str()
-                          .or_else(|| part["thought_signature"].as_str())
+                            .or_else(|| part["thought_signature"].as_str())
-                          .map(|s| s.to_string());
+                            .map(|s| s.to_string());
                        handler.tool_call(
                            ToolCall::new(name.to_string(), json!(args), None)
-                              .with_thought_signature(thought_signature),
+                                .with_thought_signature(thought_signature),
                        )?;
                    }
                }
@@ -288,12 +296,12 @@ fn gemini_extract_chat_completions_text(data: &Value) -> Result<ChatCompletionsO
                part["functionCall"]["args"].as_object(),
            ) {
                let thought_signature = part["thoughtSignature"]
-                  .as_str()
+                    .as_str()
-                  .or_else(|| part["thought_signature"].as_str())
+                    .or_else(|| part["thought_signature"].as_str())
-                  .map(|s| s.to_string());
+                    .map(|s| s.to_string());
                tool_calls.push(
                    ToolCall::new(name.to_string(), json!(args), None)
-                      .with_thought_signature(thought_signature),
+                        .with_thought_signature(thought_signature),
                );
            }
        }
@@ -6,6 +6,7 @@ use crate::{
    function::{Functions, run_llm_function},
 };
 use crate::config::paths;
 use crate::config::prompts::{
    DEFAULT_SPAWN_INSTRUCTIONS, DEFAULT_TEAMMATE_INSTRUCTIONS, DEFAULT_TODO_INSTRUCTIONS,
    DEFAULT_USER_INTERACTION_INSTRUCTIONS,
@@ -47,7 +48,7 @@ impl Agent {
    pub fn install_builtin_agents() -> Result<()> {
        info!(
            "Installing built-in agents in {}",
-            Config::agents_data_dir().display()
+            paths::agents_data_dir().display()
        );
        for file in AgentAssets::iter() {
@@ -56,7 +57,7 @@ impl Agent {
            let embedded_file = AgentAssets::get(&file)
                .ok_or_else(|| anyhow!("Failed to load embedded agent file: {}", file.as_ref()))?;
            let content = unsafe { std::str::from_utf8_unchecked(&embedded_file.data) };
-            let file_path = Config::agents_data_dir().join(file.as_ref());
+            let file_path = paths::agents_data_dir().join(file.as_ref());
            let file_extension = file_path
                .extension()
                .and_then(OsStr::to_str)
@@ -92,10 +93,10 @@ impl Agent {
        name: &str,
        abort_signal: AbortSignal,
    ) -> Result<Self> {
-        let agent_data_dir = Config::agent_data_dir(name);
+        let agent_data_dir = paths::agent_data_dir(name);
        let loaders = config.read().document_loaders.clone();
-        let rag_path = Config::agent_rag_file(name, DEFAULT_AGENT_NAME);
+        let rag_path = paths::agent_rag_file(name, DEFAULT_AGENT_NAME);
-        let config_path = Config::agent_config_file(name);
+        let config_path = paths::agent_config_file(name);
        let mut agent_config = if config_path.exists() {
            AgentConfig::load(&config_path)?
        } else {
@@ -138,7 +139,9 @@ impl Agent {
        let model = {
            let config = config.read();
            match agent_config.model_id.as_ref() {
-                Some(model_id) => Model::retrieve_model(&config, model_id, ModelType::Chat)?,
+                Some(model_id) => {
                    Model::retrieve_model(&config.to_app_config(), model_id, ModelType::Chat)?
                }
                None => {
                    if agent_config.temperature.is_none() {
                        agent_config.temperature = config.temperature;
@@ -295,11 +298,11 @@ impl Agent {
        let mut config = self.config.clone();
        config.instructions = self.interpolated_instructions();
        value["definition"] = json!(config);
-        value["data_dir"] = Config::agent_data_dir(&self.name)
+        value["data_dir"] = paths::agent_data_dir(&self.name)
            .display()
            .to_string()
            .into();
-        value["config_file"] = Config::agent_config_file(&self.name)
+        value["config_file"] = paths::agent_config_file(&self.name)
            .display()
            .to_string()
            .into();
@@ -476,6 +479,11 @@ impl Agent {
        self.todo_list.mark_done(id)
    }
    pub fn clear_todo_list(&mut self) {
        self.todo_list.clear();
        self.reset_continuation();
    }
    pub fn continuation_prompt(&self) -> String {
        self.config.continuation_prompt.clone().unwrap_or_else(|| {
            formatdoc! {"
@@ -788,7 +796,7 @@ pub struct AgentVariable {
 }
 pub fn list_agents() -> Vec<String> {
-    let agents_data_dir = Config::agents_data_dir();
+    let agents_data_dir = paths::agents_data_dir();
    if !agents_data_dir.exists() {
        return vec![];
    }
@@ -808,7 +816,7 @@ pub fn list_agents() -> Vec<String> {
 }
 pub fn complete_agent_variables(agent_name: &str) -> Vec<(String, Option<String>)> {
-    let config_path = Config::agent_config_file(agent_name);
+    let config_path = paths::agent_config_file(agent_name);
    if !config_path.exists() {
        return vec![];
    }
@@ -0,0 +1,93 @@
 //! Per-request agent runtime: supervisor, inbox, escalation queue,
 //! optional todo list, shared RAG, and sub-agent wiring.
 //!
 //! `AgentRuntime` is present on [`RequestContext`](super::RequestContext)
 //! only when an agent is active. It holds the agent-specific state
 //! that today lives as flat fields on `Config` (`supervisor`, `inbox`,
 //! `root_escalation_queue`, `self_agent_id`, `current_depth`,
 //! `parent_supervisor`) plus the shared agent RAG (served from
 //! [`RagCache`](super::rag_cache::RagCache) via `RagKey::Agent`).
 //!
 //! # Phase 1 Step 6.5 scope
 //!
 //! This file introduces the type scaffolding. Agent activation
 //! (`Config::use_agent`) still populates the flat fields on
 //! `Config` directly; the new `AgentRuntime` field on
 //! `RequestContext` stays empty during the bridge window and gets
 //! wired up in Step 8 when entry points migrate.
 //!
 //! The `todo_list: Option<TodoList>` field is `Option` (not just
 //! `TodoList::default()`) as an opportunistic tightening during the
 //! Step 6.5 scaffolding: today's code always allocates a default
 //! `TodoList` regardless of whether `auto_continue` is enabled. When
 //! callers migrate to build `AgentRuntime` instances in Step 8, they
 //! will set `todo_list = Some(...)` only when `spec.auto_continue`
 //! is true. See `docs/PHASE-1-IMPLEMENTATION-PLAN.md` Step 6.5 for
 //! the rationale.
 #![allow(dead_code)]
 use super::todo::TodoList;
 use crate::rag::Rag;
 use crate::supervisor::Supervisor;
 use crate::supervisor::escalation::EscalationQueue;
 use crate::supervisor::mailbox::Inbox;
 use parking_lot::RwLock;
 use std::sync::Arc;
 pub struct AgentRuntime {
    pub rag: Option<Arc<Rag>>,
    pub supervisor: Arc<RwLock<Supervisor>>,
    pub inbox: Arc<Inbox>,
    pub escalation_queue: Arc<EscalationQueue>,
    pub todo_list: Option<TodoList>,
    pub self_agent_id: String,
    pub parent_supervisor: Option<Arc<RwLock<Supervisor>>>,
    pub current_depth: usize,
    pub auto_continue_count: usize,
 }
 impl AgentRuntime {
    pub fn new(
        self_agent_id: String,
        supervisor: Arc<RwLock<Supervisor>>,
        inbox: Arc<Inbox>,
        escalation_queue: Arc<EscalationQueue>,
    ) -> Self {
        Self {
            rag: None,
            supervisor,
            inbox,
            escalation_queue,
            todo_list: None,
            self_agent_id,
            parent_supervisor: None,
            current_depth: 0,
            auto_continue_count: 0,
        }
    }
    pub fn with_rag(mut self, rag: Option<Arc<Rag>>) -> Self {
        self.rag = rag;
        self
    }
    pub fn with_todo_list(mut self, todo_list: Option<TodoList>) -> Self {
        self.todo_list = todo_list;
        self
    }
    pub fn with_parent_supervisor(
        mut self,
        parent_supervisor: Option<Arc<RwLock<Supervisor>>>,
    ) -> Self {
        self.parent_supervisor = parent_supervisor;
        self
    }
    pub fn with_depth(mut self, depth: usize) -> Self {
        self.current_depth = depth;
        self
    }
 }
@@ -0,0 +1,452 @@
 //! Immutable, server-wide application configuration.
 //!
 //! `AppConfig` contains the settings loaded from `config.yaml` that are
 //! global to the Loki process: LLM provider configs, UI preferences, tool
 //! and MCP settings, RAG defaults, etc.
 //!
 //! This is Phase 1, Step 0 of the REST API refactor: the struct is
 //! introduced alongside the existing [`Config`](super::Config) and is not
 //! yet wired into the runtime. See `docs/PHASE-1-IMPLEMENTATION-PLAN.md`
 //! for the full migration plan.
 //!
 //! # Relationship to `Config`
 //!
 //! `AppConfig` mirrors the **serialized** fields of [`Config`] — that is,
 //! every field that is NOT marked `#[serde(skip)]`. The deserialization
 //! shape is identical so an existing `config.yaml` can be loaded into
 //! either type without modification.
 //!
 //! Runtime-only state (current role, session, agent, supervisor, etc.)
 //! lives on [`RequestContext`](super::request_context::RequestContext).
 use crate::client::ClientConfig;
 use crate::render::{MarkdownRender, RenderOptions};
 use crate::utils::{IS_STDOUT_TERMINAL, NO_COLOR, decode_bin, get_env_name};
 use anyhow::{Context, Result, anyhow};
 use indexmap::IndexMap;
 use serde::Deserialize;
 use std::collections::HashMap;
 use std::env;
 use std::path::PathBuf;
 use syntect::highlighting::ThemeSet;
 use terminal_colorsaurus::{ColorScheme, QueryOptions, color_scheme};
 #[allow(dead_code)]
 #[derive(Debug, Clone, Deserialize)]
 #[serde(default)]
 pub struct AppConfig {
    #[serde(rename(serialize = "model", deserialize = "model"))]
    #[serde(default)]
    pub model_id: String,
    pub temperature: Option<f64>,
    pub top_p: Option<f64>,
    pub dry_run: bool,
    pub stream: bool,
    pub save: bool,
    pub keybindings: String,
    pub editor: Option<String>,
    pub wrap: Option<String>,
    pub wrap_code: bool,
    pub(crate) vault_password_file: Option<PathBuf>,
    pub function_calling_support: bool,
    pub mapping_tools: IndexMap<String, String>,
    pub enabled_tools: Option<String>,
    pub visible_tools: Option<Vec<String>>,
    pub mcp_server_support: bool,
    pub mapping_mcp_servers: IndexMap<String, String>,
    pub enabled_mcp_servers: Option<String>,
    pub repl_prelude: Option<String>,
    pub cmd_prelude: Option<String>,
    pub agent_session: Option<String>,
    pub save_session: Option<bool>,
    pub compression_threshold: usize,
    pub summarization_prompt: Option<String>,
    pub summary_context_prompt: Option<String>,
    pub rag_embedding_model: Option<String>,
    pub rag_reranker_model: Option<String>,
    pub rag_top_k: usize,
    pub rag_chunk_size: Option<usize>,
    pub rag_chunk_overlap: Option<usize>,
    pub rag_template: Option<String>,
    #[serde(default)]
    pub document_loaders: HashMap<String, String>,
    pub highlight: bool,
    pub theme: Option<String>,
    pub left_prompt: Option<String>,
    pub right_prompt: Option<String>,
    pub user_agent: Option<String>,
    pub save_shell_history: bool,
    pub sync_models_url: Option<String>,
    pub clients: Vec<ClientConfig>,
 }
 impl Default for AppConfig {
    fn default() -> Self {
        Self {
            model_id: Default::default(),
            temperature: None,
            top_p: None,
            dry_run: false,
            stream: true,
            save: false,
            keybindings: "emacs".into(),
            editor: None,
            wrap: None,
            wrap_code: false,
            vault_password_file: None,
            function_calling_support: true,
            mapping_tools: Default::default(),
            enabled_tools: None,
            visible_tools: None,
            mcp_server_support: true,
            mapping_mcp_servers: Default::default(),
            enabled_mcp_servers: None,
            repl_prelude: None,
            cmd_prelude: None,
            agent_session: None,
            save_session: None,
            compression_threshold: 4000,
            summarization_prompt: None,
            summary_context_prompt: None,
            rag_embedding_model: None,
            rag_reranker_model: None,
            rag_top_k: 5,
            rag_chunk_size: None,
            rag_chunk_overlap: None,
            rag_template: None,
            document_loaders: Default::default(),
            highlight: true,
            theme: None,
            left_prompt: None,
            right_prompt: None,
            user_agent: None,
            save_shell_history: true,
            sync_models_url: None,
            clients: vec![],
        }
    }
 }
 #[allow(dead_code)]
 impl AppConfig {
    pub fn vault_password_file(&self) -> PathBuf {
        match &self.vault_password_file {
            Some(path) => match path.exists() {
                true => path.clone(),
                false => gman::config::Config::local_provider_password_file(),
            },
            None => gman::config::Config::local_provider_password_file(),
        }
    }
    pub fn editor(&self) -> Result<String> {
        super::EDITOR.get_or_init(move || {
            let editor = self.editor.clone()
                .or_else(|| env::var("VISUAL").ok().or_else(|| env::var("EDITOR").ok()))
                .unwrap_or_else(|| {
                    if cfg!(windows) {
                        "notepad".to_string()
                    } else {
                        "nano".to_string()
                    }
                });
            which::which(&editor).ok().map(|_| editor)
        })
            .clone()
            .ok_or_else(|| anyhow!("Editor not found. Please add the `editor` configuration or set the $EDITOR or $VISUAL environment variable."))
    }
    pub fn sync_models_url(&self) -> String {
        self.sync_models_url
            .clone()
            .unwrap_or_else(|| super::SYNC_MODELS_URL.into())
    }
    pub fn light_theme(&self) -> bool {
        matches!(self.theme.as_deref(), Some("light"))
    }
    pub fn render_options(&self) -> Result<RenderOptions> {
        let theme = if self.highlight {
            let theme_mode = if self.light_theme() { "light" } else { "dark" };
            let theme_filename = format!("{theme_mode}.tmTheme");
            let theme_path = super::paths::local_path(&theme_filename);
            if theme_path.exists() {
                let theme = ThemeSet::get_theme(&theme_path)
                    .with_context(|| format!("Invalid theme at '{}'", theme_path.display()))?;
                Some(theme)
            } else {
                let theme = if self.light_theme() {
                    decode_bin(super::LIGHT_THEME).context("Invalid builtin light theme")?
                } else {
                    decode_bin(super::DARK_THEME).context("Invalid builtin dark theme")?
                };
                Some(theme)
            }
        } else {
            None
        };
        let wrap = if *IS_STDOUT_TERMINAL {
            self.wrap.clone()
        } else {
            None
        };
        let truecolor = matches!(
            env::var("COLORTERM").as_ref().map(|v| v.as_str()),
            Ok("truecolor")
        );
        Ok(RenderOptions::new(theme, wrap, self.wrap_code, truecolor))
    }
    pub fn print_markdown(&self, text: &str) -> Result<()> {
        if *IS_STDOUT_TERMINAL {
            let render_options = self.render_options()?;
            let mut markdown_render = MarkdownRender::init(render_options)?;
            println!("{}", markdown_render.render(text));
        } else {
            println!("{text}");
        }
        Ok(())
    }
    pub fn rag_template(&self, embeddings: &str, sources: &str, text: &str) -> String {
        if embeddings.is_empty() {
            return text.to_string();
        }
        self.rag_template
            .as_deref()
            .unwrap_or(super::RAG_TEMPLATE)
            .replace("__CONTEXT__", embeddings)
            .replace("__SOURCES__", sources)
            .replace("__INPUT__", text)
    }
 }
 #[allow(dead_code)]
 impl AppConfig {
    pub fn set_wrap(&mut self, value: &str) -> Result<()> {
        if value == "no" {
            self.wrap = None;
        } else if value == "auto" {
            self.wrap = Some(value.into());
        } else {
            value
                .parse::<u16>()
                .map_err(|_| anyhow!("Invalid wrap value"))?;
            self.wrap = Some(value.into())
        }
        Ok(())
    }
    pub fn setup_document_loaders(&mut self) {
        [("pdf", "pdftotext $1 -"), ("docx", "pandoc --to plain $1")]
            .into_iter()
            .for_each(|(k, v)| {
                let (k, v) = (k.to_string(), v.to_string());
                self.document_loaders.entry(k).or_insert(v);
            });
    }
    pub fn setup_user_agent(&mut self) {
        if let Some("auto") = self.user_agent.as_deref() {
            self.user_agent = Some(format!(
                "{}/{}",
                env!("CARGO_CRATE_NAME"),
                env!("CARGO_PKG_VERSION")
            ));
        }
    }
    pub fn load_envs(&mut self) {
        if let Ok(v) = env::var(get_env_name("model")) {
            self.model_id = v;
        }
        if let Some(v) = super::read_env_value::<f64>(&get_env_name("temperature")) {
            self.temperature = v;
        }
        if let Some(v) = super::read_env_value::<f64>(&get_env_name("top_p")) {
            self.top_p = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("dry_run")) {
            self.dry_run = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("stream")) {
            self.stream = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("save")) {
            self.save = v;
        }
        if let Ok(v) = env::var(get_env_name("keybindings"))
            && v == "vi"
        {
            self.keybindings = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("editor")) {
            self.editor = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("wrap")) {
            self.wrap = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("wrap_code")) {
            self.wrap_code = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("function_calling_support")) {
            self.function_calling_support = v;
        }
        if let Ok(v) = env::var(get_env_name("mapping_tools"))
            && let Ok(v) = serde_json::from_str(&v)
        {
            self.mapping_tools = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("enabled_tools")) {
            self.enabled_tools = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("mcp_server_support")) {
            self.mcp_server_support = v;
        }
        if let Ok(v) = env::var(get_env_name("mapping_mcp_servers"))
            && let Ok(v) = serde_json::from_str(&v)
        {
            self.mapping_mcp_servers = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("enabled_mcp_servers")) {
            self.enabled_mcp_servers = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("repl_prelude")) {
            self.repl_prelude = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("cmd_prelude")) {
            self.cmd_prelude = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("agent_session")) {
            self.agent_session = v;
        }
        if let Some(v) = super::read_env_bool(&get_env_name("save_session")) {
            self.save_session = v;
        }
        if let Some(Some(v)) =
            super::read_env_value::<usize>(&get_env_name("compression_threshold"))
        {
            self.compression_threshold = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("summarization_prompt")) {
            self.summarization_prompt = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("summary_context_prompt")) {
            self.summary_context_prompt = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("rag_embedding_model")) {
            self.rag_embedding_model = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("rag_reranker_model")) {
            self.rag_reranker_model = v;
        }
        if let Some(Some(v)) = super::read_env_value::<usize>(&get_env_name("rag_top_k")) {
            self.rag_top_k = v;
        }
        if let Some(v) = super::read_env_value::<usize>(&get_env_name("rag_chunk_size")) {
            self.rag_chunk_size = v;
        }
        if let Some(v) = super::read_env_value::<usize>(&get_env_name("rag_chunk_overlap")) {
            self.rag_chunk_overlap = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("rag_template")) {
            self.rag_template = v;
        }
        if let Ok(v) = env::var(get_env_name("document_loaders"))
            && let Ok(v) = serde_json::from_str(&v)
        {
            self.document_loaders = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("highlight")) {
            self.highlight = v;
        }
        if *NO_COLOR {
            self.highlight = false;
        }
        if self.highlight && self.theme.is_none() {
            if let Some(v) = super::read_env_value::<String>(&get_env_name("theme")) {
                self.theme = v;
            } else if *IS_STDOUT_TERMINAL
                && let Ok(color_scheme) = color_scheme(QueryOptions::default())
            {
                let theme = match color_scheme {
                    ColorScheme::Dark => "dark",
                    ColorScheme::Light => "light",
                };
                self.theme = Some(theme.into());
            }
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("left_prompt")) {
            self.left_prompt = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("right_prompt")) {
            self.right_prompt = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("user_agent")) {
            self.user_agent = v;
        }
        if let Some(Some(v)) = super::read_env_bool(&get_env_name("save_shell_history")) {
            self.save_shell_history = v;
        }
        if let Some(v) = super::read_env_value::<String>(&get_env_name("sync_models_url")) {
            self.sync_models_url = v;
        }
    }
 }
 #[allow(dead_code)]
 impl AppConfig {
    pub fn set_temperature_default(&mut self, value: Option<f64>) {
        self.temperature = value;
    }
    pub fn set_top_p_default(&mut self, value: Option<f64>) {
        self.top_p = value;
    }
    pub fn set_enabled_tools_default(&mut self, value: Option<String>) {
        self.enabled_tools = value;
    }
    pub fn set_enabled_mcp_servers_default(&mut self, value: Option<String>) {
        self.enabled_mcp_servers = value;
    }
    pub fn set_save_session_default(&mut self, value: Option<bool>) {
        self.save_session = value;
    }
    pub fn set_compression_threshold_default(&mut self, value: Option<usize>) {
        self.compression_threshold = value.unwrap_or_default();
    }
 }
@@ -0,0 +1,45 @@
 //! Shared global services for a running Loki process.
 //!
 //! `AppState` holds the services that are genuinely process-wide and
 //! immutable during request handling: the frozen [`AppConfig`], the
 //! credential [`Vault`](GlobalVault), the [`McpFactory`](super::mcp_factory::McpFactory)
 //! for MCP subprocess sharing, and the [`RagCache`](super::rag_cache::RagCache)
 //! for shared RAG instances. It is intended to be wrapped in `Arc`
 //! and shared across every [`RequestContext`] that a frontend (CLI,
 //! REPL, API) creates.
 //!
 //! This struct deliberately does **not** hold a live `McpRegistry`.
 //! MCP server processes are scoped to whichever `RoleLike`
 //! (role/session/agent) is currently active, because each scope may
 //! demand a different enabled server set. Live MCP processes are
 //! owned by per-scope
 //! [`ToolScope`](super::tool_scope::ToolScope)s on the
 //! [`RequestContext`] and acquired through `McpFactory`.
 //!
 //! # Phase 1 scope
 //!
 //! This is Phase 1 of the REST API refactor:
 //!
 //! * **Step 0** introduced this struct alongside the existing
 //!   [`Config`](super::Config)
 //! * **Step 6.5** added the `mcp_factory` and `rag_cache` fields
 //!
 //! Neither field is wired into the runtime yet — they exist as
 //! additive scaffolding that Step 8+ will connect when the entry
 //! points migrate. See `docs/PHASE-1-IMPLEMENTATION-PLAN.md`.
 use super::mcp_factory::McpFactory;
 use super::rag_cache::RagCache;
 use crate::config::AppConfig;
 use crate::vault::GlobalVault;
 use std::sync::Arc;
 #[allow(dead_code)]
 #[derive(Clone)]
 pub struct AppState {
    pub config: Arc<AppConfig>,
    pub vault: GlobalVault,
    pub mcp_factory: Arc<McpFactory>,
    pub rag_cache: Arc<RagCache>,
 }
@@ -0,0 +1,439 @@
 //! Transitional conversions between the legacy [`Config`] struct and the
 //! new [`AppConfig`] + [`RequestContext`] split.
 //!
 //! These methods are the bridge that lets Phase 1 migrate incrementally:
 //! the old `Config` stays intact while new code starts threading
 //! `AppState` + `RequestContext` through specific callsites. Each
 //! migrated callsite can go between the two representations using
 //! [`Config::to_app_config`], [`Config::to_request_context`], and
 //! [`Config::from_parts`] without the rest of the codebase noticing.
 //!
 //! This entire module is scheduled for deletion in Phase 1 Step 10,
 //! once every callsite has been migrated and the legacy `Config` is
 //! removed. Keeping the bridge isolated in one file makes that deletion
 //! a single `rm` + one `mod bridge;` removal.
 //!
 //! # Lossy fields
 //!
 //! The round-trip `Config → AppConfig + RequestContext → Config` is
 //! lossy for fields that the architecture plan is deliberately removing
 //! during the refactor. Specifically:
 //!
 //! * **`mcp_registry`** — today's process-wide registry is being replaced
 //!   by per-`ToolScope` `McpRuntime`s managed by a new `McpFactory` on
 //!   `AppState`. Neither `AppConfig` nor `RequestContext` holds an
 //!   `McpRegistry` field, so [`Config::from_parts`] reconstructs it as
 //!   `None`. Callers that need the registry during the migration window
 //!   must keep a reference to the original `Config`.
 //!
 //! All other runtime fields (including `model`, `functions`,
 //! `tool_call_tracker`) round-trip correctly.
 use super::{AppConfig, AppState, Config, RequestContext};
 use std::sync::Arc;
 #[allow(dead_code)]
 impl Config {
    /// Extract the serialized half of the `Config` into a fresh
    /// [`AppConfig`]. The original `Config` is unchanged.
    pub fn to_app_config(&self) -> AppConfig {
        AppConfig {
            model_id: self.model_id.clone(),
            temperature: self.temperature,
            top_p: self.top_p,
            dry_run: self.dry_run,
            stream: self.stream,
            save: self.save,
            keybindings: self.keybindings.clone(),
            editor: self.editor.clone(),
            wrap: self.wrap.clone(),
            wrap_code: self.wrap_code,
            vault_password_file: self.vault_password_file.clone(),
            function_calling_support: self.function_calling_support,
            mapping_tools: self.mapping_tools.clone(),
            enabled_tools: self.enabled_tools.clone(),
            visible_tools: self.visible_tools.clone(),
            mcp_server_support: self.mcp_server_support,
            mapping_mcp_servers: self.mapping_mcp_servers.clone(),
            enabled_mcp_servers: self.enabled_mcp_servers.clone(),
            repl_prelude: self.repl_prelude.clone(),
            cmd_prelude: self.cmd_prelude.clone(),
            agent_session: self.agent_session.clone(),
            save_session: self.save_session,
            compression_threshold: self.compression_threshold,
            summarization_prompt: self.summarization_prompt.clone(),
            summary_context_prompt: self.summary_context_prompt.clone(),
            rag_embedding_model: self.rag_embedding_model.clone(),
            rag_reranker_model: self.rag_reranker_model.clone(),
            rag_top_k: self.rag_top_k,
            rag_chunk_size: self.rag_chunk_size,
            rag_chunk_overlap: self.rag_chunk_overlap,
            rag_template: self.rag_template.clone(),
            document_loaders: self.document_loaders.clone(),
            highlight: self.highlight,
            theme: self.theme.clone(),
            left_prompt: self.left_prompt.clone(),
            right_prompt: self.right_prompt.clone(),
            user_agent: self.user_agent.clone(),
            save_shell_history: self.save_shell_history,
            sync_models_url: self.sync_models_url.clone(),
            clients: self.clients.clone(),
        }
    }
    /// Extract the runtime half of the `Config` into a fresh
    /// [`RequestContext`] that shares the provided [`AppState`].
    /// The original `Config` is unchanged. See the module docs for
    /// the single lossy field (`mcp_registry`).
    pub fn to_request_context(&self, app: Arc<AppState>) -> RequestContext {
        RequestContext {
            app,
            macro_flag: self.macro_flag,
            info_flag: self.info_flag,
            working_mode: self.working_mode,
            model: self.model.clone(),
            functions: self.functions.clone(),
            agent_variables: self.agent_variables.clone(),
            role: self.role.clone(),
            session: self.session.clone(),
            rag: self.rag.clone(),
            agent: self.agent.clone(),
            last_message: self.last_message.clone(),
            tool_call_tracker: self.tool_call_tracker.clone(),
            supervisor: self.supervisor.clone(),
            parent_supervisor: self.parent_supervisor.clone(),
            self_agent_id: self.self_agent_id.clone(),
            current_depth: self.current_depth,
            inbox: self.inbox.clone(),
            root_escalation_queue: self.root_escalation_queue.clone(),
            tool_scope: super::tool_scope::ToolScope::default(),
            agent_runtime: None,
        }
    }
    /// Reconstruct a `Config` by merging the serialized half from an
    /// [`AppState`] with the runtime half from a [`RequestContext`].
    ///
    /// The resulting `Config` is a new owned value; neither input is
    /// consumed. The `mcp_registry` field is reconstructed as `None`
    /// because no split type owns it — see module docs.
    pub fn from_parts(app: &AppState, ctx: &RequestContext) -> Config {
        let cfg = &*app.config;
        Config {
            model_id: cfg.model_id.clone(),
            temperature: cfg.temperature,
            top_p: cfg.top_p,
            dry_run: cfg.dry_run,
            stream: cfg.stream,
            save: cfg.save,
            keybindings: cfg.keybindings.clone(),
            editor: cfg.editor.clone(),
            wrap: cfg.wrap.clone(),
            wrap_code: cfg.wrap_code,
            vault_password_file: cfg.vault_password_file.clone(),
            function_calling_support: cfg.function_calling_support,
            mapping_tools: cfg.mapping_tools.clone(),
            enabled_tools: cfg.enabled_tools.clone(),
            visible_tools: cfg.visible_tools.clone(),
            mcp_server_support: cfg.mcp_server_support,
            mapping_mcp_servers: cfg.mapping_mcp_servers.clone(),
            enabled_mcp_servers: cfg.enabled_mcp_servers.clone(),
            repl_prelude: cfg.repl_prelude.clone(),
            cmd_prelude: cfg.cmd_prelude.clone(),
            agent_session: cfg.agent_session.clone(),
            save_session: cfg.save_session,
            compression_threshold: cfg.compression_threshold,
            summarization_prompt: cfg.summarization_prompt.clone(),
            summary_context_prompt: cfg.summary_context_prompt.clone(),
            rag_embedding_model: cfg.rag_embedding_model.clone(),
            rag_reranker_model: cfg.rag_reranker_model.clone(),
            rag_top_k: cfg.rag_top_k,
            rag_chunk_size: cfg.rag_chunk_size,
            rag_chunk_overlap: cfg.rag_chunk_overlap,
            rag_template: cfg.rag_template.clone(),
            document_loaders: cfg.document_loaders.clone(),
            highlight: cfg.highlight,
            theme: cfg.theme.clone(),
            left_prompt: cfg.left_prompt.clone(),
            right_prompt: cfg.right_prompt.clone(),
            user_agent: cfg.user_agent.clone(),
            save_shell_history: cfg.save_shell_history,
            sync_models_url: cfg.sync_models_url.clone(),
            clients: cfg.clients.clone(),
            vault: app.vault.clone(),
            macro_flag: ctx.macro_flag,
            info_flag: ctx.info_flag,
            agent_variables: ctx.agent_variables.clone(),
            model: ctx.model.clone(),
            functions: ctx.functions.clone(),
            mcp_registry: None,
            working_mode: ctx.working_mode,
            last_message: ctx.last_message.clone(),
            role: ctx.role.clone(),
            session: ctx.session.clone(),
            rag: ctx.rag.clone(),
            agent: ctx.agent.clone(),
            tool_call_tracker: ctx.tool_call_tracker.clone(),
            supervisor: ctx.supervisor.clone(),
            parent_supervisor: ctx.parent_supervisor.clone(),
            self_agent_id: ctx.self_agent_id.clone(),
            current_depth: ctx.current_depth,
            inbox: ctx.inbox.clone(),
            root_escalation_queue: ctx.root_escalation_queue.clone(),
        }
    }
 }
 #[cfg(test)]
 mod tests {
    use super::super::mcp_factory::McpFactory;
    use super::super::rag_cache::RagCache;
    use super::*;
    use crate::config::WorkingMode;
    fn build_populated_config() -> Config {
        let mut cfg = Config::default();
        cfg.model_id = "openai:gpt-4o".into();
        cfg.temperature = Some(0.42);
        cfg.top_p = Some(0.9);
        cfg.dry_run = true;
        cfg.stream = false;
        cfg.save = true;
        cfg.keybindings = "vi".into();
        cfg.editor = Some("nvim".into());
        cfg.wrap = Some("80".into());
        cfg.wrap_code = true;
        cfg.function_calling_support = false;
        cfg.enabled_tools = Some("fs,web".into());
        cfg.visible_tools = Some(vec!["fs".into(), "web".into()]);
        cfg.mcp_server_support = false;
        cfg.enabled_mcp_servers = Some("github,jira".into());
        cfg.repl_prelude = Some("role:explain".into());
        cfg.cmd_prelude = Some("session:temp".into());
        cfg.agent_session = Some("shared".into());
        cfg.save_session = Some(true);
        cfg.compression_threshold = 8000;
        cfg.summarization_prompt = Some("be terse".into());
        cfg.summary_context_prompt = Some("recap:".into());
        cfg.rag_embedding_model = Some("openai:text-embedding-3-small".into());
        cfg.rag_reranker_model = Some("voyage:rerank-2".into());
        cfg.rag_top_k = 12;
        cfg.rag_chunk_size = Some(1024);
        cfg.rag_chunk_overlap = Some(128);
        cfg.rag_template = Some("custom template".into());
        cfg.highlight = false;
        cfg.theme = Some("light".into());
        cfg.left_prompt = Some("> ".into());
        cfg.right_prompt = Some(" <".into());
        cfg.user_agent = Some("loki-test/1.0".into());
        cfg.save_shell_history = false;
        cfg.sync_models_url = Some("https://example.com/models.yaml".into());
        cfg.macro_flag = true;
        cfg.info_flag = true;
        cfg.working_mode = WorkingMode::Repl;
        cfg.current_depth = 3;
        cfg.self_agent_id = Some("agent_42".into());
        cfg
    }
    #[test]
    fn to_app_config_copies_every_serialized_field() {
        let cfg = build_populated_config();
        let app = cfg.to_app_config();
        assert_eq!(app.model_id, cfg.model_id);
        assert_eq!(app.temperature, cfg.temperature);
        assert_eq!(app.top_p, cfg.top_p);
        assert_eq!(app.dry_run, cfg.dry_run);
        assert_eq!(app.stream, cfg.stream);
        assert_eq!(app.save, cfg.save);
        assert_eq!(app.keybindings, cfg.keybindings);
        assert_eq!(app.editor, cfg.editor);
        assert_eq!(app.wrap, cfg.wrap);
        assert_eq!(app.wrap_code, cfg.wrap_code);
        assert_eq!(app.function_calling_support, cfg.function_calling_support);
        assert_eq!(app.enabled_tools, cfg.enabled_tools);
        assert_eq!(app.visible_tools, cfg.visible_tools);
        assert_eq!(app.mcp_server_support, cfg.mcp_server_support);
        assert_eq!(app.enabled_mcp_servers, cfg.enabled_mcp_servers);
        assert_eq!(app.repl_prelude, cfg.repl_prelude);
        assert_eq!(app.cmd_prelude, cfg.cmd_prelude);
        assert_eq!(app.agent_session, cfg.agent_session);
        assert_eq!(app.save_session, cfg.save_session);
        assert_eq!(app.compression_threshold, cfg.compression_threshold);
        assert_eq!(app.summarization_prompt, cfg.summarization_prompt);
        assert_eq!(app.summary_context_prompt, cfg.summary_context_prompt);
        assert_eq!(app.rag_embedding_model, cfg.rag_embedding_model);
        assert_eq!(app.rag_reranker_model, cfg.rag_reranker_model);
        assert_eq!(app.rag_top_k, cfg.rag_top_k);
        assert_eq!(app.rag_chunk_size, cfg.rag_chunk_size);
        assert_eq!(app.rag_chunk_overlap, cfg.rag_chunk_overlap);
        assert_eq!(app.rag_template, cfg.rag_template);
        assert_eq!(app.highlight, cfg.highlight);
        assert_eq!(app.theme, cfg.theme);
        assert_eq!(app.left_prompt, cfg.left_prompt);
        assert_eq!(app.right_prompt, cfg.right_prompt);
        assert_eq!(app.user_agent, cfg.user_agent);
        assert_eq!(app.save_shell_history, cfg.save_shell_history);
        assert_eq!(app.sync_models_url, cfg.sync_models_url);
    }
    #[test]
    fn to_request_context_copies_every_runtime_field() {
        let cfg = build_populated_config();
        let app_config = Arc::new(cfg.to_app_config());
        let app_state = Arc::new(AppState {
            config: app_config,
            vault: cfg.vault.clone(),
            mcp_factory: Arc::new(McpFactory::new()),
            rag_cache: Arc::new(RagCache::new()),
        });
        let ctx = cfg.to_request_context(app_state);
        assert_eq!(ctx.macro_flag, cfg.macro_flag);
        assert_eq!(ctx.info_flag, cfg.info_flag);
        assert_eq!(ctx.working_mode, cfg.working_mode);
        assert_eq!(ctx.current_depth, cfg.current_depth);
        assert_eq!(ctx.self_agent_id, cfg.self_agent_id);
        assert!(ctx.role.is_none());
        assert!(ctx.session.is_none());
        assert!(ctx.rag.is_none());
        assert!(ctx.agent.is_none());
        assert!(ctx.last_message.is_none());
        assert!(ctx.supervisor.is_none());
        assert!(ctx.parent_supervisor.is_none());
        assert!(ctx.inbox.is_none());
        assert!(ctx.root_escalation_queue.is_none());
        assert!(ctx.agent_variables.is_none());
    }
    #[test]
    fn round_trip_preserves_all_non_lossy_fields() {
        let original = build_populated_config();
        let app_config = Arc::new(original.to_app_config());
        let app_state = Arc::new(AppState {
            config: app_config,
            vault: original.vault.clone(),
            mcp_factory: Arc::new(McpFactory::new()),
            rag_cache: Arc::new(RagCache::new()),
        });
        let ctx = original.to_request_context(Arc::clone(&app_state));
        let rebuilt = Config::from_parts(&app_state, &ctx);
        assert_eq!(rebuilt.model_id, original.model_id);
        assert_eq!(rebuilt.temperature, original.temperature);
        assert_eq!(rebuilt.top_p, original.top_p);
        assert_eq!(rebuilt.dry_run, original.dry_run);
        assert_eq!(rebuilt.stream, original.stream);
        assert_eq!(rebuilt.save, original.save);
        assert_eq!(rebuilt.keybindings, original.keybindings);
        assert_eq!(rebuilt.editor, original.editor);
        assert_eq!(rebuilt.wrap, original.wrap);
        assert_eq!(rebuilt.wrap_code, original.wrap_code);
        assert_eq!(
            rebuilt.function_calling_support,
            original.function_calling_support
        );
        assert_eq!(rebuilt.enabled_tools, original.enabled_tools);
        assert_eq!(rebuilt.visible_tools, original.visible_tools);
        assert_eq!(rebuilt.mcp_server_support, original.mcp_server_support);
        assert_eq!(rebuilt.enabled_mcp_servers, original.enabled_mcp_servers);
        assert_eq!(rebuilt.repl_prelude, original.repl_prelude);
        assert_eq!(rebuilt.cmd_prelude, original.cmd_prelude);
        assert_eq!(rebuilt.agent_session, original.agent_session);
        assert_eq!(rebuilt.save_session, original.save_session);
        assert_eq!(
            rebuilt.compression_threshold,
            original.compression_threshold
        );
        assert_eq!(rebuilt.summarization_prompt, original.summarization_prompt);
        assert_eq!(
            rebuilt.summary_context_prompt,
            original.summary_context_prompt
        );
        assert_eq!(rebuilt.rag_embedding_model, original.rag_embedding_model);
        assert_eq!(rebuilt.rag_reranker_model, original.rag_reranker_model);
        assert_eq!(rebuilt.rag_top_k, original.rag_top_k);
        assert_eq!(rebuilt.rag_chunk_size, original.rag_chunk_size);
        assert_eq!(rebuilt.rag_chunk_overlap, original.rag_chunk_overlap);
        assert_eq!(rebuilt.rag_template, original.rag_template);
        assert_eq!(rebuilt.highlight, original.highlight);
        assert_eq!(rebuilt.theme, original.theme);
        assert_eq!(rebuilt.left_prompt, original.left_prompt);
        assert_eq!(rebuilt.right_prompt, original.right_prompt);
        assert_eq!(rebuilt.user_agent, original.user_agent);
        assert_eq!(rebuilt.save_shell_history, original.save_shell_history);
        assert_eq!(rebuilt.sync_models_url, original.sync_models_url);
        assert_eq!(rebuilt.macro_flag, original.macro_flag);
        assert_eq!(rebuilt.info_flag, original.info_flag);
        assert_eq!(rebuilt.working_mode, original.working_mode);
        assert_eq!(rebuilt.current_depth, original.current_depth);
        assert_eq!(rebuilt.self_agent_id, original.self_agent_id);
        // Lossy field: mcp_registry is always reconstructed as None
        assert!(rebuilt.mcp_registry.is_none());
    }
    #[test]
    fn round_trip_default_config() {
        let original = Config::default();
        let app_config = Arc::new(original.to_app_config());
        let app_state = Arc::new(AppState {
            config: app_config,
            vault: original.vault.clone(),
            mcp_factory: Arc::new(McpFactory::new()),
            rag_cache: Arc::new(RagCache::new()),
        });
        let ctx = original.to_request_context(Arc::clone(&app_state));
        let rebuilt = Config::from_parts(&app_state, &ctx);
        assert_eq!(rebuilt.model_id, original.model_id);
        assert_eq!(
            rebuilt.compression_threshold,
            original.compression_threshold
        );
        assert_eq!(rebuilt.rag_top_k, original.rag_top_k);
        assert_eq!(rebuilt.highlight, original.highlight);
        assert_eq!(rebuilt.stream, original.stream);
        assert_eq!(rebuilt.working_mode, original.working_mode);
    }
 }
@@ -239,12 +239,17 @@ impl Input {
        patch_messages(&mut messages, model);
        model.guard_max_input_tokens(&messages)?;
        let (temperature, top_p) = (self.role().temperature(), self.role().top_p());
-        let functions = self.config.read().select_functions(self.role());
+        let functions = if model.supports_function_calling() {
-        if let Some(vec) = &functions {
+            let fns = self.config.read().select_functions(self.role());
-            for def in vec {
+            if let Some(vec) = &fns {
-                debug!("Function definition: {:?}", def.name);
+                for def in vec {
                    debug!("Function definition: {:?}", def.name);
                }
            }
-        }
+            fns
        } else {
            None
        };
        Ok(ChatCompletionsData {
            messages,
            temperature,
@@ -1,3 +1,4 @@
 use crate::config::paths;
 use crate::config::{Config, GlobalConfig, RoleLike, ensure_parent_exists};
 use crate::repl::{run_repl_command, split_args_text};
 use crate::utils::{AbortSignal, multiline_text};
@@ -63,7 +64,7 @@ impl Macro {
    pub fn install_macros() -> Result<()> {
        info!(
            "Installing built-in macros in {}",
-            Config::macros_dir().display()
+            paths::macros_dir().display()
        );
        for file in MacroAssets::iter() {
@@ -71,7 +72,7 @@ impl Macro {
            let embedded_file = MacroAssets::get(&file)
                .ok_or_else(|| anyhow!("Failed to load embedded macro file: {}", file.as_ref()))?;
            let content = unsafe { std::str::from_utf8_unchecked(&embedded_file.data) };
-            let file_path = Config::macros_dir().join(file.as_ref());
+            let file_path = paths::macros_dir().join(file.as_ref());
            if file_path.exists() {
                debug!(
@@ -0,0 +1,93 @@
 //! Per-process factory for MCP subprocess handles.
 //!
 //! `McpFactory` lives on [`AppState`](super::AppState) and is the
 //! single entrypoint that scopes use to obtain `Arc<ConnectedServer>`
 //! handles for MCP tool servers. Multiple scopes requesting the same
 //! server can (eventually) share a single subprocess via `Arc`
 //! reference counting.
 //!
 //! # Phase 1 Step 6.5 scope
 //!
 //! This file introduces the factory scaffolding with a trivial
 //! implementation:
 //!
 //! * `active` — `Mutex<HashMap<McpServerKey, Weak<ConnectedServer>>>`
 //!   for future Arc-based sharing across scopes
 //! * `acquire` — unimplemented stub for now; will be filled in when
 //!   Step 8 rewrites `use_role` / `use_session` / `use_agent` to
 //!   actually build `ToolScope`s
 //!
 //! The full design (idle pool, reaper task, per-server TTL, health
 //! checks, graceful shutdown) lands in **Phase 5** per
 //! `docs/PHASE-5-IMPLEMENTATION-PLAN.md`. Phase 1 Step 6.5 ships just
 //! enough for the type to exist on `AppState` and participate in
 //! construction / test round-trips.
 //!
 //! The key type `McpServerKey` hashes the server name plus its full
 //! command/args/env so that two scopes requesting an identically-
 //! configured server share an `Arc`, while two scopes requesting
 //! differently-configured servers (e.g., different API tokens) get
 //! independent subprocesses. This is the sharing-vs-isolation property
 //! described in `docs/REST-API-ARCHITECTURE.md` section 5.
 #![allow(dead_code)]
 use crate::mcp::ConnectedServer;
 use parking_lot::Mutex;
 use std::collections::HashMap;
 use std::sync::{Arc, Weak};
 #[derive(Clone, Debug, Eq, Hash, PartialEq)]
 pub struct McpServerKey {
    pub name: String,
    pub command: String,
    pub args: Vec<String>,
    pub env: Vec<(String, String)>,
 }
 impl McpServerKey {
    pub fn new(
        name: impl Into<String>,
        command: impl Into<String>,
        args: impl IntoIterator<Item = String>,
        env: impl IntoIterator<Item = (String, String)>,
    ) -> Self {
        let mut args: Vec<String> = args.into_iter().collect();
        args.sort();
        let mut env: Vec<(String, String)> = env.into_iter().collect();
        env.sort();
        Self {
            name: name.into(),
            command: command.into(),
            args,
            env,
        }
    }
 }
 #[derive(Default)]
 pub struct McpFactory {
    active: Mutex<HashMap<McpServerKey, Weak<ConnectedServer>>>,
 }
 impl McpFactory {
    pub fn new() -> Self {
        Self::default()
    }
    pub fn active_count(&self) -> usize {
        let map = self.active.lock();
        map.values().filter(|w| w.strong_count() > 0).count()
    }
    pub fn try_get_active(&self, key: &McpServerKey) -> Option<Arc<ConnectedServer>> {
        let map = self.active.lock();
        map.get(key).and_then(|weak| weak.upgrade())
    }
    pub fn insert_active(&self, key: McpServerKey, handle: &Arc<ConnectedServer>) {
        let mut map = self.active.lock();
        map.insert(key, Arc::downgrade(handle));
    }
 }
@@ -1,13 +1,28 @@
 mod agent;
 mod agent_runtime;
 mod app_config;
 mod app_state;
 mod bridge;
 mod input;
 mod macros;
 mod mcp_factory;
 pub(crate) mod paths;
 mod prompts;
 mod rag_cache;
 mod request_context;
 mod role;
 mod session;
 pub(crate) mod todo;
 mod tool_scope;
 pub use self::agent::{Agent, AgentVariables, complete_agent_variables, list_agents};
 #[allow(unused_imports)]
 pub use self::app_config::AppConfig;
 #[allow(unused_imports)]
 pub use self::app_state::AppState;
 pub use self::input::Input;
 #[allow(unused_imports)]
 pub use self::request_context::RequestContext;
 pub use self::role::{
    CODE_ROLE, CREATE_TITLE_ROLE, EXPLAIN_SHELL_ROLE, Role, RoleLike, SHELL_ROLE,
 };
@@ -39,7 +54,6 @@ use fancy_regex::Regex;
 use indexmap::IndexMap;
 use indoc::formatdoc;
 use inquire::{Confirm, MultiSelect, Select, Text, list_option::ListOption, validator::Validation};
 use log::LevelFilter;
 use parking_lot::RwLock;
 use serde::{Deserialize, Serialize};
 use serde_json::json;
@@ -330,7 +344,7 @@ impl Config {
        log_path: Option<PathBuf>,
        abort_signal: AbortSignal,
    ) -> Result<Self> {
-        let config_path = Self::config_file();
+        let config_path = paths::config_file();
        let (mut config, content) = if !config_path.exists() {
            match env::var(get_env_name("provider"))
                .ok()
@@ -407,38 +421,6 @@ impl Config {
        Ok(config)
    }
    pub fn config_dir() -> PathBuf {
        if let Ok(v) = env::var(get_env_name("config_dir")) {
            PathBuf::from(v)
        } else if let Ok(v) = env::var("XDG_CONFIG_HOME") {
            PathBuf::from(v).join(env!("CARGO_CRATE_NAME"))
        } else {
            let dir = dirs::config_dir().expect("No user's config directory");
            dir.join(env!("CARGO_CRATE_NAME"))
        }
    }
    pub fn local_path(name: &str) -> PathBuf {
        Self::config_dir().join(name)
    }
    pub fn cache_path() -> PathBuf {
        let base_dir = dirs::cache_dir().unwrap_or_else(env::temp_dir);
        base_dir.join(env!("CARGO_CRATE_NAME"))
    }
    pub fn log_path() -> PathBuf {
        Config::cache_path().join(format!("{}.log", env!("CARGO_CRATE_NAME")))
    }
    pub fn config_file() -> PathBuf {
        match env::var(get_env_name("config_file")) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::local_path(CONFIG_FILE_NAME),
        }
    }
    pub fn vault_password_file(&self) -> PathBuf {
        match &self.vault_password_file {
            Some(path) => match path.exists() {
@@ -449,42 +431,13 @@ impl Config {
        }
    }
    pub fn roles_dir() -> PathBuf {
        match env::var(get_env_name("roles_dir")) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::local_path(ROLES_DIR_NAME),
        }
    }
    pub fn role_file(name: &str) -> PathBuf {
        Self::roles_dir().join(format!("{name}.md"))
    }
    pub fn macros_dir() -> PathBuf {
        match env::var(get_env_name("macros_dir")) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::local_path(MACROS_DIR_NAME),
        }
    }
    pub fn macro_file(name: &str) -> PathBuf {
        Self::macros_dir().join(format!("{name}.yaml"))
    }
    pub fn env_file() -> PathBuf {
        match env::var(get_env_name("env_file")) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::local_path(ENV_FILE_NAME),
        }
    }
    pub fn messages_file(&self) -> PathBuf {
        match &self.agent {
            None => match env::var(get_env_name("messages_file")) {
                Ok(value) => PathBuf::from(value),
-                Err(_) => Self::cache_path().join(MESSAGES_FILE_NAME),
+                Err(_) => paths::cache_path().join(MESSAGES_FILE_NAME),
            },
-            Some(agent) => Self::cache_path()
+            Some(agent) => paths::cache_path()
                .join(AGENTS_DIR_NAME)
                .join(agent.name())
                .join(MESSAGES_FILE_NAME),
@@ -495,46 +448,12 @@ impl Config {
        match &self.agent {
            None => match env::var(get_env_name("sessions_dir")) {
                Ok(value) => PathBuf::from(value),
-                Err(_) => Self::local_path(SESSIONS_DIR_NAME),
+                Err(_) => paths::local_path(SESSIONS_DIR_NAME),
            },
-            Some(agent) => Self::agent_data_dir(agent.name()).join(SESSIONS_DIR_NAME),
+            Some(agent) => paths::agent_data_dir(agent.name()).join(SESSIONS_DIR_NAME),
        }
    }
    pub fn rags_dir() -> PathBuf {
        match env::var(get_env_name("rags_dir")) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::local_path(RAGS_DIR_NAME),
        }
    }
    pub fn functions_dir() -> PathBuf {
        match env::var(get_env_name("functions_dir")) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::local_path(FUNCTIONS_DIR_NAME),
        }
    }
    pub fn functions_bin_dir() -> PathBuf {
        Self::functions_dir().join(FUNCTIONS_BIN_DIR_NAME)
    }
    pub fn mcp_config_file() -> PathBuf {
        Self::functions_dir().join(MCP_FILE_NAME)
    }
    pub fn global_tools_dir() -> PathBuf {
        Self::functions_dir().join(GLOBAL_TOOLS_DIR_NAME)
    }
    pub fn global_utils_dir() -> PathBuf {
        Self::functions_dir().join(GLOBAL_TOOLS_UTILS_DIR_NAME)
    }
    pub fn bash_prompt_utils_file() -> PathBuf {
        Self::global_utils_dir().join(BASH_PROMPT_UTILS_FILE_NAME)
    }
    pub fn session_file(&self, name: &str) -> PathBuf {
        match name.split_once("/") {
            Some((dir, name)) => self.sessions_dir().join(dir).join(format!("{name}.yaml")),
@@ -544,58 +463,11 @@ impl Config {
    pub fn rag_file(&self, name: &str) -> PathBuf {
        match &self.agent {
-            Some(agent) => Self::agent_rag_file(agent.name(), name),
+            Some(agent) => paths::agent_rag_file(agent.name(), name),
-            None => Self::rags_dir().join(format!("{name}.yaml")),
+            None => paths::rags_dir().join(format!("{name}.yaml")),
        }
    }
    pub fn agents_data_dir() -> PathBuf {
        Self::local_path(AGENTS_DIR_NAME)
    }
    pub fn agent_data_dir(name: &str) -> PathBuf {
        match env::var(format!("{}_DATA_DIR", normalize_env_name(name))) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::agents_data_dir().join(name),
        }
    }
    pub fn agent_config_file(name: &str) -> PathBuf {
        match env::var(format!("{}_CONFIG_FILE", normalize_env_name(name))) {
            Ok(value) => PathBuf::from(value),
            Err(_) => Self::agent_data_dir(name).join(CONFIG_FILE_NAME),
        }
    }
    pub fn agent_bin_dir(name: &str) -> PathBuf {
        Self::agent_data_dir(name).join(FUNCTIONS_BIN_DIR_NAME)
    }
    pub fn agent_rag_file(agent_name: &str, rag_name: &str) -> PathBuf {
        Self::agent_data_dir(agent_name).join(format!("{rag_name}.yaml"))
    }
    pub fn agent_functions_file(name: &str) -> Result<PathBuf> {
        let allowed = ["tools.sh", "tools.py", "tools.js"];
        for entry in read_dir(Self::agent_data_dir(name))? {
            let entry = entry?;
            if let Some(file) = entry.file_name().to_str()
                && allowed.contains(&file)
            {
                return Ok(entry.path());
            }
        }
        Err(anyhow!(
            "No tools script found in agent functions directory"
        ))
    }
    pub fn models_override_file() -> PathBuf {
        Self::local_path("models-override.yaml")
    }
    pub fn state(&self) -> StateFlags {
        let mut flags = StateFlags::empty();
        if let Some(session) = &self.session {
@@ -619,23 +491,8 @@ impl Config {
        flags
    }
    pub fn log_config() -> Result<(LevelFilter, Option<PathBuf>)> {
        let log_level = env::var(get_env_name("log_level"))
            .ok()
            .and_then(|v| v.parse().ok())
            .unwrap_or(match cfg!(debug_assertions) {
                true => LevelFilter::Debug,
                false => LevelFilter::Info,
            });
        let log_path = match env::var(get_env_name("log_path")) {
            Ok(v) => Some(PathBuf::from(v)),
            Err(_) => Some(Config::log_path()),
        };
        Ok((log_level, log_path))
    }
    pub fn edit_config(&self) -> Result<()> {
-        let config_path = Self::config_file();
+        let config_path = paths::config_file();
        let editor = self.editor()?;
        edit_file(&editor, &config_path)?;
        println!(
@@ -765,21 +622,21 @@ impl Config {
            ("wrap_code", self.wrap_code.to_string()),
            ("highlight", self.highlight.to_string()),
            ("theme", format_option_value(&self.theme)),
-            ("config_file", display_path(&Self::config_file())),
+            ("config_file", display_path(&paths::config_file())),
-            ("env_file", display_path(&Self::env_file())),
+            ("env_file", display_path(&paths::env_file())),
-            ("agents_dir", display_path(&Self::agents_data_dir())),
+            ("agents_dir", display_path(&paths::agents_data_dir())),
-            ("roles_dir", display_path(&Self::roles_dir())),
+            ("roles_dir", display_path(&paths::roles_dir())),
            ("sessions_dir", display_path(&self.sessions_dir())),
-            ("rags_dir", display_path(&Self::rags_dir())),
+            ("rags_dir", display_path(&paths::rags_dir())),
-            ("macros_dir", display_path(&Self::macros_dir())),
+            ("macros_dir", display_path(&paths::macros_dir())),
-            ("functions_dir", display_path(&Self::functions_dir())),
+            ("functions_dir", display_path(&paths::functions_dir())),
            ("messages_file", display_path(&self.messages_file())),
            (
                "vault_password_file",
                display_path(&self.vault_password_file()),
            ),
        ];
-        if let Ok((_, Some(log_path))) = Self::log_config() {
+        if let Ok((_, Some(log_path))) = paths::log_config() {
            items.push(("log_path", display_path(&log_path)));
        }
        let output = items
@@ -933,11 +790,11 @@ impl Config {
    pub fn delete(config: &GlobalConfig, kind: &str) -> Result<()> {
        let (dir, file_ext) = match kind {
-            "role" => (Self::roles_dir(), Some(".md")),
+            "role" => (paths::roles_dir(), Some(".md")),
            "session" => (config.read().sessions_dir(), Some(".yaml")),
-            "rag" => (Self::rags_dir(), Some(".yaml")),
+            "rag" => (paths::rags_dir(), Some(".yaml")),
-            "macro" => (Self::macros_dir(), Some(".yaml")),
+            "macro" => (paths::macros_dir(), Some(".yaml")),
-            "agent-data" => (Self::agents_data_dir(), None),
+            "agent-data" => (paths::agents_data_dir(), None),
            _ => bail!("Unknown kind '{kind}'"),
        };
        let names = match read_dir(&dir) {
@@ -1046,7 +903,7 @@ impl Config {
    pub fn set_rag_reranker_model(config: &GlobalConfig, value: Option<String>) -> Result<()> {
        if let Some(id) = &value {
-            Model::retrieve_model(&config.read(), id, ModelType::Reranker)?;
+            Model::retrieve_model(&config.read().to_app_config(), id, ModelType::Reranker)?;
        }
        let has_rag = config.read().rag.is_some();
        match has_rag {
@@ -1099,7 +956,7 @@ impl Config {
    }
    pub fn set_model(&mut self, model_id: &str) -> Result<()> {
-        let model = Model::retrieve_model(self, model_id, ModelType::Chat)?;
+        let model = Model::retrieve_model(&self.to_app_config(), model_id, ModelType::Chat)?;
        match self.role_like_mut() {
            Some(role_like) => role_like.set_model(model),
            None => {
@@ -1207,9 +1064,9 @@ impl Config {
    }
    pub fn retrieve_role(&self, name: &str) -> Result<Role> {
-        let names = Self::list_roles(false);
+        let names = paths::list_roles(false);
        let mut role = if names.contains(&name.to_string()) {
-            let path = Self::role_file(name);
+            let path = paths::role_file(name);
            let content = read_to_string(&path)?;
            Role::new(name, &content)
        } else {
@@ -1219,7 +1076,8 @@ impl Config {
        match role.model_id() {
            Some(model_id) => {
                if current_model.id() != model_id {
-                    let model = Model::retrieve_model(self, model_id, ModelType::Chat)?;
+                    let model =
                        Model::retrieve_model(&self.to_app_config(), model_id, ModelType::Chat)?;
                    role.set_model(model);
                } else {
                    role.set_model(current_model);
@@ -1274,7 +1132,7 @@ impl Config {
    }
    pub fn upsert_role(&mut self, name: &str) -> Result<()> {
-        let role_path = Self::role_file(name);
+        let role_path = paths::role_file(name);
        ensure_parent_exists(&role_path)?;
        let editor = self.editor()?;
        edit_file(&editor, &role_path)?;
@@ -1311,7 +1169,7 @@ impl Config {
                })
                .prompt()?;
        }
-        let role_path = Self::role_file(&role_name);
+        let role_path = paths::role_file(&role_name);
        if let Some(role) = self.role.as_mut() {
            role.save(&role_name, &role_path, self.working_mode.is_repl())?;
        }
@@ -1319,32 +1177,6 @@ impl Config {
        Ok(())
    }
    pub fn list_roles(with_builtin: bool) -> Vec<String> {
        let mut names = HashSet::new();
        if let Ok(rd) = read_dir(Self::roles_dir()) {
            for entry in rd.flatten() {
                if let Some(name) = entry
                    .file_name()
                    .to_str()
                    .and_then(|v| v.strip_suffix(".md"))
                {
                    names.insert(name.to_string());
                }
            }
        }
        if with_builtin {
            names.extend(Role::list_builtin_role_names());
        }
        let mut names: Vec<_> = names.into_iter().collect();
        names.sort_unstable();
        names
    }
    pub fn has_role(name: &str) -> bool {
        let names = Self::list_roles(true);
        names.contains(&name.to_string())
    }
    pub async fn use_session_safely(
        config: &GlobalConfig,
        session_name: Option<&str>,
@@ -1792,23 +1624,6 @@ impl Config {
        Ok(text)
    }
    pub fn list_rags() -> Vec<String> {
        match read_dir(Self::rags_dir()) {
            Ok(rd) => {
                let mut names = vec![];
                for entry in rd.flatten() {
                    let name = entry.file_name();
                    if let Some(name) = name.to_string_lossy().strip_suffix(".yaml") {
                        names.push(name.to_string());
                    }
                }
                names.sort_unstable();
                names
            }
            Err(_) => vec![],
        }
    }
    pub fn rag_template(&self, embeddings: &str, sources: &str, text: &str) -> String {
        if embeddings.is_empty() {
            return text.to_string();
@@ -1834,6 +1649,12 @@ impl Config {
            bail!("Already in an agent, please run '.exit agent' first to exit the current agent.");
        }
        let agent = Agent::init(config, agent_name, abort_signal.clone()).await?;
        if !agent.model().supports_function_calling() {
            eprintln!(
                "Warning: The model '{}' does not support function calling. Agent tools (including todo, spawning, and user interaction) will not be available.",
                agent.model().id()
            );
        }
        let session = session_name.map(|v| v.to_string()).or_else(|| {
            if config.read().macro_flag {
                None
@@ -1881,7 +1702,7 @@ impl Config {
            Some(agent) => agent.name(),
            None => bail!("No agent"),
        };
-        let agent_config_path = Config::agent_config_file(agent_name);
+        let agent_config_path = paths::agent_config_file(agent_name);
        ensure_parent_exists(&agent_config_path)?;
        if !agent_config_path.exists() {
            std::fs::write(
@@ -1924,23 +1745,14 @@ impl Config {
        Ok(())
    }
    pub fn list_macros() -> Vec<String> {
        list_file_names(Self::macros_dir(), ".yaml")
    }
    pub fn load_macro(name: &str) -> Result<Macro> {
-        let path = Self::macro_file(name);
+        let path = paths::macro_file(name);
        let err = || format!("Failed to load macro '{name}' at '{}'", path.display());
        let content = read_to_string(&path).with_context(err)?;
        let value: Macro = serde_yaml::from_str(&content).with_context(err)?;
        Ok(value)
    }
    pub fn has_macro(name: &str) -> bool {
        let names = Self::list_macros();
        names.contains(&name.to_string())
    }
    pub fn new_macro(&mut self, name: &str) -> Result<()> {
        if self.macro_flag {
            bail!("No macro");
@@ -1949,7 +1761,7 @@ impl Config {
            .with_default(true)
            .prompt()?;
        if ans {
-            let macro_path = Self::macro_file(name);
+            let macro_path = paths::macro_file(name);
            ensure_parent_exists(&macro_path)?;
            let editor = self.editor()?;
            edit_file(&editor, &macro_path)?;
@@ -2247,8 +2059,8 @@ impl Config {
        let filter = args.last().unwrap_or(&"");
        if args.len() == 1 {
            values = match cmd {
-                ".role" => map_completion_values(Self::list_roles(true)),
+                ".role" => map_completion_values(paths::list_roles(true)),
-                ".model" => list_models(self, ModelType::Chat)
+                ".model" => list_models(&self.to_app_config(), ModelType::Chat)
                    .into_iter()
                    .map(|v| (v.id(), Some(v.description())))
                    .collect(),
@@ -2265,9 +2077,9 @@ impl Config {
                        map_completion_values(self.list_sessions())
                    }
                }
-                ".rag" => map_completion_values(Self::list_rags()),
+                ".rag" => map_completion_values(paths::list_rags()),
                ".agent" => map_completion_values(list_agents()),
-                ".macro" => map_completion_values(Self::list_macros()),
+                ".macro" => map_completion_values(paths::list_macros()),
                ".starter" => match &self.agent {
                    Some(agent) => agent
                        .conversation_starters()
@@ -2375,7 +2187,7 @@ impl Config {
                    };
                    complete_option_bool(save_session)
                }
-                "rag_reranker_model" => list_models(self, ModelType::Reranker)
+                "rag_reranker_model" => list_models(&self.to_app_config(), ModelType::Reranker)
                    .iter()
                    .map(|v| v.id())
                    .collect(),
@@ -2393,7 +2205,7 @@ impl Config {
                .collect();
        } else if cmd == ".agent" {
            if args.len() == 2 {
-                let dir = Self::agent_data_dir(args[0]).join(SESSIONS_DIR_NAME);
+                let dir = paths::agent_data_dir(args[0]).join(SESSIONS_DIR_NAME);
                values = list_file_names(dir, ".yaml")
                    .into_iter()
                    .map(|v| (v, None))
@@ -2424,7 +2236,7 @@ impl Config {
        let models_override_data =
            serde_yaml::to_string(&models_override).with_context(|| "Failed to serde {}")?;
-        let model_override_path = Self::models_override_file();
+        let model_override_path = paths::models_override_file();
        ensure_parent_exists(&model_override_path)?;
        std::fs::write(&model_override_path, models_override_data)
            .with_context(|| format!("Failed to write to '{}'", model_override_path.display()))?;
@@ -2432,22 +2244,6 @@ impl Config {
        Ok(())
    }
    pub fn local_models_override() -> Result<Vec<ProviderModels>> {
        let model_override_path = Self::models_override_file();
        let err = || {
            format!(
                "Failed to load models at '{}'",
                model_override_path.display()
            )
        };
        let content = read_to_string(&model_override_path).with_context(err)?;
        let models_override: ModelsOverride = serde_yaml::from_str(&content).with_context(err)?;
        if models_override.version != env!("CARGO_PKG_VERSION") {
            bail!("Incompatible version")
        }
        Ok(models_override.list)
    }
    pub fn light_theme(&self) -> bool {
        matches!(self.theme.as_deref(), Some("light"))
    }
@@ -2456,7 +2252,7 @@ impl Config {
        let theme = if self.highlight {
            let theme_mode = if self.light_theme() { "light" } else { "dark" };
            let theme_filename = format!("{theme_mode}.tmTheme");
-            let theme_path = Self::local_path(&theme_filename);
+            let theme_path = paths::local_path(&theme_filename);
            if theme_path.exists() {
                let theme = ThemeSet::get_theme(&theme_path)
                    .with_context(|| format!("Invalid theme at '{}'", theme_path.display()))?;
@@ -2979,7 +2775,7 @@ impl Config {
    fn setup_model(&mut self) -> Result<()> {
        let mut model_id = self.model_id.clone();
        if model_id.is_empty() {
-            let models = list_models(self, ModelType::Chat);
+            let models = list_models(&self.to_app_config(), ModelType::Chat);
            if models.is_empty() {
                bail!("No available model");
            }
@@ -3012,7 +2808,7 @@ impl Config {
 }
 pub fn load_env_file() -> Result<()> {
-    let env_file_path = Config::env_file();
+    let env_file_path = paths::env_file();
    let contents = match read_to_string(&env_file_path) {
        Ok(v) => v,
        Err(_) => return Ok(()),
@@ -3225,7 +3021,7 @@ where
    Ok(())
 }
-fn format_option_value<T>(value: &Option<T>) -> String
+pub(super) fn format_option_value<T>(value: &Option<T>) -> String
 where
    T: std::fmt::Display,
 {
@@ -0,0 +1,268 @@
 //! Static path and filesystem-lookup helpers that used to live as
 //! associated functions on [`Config`](super::Config).
 //!
 //! None of these functions depend on any `Config` instance data — they
 //! compute paths from environment variables, XDG directories, or the
 //! crate constant for the config root. Moving them here is Phase 1
 //! Step 2 of the REST API refactor: the `Config` struct is shedding
 //! anything that doesn't actually need per-instance state so the
 //! eventual split into `AppConfig` + `RequestContext` has a clean
 //! division line.
 //!
 //! # Compatibility shim during migration
 //!
 //! The existing associated functions on `Config` (e.g.,
 //! `Config::config_dir()`) are kept as `#[deprecated]` forwarders that
 //! call into this module. Callers are migrated module-by-module; when
 //! the last caller is updated, the forwarders are deleted in a later
 //! sub-step of Step 2.
 #![allow(dead_code)]
 use super::role::Role;
 use super::{
    AGENTS_DIR_NAME, BASH_PROMPT_UTILS_FILE_NAME, CONFIG_FILE_NAME, ENV_FILE_NAME,
    FUNCTIONS_BIN_DIR_NAME, FUNCTIONS_DIR_NAME, GLOBAL_TOOLS_DIR_NAME, GLOBAL_TOOLS_UTILS_DIR_NAME,
    MACROS_DIR_NAME, MCP_FILE_NAME, ModelsOverride, RAGS_DIR_NAME, ROLES_DIR_NAME,
 };
 use crate::client::ProviderModels;
 use crate::utils::{get_env_name, list_file_names, normalize_env_name};
 use anyhow::{Context, Result, anyhow, bail};
 use log::LevelFilter;
 use std::collections::HashSet;
 use std::env;
 use std::fs::{read_dir, read_to_string};
 use std::path::PathBuf;
 pub fn config_dir() -> PathBuf {
    if let Ok(v) = env::var(get_env_name("config_dir")) {
        PathBuf::from(v)
    } else if let Ok(v) = env::var("XDG_CONFIG_HOME") {
        PathBuf::from(v).join(env!("CARGO_CRATE_NAME"))
    } else {
        let dir = dirs::config_dir().expect("No user's config directory");
        dir.join(env!("CARGO_CRATE_NAME"))
    }
 }
 pub fn local_path(name: &str) -> PathBuf {
    config_dir().join(name)
 }
 pub fn cache_path() -> PathBuf {
    let base_dir = dirs::cache_dir().unwrap_or_else(env::temp_dir);
    base_dir.join(env!("CARGO_CRATE_NAME"))
 }
 pub fn oauth_tokens_path() -> PathBuf {
    cache_path().join("oauth")
 }
 pub fn token_file(client_name: &str) -> PathBuf {
    oauth_tokens_path().join(format!("{client_name}_oauth_tokens.json"))
 }
 pub fn log_path() -> PathBuf {
    cache_path().join(format!("{}.log", env!("CARGO_CRATE_NAME")))
 }
 pub fn config_file() -> PathBuf {
    match env::var(get_env_name("config_file")) {
        Ok(value) => PathBuf::from(value),
        Err(_) => local_path(CONFIG_FILE_NAME),
    }
 }
 pub fn roles_dir() -> PathBuf {
    match env::var(get_env_name("roles_dir")) {
        Ok(value) => PathBuf::from(value),
        Err(_) => local_path(ROLES_DIR_NAME),
    }
 }
 pub fn role_file(name: &str) -> PathBuf {
    roles_dir().join(format!("{name}.md"))
 }
 pub fn macros_dir() -> PathBuf {
    match env::var(get_env_name("macros_dir")) {
        Ok(value) => PathBuf::from(value),
        Err(_) => local_path(MACROS_DIR_NAME),
    }
 }
 pub fn macro_file(name: &str) -> PathBuf {
    macros_dir().join(format!("{name}.yaml"))
 }
 pub fn env_file() -> PathBuf {
    match env::var(get_env_name("env_file")) {
        Ok(value) => PathBuf::from(value),
        Err(_) => local_path(ENV_FILE_NAME),
    }
 }
 pub fn rags_dir() -> PathBuf {
    match env::var(get_env_name("rags_dir")) {
        Ok(value) => PathBuf::from(value),
        Err(_) => local_path(RAGS_DIR_NAME),
    }
 }
 pub fn functions_dir() -> PathBuf {
    match env::var(get_env_name("functions_dir")) {
        Ok(value) => PathBuf::from(value),
        Err(_) => local_path(FUNCTIONS_DIR_NAME),
    }
 }
 pub fn functions_bin_dir() -> PathBuf {
    functions_dir().join(FUNCTIONS_BIN_DIR_NAME)
 }
 pub fn mcp_config_file() -> PathBuf {
    functions_dir().join(MCP_FILE_NAME)
 }
 pub fn global_tools_dir() -> PathBuf {
    functions_dir().join(GLOBAL_TOOLS_DIR_NAME)
 }
 pub fn global_utils_dir() -> PathBuf {
    functions_dir().join(GLOBAL_TOOLS_UTILS_DIR_NAME)
 }
 pub fn bash_prompt_utils_file() -> PathBuf {
    global_utils_dir().join(BASH_PROMPT_UTILS_FILE_NAME)
 }
 pub fn agents_data_dir() -> PathBuf {
    local_path(AGENTS_DIR_NAME)
 }
 pub fn agent_data_dir(name: &str) -> PathBuf {
    match env::var(format!("{}_DATA_DIR", normalize_env_name(name))) {
        Ok(value) => PathBuf::from(value),
        Err(_) => agents_data_dir().join(name),
    }
 }
 pub fn agent_config_file(name: &str) -> PathBuf {
    match env::var(format!("{}_CONFIG_FILE", normalize_env_name(name))) {
        Ok(value) => PathBuf::from(value),
        Err(_) => agent_data_dir(name).join(CONFIG_FILE_NAME),
    }
 }
 pub fn agent_bin_dir(name: &str) -> PathBuf {
    agent_data_dir(name).join(FUNCTIONS_BIN_DIR_NAME)
 }
 pub fn agent_rag_file(agent_name: &str, rag_name: &str) -> PathBuf {
    agent_data_dir(agent_name).join(format!("{rag_name}.yaml"))
 }
 pub fn agent_functions_file(name: &str) -> Result<PathBuf> {
    let allowed = ["tools.sh", "tools.py", "tools.ts", "tools.js"];
    for entry in read_dir(agent_data_dir(name))? {
        let entry = entry?;
        if let Some(file) = entry.file_name().to_str()
            && allowed.contains(&file)
        {
            return Ok(entry.path());
        }
    }
    Err(anyhow!(
        "No tools script found in agent functions directory"
    ))
 }
 pub fn models_override_file() -> PathBuf {
    local_path("models-override.yaml")
 }
 pub fn log_config() -> Result<(LevelFilter, Option<PathBuf>)> {
    let log_level = env::var(get_env_name("log_level"))
        .ok()
        .and_then(|v| v.parse().ok())
        .unwrap_or(match cfg!(debug_assertions) {
            true => LevelFilter::Debug,
            false => LevelFilter::Info,
        });
    let resolved_log_path = match env::var(get_env_name("log_path")) {
        Ok(v) => Some(PathBuf::from(v)),
        Err(_) => Some(log_path()),
    };
    Ok((log_level, resolved_log_path))
 }
 pub fn list_roles(with_builtin: bool) -> Vec<String> {
    let mut names = HashSet::new();
    if let Ok(rd) = read_dir(roles_dir()) {
        for entry in rd.flatten() {
            if let Some(name) = entry
                .file_name()
                .to_str()
                .and_then(|v| v.strip_suffix(".md"))
            {
                names.insert(name.to_string());
            }
        }
    }
    if with_builtin {
        names.extend(Role::list_builtin_role_names());
    }
    let mut names: Vec<_> = names.into_iter().collect();
    names.sort_unstable();
    names
 }
 pub fn has_role(name: &str) -> bool {
    let names = list_roles(true);
    names.contains(&name.to_string())
 }
 pub fn list_rags() -> Vec<String> {
    match read_dir(rags_dir()) {
        Ok(rd) => {
            let mut names = vec![];
            for entry in rd.flatten() {
                let name = entry.file_name();
                if let Some(name) = name.to_string_lossy().strip_suffix(".yaml") {
                    names.push(name.to_string());
                }
            }
            names.sort_unstable();
            names
        }
        Err(_) => vec![],
    }
 }
 pub fn list_macros() -> Vec<String> {
    list_file_names(macros_dir(), ".yaml")
 }
 pub fn has_macro(name: &str) -> bool {
    let names = list_macros();
    names.contains(&name.to_string())
 }
 pub fn local_models_override() -> Result<Vec<ProviderModels>> {
    let model_override_path = models_override_file();
    let err = || {
        format!(
            "Failed to load models at '{}'",
            model_override_path.display()
        )
    };
    let content = read_to_string(&model_override_path).with_context(err)?;
    let models_override: ModelsOverride = serde_yaml::from_str(&content).with_context(err)?;
    if models_override.version != env!("CARGO_PKG_VERSION") {
        bail!("Incompatible version")
    }
    Ok(models_override.list)
 }
@@ -7,10 +7,12 @@ pub(in crate::config) const DEFAULT_TODO_INSTRUCTIONS: &str = indoc! {"
        - `todo__add`: Add individual tasks. Add all planned steps before starting work.
        - `todo__done`: Mark a task done by id. Call this immediately after completing each step.
        - `todo__list`: Show the current todo list.
        - `todo__clear`: Clear the entire todo list and reset the goal. Use when the user cancels or changes direction.
    RULES:
        - Always create a todo list before starting work.
        - Mark each task done as soon as you finish it; do not batch.
        - If the user cancels the current task or changes direction, call `todo__clear` immediately.
        - If you stop with incomplete tasks, the system will automatically prompt you to continue."
 };
@@ -0,0 +1,85 @@
 //! Per-process RAG instance cache with weak-reference sharing.
 //!
 //! `RagCache` lives on [`AppState`](super::AppState) and serves both
 //! standalone RAGs (attached via `.rag <name>`) and agent-owned RAGs
 //! (loaded from an agent's `documents:` field). The cache keys with
 //! [`RagKey`] so that agent RAGs and standalone RAGs occupy distinct
 //! namespaces even if they share a name.
 //!
 //! Entries are held as `Weak<Rag>` so the cache never keeps a RAG
 //! alive on its own — once all active scopes drop their `Arc<Rag>`,
 //! the cache entry becomes unupgradable and the next `load()` falls
 //! through to a fresh disk read.
 //!
 //! # Phase 1 Step 6.5 scope
 //!
 //! This file introduces the type scaffolding. Actual cache population
 //! (i.e., routing `use_rag`, `use_agent`, and sub-agent spawning
 //! through the cache) is deferred to Step 8 when the entry points get
 //! rewritten. During the bridge window, `Config.rag` keeps serving
 //! today's callers via direct `Rag::load` / `Rag::init` calls and
 //! `RagCache` sits on `AppState` as an unused-but-ready service.
 //!
 //! See `docs/REST-API-ARCHITECTURE.md` section 5 ("RAG Cache") for
 //! the full design including concurrent first-load serialization and
 //! invalidation semantics.
 #![allow(dead_code)]
 use crate::rag::Rag;
 use anyhow::Result;
 use parking_lot::RwLock;
 use std::collections::HashMap;
 use std::sync::{Arc, Weak};
 #[derive(Clone, Debug, Eq, Hash, PartialEq)]
 pub enum RagKey {
    Named(String),
    Agent(String),
 }
 #[derive(Default)]
 pub struct RagCache {
    entries: RwLock<HashMap<RagKey, Weak<Rag>>>,
 }
 impl RagCache {
    pub fn new() -> Self {
        Self::default()
    }
    pub fn try_get(&self, key: &RagKey) -> Option<Arc<Rag>> {
        let map = self.entries.read();
        map.get(key).and_then(|weak| weak.upgrade())
    }
    pub fn insert(&self, key: RagKey, rag: &Arc<Rag>) {
        let mut map = self.entries.write();
        map.insert(key, Arc::downgrade(rag));
    }
    pub fn invalidate(&self, key: &RagKey) {
        let mut map = self.entries.write();
        map.remove(key);
    }
    pub fn entry_count(&self) -> usize {
        let map = self.entries.read();
        map.values().filter(|w| w.strong_count() > 0).count()
    }
    pub async fn load_with<F, Fut>(&self, key: RagKey, loader: F) -> Result<Arc<Rag>>
    where
        F: FnOnce() -> Fut,
        Fut: std::future::Future<Output = Result<Rag>>,
    {
        if let Some(existing) = self.try_get(&key) {
            return Ok(existing);
        }
        let rag = loader().await?;
        let arc = Arc::new(rag);
        self.insert(key, &arc);
        Ok(arc)
    }
 }
@@ -85,7 +85,8 @@ impl Session {
        let mut session: Self =
            serde_yaml::from_str(&content).with_context(|| format!("Invalid session {name}"))?;
-        session.model = Model::retrieve_model(config, &session.model_id, ModelType::Chat)?;
+        session.model =
            Model::retrieve_model(&config.to_app_config(), &session.model_id, ModelType::Chat)?;
        if let Some(autoname) = name.strip_prefix("_/") {
            session.name = TEMP_SESSION_NAME.to_string();
@@ -67,6 +67,11 @@ impl TodoList {
        self.todos.is_empty()
    }
    pub fn clear(&mut self) {
        self.goal.clear();
        self.todos.clear();
    }
    pub fn render_for_model(&self) -> String {
        let mut lines = Vec::new();
        if !self.goal.is_empty() {
@@ -149,6 +154,21 @@ mod tests {
        assert!(rendered.contains("○ 2. Map"));
    }
    #[test]
    fn test_clear() {
        let mut list = TodoList::new("Some goal");
        list.add("Task 1");
        list.add("Task 2");
        list.mark_done(1);
        assert!(!list.is_empty());
        list.clear();
        assert!(list.is_empty());
        assert!(list.goal.is_empty());
        assert_eq!(list.todos.len(), 0);
        assert!(!list.has_incomplete());
    }
    #[test]
    fn test_serialization_roundtrip() {
        let mut list = TodoList::new("Roundtrip");
@@ -0,0 +1,78 @@
 //! Per-scope tool runtime: resolved functions + live MCP handles +
 //! call tracker.
 //!
 //! `ToolScope` is the unit of tool availability for a single request.
 //! Every active `RoleLike` (role, session, agent) conceptually owns one.
 //! The contents are:
 //!
 //! * `functions` — the `Functions` declarations visible to the LLM for
 //!   this scope (global tools + role/session/agent filters applied)
 //! * `mcp_runtime` — live MCP subprocess handles for the servers this
 //!   scope has enabled, keyed by server name
 //! * `tool_tracker` — per-scope tool call history for auto-continuation
 //!   and looping detection
 //!
 //! # Phase 1 Step 6.5 scope
 //!
 //! This file introduces the type scaffolding. Scope transitions
 //! (`use_role`, `use_session`, `use_agent`, `exit_*`) that actually
 //! build and swap `ToolScope` instances are deferred to Step 8 when
 //! the entry points (`main.rs`, `repl/mod.rs`) get rewritten to thread
 //! `RequestContext` through the pipeline. During the bridge window,
 //! `Config.functions` / `Config.mcp_registry` keep serving today's
 //! callers and `ToolScope` sits alongside them on `RequestContext` as
 //! an unused (but compiling and tested) parallel structure.
 //!
 //! The fields mirror the plan in `docs/REST-API-ARCHITECTURE.md`
 //! section 5 and `docs/PHASE-1-IMPLEMENTATION-PLAN.md` Step 6.5.
 #![allow(dead_code)]
 use crate::function::{Functions, ToolCallTracker};
 use crate::mcp::ConnectedServer;
 use std::collections::HashMap;
 use std::sync::Arc;
 pub struct ToolScope {
    pub functions: Functions,
    pub mcp_runtime: McpRuntime,
    pub tool_tracker: ToolCallTracker,
 }
 impl Default for ToolScope {
    fn default() -> Self {
        Self {
            functions: Functions::default(),
            mcp_runtime: McpRuntime::default(),
            tool_tracker: ToolCallTracker::default(),
        }
    }
 }
 #[derive(Default)]
 pub struct McpRuntime {
    pub servers: HashMap<String, Arc<ConnectedServer>>,
 }
 impl McpRuntime {
    pub fn new() -> Self {
        Self::default()
    }
    pub fn is_empty(&self) -> bool {
        self.servers.is_empty()
    }
    pub fn insert(&mut self, name: String, handle: Arc<ConnectedServer>) {
        self.servers.insert(name, handle);
    }
    pub fn get(&self, name: &str) -> Option<&Arc<ConnectedServer>> {
        self.servers.get(name)
    }
    pub fn server_names(&self) -> Vec<String> {
        self.servers.keys().cloned().collect()
    }
 }
@@ -3,16 +3,17 @@ pub(crate) mod todo;
 pub(crate) mod user_interaction;
 use crate::{
-    config::{Agent, Config, GlobalConfig},
+    config::{Agent, GlobalConfig},
    utils::*,
 };
 use crate::config::ensure_parent_exists;
 use crate::config::paths;
 use crate::mcp::{
    MCP_DESCRIBE_META_FUNCTION_NAME_PREFIX, MCP_INVOKE_META_FUNCTION_NAME_PREFIX,
    MCP_SEARCH_META_FUNCTION_NAME_PREFIX,
 };
-use crate::parsers::{bash, python};
+use crate::parsers::{bash, python, typescript};
 use anyhow::{Context, Result, anyhow, bail};
 use indexmap::IndexMap;
 use indoc::formatdoc;
@@ -22,7 +23,7 @@ use serde_json::{Value, json};
 use std::collections::VecDeque;
 use std::ffi::OsStr;
 use std::fs::File;
-use std::io::Write;
+use std::io::{Read, Write};
 use std::{
    collections::{HashMap, HashSet},
    env, fs, io,
@@ -53,6 +54,7 @@ enum BinaryType<'a> {
 enum Language {
    Bash,
    Python,
    TypeScript,
    Unsupported,
 }
@@ -61,6 +63,7 @@ impl From<&String> for Language {
        match s.to_lowercase().as_str() {
            "sh" => Language::Bash,
            "py" => Language::Python,
            "ts" => Language::TypeScript,
            _ => Language::Unsupported,
        }
    }
@@ -72,6 +75,7 @@ impl Language {
        match self {
            Language::Bash => "bash",
            Language::Python => "python",
            Language::TypeScript => "npx tsx",
            Language::Unsupported => "sh",
        }
    }
@@ -80,11 +84,32 @@ impl Language {
        match self {
            Language::Bash => "sh",
            Language::Python => "py",
            Language::TypeScript => "ts",
            _ => "sh",
        }
    }
 }
 fn extract_shebang_runtime(path: &Path) -> Option<String> {
    let file = File::open(path).ok()?;
    let reader = io::BufReader::new(file);
    let first_line = io::BufRead::lines(reader).next()?.ok()?;
    let shebang = first_line.strip_prefix("#!")?;
    let cmd = shebang.trim();
    if cmd.is_empty() {
        return None;
    }
    if let Some(after_env) = cmd.strip_prefix("/usr/bin/env ") {
        let runtime = after_env.trim();
        if runtime.is_empty() {
            return None;
        }
        Some(runtime.to_string())
    } else {
        Some(cmd.to_string())
    }
 }
 pub async fn eval_tool_calls(
    config: &GlobalConfig,
    mut calls: Vec<ToolCall>,
@@ -175,7 +200,7 @@ impl Functions {
    fn install_global_tools() -> Result<()> {
        info!(
            "Installing global built-in functions in {}",
-            Config::functions_dir().display()
+            paths::functions_dir().display()
        );
        for file in FunctionAssets::iter() {
@@ -189,7 +214,7 @@ impl Functions {
                anyhow!("Failed to load embedded function file: {}", file.as_ref())
            })?;
            let content = unsafe { std::str::from_utf8_unchecked(&embedded_file.data) };
-            let file_path = Config::functions_dir().join(file.as_ref());
+            let file_path = paths::functions_dir().join(file.as_ref());
            let file_extension = file_path
                .extension()
                .and_then(OsStr::to_str)
@@ -230,7 +255,7 @@ impl Functions {
        info!(
            "Building global function binaries in {}",
-            Config::functions_bin_dir().display()
+            paths::functions_bin_dir().display()
        );
        Self::build_global_function_binaries(visible_tools, None)?;
@@ -247,7 +272,7 @@ impl Functions {
            info!(
                "Building global function binaries required by agent: {name} in {}",
-                Config::functions_bin_dir().display()
+                paths::functions_bin_dir().display()
            );
            Self::build_global_function_binaries(global_tools, Some(name))?;
            tools_declarations
@@ -255,7 +280,7 @@ impl Functions {
            debug!("No global tools found for agent: {}", name);
            Vec::new()
        };
-        let agent_script_declarations = match Config::agent_functions_file(name) {
+        let agent_script_declarations = match paths::agent_functions_file(name) {
            Ok(path) if path.exists() => {
                info!(
                    "Loading functions script for agent: {name} from {}",
@@ -266,7 +291,7 @@ impl Functions {
                info!(
                    "Building function binary for agent: {name} in {}",
-                    Config::agent_bin_dir(name).display()
+                    paths::agent_bin_dir(name).display()
                );
                Self::build_agent_tool_binaries(name)?;
                script_declarations
@@ -429,7 +454,7 @@ impl Functions {
    fn build_global_tool_declarations(
        enabled_tools: &[String],
    ) -> Result<Vec<FunctionDeclaration>> {
-        let global_tools_directory = Config::global_tools_dir();
+        let global_tools_directory = paths::global_tools_dir();
        let mut function_declarations = Vec::new();
        for tool in enabled_tools {
@@ -473,6 +498,11 @@ impl Functions {
                        file_name,
                        tools_file_path.parent(),
                    ),
                    Language::TypeScript => typescript::generate_typescript_declarations(
                        tool_file,
                        file_name,
                        tools_file_path.parent(),
                    ),
                    Language::Unsupported => {
                        bail!("Unsupported tool file extension: {}", language.as_ref())
                    }
@@ -513,14 +543,21 @@ impl Functions {
                bail!("Unsupported tool file extension: {}", language.as_ref());
            }
-            Self::build_binaries(binary_name, language, BinaryType::Tool(agent_name))?;
+            let tool_path = paths::global_tools_dir().join(tool);
            let custom_runtime = extract_shebang_runtime(&tool_path);
            Self::build_binaries(
                binary_name,
                language,
                BinaryType::Tool(agent_name),
                custom_runtime.as_deref(),
            )?;
        }
        Ok(())
    }
    fn clear_agent_bin_dir(name: &str) -> Result<()> {
-        let agent_bin_directory = Config::agent_bin_dir(name);
+        let agent_bin_directory = paths::agent_bin_dir(name);
        if !agent_bin_directory.exists() {
            debug!(
                "Creating agent bin directory: {}",
@@ -539,7 +576,7 @@ impl Functions {
    }
    fn clear_global_functions_bin_dir() -> Result<()> {
-        let bin_dir = Config::functions_bin_dir();
+        let bin_dir = paths::functions_bin_dir();
        if !bin_dir.exists() {
            fs::create_dir_all(&bin_dir)?;
        }
@@ -554,8 +591,9 @@ impl Functions {
    }
    fn build_agent_tool_binaries(name: &str) -> Result<()> {
        let tools_file = paths::agent_functions_file(name)?;
        let language = Language::from(
-            &Config::agent_functions_file(name)?
+            &tools_file
                .extension()
                .and_then(OsStr::to_str)
                .map(|s| s.to_lowercase())
@@ -568,7 +606,8 @@ impl Functions {
            bail!("Unsupported tool file extension: {}", language.as_ref());
        }
-        Self::build_binaries(name, language, BinaryType::Agent)
+        let custom_runtime = extract_shebang_runtime(&tools_file);
        Self::build_binaries(name, language, BinaryType::Agent, custom_runtime.as_deref())
    }
    #[cfg(windows)]
@@ -576,22 +615,23 @@ impl Functions {
        binary_name: &str,
        language: Language,
        binary_type: BinaryType,
        custom_runtime: Option<&str>,
    ) -> Result<()> {
        use native::runtime;
        let (binary_file, binary_script_file) = match binary_type {
            BinaryType::Tool(None) => (
-                Config::functions_bin_dir().join(format!("{binary_name}.cmd")),
+                paths::functions_bin_dir().join(format!("{binary_name}.cmd")),
-                Config::functions_bin_dir()
+                paths::functions_bin_dir()
                    .join(format!("run-{binary_name}.{}", language.to_extension())),
            ),
            BinaryType::Tool(Some(agent_name)) => (
-                Config::agent_bin_dir(agent_name).join(format!("{binary_name}.cmd")),
+                paths::agent_bin_dir(agent_name).join(format!("{binary_name}.cmd")),
-                Config::agent_bin_dir(agent_name)
+                paths::agent_bin_dir(agent_name)
                    .join(format!("run-{binary_name}.{}", language.to_extension())),
            ),
            BinaryType::Agent => (
-                Config::agent_bin_dir(binary_name).join(format!("{binary_name}.cmd")),
+                paths::agent_bin_dir(binary_name).join(format!("{binary_name}.cmd")),
-                Config::agent_bin_dir(binary_name)
+                paths::agent_bin_dir(binary_name)
                    .join(format!("run-{binary_name}.{}", language.to_extension())),
            ),
        };
@@ -613,36 +653,40 @@ impl Functions {
            )
        })?;
        let content_template = unsafe { std::str::from_utf8_unchecked(&embedded_file.data) };
        let to_script_path = |p: &str| -> String { p.replace('\\', "/") };
        let content = match binary_type {
            BinaryType::Tool(None) => {
-                let root_dir = Config::functions_dir();
+                let root_dir = paths::functions_dir();
                let tool_path = format!(
                    "{}/{binary_name}",
                    &Config::global_tools_dir().to_string_lossy()
                );
                content_template
                    .replace("{function_name}", binary_name)
-                    .replace("{root_dir}", &root_dir.to_string_lossy())
+                    .replace("{root_dir}", &to_script_path(&root_dir.to_string_lossy()))
-                    .replace("{tool_path}", &tool_path)
+                    .replace("{tool_path}", &to_script_path(&tool_path))
            }
            BinaryType::Tool(Some(agent_name)) => {
-                let root_dir = Config::agent_data_dir(agent_name);
+                let root_dir = paths::agent_data_dir(agent_name);
                let tool_path = format!(
                    "{}/{binary_name}",
                    &Config::global_tools_dir().to_string_lossy()
                );
                content_template
                    .replace("{function_name}", binary_name)
-                    .replace("{root_dir}", &root_dir.to_string_lossy())
+                    .replace("{root_dir}", &to_script_path(&root_dir.to_string_lossy()))
-                    .replace("{tool_path}", &tool_path)
+                    .replace("{tool_path}", &to_script_path(&tool_path))
            }
            BinaryType::Agent => content_template
                .replace("{agent_name}", binary_name)
-                .replace("{config_dir}", &Config::config_dir().to_string_lossy()),
+                .replace(
                    "{config_dir}",
                    &to_script_path(&paths::config_dir().to_string_lossy()),
                ),
        }
        .replace(
            "{prompt_utils_file}",
-            &Config::bash_prompt_utils_file().to_string_lossy(),
+            &to_script_path(&paths::bash_prompt_utils_file().to_string_lossy()),
        );
        if binary_script_file.exists() {
            fs::remove_file(&binary_script_file)?;
@@ -656,40 +700,48 @@ impl Functions {
            binary_file.display()
        );
-        let run = match language {
+        let run = if let Some(rt) = custom_runtime {
-            Language::Bash => {
+            rt.to_string()
-                let shell = runtime::bash_path().ok_or_else(|| anyhow!("Shell not found"))?;
+        } else {
-                format!("{shell} --noprofile --norc")
+            match language {
                Language::Bash => {
                    let shell = runtime::bash_path().ok_or_else(|| anyhow!("Shell not found"))?;
                    format!("{shell} --noprofile --norc")
                }
                Language::Python if Path::new(".venv").exists() => {
                    let executable_path = env::current_dir()?
                        .join(".venv")
                        .join("Scripts")
                        .join("activate.bat");
                    let canonicalized_path = dunce::canonicalize(&executable_path)?;
                    format!(
                        "call \"{}\" && {}",
                        canonicalized_path.to_string_lossy(),
                        language.to_cmd()
                    )
                }
                Language::Python => {
                    let executable_path = which::which("python")
                        .or_else(|_| which::which("python3"))
                        .map_err(|_| anyhow!("Python executable not found in PATH"))?;
                    let canonicalized_path = dunce::canonicalize(&executable_path)?;
                    canonicalized_path.to_string_lossy().into_owned()
                }
                Language::TypeScript => {
                    let npx_path = which::which("npx").map_err(|_| {
                        anyhow!("npx executable not found in PATH (required for TypeScript tools)")
                    })?;
                    let canonicalized_path = dunce::canonicalize(&npx_path)?;
                    format!("{} tsx", canonicalized_path.to_string_lossy())
                }
                _ => bail!("Unsupported language: {}", language.as_ref()),
            }
            Language::Python if Path::new(".venv").exists() => {
                let executable_path = env::current_dir()?
                    .join(".venv")
                    .join("Scripts")
                    .join("activate.bat");
                let canonicalized_path = fs::canonicalize(&executable_path)?;
                format!(
                    "call \"{}\" && {}",
                    canonicalized_path.to_string_lossy(),
                    language.to_cmd()
                )
            }
            Language::Python => {
                let executable_path = which::which("python")
                    .or_else(|_| which::which("python3"))
                    .map_err(|_| anyhow!("Python executable not found in PATH"))?;
                let canonicalized_path = fs::canonicalize(&executable_path)?;
                canonicalized_path.to_string_lossy().into_owned()
            }
            _ => bail!("Unsupported language: {}", language.as_ref()),
        };
        let bin_dir = binary_file
            .parent()
-            .expect("Failed to get parent directory of binary file")
+            .expect("Failed to get parent directory of binary file");
-            .canonicalize()?
+        let canonical_bin_dir = dunce::canonicalize(bin_dir)?.to_string_lossy().into_owned();
-            .to_string_lossy()
+        let wrapper_binary = dunce::canonicalize(&binary_script_file)?
            .into_owned();
        let wrapper_binary = binary_script_file
            .canonicalize()?
            .to_string_lossy()
            .into_owned();
        let content = formatdoc!(
@@ -697,7 +749,7 @@ impl Functions {
 						@echo off
 						setlocal
-						set "bin_dir={bin_dir}"
+						set "bin_dir={canonical_bin_dir}"
 						{run} "{wrapper_binary}" %*"#,
        );
@@ -713,15 +765,16 @@ impl Functions {
        binary_name: &str,
        language: Language,
        binary_type: BinaryType,
        custom_runtime: Option<&str>,
    ) -> Result<()> {
        use std::os::unix::prelude::PermissionsExt;
        let binary_file = match binary_type {
-            BinaryType::Tool(None) => Config::functions_bin_dir().join(binary_name),
+            BinaryType::Tool(None) => paths::functions_bin_dir().join(binary_name),
            BinaryType::Tool(Some(agent_name)) => {
-                Config::agent_bin_dir(agent_name).join(binary_name)
+                paths::agent_bin_dir(agent_name).join(binary_name)
            }
-            BinaryType::Agent => Config::agent_bin_dir(binary_name).join(binary_name),
+            BinaryType::Agent => paths::agent_bin_dir(binary_name).join(binary_name),
        };
        info!(
            "Building binary for function: {} ({})",
@@ -741,12 +794,12 @@ impl Functions {
            )
        })?;
        let content_template = unsafe { std::str::from_utf8_unchecked(&embedded_file.data) };
-        let content = match binary_type {
+        let mut content = match binary_type {
            BinaryType::Tool(None) => {
-                let root_dir = Config::functions_dir();
+                let root_dir = paths::functions_dir();
                let tool_path = format!(
                    "{}/{binary_name}",
-                    &Config::global_tools_dir().to_string_lossy()
+                    &paths::global_tools_dir().to_string_lossy()
                );
                content_template
                    .replace("{function_name}", binary_name)
@@ -754,10 +807,10 @@ impl Functions {
                    .replace("{tool_path}", &tool_path)
            }
            BinaryType::Tool(Some(agent_name)) => {
-                let root_dir = Config::agent_data_dir(agent_name);
+                let root_dir = paths::agent_data_dir(agent_name);
                let tool_path = format!(
                    "{}/{binary_name}",
-                    &Config::global_tools_dir().to_string_lossy()
+                    &paths::global_tools_dir().to_string_lossy()
                );
                content_template
                    .replace("{function_name}", binary_name)
@@ -766,19 +819,50 @@ impl Functions {
            }
            BinaryType::Agent => content_template
                .replace("{agent_name}", binary_name)
-                .replace("{config_dir}", &Config::config_dir().to_string_lossy()),
+                .replace("{config_dir}", &paths::config_dir().to_string_lossy()),
        }
        .replace(
            "{prompt_utils_file}",
-            &Config::bash_prompt_utils_file().to_string_lossy(),
+            &paths::bash_prompt_utils_file().to_string_lossy(),
        );
        if binary_file.exists() {
            fs::remove_file(&binary_file)?;
        }
        let mut file = File::create(&binary_file)?;
        file.write_all(content.as_bytes())?;
-        fs::set_permissions(&binary_file, fs::Permissions::from_mode(0o755))?;
+        if let Some(rt) = custom_runtime
            && let Some(newline_pos) = content.find('\n')
        {
            content = format!("#!/usr/bin/env {rt}{}", &content[newline_pos..]);
        }
        if language == Language::TypeScript {
            let bin_dir = binary_file
                .parent()
                .expect("Failed to get parent directory of binary file");
            let script_file = bin_dir.join(format!("run-{binary_name}.ts"));
            if script_file.exists() {
                fs::remove_file(&script_file)?;
            }
            let mut sf = File::create(&script_file)?;
            sf.write_all(content.as_bytes())?;
            fs::set_permissions(&script_file, fs::Permissions::from_mode(0o755))?;
            let ts_runtime = custom_runtime.unwrap_or("tsx");
            let wrapper = format!(
                "#!/bin/sh\nexec {ts_runtime} \"{}\" \"$@\"\n",
                script_file.display()
            );
            if binary_file.exists() {
                fs::remove_file(&binary_file)?;
            }
            let mut wf = File::create(&binary_file)?;
            wf.write_all(wrapper.as_bytes())?;
            fs::set_permissions(&binary_file, fs::Permissions::from_mode(0o755))?;
        } else {
            if binary_file.exists() {
                fs::remove_file(&binary_file)?;
            }
            let mut file = File::create(&binary_file)?;
            file.write_all(content.as_bytes())?;
            fs::set_permissions(&binary_file, fs::Permissions::from_mode(0o755))?;
        }
        Ok(())
    }
@@ -1064,7 +1148,7 @@ impl ToolCall {
                        function_name.clone(),
                        function_name,
                        vec![],
-                        Default::default(),
+                        agent.variable_envs(),
                    ))
                }
            }
@@ -1096,12 +1180,12 @@ pub fn run_llm_function(
    let mut command_name = cmd_name.clone();
    if let Some(agent_name) = agent_name {
        command_name = cmd_args[0].clone();
-        let dir = Config::agent_bin_dir(&agent_name);
+        let dir = paths::agent_bin_dir(&agent_name);
        if dir.exists() {
            bin_dirs.push(dir);
        }
    } else {
-        bin_dirs.push(Config::functions_bin_dir());
+        bin_dirs.push(paths::functions_bin_dir());
    }
    let current_path = env::var("PATH").context("No PATH environment variable")?;
    let prepend_path = bin_dirs
@@ -1117,18 +1201,73 @@ pub fn run_llm_function(
    #[cfg(windows)]
    let cmd_name = polyfill_cmd_name(&cmd_name, &bin_dirs);
-    let output = Command::new(&cmd_name)
+    #[cfg(windows)]
    let cmd_args = {
        let mut args = cmd_args;
        if let Some(json_data) = args.pop() {
            let tool_data_file = temp_file("-tool-data-", ".json");
            fs::write(&tool_data_file, &json_data)?;
            envs.insert(
                "LLM_TOOL_DATA_FILE".into(),
                tool_data_file.display().to_string(),
            );
        }
        args
    };
    envs.insert("CLICOLOR_FORCE".into(), "1".into());
    envs.insert("FORCE_COLOR".into(), "1".into());
    let mut child = Command::new(&cmd_name)
        .args(&cmd_args)
        .envs(envs)
-        .stdout(Stdio::inherit())
+        .stdout(Stdio::piped())
        .stderr(Stdio::piped())
        .spawn()
        .and_then(|child| child.wait_with_output())
        .map_err(|err| anyhow!("Unable to run {command_name}, {err}"))?;
-    let exit_code = output.status.code().unwrap_or_default();
+    let stdout = child.stdout.take().expect("Failed to capture stdout");
    let mut stderr = child.stderr.take().expect("Failed to capture stderr");
    let stdout_thread = std::thread::spawn(move || {
        let mut buffer = [0; 1024];
        let mut reader = stdout;
        let mut out = io::stdout();
        while let Ok(n) = reader.read(&mut buffer) {
            if n == 0 {
                break;
            }
            let chunk = &buffer[0..n];
            let mut last_pos = 0;
            for (i, &byte) in chunk.iter().enumerate() {
                if byte == b'\n' {
                    let _ = out.write_all(&chunk[last_pos..i]);
                    let _ = out.write_all(b"\r\n");
                    last_pos = i + 1;
                }
            }
            if last_pos < n {
                let _ = out.write_all(&chunk[last_pos..n]);
            }
            let _ = out.flush();
        }
    });
    let stderr_thread = std::thread::spawn(move || {
        let mut buf = Vec::new();
        let _ = stderr.read_to_end(&mut buf);
        buf
    });
    let status = child
        .wait()
        .map_err(|err| anyhow!("Unable to run {command_name}, {err}"))?;
    let _ = stdout_thread.join();
    let stderr_bytes = stderr_thread.join().unwrap_or_default();
    let exit_code = status.code().unwrap_or_default();
    if exit_code != 0 {
-        let stderr = String::from_utf8_lossy(&output.stderr).trim().to_string();
+        let stderr = String::from_utf8_lossy(&stderr_bytes).trim().to_string();
        if !stderr.is_empty() {
            eprintln!("{stderr}");
        }
@@ -990,7 +990,9 @@ async fn summarize_output(config: &GlobalConfig, agent_name: &str, output: &str)
    let model = {
        let cfg = config.read();
        match summarization_model_id {
-            Some(ref model_id) => Model::retrieve_model(&cfg, model_id, ModelType::Chat)?,
+            Some(ref model_id) => {
                Model::retrieve_model(&cfg.to_app_config(), model_id, ModelType::Chat)?
            }
            None => cfg.current_model().clone(),
        }
    };
@@ -76,6 +76,16 @@ pub fn todo_function_declarations() -> Vec<FunctionDeclaration> {
            },
            agent: false,
        },
        FunctionDeclaration {
            name: format!("{TODO_FUNCTION_PREFIX}clear"),
            description: "Clear the entire todo list and reset the goal. Use when the current task has been canceled or invalidated.".to_string(),
            parameters: JsonSchema {
                type_value: Some("object".to_string()),
                properties: Some(IndexMap::new()),
                ..Default::default()
            },
            agent: false,
        },
    ]
 }
@@ -156,6 +166,17 @@ pub fn handle_todo_tool(config: &GlobalConfig, cmd_name: &str, args: &Value) ->
                None => bail!("No active agent"),
            }
        }
        "clear" => {
            let mut cfg = config.write();
            let agent = cfg.agent.as_mut();
            match agent {
                Some(agent) => {
                    agent.clear_todo_list();
                    Ok(json!({"status": "ok", "message": "Todo list cleared"}))
                }
                None => bail!("No active agent"),
            }
        }
        _ => bail!("Unknown todo action: {action}"),
    }
 }
@@ -12,6 +12,7 @@ use tokio::sync::oneshot;
 pub const USER_FUNCTION_PREFIX: &str = "user__";
 const DEFAULT_ESCALATION_TIMEOUT_SECS: u64 = 300;
 const CUSTOM_MULTI_CHOICE_ANSWER_OPTION: &str = "Other (custom)";
 pub fn user_interaction_function_declarations() -> Vec<FunctionDeclaration> {
    vec![
@@ -151,9 +152,14 @@ fn handle_direct_ask(args: &Value) -> Result<Value> {
        .get("question")
        .and_then(Value::as_str)
        .ok_or_else(|| anyhow!("'question' is required"))?;
-    let options = parse_options(args)?;
+    let mut options = parse_options(args)?;
    options.push(CUSTOM_MULTI_CHOICE_ANSWER_OPTION.to_string());
-    let answer = Select::new(question, options).prompt()?;
+    let mut answer = Select::new(question, options).prompt()?;
    if answer == CUSTOM_MULTI_CHOICE_ANSWER_OPTION {
        answer = Text::new("Custom response:").prompt()?
    }
    Ok(json!({ "answer": answer }))
 }
@@ -175,7 +181,7 @@ fn handle_direct_input(args: &Value) -> Result<Value> {
        .and_then(Value::as_str)
        .ok_or_else(|| anyhow!("'question' is required"))?;
-    let answer = Text::new(question).prompt()?;
+    let answer = Text::new(&format!("{question}\nYour answer: ")).prompt()?;
    Ok(json!({ "answer": answer }))
 }
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Alex Clarke	e9e6b82e24	testing	2026-04-10 15:45:51 -06:00
Alex Clarke	ff3419a714	Merge branch 'tree-sitter-tools' into 'develop'	2026-04-09 14:48:22 -06:00
Alex Clarke	a5899da4fb	feat: Automatic runtime customization using shebangs	2026-04-09 14:16:02 -06:00
Alex Clarke	dedcef8ac5	test: Updated client stream tests to use the thread_rng from rand	2026-04-09 13:53:52 -06:00
Alex Clarke	d658f1d2fe	build: Pulled additional features for rand dependency	2026-04-09 13:45:08 -06:00
Alex Clarke	6b4a45874f	fix: TypeScript function args were being passed as objects rather than direct parameters	2026-04-09 13:32:16 -06:00
Alex Clarke	7839e1dbd9	build: upgraded dependencies to latest	2026-04-09 13:28:19 -06:00
Alex Clarke	78c3932f36	docs: Updated docs to talk about the new TypeScript-based tool support	2026-04-09 13:19:15 -06:00
Alex Clarke	11334149b0	feat: Created a demo TypeScript tool and a get_current_weather function in TypeScript	2026-04-09 13:18:41 -06:00
Alex Clarke	4caa035528	feat: Updated the Python demo tool to show all possible parameter types and variations	2026-04-09 13:18:18 -06:00
Alex Clarke	f30e81af08	fix: Added in forgotten wrapper scripts for TypeScript tools	2026-04-09 13:17:53 -06:00
Alex Clarke	4c75655f58	feat: Added TypeScript tool support using the refactored common ScriptedLanguage trait	2026-04-09 13:17:28 -06:00
Alex Clarke	f865892c28	refactor: Extracted common Python parser logic into a common.rs module	2026-04-09 13:16:35 -06:00
Alex Clarke	ebeb9c9b7d	refactor: python tools now use tree-sitter queries instead of AST	2026-04-09 10:20:49 -06:00
Alex Clarke	ab2b927fcb	fix: don't shadow variables in binary path handling for Windows	2026-04-09 07:53:18 -06:00
Alex Clarke	7e5ff2ba1f	build: Upgraded crossterm and reedline dependencies	2026-04-08 14:54:53 -06:00
Alex Clarke	ed59051f3d	fix: Tool call improvements for Windows systems	2026-04-08 12:49:43 -06:00
github-actions[bot]	e98bf56a2b	chore: bump Cargo.toml to 0.3.0	2026-04-02 20:17:47 +00:00
github-actions[bot]	fb510b1a4f	bump: version 0.2.0 → 0.3.0 [skip ci]	2026-04-02 20:17:45 +00:00
Alex Clarke	6c17462040	feat: Added `todo__clear` function to the todo system and updated REPL commands to have a .clear todo as well for significant changes in agent direction CI / All (ubuntu-latest) (push) Failing after 24s Details CI / All (macos-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-04-02 13:13:44 -06:00
Alex Clarke	1536cf384c	fix: Clarified user text input interaction CI / All (ubuntu-latest) (push) Failing after 23s Details CI / All (macos-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-30 16:27:22 -06:00
Alex Clarke	d6842d7e29	fix: recursion bug with similarly named Bash search functions in the explore agent CI / All (ubuntu-latest) (push) Failing after 24s Details CI / All (macos-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-30 13:32:13 -06:00
Alex Clarke	fbc0acda2a	feat: Added available tools to prompts for sisyphus and code-reviewer agent families	2026-03-30 13:13:30 -06:00
Alex Clarke	0327d041b6	feat: Added available tools to coder prompt	2026-03-30 11:11:43 -06:00
Alex Clarke	6a01fd4fbd	Merge branch 'main' of github.com:Dark-Alex-17/loki	2026-03-30 10:15:51 -06:00
Alex Clarke	d822180205	fix: updated the error for unauthenticated oauth to include the REPL .authenticated command	2026-03-28 11:57:01 -06:00
Alex Clarke	89d0fdce26	feat: Improved token efficiency when delegating from sisyphus -> coder	2026-03-18 15:07:29 -06:00
Alex Clarke	b3ecdce979	build: Removed deprecated agent functions from the .shared/utils.sh script	2026-03-18 15:04:14 -06:00
Alex Clarke	3873821a31	fix: Corrected a bug in the coder agent that wasn't outputting a summary of the changes made, so the parent Sisyphus agent has no idea if the agent worked or not CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-17 14:57:07 -06:00
Alex Clarke	9c2801b643	feat: modified sisyphus agents to use the new ddg-search MCP server for web searches instead of built-in model searches	2026-03-17 14:55:33 -06:00
Alex Clarke	d78820dcd4	fix: Claude code system prompt injected into claude requests to make them valid once again CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-17 10:44:50 -06:00
Alex Clarke	d43c4232a2	fix: Do not inject tools when models don't support them; detect this conflict before API calls happen	2026-03-17 09:35:51 -06:00
Alex Clarke	f41c85b703	style: Applied formatting across new inquire files CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-16 12:39:20 -06:00
Alex Clarke	9e056bdcf0	feat: Added support for specifying a custom response to multiple-choice prompts when nothing suits the user's needs	2026-03-16 12:37:47 -06:00
Alex Clarke	d6022b9f98	feat: Supported theming in the inquire prompts in the REPL	2026-03-16 12:36:20 -06:00
Alex Clarke	6fc1abf94a	build: upgraded to the most recent version of the inquire crate	2026-03-16 12:31:28 -06:00
Alex Clarke	92ea0f624e	docs: Fixed a spacing issue in the example agent configuration CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-13 14:19:39 -06:00
Alex Clarke	c3fd8fbc1c	docs: Added the file-reviewer agent to the AGENTS docs CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-13 14:07:13 -06:00
Alex Clarke	7fd3f7761c	docs: Updated the MCP-SERVERS docs to mention the ddg-search MCP server CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-13 13:32:58 -06:00
Alex Clarke	05e19098b2	feat: Added the duckduckgo-search MCP server for searching the web (in addition to the built-in tools for web searches) CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-13 13:29:56 -06:00
Alex Clarke	60067ae757	Merge branch 'main' of github.com:Dark-Alex-17/loki CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-12 15:17:54 -06:00
Alex Clarke	c72003b0b6	fix: Implemented the path normalization fix for the oracle and explore agents	2026-03-12 13:38:15 -06:00
Alex Clarke	7c9d500116	chore: Added GPT-5.2 to models.yaml	2026-03-12 13:30:23 -06:00
Alex Clarke	6b2c87b562	docs: Updated the docs to now explicitly mention Gemini OAuth support	2026-03-12 13:30:10 -06:00
Alex Clarke	b2dbdfb4b1	feat: Support for Gemini OAuth	2026-03-12 13:29:47 -06:00
Alex Clarke	063e198f96	refactor: Made the oauth module more generic so it can support loopback OAuth (not just manual)	2026-03-12 13:28:09 -06:00
Alex Clarke	73cbe16ec1	fix: Updated the atlassian MCP server endpoint to account for future deprecation	2026-03-12 12:49:26 -06:00
Alex Clarke	bdea854a9f	fix: Fixed a bug in the coder agent that was causing the agent to create absolute paths from the current directory	2026-03-12 12:39:49 -06:00
Alex Clarke	9b4c800597	fix: The REPL .authenticate command works from within sessions, agents, and roles with pre-configured models CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-12 09:08:17 -06:00
Alex Clarke	eb4d1c02f4	feat: Support authenticating or refreshing OAuth for supported clients from within the REPL CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-11 13:07:27 -06:00
Alex Clarke	c428990900	fix: the updated regex for secrets injection broke MCP server secrets interpolation because the regex greedily matched on new lines, replacing too much content. This fix just ignores commented out lines in YAML files by skipping commented out lines.	2026-03-11 12:55:28 -06:00
Alex Clarke	03b9cc70b9	feat: Allow first-runs to select OAuth for supported providers	2026-03-11 12:01:17 -06:00
Alex Clarke	3fa0eb832c	fix: Don't try to inject secrets into commented-out lines in the config	2026-03-11 11:11:09 -06:00
Alex Clarke	83f66e1061	feat: Support OAuth authentication flows for Claude	2026-03-11 11:10:48 -06:00
Alex Clarke	741b9c364c	chore: Added support for Claude 4.6 gen models CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-10 14:55:30 -06:00
Alex Clarke	b6f6f456db	fix: Removed top_p parameter from some agents so they can work across model providers CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-10 10:18:38 -06:00
Alex Clarke	00a6cf74d7	Merge branch 'main' of github.com:Dark-Alex-17/loki CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-09 14:58:23 -06:00
Alex Clarke	d35ca352ca	chore: Added the new gemini-3.1-pro-preview model to gemini and vertex models CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-03-09 14:57:39 -06:00
Alex Clarke	57dc1cb252	docs: created an authorship policy and PR template that requires disclosure of AI assistance in contributions	2026-02-24 17:46:07 -07:00
Alex Clarke	101a9cdd6e	style: Applied formatting to MCP module CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-02-20 15:28:21 -07:00
Alex Clarke	c5f52e1efb	docs: Updated sisyphus README to always include the execute_command.sh tool CI / All (macos-latest) (push) Has been cancelled Details CI / All (ubuntu-latest) (push) Has been cancelled Details CI / All (windows-latest) (push) Has been cancelled Details	2026-02-20 15:06:57 -07:00
Alex Clarke	470149b606	docs: Updated the sisyphus system docs to have a pro-tip of configuring an IDE MCP server to improve performance	2026-02-20 15:01:08 -07:00
Alex Clarke	02062c5a50	docs: Created README docs for the CodeRabbit-style Code reviewer agents	2026-02-20 15:00:32 -07:00
Alex Clarke	e6e99b6926	feat: Improved MCP server spinup and spindown when switching contexts or settings in the REPL: Modify existing config rather than stopping all servers always and re-initializing if unnecessary	2026-02-20 14:36:34 -07:00
Alex Clarke	15a293204f	fix: Improved sub-agent stdout and stderr output for users to follow	2026-02-20 13:47:28 -07:00
Alex Clarke	ecf3780aed	Update models.yaml with latest OpenRouter data	2026-02-20 12:08:00 -07:00
Alex Clarke	e798747135	Add script to update models.yaml from OpenRouter	2026-02-20 12:07:59 -07:00
Alex Clarke	60493728a0	fix: Inject agent variables into environment variables for global tool calls when invoked from agents to modify global tool behavior	2026-02-20 11:38:24 -07:00