Files
loki/docs/implementation/PHASE-1-STEP-8c-NOTES.md
2026-04-15 12:56:00 -06:00

9.5 KiB
Raw Blame History

Phase 1 Step 8c — Implementation Notes

Status

Done.

Plan reference

  • Plan: docs/PHASE-1-IMPLEMENTATION-PLAN.md
  • Section: "Step 8c: Extract McpFactory::acquire() from McpRegistry::init_server"

Summary

Extracted the MCP subprocess spawn + rmcp handshake logic from McpRegistry::start_server into a standalone pub(crate) async fn spawn_mcp_server() function. Rewrote start_server to call it. Implemented McpFactory::acquire() using the extracted function plus the existing try_get_active / insert_active scaffolding from Step 6.5. Three types in mcp/mod.rs were bumped to pub(crate) visibility for cross-module access.

What was changed

Files modified (2 files)

  • src/mcp/mod.rs — 4 changes:

    1. Extracted spawn_mcp_server (~40 lines) — standalone pub(crate) async fn that takes an &McpServer spec and optional log path, builds a tokio::process::Command, creates a TokioChildProcess transport (with optional stderr log redirect), calls ().serve(transport).await for the rmcp handshake, and returns Arc<ConnectedServer>.

    2. Rewrote McpRegistry::start_server — now looks up the McpServer spec from self.config, calls spawn_mcp_server, then does its own catalog building (tool listing, BM25 index construction). The spawn + handshake code that was previously inline is replaced by the one-liner spawn_mcp_server(spec, self.log_path.as_deref()).await?.

    3. Bumped 3 types to pub(crate): McpServer, JsonField, McpServersConfig. These were previously private to mcp/mod.rs. McpFactory::acquire() and McpServerKey::from_spec() need McpServer and JsonField to build the server key from a spec. McpServersConfig is bumped for completeness (Step 8d may need to access it when loading server specs during scope transitions).

  • src/config/mcp_factory.rs — 3 changes:

    1. Added McpServerKey::from_spec(name, &McpServer) — builds a key by extracting command, args (defaulting to empty vec), and env vars (converting JsonField variants to strings) from the spec. Args and env are sorted by the existing new() constructor to ensure identical specs produce identical keys.

    2. Added McpFactory::acquire(name, &McpServer, log_path) — the core method. Builds an McpServerKey from the spec, checks try_get_active for an existing Arc (sharing path), otherwise calls spawn_mcp_server to start a fresh subprocess, inserts the result into active via insert_active, and returns the Arc<ConnectedServer>.

    3. Updated imports — added McpServer, spawn_mcp_server, Result, Path.

Files NOT changed

  • src/config/tool_scope.rs — unchanged; Step 8d will use McpFactory::acquire() to populate McpRuntime instances.
  • All caller codeMcpRegistry::start_select_mcp_servers and McpRegistry::reinit continue to call self.start_server() which internally uses the extracted function. No caller migration.

Key decisions

1. Spawn function does NOT list tools or build catalogs

The plan said to extract "the MCP subprocess spawn + rmcp handshake logic (~60 lines)." I interpreted this as: Command construction → transport creation → serve() handshake → Arc wrapping. The tool listing (service.list_tools) and catalog building (BM25 index) are McpRegistry-specific bookkeeping and stay in start_server.

McpFactory::acquire() returns a connected server handle ready to use. Callers (Step 8d's scope transitions) can list tools themselves if they need to build function declarations.

2. No abort_signal parameter on spawn_mcp_server

The plan suggested abort_signal: &AbortSignal as a parameter. The existing start_server doesn't use an abort signal — cancellation is handled at a higher level by abortable_run_with_spinner wrapping the entire batch of start_select_mcp_servers. Adding an abort signal to the individual spawn would require threading tokio::select! into the transport creation, which is a behavior change beyond Step 8c's scope. Step 8d can add cancellation when building ToolScope if needed.

3. McpServerKey::from_spec converts JsonField to strings

The McpServer.env field uses a JsonField enum (Str/Bool/Int) for JSON flexibility. The key needs string comparisons for hashing, so from_spec converts each variant to its string representation. This matches the conversion already done in the env-building code inside spawn_mcp_server.

4. McpFactory::acquire mutex contention is safe

The plan warned: "hold the lock only during HashMap mutation, never across subprocess spawn." The implementation achieves this by using the existing try_get_active and insert_active methods, which each acquire and release the mutex within their own scope. The spawn_mcp_server await happens between the two lock acquisitions with no lock held.

TOCTOU race: two concurrent callers could both miss in try_get_active, both spawn, and both insert. The second insert overwrites the first's Weak. This means one extra subprocess gets spawned and the first Arc has no Weak in the map (but stays alive via its holder's Arc). This is acceptable for Phase 1 — the worst case is a redundant spawn, not a crash or leak. Phase 5's pooling design (per-key tokio::sync::Mutex) will eliminate this race.

5. No integration tests for acquire()

The plan suggested writing integration tests for the factory's sharing behavior. Spawning a real MCP server requires a configured binary on the system PATH. A mock server would need a test binary that speaks the rmcp stdio protocol — this is substantial test infrastructure that doesn't exist yet. Rather than building it in Step 8c, I'm documenting that integration testing of McpFactory::acquire() should happen in Phase 5 when the pooling infrastructure provides natural test hooks (idle pool, reaper, health checks). The extraction itself is verified by the fact that existing MCP functionality (which goes through McpRegistry::start_serverspawn_mcp_server) still compiles and all 63 tests pass.

Deviations from plan

Deviation Rationale
No abort_signal parameter Not used by existing code; adding it is a behavior change
No integration tests Requires MCP test infrastructure that doesn't exist
Removed get_server_spec / log_path accessors from McpRegistry Not needed; acquire() takes spec and log_path directly

Verification

Compilation

  • cargo check — clean, zero warnings, zero errors
  • cargo clippy — clean

Tests

  • cargo test63 passed, 0 failed (unchanged from Steps 18b)

Handoff to next step

What Step 8d can rely on

  • spawn_mcp_server(&McpServer, Option<&Path>) -> Result<Arc<ConnectedServer>> — available from crate::mcp::spawn_mcp_server
  • McpFactory::acquire(name, &McpServer, log_path) -> Result<Arc<ConnectedServer>> — checks active map for sharing, spawns fresh if needed, inserts into active map
  • McpServerKey::from_spec(name, &McpServer) -> McpServerKey — builds a hashable key from a server spec
  • McpServer, McpServersConfig, JsonField — all pub(crate) and accessible from src/config/

What Step 8d should do

Build real ToolScope instances during scope transitions:

  1. Resolve the effective enabled-server list from the role/session/agent
  2. Look up each server's McpServer spec (from the MCP config)
  3. Call app.mcp_factory.acquire(name, spec, log_path) for each
  4. Populate an McpRuntime with the returned Arc<ConnectedServer> handles
  5. Construct a ToolScope with the runtime + resolved Functions
  6. Assign to ctx.tool_scope

What Step 8d should watch for

  • Log path. McpRegistry stores log_path during init(). Step 8d needs to decide where the log path comes from for factory-acquired servers. Options: store it on AppState, compute it from paths::cache_path(), or pass it through from the caller. The simplest is to store it on McpFactory at construction time.

  • MCP config loading. McpRegistry::init() loads and parses mcp.json. Step 8d's scope transitions need access to the parsed McpServersConfig to look up server specs by name. Options: store the parsed config on AppState, or load it fresh each time. Storing on AppState is more efficient.

  • Catalog building. McpRegistry::start_server builds a ServerCatalog (BM25 index) for each server after spawning. Step 8d's ToolScope doesn't use catalogs — they're for the mcp_search meta-function. The catalog functionality may need to be lifted out of McpRegistry eventually, but that's not blocking Step 8d.

Files to re-read at the start of Step 8d

  • docs/PHASE-1-IMPLEMENTATION-PLAN.md — Step 8d section
  • This notes file
  • src/config/mcp_factory.rs — full file
  • src/config/tool_scope.rs — full file
  • src/mcp/mod.rsMcpRegistry::init, start_select_mcp_servers, resolve_server_ids for the config loading / server selection patterns that Step 8d will replicate

References

  • Phase 1 plan: docs/PHASE-1-IMPLEMENTATION-PLAN.md
  • Step 8b notes: docs/implementation/PHASE-1-STEP-8b-NOTES.md
  • Step 6.5 notes: docs/implementation/PHASE-1-STEP-6.5-NOTES.md
  • Modified files:
    • src/mcp/mod.rs (extracted spawn_mcp_server, rewrote start_server, bumped 3 types to pub(crate))
    • src/config/mcp_factory.rs (added from_spec, acquire, updated imports)