testing

2026-04-10 15:45:51 -06:00
parent ff3419a714
commit e9e6b82e24
42 changed files with 11578 additions and 358 deletions
@@ -0,0 +1,744 @@
+# Phase 6 Implementation Plan: Production Hardening
+
+## Overview
+
+Phase 6 closes out the refactor by picking up every "deferred to production hardening" item from Phases 1–5 and delivering a Loki build that's safe to run as a multi-tenant service. The preceding phases made Loki *functionally* a server — Phase 6 makes it *operationally* a server. That means real rate limiting instead of a stub, per-subject session ownership instead of flat visibility, Prometheus metrics instead of in-memory counters, structured JSON logging, deployment manifests, security headers, config validation, and operational runbooks.
+
+This is the final phase. After it lands, Loki v1 is production-ready: you can run `loki --serve` in a container behind a reverse proxy, scrape its metrics from Prometheus, route requests through a rate limiter, and have multiple tenants share the same instance without seeing each other's data.
+
+**Estimated effort:** ~1 week
+**Risk:** Low. Most of the work is applying well-known patterns (sliding-window rate limiting, row-level authz, Prometheus, structured logging) on top of the architecture the previous phases already built. No new core types, no new pipelines.
+**Depends on:** Phases 1–5 complete. The API server runs, MCP pool works, sessions are UUID-keyed.
+
+---
+
+## Why Phase 6 Exists
+
+Phases 4 and 5 got the API server running with correct semantics, but several explicit gaps were called out as "stubs" or "follow-ups." A Phase 4 deployment is usable for a trusted single-tenant context (an internal tool, a personal server) but unsafe for anything else:
+
+- **Anyone with a valid API key can see every session.** Phase 4 flagged this as "single-tenant-per-key." In a multi-tenant deployment where Alice and Bob both have keys, Alice can list Bob's sessions and read their messages. This is a security issue, not a feature gap.
+- **No real rate limiting.** Phase 4's `max_concurrent_requests` semaphore caps parallelism but doesn't throttle per-subject request rates. A single runaway client can exhaust the whole concurrency budget.
+- **No metrics for external observability.** Phase 5 added in-memory counters, but they're only reachable via the `.info mcp` dot-command or a one-shot JSON endpoint. Production needs Prometheus scraping so alerting and dashboards work.
+- **Logs aren't structured.** The `tracing` spans from Phase 4 middleware emit human-readable text. Aggregators like Loki (the other one), Datadog, or CloudWatch want JSON with correlation IDs.
+- **No deployment story.** There's no Dockerfile, no systemd unit, no documented way to actually run the thing in production. Every deploying team has to reinvent this.
+- **Security headers missing.** Phase 4's CORS handles cross-origin; it doesn't set `X-Content-Type-Options`, `X-Frame-Options`, or similar defaults that a browser-facing endpoint should have.
+- **No config validation at startup.** Mistyped config values produce runtime errors hours after deployment instead of failing fast at startup.
+- **Operational procedures are undocumented.** How do you rotate auth keys? How do you reload MCP credentials? What's the runbook when the MCP hit rate drops? None of this is written down.
+
+Phase 6 delivers answers to all of the above. It's the "you can actually deploy this" phase.
+
+---
+
+## What Phase 6 Delivers
+
+Grouped by theme rather than by dependency order. Each item is independently valuable and can be worked in parallel.
+
+### Security and isolation
+
+1. **Per-subject session ownership** — every session records the authenticated subject that created it; reads/writes are authz-checked against the caller's subject.
+2. **Scope-based authorization** — `AuthContext.scopes` are enforced per endpoint (e.g., `read:sessions`, `write:sessions`, `admin:mcp`). Phase 4's middleware already populates scopes; Phase 6 adds the enforcement.
+3. **JWT support** — extends `AuthConfig` with a `Jwt { issuer, audience, jwks_url }` variant that validates tokens against a JWKS endpoint and extracts subject + scopes from claims.
+4. **Security headers middleware** — `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin`, optional HSTS when behind HTTPS.
+5. **Audit logging** — structured audit events for every authenticated request (subject, action, target, result), written to a dedicated sink so they survive log rotation.
+
+### Throughput and fairness
+
+6. **Per-subject rate limiting** — sliding-window limiter keyed by subject. Enforces `rate_limit_per_minute` and related config. Returns `429 Too Many Requests` with a `Retry-After` header.
+7. **Per-subject concurrency limit** — subject-scoped semaphore so one noisy neighbor can't exhaust the global concurrency budget.
+8. **Backpressure signal** — expose a `/healthz/ready` endpoint that returns 503 when the server is saturated, so upstream load balancers can drain traffic.
+
+### Observability
+
+9. **Structured JSON logging** — every log line is JSON with `timestamp`, `level`, `target`, `request_id`, `subject`, `session_id`, and `fields`. Routes through `tracing_subscriber` with `fmt::layer().json()`.
+10. **Prometheus metrics endpoint** — `/metrics` exposing the existing Phase 5 counters plus new HTTP metrics (`http_requests_total`, `http_request_duration_seconds`, `http_requests_in_flight`), MCP metrics (`mcp_pool_size`, `mcp_acquire_latency_seconds` histogram), and session metrics (`sessions_active_total`, `sessions_created_total`).
+11. **Liveness and readiness probes** — `/healthz/live` for process liveness (always 200 unless shutting down), `/healthz/ready` for dependency readiness (config loaded, MCP pool initialized, storage writable).
+
+### Operability
+
+12. **Config validation at startup** — a dedicated `ApiConfig::validate()` that checks every field against a schema and fails fast with a readable error message listing *all* problems, not just the first one.
+13. **SIGHUP config reload** — reloads auth keys, log level, and rate limit settings without restarting the server. Does NOT reload MCP pool config (requires restart because the pool holds live subprocesses).
+14. **Dockerfile + multi-stage build** — minimal runtime image based on `debian:bookworm-slim` with the compiled binary, config directory, and non-root user.
+15. **systemd service unit** — with `Type=notify`, sandboxing directives, and resource limits.
+16. **docker-compose example** — for local development with nginx-as-TLS-terminator in front.
+17. **Kubernetes manifests** — Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler.
+
+### Documentation
+
+18. **Operational runbook** (`docs/RUNBOOK.md`) — documented procedures for common scenarios.
+19. **Deployment guide** (`docs/DEPLOYMENT.md`) — end-to-end instructions for each deployment target.
+20. **Security guide** (`docs/SECURITY.md`) — threat model, hardening checklist, key rotation procedures.
+
+---
+
+## Core Type Additions
+
+Most of Phase 6 hangs off existing types. A few new concepts need introducing.
+
+### `AuthContext` enrichment
+
+Phase 4 defined `AuthContext { subject: String, scopes: Vec<String> }`. Phase 6 extends it:
+
+```rust
+pub struct AuthContext {
+    pub subject: String,
+    pub scopes: Scopes,
+    pub key_id: Option<String>,        // for audit log correlation
+    pub claims: Option<JwtClaims>,     // present when auth mode is Jwt
+}
+
+pub struct Scopes(HashSet<String>);
+
+impl Scopes {
+    pub fn has(&self, scope: &str) -> bool;
+    pub fn has_any(&self, required: &[&str]) -> bool;
+    pub fn has_all(&self, required: &[&str]) -> bool;
+}
+
+pub enum Scope {
+    ReadSessions,      // "read:sessions"
+    WriteSessions,     // "write:sessions"
+    ReadAgents,        // "read:agents"
+    RunAgents,         // "run:agents"
+    ReadModels,        // "read:models"
+    AdminMcp,          // "admin:mcp"
+    AdminSessions,     // "admin:sessions" — can see all users' sessions
+}
+```
+
+The `Scope` enum provides typed constants for the well-known scope strings used in the handlers. Custom scopes (for callers to define their own access tiers) continue to work as raw strings.
+
+### `SessionOwnership` in the session store
+
+The session metadata needs to record who owns each session so reads/writes can be authorized:
+
+```rust
+pub struct SessionMeta {
+    pub id: SessionId,
+    pub alias: Option<SessionAlias>,
+    pub owner: Option<String>,         // subject that created it; None = legacy
+    pub last_modified: SystemTime,
+    pub is_autoname: bool,
+}
+```
+
+On disk, the ownership field goes into the session's YAML file under a reserved `_meta` block:
+
+```yaml
+_meta:
+  owner: "alice"
+  created_at: "2026-04-10T15:32:11Z"
+  created_by_key_id: "key_3f2a..."
+# ... rest of session fields unchanged
+```
+
+The `SessionStore` trait gets two new methods and an enriched `open` signature:
+
+```rust
+#[async_trait]
+pub trait SessionStore: Send + Sync {
+    // existing methods unchanged except:
+    async fn open(
+        &self,
+        agent: Option<&str>,
+        id: SessionId,
+        caller: Option<&AuthContext>,  // NEW: for authz check
+    ) -> Result<SessionHandle, StoreError>;
+
+    async fn list(
+        &self,
+        agent: Option<&str>,
+        caller: Option<&AuthContext>,  // NEW: for filtering
+    ) -> Result<Vec<SessionMeta>, StoreError>;
+
+    // NEW: transfer ownership (e.g., admin reassignment)
+    async fn set_owner(
+        &self,
+        id: SessionId,
+        new_owner: Option<String>,
+    ) -> Result<(), StoreError>;
+}
+```
+
+`caller: None` means internal or legacy access (CLI/REPL) — skip authz entirely. `caller: Some(...)` means an API call — enforce ownership.
+
+**Authz rules:**
+- Own session: full access.
+- Other subject's session: denied unless caller has `admin:sessions` scope.
+- Legacy sessions with `owner: None`: accessible to anyone (grandfathered); every mutation attempts to set the owner to the current caller so they get claimed forward.
+- `list`: returns only sessions owned by the caller (or all if they have `admin:sessions`).
+
+### `RateLimiter` and `ConcurrencyLimiter`
+
+```rust
+pub struct RateLimiter {
+    windows: DashMap<String, SlidingWindow>,
+    config: RateLimitConfig,
+}
+
+struct SlidingWindow {
+    bucket_a: AtomicU64,
+    bucket_b: AtomicU64,
+    last_reset: AtomicU64,
+}
+
+pub struct RateLimitConfig {
+    pub per_minute: u32,
+    pub burst: u32,
+}
+
+impl RateLimiter {
+    pub fn check(&self, subject: &str) -> Result<(), RateLimitError>;
+}
+
+pub struct RateLimitError {
+    pub retry_after: Duration,
+    pub limit: u32,
+    pub remaining: u32,
+}
+
+pub struct SubjectConcurrencyLimiter {
+    semaphores: DashMap<String, Arc<Semaphore>>,
+    per_subject: usize,
+}
+
+impl SubjectConcurrencyLimiter {
+    pub async fn acquire(&self, subject: &str) -> OwnedSemaphorePermit;
+}
+```
+
+Both live in `ApiState` and are applied via middleware. Rate limiting runs first (cheap atomic operations), then concurrency acquisition (may block briefly).
+
+### `MetricsRegistry`
+
+```rust
+pub struct MetricsRegistry {
+    pub http_requests_total: IntCounterVec,
+    pub http_request_duration: HistogramVec,
+    pub http_requests_in_flight: IntGaugeVec,
+    pub sessions_active: IntGauge,
+    pub sessions_created_total: IntCounter,
+    pub mcp_pool_size: IntGaugeVec,
+    pub mcp_acquire_latency: HistogramVec,
+    pub mcp_spawns_total: IntCounter,
+    pub mcp_idle_evictions_total: IntCounter,
+    pub auth_failures_total: IntCounterVec,
+    pub rate_limit_rejections_total: IntCounterVec,
+}
+```
+
+Built on top of the `prometheus` crate. Exposed via `GET /metrics` with the Prometheus text exposition format. The registry bridges Phase 5's atomic counters into the Prometheus types without requiring Phase 5's code to change — Phase 5 keeps its simple counters, and Phase 6 reads them on each scrape to populate the Prometheus gauges.
+
+### `AuditLogger`
+
+```rust
+pub struct AuditLogger {
+    sink: AuditSink,
+}
+
+pub enum AuditSink {
+    Stderr,                                 // default
+    File { path: PathBuf, rotation: Rotation },
+    Syslog { facility: String },
+}
+
+pub struct AuditEvent<'a> {
+    pub timestamp: OffsetDateTime,
+    pub request_id: Uuid,
+    pub subject: Option<&'a str>,
+    pub action: AuditAction,
+    pub target: Option<&'a str>,
+    pub result: AuditResult,
+    pub details: Option<serde_json::Value>,
+}
+
+pub enum AuditAction {
+    SessionCreate,
+    SessionRead,
+    SessionUpdate,
+    SessionDelete,
+    AgentActivate,
+    ToolExecute,
+    McpReload,
+    ConfigReload,
+    AuthFailure,
+    RateLimitRejection,
+}
+
+pub enum AuditResult {
+    Success,
+    Denied { reason: String },
+    Error { message: String },
+}
+
+impl AuditLogger {
+    pub fn log(&self, event: AuditEvent<'_>);
+}
+```
+
+Audit events are emitted from handler middleware after request completion. The audit stream is deliberately separate from the regular tracing logs because audit logs have stricter retention/integrity requirements in regulated environments — you want to be able to pipe them to a WORM storage or SIEM without mixing in debug logs.
+
+---
+
+## Migration Strategy
+
+### Step 1: Per-subject session ownership
+
+The highest-impact security fix. No new deps, no new config — just enriching existing types.
+
+1. Add `owner: Option<String>` and `created_by_key_id: Option<String>` to the session YAML `_meta` block. Serde skip if absent (backward compat for legacy files).
+2. Update `SessionStore::create` to record the caller's subject.
+3. Update `SessionStore::open` to take `caller: Option<&AuthContext>` and enforce ownership.
+4. Update `SessionStore::list` to filter by caller subject (unless caller has `admin:sessions` scope).
+5. Add `SessionStore::set_owner` for admin reassignment.
+6. Implement the "claim on first mutation" behavior for legacy sessions.
+7. Update all API handlers to pass the `AuthContext` through to store calls.
+8. Add integration tests: Alice creates a session, Bob tries to read it (403), admin Claire can read it (200), Alice's `list` returns only her own, Claire's `list` with `admin:sessions` scope returns everything.
+
+**Verification:** all new authz tests pass. CLI/REPL tests still pass because they pass `caller: None`.
+
+### Step 2: Scope-based authorization for endpoints
+
+Phase 4's middleware attaches `AuthContext` with a `scopes: Vec<String>` field but handlers don't check it. Phase 6 adds the enforcement.
+
+1. Change `AuthContext.scopes` from `Vec<String>` to a `Scopes(HashSet<String>)` newtype with `has`/`has_any`/`has_all` methods.
+2. Define the `Scope` enum with well-known constants.
+3. Add a `require_scope` helper and a `#[require_scope("read:sessions")]` proc macro (or a handler-side check if proc macros add too much complexity).
+4. Annotate every handler with the required scope(s):
+   - `GET /v1/sessions` → `read:sessions`
+   - `POST /v1/sessions` → `write:sessions`
+   - `GET /v1/sessions/:id` → `read:sessions`
+   - `DELETE /v1/sessions/:id` → `write:sessions`
+   - `POST /v1/sessions/:id/completions` → `write:sessions` + `run:agents` (if the session has an agent)
+   - `POST /v1/rags/:name/rebuild` → `admin:mcp`
+   - `GET /v1/agents`, `/v1/roles`, `/v1/rags`, `/v1/models` → `read:agents`, `read:roles`, etc.
+   - `/metrics` → `admin:metrics` (or unauthenticated if the endpoint is bound to a private network)
+5. Document the scope model in `docs/SECURITY.md`.
+
+**Verification:** per-endpoint authz tests. A key with only `read:sessions` can list and read but not write.
+
+### Step 3: JWT support in `AuthConfig`
+
+Extend the auth mode enum:
+
+```rust
+pub enum AuthConfig {
+    Disabled,
+    StaticKeys { keys: Vec<AuthKeyEntry> },
+    Jwt(JwtConfig),
+}
+
+pub struct JwtConfig {
+    pub issuer: String,
+    pub audience: String,
+    pub jwks_url: String,
+    pub jwks_refresh_interval: Duration,
+    pub subject_claim: String,        // e.g., "sub"
+    pub scopes_claim: String,         // e.g., "scope" or "permissions"
+    pub leeway_seconds: u64,
+}
+```
+
+1. Add `jsonwebtoken` and `reqwest` (already present) to dependencies.
+2. Implement a `JwksCache` that fetches `jwks_url` on startup and refreshes every `jwks_refresh_interval`. Uses `reqwest` with a short timeout. Refreshes in the background via `tokio::spawn`.
+3. The auth middleware branches on `AuthConfig`: `StaticKeys` continues to work, `Jwt` calls `jsonwebtoken::decode` with the cached JWKS.
+4. Extract subject from the configured claim name. Extract scopes from either a space-separated string (`scope` claim) or an array claim (`permissions`).
+5. Handle key rotation gracefully: if decoding fails with "unknown key ID," trigger an immediate JWKS refresh (debounced to once per minute) and retry once.
+6. Integration tests with a fake JWKS endpoint (use `mockito` or `wiremock`).
+
+**Verification:** valid JWT authenticates; expired JWT rejected; invalid signature rejected; JWKS refresh handles key rotation.
+
+### Step 4: Real rate limiting
+
+Replace the Phase 4 stub with a working sliding-window implementation.
+
+1. Add `dashmap` dependency for the per-subject map (lock-free reads/writes).
+2. Implement `SlidingWindow` with two adjacent one-minute buckets; the effective rate is the weighted sum of the current bucket plus the tail of the previous bucket based on how far into the current window we are.
+3. Add `RateLimiter::check(subject) -> Result<(), RateLimitError>`.
+4. Write middleware that calls `check` before dispatching to handlers. On `Err`, return 429 with `Retry-After` header.
+5. Add `rate_limit_per_minute` and `rate_limit_burst` config fields. Reasonable defaults: 60/min, burst 10.
+6. Expose per-subject current rate as a gauge in the Prometheus registry.
+7. Integration test: fire N+1 requests as the same subject within a minute, assert the N+1th gets 429.
+
+**Verification:** rate limiting works correctly across subjects; non-limited subjects aren't affected; burst allowance works.
+
+### Step 5: Per-subject concurrency limiter
+
+Complements rate limiting — rate limits the *count* of requests over time, concurrency limits the *simultaneous* count.
+
+1. Implement `SubjectConcurrencyLimiter` with a `DashMap<String, Arc<Semaphore>>`.
+2. Lazy-init semaphores per subject with `per_subject_concurrency` slots (default 8).
+3. Middleware acquires a permit per request. If the subject's semaphore is full, queue briefly (`try_acquire_owned` with a short timeout), then 503 if still full.
+4. Garbage-collect unused semaphores periodically (entries with no waiters and full availability count haven't been used).
+5. Integration test: fire 10 concurrent requests as one subject with `per_subject_concurrency: 5`, assert at least 5 serialize.
+
+**Verification:** no subject can exceed its concurrency budget; other subjects unaffected.
+
+### Step 6: Prometheus metrics endpoint
+
+1. Add `prometheus` crate.
+2. Implement `MetricsRegistry` with the metrics listed in the types section.
+3. Wire metric updates into existing code:
+   - HTTP middleware: `http_requests_total.inc()` on response, `http_request_duration.observe(elapsed)`, `http_requests_in_flight.inc()/dec()`
+   - Session creation: `sessions_created_total.inc()`, `sessions_active.set(store.count())`
+   - MCP factory: read the Phase 5 atomic counters on scrape and populate the Prometheus types
+4. Add `GET /metrics` handler that writes the Prometheus text exposition format.
+5. Auth policy for `/metrics`: configurable — either requires `admin:metrics` scope, or is opened to a private network via `metrics_listen_addr: "127.0.0.1:9090"` on a separate port (recommended).
+6. Integration test: scrape `/metrics`, parse the response, assert expected metrics are present with sensible values.
+
+**Verification:** Prometheus scraping works; metrics increment correctly.
+
+### Step 7: Structured JSON logging
+
+Replace the default `tracing_subscriber` format with JSON output.
+
+1. Add a `log_format: Text | Json` config field, default `Text` for CLI/REPL, `Json` for `--serve` mode.
+2. Configure `tracing_subscriber::fmt::layer().json()` conditionally.
+3. Ensure every span has a `request_id` field (already present from Phase 4 middleware).
+4. Add `subject` and `session_id` as span fields when present, so they get included in every child log line automatically.
+5. Add a `log_level` config field that SIGHUP reloads at runtime (see Step 12).
+6. Integration test: capture stdout during a request, parse as JSON, assert the fields are present and correctly scoped.
+
+**Verification:** `loki --serve` produces one-line-per-event JSON output suitable for log aggregators.
+
+### Step 8: Audit logging
+
+Dedicated sink for security-relevant events.
+
+1. Implement `AuditLogger` with `Stderr`, `File`, and `Syslog` sinks. Start with just `Stderr` and `File` — `Syslog` via `syslog` crate can follow.
+2. Emit audit events from:
+   - Auth middleware: `AuditAction::AuthFailure` on any auth rejection
+   - Rate limiter: `AuditAction::RateLimitRejection` on 429
+   - Session handlers: `AuditAction::SessionCreate/Read/Update/Delete`
+   - Agent handlers: `AuditAction::AgentActivate`
+   - MCP reload endpoint: `AuditAction::McpReload`
+3. Audit events are JSON lines with a schema documented in `docs/SECURITY.md`.
+4. Audit events don't interfere with the main tracing stream — they go to the configured audit sink independently.
+5. File rotation via `tracing-appender` or manual rotation with size + date cap.
+
+**Verification:** every security-relevant action produces an audit event; failures include a `reason`.
+
+### Step 9: Security headers and misc middleware
+
+1. Add a `security_headers` middleware layer that attaches:
+   - `X-Content-Type-Options: nosniff`
+   - `X-Frame-Options: DENY`
+   - `Referrer-Policy: strict-origin-when-cross-origin`
+   - `Strict-Transport-Security: max-age=31536000; includeSubDomains` (only when `api.force_https: true`)
+   - Do NOT set CSP — this is an API, not a browser app; CSP would confuse clients.
+2. Remove `Server: ...` and other fingerprinting headers.
+3. Handle `OPTIONS` preflight correctly (Phase 4's CORS layer does this; verify).
+
+**Verification:** `curl -I` inspects headers; automated test asserts each required header is present.
+
+### Step 10: Config validation at startup
+
+A single `ApiConfig::validate()` method that checks every field and aggregates ALL errors before failing.
+
+1. Implement validation for:
+   - `listen_addr` is parseable and bindable
+   - `auth.mode` has a valid configuration (e.g., `StaticKeys` with non-empty key list, `Jwt` with reachable JWKS URL)
+   - `auth.keys[].key_hash` starts with `$argon2id$` (catches plaintext keys)
+   - `rate_limit_per_minute > 0` and `burst > 0`
+   - `max_body_bytes > 0` and `< 100 MiB` (sanity)
+   - `request_timeout_seconds > 0` and `< 3600`
+   - `shutdown_grace_seconds >= 0`
+   - `cors.allowed_origins` entries are valid URLs or `"*"`
+2. Return a `ConfigValidationError` that lists every problem, not just the first.
+3. Call `validate()` in `serve()` before binding the listener.
+4. Test: a deliberately-broken config produces an error listing all problems.
+
+**Verification:** startup validation catches common mistakes; error message is actionable.
+
+### Step 11: Health check endpoints
+
+1. `GET /healthz/live` — always returns 200 OK unless the process is in graceful shutdown. Body: `{"status":"ok"}`. No auth required.
+2. `GET /healthz/ready` — returns 200 OK when fully initialized and not saturated, otherwise 503 Service Unavailable. Readiness criteria:
+   - `AppState` fully initialized
+   - Session store writable (attempt a probe write to a reserved path)
+   - MCP pool initialized (at least the factory is alive)
+   - Concurrency semaphore has at least 10% available (not saturated)
+3. Both endpoints are unauthenticated and unmetered — load balancers hit them constantly.
+4. Document in `docs/DEPLOYMENT.md` how Kubernetes, systemd, and other supervisors should use these.
+
+**Verification:** endpoints return correct status under various load conditions.
+
+### Step 12: SIGHUP config reload
+
+Reload a subset of config without restarting.
+
+1. Reloadable fields:
+   - Auth keys (StaticKeys mode)
+   - JWT config (including JWKS URL)
+   - Log level
+   - Rate limit config
+   - Per-subject concurrency limits
+   - Audit logger sink
+2. NOT reloadable (requires full restart):
+   - Listen address
+   - MCP pool config (pool holds live subprocesses)
+   - Session storage paths
+   - TLS certs (use a reverse proxy)
+3. Implementation: SIGHUP handler that re-reads `config.yaml`, validates it, and atomically swaps the affected fields in `ApiState`. Uses `arc-swap` crate for lock-free swaps.
+4. Audit every reload: `AuditAction::ConfigReload` with before/after diff summary.
+5. Document: rotation procedures for auth keys, logging level adjustments, etc.
+
+**Verification:** start server, modify `config.yaml`, send SIGHUP, assert new config is in effect without dropped requests.
+
+### Step 13: Deployment manifests
+
+#### 13a. Dockerfile
+
+Multi-stage build for a minimal runtime image:
+
+```dockerfile
+# Build stage
+FROM rust:1.82-slim AS builder
+WORKDIR /build
+COPY Cargo.toml Cargo.lock ./
+COPY src ./src
+COPY assets ./assets
+RUN cargo build --release --bin loki
+
+# Runtime stage
+FROM debian:bookworm-slim
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ca-certificates \
+    tini \
+    && rm -rf /var/lib/apt/lists/*
+RUN useradd --system --home /loki --shell /bin/false loki
+COPY --from=builder /build/target/release/loki /usr/local/bin/loki
+COPY --from=builder /build/assets /opt/loki/assets
+USER loki
+WORKDIR /loki
+ENV LOKI_CONFIG_DIR=/loki/config
+EXPOSE 3400
+ENTRYPOINT ["/usr/bin/tini", "--"]
+CMD ["/usr/local/bin/loki", "--serve"]
+```
+
+Build args for targeting specific architectures. Result is a ~100 MB image.
+
+#### 13b. systemd unit
+
+```ini
+[Unit]
+Description=Loki AI Server
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=notify
+ExecStart=/usr/local/bin/loki --serve
+Restart=on-failure
+RestartSec=5
+User=loki
+Group=loki
+
+# Sandboxing
+NoNewPrivileges=true
+PrivateTmp=true
+PrivateDevices=true
+ProtectSystem=strict
+ProtectHome=true
+ReadWritePaths=/var/lib/loki
+ProtectKernelTunables=true
+ProtectKernelModules=true
+ProtectControlGroups=true
+RestrictSUIDSGID=true
+RestrictRealtime=true
+LockPersonality=true
+
+# Resource limits
+LimitNOFILE=65536
+LimitNPROC=512
+MemoryMax=4G
+
+# Reload
+ExecReload=/bin/kill -HUP $MAINPID
+
+[Install]
+WantedBy=multi-user.target
+```
+
+`Type=notify` requires Loki to call `sd_notify(READY=1)` after successful startup — add this with the `sd-notify` crate.
+
+#### 13c. docker-compose example
+
+For local development with TLS via nginx:
+
+```yaml
+version: "3.9"
+services:
+  loki:
+    build: .
+    environment:
+      LOKI_CONFIG_DIR: /loki/config
+    volumes:
+      - ./config:/loki/config:ro
+      - loki_data:/loki/data
+    ports:
+      - "127.0.0.1:3400:3400"
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3400/healthz/live"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+
+  nginx:
+    image: nginx:alpine
+    volumes:
+      - ./deploy/nginx.conf:/etc/nginx/nginx.conf:ro
+      - ./deploy/certs:/etc/nginx/certs:ro
+    ports:
+      - "443:443"
+    depends_on:
+      - loki
+
+volumes:
+  loki_data:
+```
+
+Include a sample `nginx.conf` that terminates TLS and forwards to `loki:3400`.
+
+#### 13d. Kubernetes manifests
+
+Provide `deploy/k8s/` with:
+- `namespace.yaml`
+- `deployment.yaml` (3 replicas, resource requests/limits, liveness/readiness probes)
+- `service.yaml` (ClusterIP)
+- `configmap.yaml` (non-secret config)
+- `secret.yaml` (API keys, JWT config)
+- `hpa.yaml` (HorizontalPodAutoscaler based on CPU + custom metric for requests/sec)
+- `ingress.yaml` (optional example using nginx-ingress)
+
+Document storage strategy: sessions use a PVC mounted at `/loki/data`; RAG embeddings use a read-only ConfigMap or a separate PVC.
+
+**Verification:** each deployment target produces a running Loki that passes health checks.
+
+### Step 14: Operational runbook
+
+Write `docs/RUNBOOK.md` with sections for:
+
+- **Starting and stopping** the server
+- **Rotating auth keys** (StaticKeys mode) — edit config, SIGHUP, verify in audit log
+- **Rotating auth keys** (Jwt mode) — update JWKS at issuer, Loki auto-refreshes
+- **Rotating MCP credentials** — update env vars, `POST /v1/mcp/reload` (new endpoint in this phase) or restart
+- **Diagnosing high latency** — check MCP hit rate, check LLM provider latency, check concurrency saturation
+- **Diagnosing auth failures** — audit log `AuthFailure` events, check key hash, check JWKS reachability
+- **Diagnosing rate limit rejections** — check per-subject counter, adjust limit or identify runaway client
+- **Diagnosing orphaned MCP subprocesses** — `ps aux | grep loki`, check logs for `McpFactory shutdown complete`
+- **Diagnosing session corruption** — check `.yaml.tmp` files (should not exist when server is idle), inspect session YAML for validity
+- **Backup and restore** — tar the `sessions/` and `agents/` directories
+- **Scaling horizontally** — each replica has its own MCP pool and session store; share sessions via shared filesystem (NFS/EFS) or deferred to a database-backed SessionStore (not in this phase)
+- **Incident response** — what logs to collect, what metrics to snapshot, how to reach a minimal reproducing state
+
+**Verification:** walk through each procedure on a test deployment; fix any unclear steps.
+
+### Step 15: Deployment and security guides
+
+`docs/DEPLOYMENT.md` — step-by-step for Docker, systemd, docker-compose, Kubernetes. Pre-flight checklist, first-time setup, upgrade procedure.
+
+`docs/SECURITY.md` — threat model, hardening checklist, scope model, audit event schema, key rotation, reverse proxy configuration, network security recommendations, CVE reporting contact.
+
+Cross-reference from `README.md` and add a "Production Deployment" section to the README that points to both docs.
+
+**Verification:** a developer unfamiliar with Loki can deploy it successfully using only the docs.
+
+---
+
+## Risks and Watch Items
+
+| Risk | Severity | Mitigation |
+|---|---|---|
+| **Session ownership migration breaks legacy users** | Medium | Legacy sessions with `owner: None` stay readable by anyone; they get claimed forward on first mutation. Document this in `RUNBOOK.md`. Add a one-shot migration CLI command (`loki migrate sessions --claim-to <subject>`) that assigns ownership of all unowned sessions to a specific subject. |
+| **JWT JWKS fetch failures block startup** | Medium | JWKS URL must be reachable at startup; if it's not, log an error and fall back to "reject all" mode until the fetch succeeds. A retry loop with exponential backoff runs in the background. Do NOT crash on JWKS failure. |
+| **Rate limiter DashMap growth** | Low | Per-subject windows accumulate forever without cleanup. Add a background reaper that removes entries with zero recent activity every few minutes. Cap total entries at 100k as a safety valve. |
+| **Prometheus metric cardinality explosion** | Low | `http_requests_total` with per-path labels could explode if routes have dynamic segments (`/v1/sessions/:id`). Use route templates as labels, not concrete paths. Validate label sets at registration. |
+| **Audit log retention compliance** | Low | Audit logs might need to be retained for regulatory reasons. Phase 6 provides the emission; retention is the operator's responsibility. Document this in `SECURITY.md`. |
+| **SIGHUP reload partial failure** | Medium | If the new config is invalid, don't swap it in — keep the old config running. Log the validation error. The operator can fix the file and SIGHUP again. Never leave the server in an inconsistent state. |
+| **Docker image size** | Low | `debian:bookworm-slim` is ~80 MB; final image ~100 MB. If smaller is needed, use `distroless/cc-debian12` for a ~35 MB image at the cost of not having `tini` or debugging tools. Document both options. |
+| **systemd Type=notify missing implementation** | Medium | Adding `sd_notify` requires the `sd-notify` crate AND calling it after listener bind. Missing this call makes systemd think the service failed. Add an integration test that fakes systemd and asserts the notification is sent. |
+| **Kubernetes pod disruption** | Low | HPA scales down during low traffic, but in-flight requests on the terminating pod must complete gracefully. Set `terminationGracePeriodSeconds` to at least `shutdown_grace_seconds + 10`. Document in `DEPLOYMENT.md`. |
+| **Running under a reverse proxy** | Low | CORS, `Host` header handling, `X-Forwarded-For` for rate limiter subject identification. Document the expected proxy config (trust `X-Forwarded-*` headers only from trusted proxies). |
+
+---
+
+## What Phase 6 Does NOT Do
+
+- **No multi-region replication.** Loki is a single-instance service; scale out by running multiple instances behind a load balancer, each with its own pool. Cross-instance state sharing is not in scope.
+- **No database-backed session store.** `FileSessionStore` is still the only implementation. A `PostgresSessionStore` is a clean extension point (`SessionStore` trait is already there) but belongs to a follow-up.
+- **No cluster coordination.** Each Loki instance is independent. Running Loki in a "cluster" mode where instances share work is a separate project.
+- **No advanced ML observability.** LLM call costs, token usage trends, provider error rates — these are tracked as counters but not aggregated into dashboards. Follow-up work.
+- **No built-in TLS termination.** Use a reverse proxy (nginx, Caddy, Traefik, a cloud load balancer). Supporting TLS in-process adds complexity and key management concerns that reverse proxies solve better.
+- **No SAML or LDAP.** Only StaticKeys and JWT. SAML/LDAP integration can extend `AuthConfig` later.
+- **No plugin system.** Extensions to auth, storage, or middleware require forking and rebuilding. A dynamic plugin loader is explicitly out of scope.
+- **No multi-tenancy beyond session ownership.** Tenants share the same process, same MCP pool, same RAG cache, same resources. Strict tenant isolation (separate processes per tenant) requires orchestration outside Loki.
+- **No cost accounting per tenant.** LLM API calls are tracked per-subject in audit logs but not aggregated into billing-grade cost reports.
+
+---
+
+## Entry Criteria (from Phase 5)
+
+- [ ] `McpFactory` pooling works and has metrics
+- [ ] Graceful shutdown drains the MCP pool
+- [ ] Phase 5 load test passes (hit rate >0.8, no orphaned subprocesses)
+- [ ] Phase 4 API integration test suite passes
+- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
+
+## Exit Criteria (Phase 6 complete — v1 ready)
+
+- [ ] Per-subject session ownership enforced; integration tests prove Alice can't read Bob's sessions
+- [ ] Scope-based authorization enforced on every endpoint
+- [ ] JWT authentication works with a real JWKS endpoint
+- [ ] Real rate limiting replaces the Phase 4 stub; 429 responses include `Retry-After`
+- [ ] Per-subject concurrency limiter prevents noisy-neighbor saturation
+- [ ] Prometheus `/metrics` endpoint scrapes cleanly
+- [ ] Structured JSON logs emitted in `--serve` mode
+- [ ] Audit events written for all security-relevant actions
+- [ ] Security headers set on all responses
+- [ ] Config validation fails fast at startup with readable errors
+- [ ] `/healthz/live` and `/healthz/ready` endpoints work
+- [ ] SIGHUP reloads auth keys, log level, and rate limits without restart
+- [ ] Dockerfile produces a minimal runtime image
+- [ ] systemd unit with `Type=notify` works correctly
+- [ ] docker-compose example runs end-to-end with TLS via nginx
+- [ ] Kubernetes manifests deploy successfully
+- [ ] `docs/RUNBOOK.md` covers all common operational scenarios
+- [ ] `docs/DEPLOYMENT.md` guides a first-time deployer to success
+- [ ] `docs/SECURITY.md` documents threat model, scopes, and hardening
+- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
+- [ ] End-to-end production smoke test: deploy to Kubernetes, send real traffic, scrape metrics, rotate a key, induce a failure, observe recovery
+
+---
+
+## v1 Release Summary
+
+After Phase 6 lands, Loki v1 has transformed from a single-user CLI tool into a production-ready multi-tenant AI service. Here's what the v1 release notes should say:
+
+**New in Loki v1:**
+
+- **REST API** — full HTTP surface for completions, sessions, agents, roles, RAGs, and metadata. Streaming via Server-Sent Events, synchronous via JSON.
+- **Multi-tenant sessions** — UUID-primary identity with optional human-readable aliases. Per-subject ownership with scope-based access control.
+- **Concurrent safety** — per-session mutex serialization, per-MCP-server Arc sharing, per-agent runtime isolation. Run dozens of concurrent requests without corruption.
+- **MCP pooling** — recently-used MCP subprocesses stay warm across requests. Near-zero warm-path latency. Configurable idle timeout and LRU cap.
+- **Authentication** — static API keys or JWT with JWKS. Argon2-hashed credentials. Scope-based authorization per endpoint.
+- **Observability** — Prometheus metrics, structured JSON logging with correlation IDs, dedicated audit log stream.
+- **Rate limiting** — sliding-window per subject with configurable limits and burst allowance.
+- **Graceful shutdown** — in-flight requests complete within a grace period; MCP subprocesses terminate cleanly; session state is persisted.
+- **Deployment manifests** — Dockerfile, systemd unit, docker-compose example, Kubernetes manifests.
+- **Full documentation** — runbook, deployment guide, security guide, API reference.
+
+**Backward compatibility:**
+
+CLI and REPL continue to work identically to pre-v1 builds. Existing `config.yaml`, `roles/`, `sessions/`, `agents/`, `rags/`, and `functions/` directories are read-compatible. The legacy session layout is migrated lazily on first access without destroying the old files.
+
+**What's next (v2+):**
+
+- Database-backed session store for cross-instance sharing
+- Native TLS termination option
+- SAML / LDAP authentication extensions
+- Per-tenant cost accounting and quotas
+- Dynamic plugin system for custom auth, storage, and middleware
+- Multi-region replication
+- WebSocket transport alongside SSE