37 KiB
Phase 6 Implementation Plan: Production Hardening
Overview
Phase 6 closes out the refactor by picking up every "deferred to production hardening" item from Phases 1–5 and delivering a Loki build that's safe to run as a multi-tenant service. The preceding phases made Loki functionally a server — Phase 6 makes it operationally a server. That means real rate limiting instead of a stub, per-subject session ownership instead of flat visibility, Prometheus metrics instead of in-memory counters, structured JSON logging, deployment manifests, security headers, config validation, and operational runbooks.
This is the final phase. After it lands, Loki v1 is production-ready: you can run loki --serve in a container behind a reverse proxy, scrape its metrics from Prometheus, route requests through a rate limiter, and have multiple tenants share the same instance without seeing each other's data.
Estimated effort: ~1 week Risk: Low. Most of the work is applying well-known patterns (sliding-window rate limiting, row-level authz, Prometheus, structured logging) on top of the architecture the previous phases already built. No new core types, no new pipelines. Depends on: Phases 1–5 complete. The API server runs, MCP pool works, sessions are UUID-keyed.
Why Phase 6 Exists
Phases 4 and 5 got the API server running with correct semantics, but several explicit gaps were called out as "stubs" or "follow-ups." A Phase 4 deployment is usable for a trusted single-tenant context (an internal tool, a personal server) but unsafe for anything else:
- Anyone with a valid API key can see every session. Phase 4 flagged this as "single-tenant-per-key." In a multi-tenant deployment where Alice and Bob both have keys, Alice can list Bob's sessions and read their messages. This is a security issue, not a feature gap.
- No real rate limiting. Phase 4's
max_concurrent_requestssemaphore caps parallelism but doesn't throttle per-subject request rates. A single runaway client can exhaust the whole concurrency budget. - No metrics for external observability. Phase 5 added in-memory counters, but they're only reachable via the
.info mcpdot-command or a one-shot JSON endpoint. Production needs Prometheus scraping so alerting and dashboards work. - Logs aren't structured. The
tracingspans from Phase 4 middleware emit human-readable text. Aggregators like Loki (the other one), Datadog, or CloudWatch want JSON with correlation IDs. - No deployment story. There's no Dockerfile, no systemd unit, no documented way to actually run the thing in production. Every deploying team has to reinvent this.
- Security headers missing. Phase 4's CORS handles cross-origin; it doesn't set
X-Content-Type-Options,X-Frame-Options, or similar defaults that a browser-facing endpoint should have. - No config validation at startup. Mistyped config values produce runtime errors hours after deployment instead of failing fast at startup.
- Operational procedures are undocumented. How do you rotate auth keys? How do you reload MCP credentials? What's the runbook when the MCP hit rate drops? None of this is written down.
Phase 6 delivers answers to all of the above. It's the "you can actually deploy this" phase.
What Phase 6 Delivers
Grouped by theme rather than by dependency order. Each item is independently valuable and can be worked in parallel.
Security and isolation
- Per-subject session ownership — every session records the authenticated subject that created it; reads/writes are authz-checked against the caller's subject.
- Scope-based authorization —
AuthContext.scopesare enforced per endpoint (e.g.,read:sessions,write:sessions,admin:mcp). Phase 4's middleware already populates scopes; Phase 6 adds the enforcement. - JWT support — extends
AuthConfigwith aJwt { issuer, audience, jwks_url }variant that validates tokens against a JWKS endpoint and extracts subject + scopes from claims. - Security headers middleware —
X-Content-Type-Options: nosniff,X-Frame-Options: DENY,Referrer-Policy: strict-origin, optional HSTS when behind HTTPS. - Audit logging — structured audit events for every authenticated request (subject, action, target, result), written to a dedicated sink so they survive log rotation.
Throughput and fairness
- Per-subject rate limiting — sliding-window limiter keyed by subject. Enforces
rate_limit_per_minuteand related config. Returns429 Too Many Requestswith aRetry-Afterheader. - Per-subject concurrency limit — subject-scoped semaphore so one noisy neighbor can't exhaust the global concurrency budget.
- Backpressure signal — expose a
/healthz/readyendpoint that returns 503 when the server is saturated, so upstream load balancers can drain traffic.
Observability
- Structured JSON logging — every log line is JSON with
timestamp,level,target,request_id,subject,session_id, andfields. Routes throughtracing_subscriberwithfmt::layer().json(). - Prometheus metrics endpoint —
/metricsexposing the existing Phase 5 counters plus new HTTP metrics (http_requests_total,http_request_duration_seconds,http_requests_in_flight), MCP metrics (mcp_pool_size,mcp_acquire_latency_secondshistogram), and session metrics (sessions_active_total,sessions_created_total). - Liveness and readiness probes —
/healthz/livefor process liveness (always 200 unless shutting down),/healthz/readyfor dependency readiness (config loaded, MCP pool initialized, storage writable).
Operability
- Config validation at startup — a dedicated
ApiConfig::validate()that checks every field against a schema and fails fast with a readable error message listing all problems, not just the first one. - SIGHUP config reload — reloads auth keys, log level, and rate limit settings without restarting the server. Does NOT reload MCP pool config (requires restart because the pool holds live subprocesses).
- Dockerfile + multi-stage build — minimal runtime image based on
debian:bookworm-slimwith the compiled binary, config directory, and non-root user. - systemd service unit — with
Type=notify, sandboxing directives, and resource limits. - docker-compose example — for local development with nginx-as-TLS-terminator in front.
- Kubernetes manifests — Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler.
Documentation
- Operational runbook (
docs/RUNBOOK.md) — documented procedures for common scenarios. - Deployment guide (
docs/DEPLOYMENT.md) — end-to-end instructions for each deployment target. - Security guide (
docs/SECURITY.md) — threat model, hardening checklist, key rotation procedures.
Core Type Additions
Most of Phase 6 hangs off existing types. A few new concepts need introducing.
AuthContext enrichment
Phase 4 defined AuthContext { subject: String, scopes: Vec<String> }. Phase 6 extends it:
pub struct AuthContext {
pub subject: String,
pub scopes: Scopes,
pub key_id: Option<String>, // for audit log correlation
pub claims: Option<JwtClaims>, // present when auth mode is Jwt
}
pub struct Scopes(HashSet<String>);
impl Scopes {
pub fn has(&self, scope: &str) -> bool;
pub fn has_any(&self, required: &[&str]) -> bool;
pub fn has_all(&self, required: &[&str]) -> bool;
}
pub enum Scope {
ReadSessions, // "read:sessions"
WriteSessions, // "write:sessions"
ReadAgents, // "read:agents"
RunAgents, // "run:agents"
ReadModels, // "read:models"
AdminMcp, // "admin:mcp"
AdminSessions, // "admin:sessions" — can see all users' sessions
}
The Scope enum provides typed constants for the well-known scope strings used in the handlers. Custom scopes (for callers to define their own access tiers) continue to work as raw strings.
SessionOwnership in the session store
The session metadata needs to record who owns each session so reads/writes can be authorized:
pub struct SessionMeta {
pub id: SessionId,
pub alias: Option<SessionAlias>,
pub owner: Option<String>, // subject that created it; None = legacy
pub last_modified: SystemTime,
pub is_autoname: bool,
}
On disk, the ownership field goes into the session's YAML file under a reserved _meta block:
_meta:
owner: "alice"
created_at: "2026-04-10T15:32:11Z"
created_by_key_id: "key_3f2a..."
# ... rest of session fields unchanged
The SessionStore trait gets two new methods and an enriched open signature:
#[async_trait]
pub trait SessionStore: Send + Sync {
// existing methods unchanged except:
async fn open(
&self,
agent: Option<&str>,
id: SessionId,
caller: Option<&AuthContext>, // NEW: for authz check
) -> Result<SessionHandle, StoreError>;
async fn list(
&self,
agent: Option<&str>,
caller: Option<&AuthContext>, // NEW: for filtering
) -> Result<Vec<SessionMeta>, StoreError>;
// NEW: transfer ownership (e.g., admin reassignment)
async fn set_owner(
&self,
id: SessionId,
new_owner: Option<String>,
) -> Result<(), StoreError>;
}
caller: None means internal or legacy access (CLI/REPL) — skip authz entirely. caller: Some(...) means an API call — enforce ownership.
Authz rules:
- Own session: full access.
- Other subject's session: denied unless caller has
admin:sessionsscope. - Legacy sessions with
owner: None: accessible to anyone (grandfathered); every mutation attempts to set the owner to the current caller so they get claimed forward. list: returns only sessions owned by the caller (or all if they haveadmin:sessions).
RateLimiter and ConcurrencyLimiter
pub struct RateLimiter {
windows: DashMap<String, SlidingWindow>,
config: RateLimitConfig,
}
struct SlidingWindow {
bucket_a: AtomicU64,
bucket_b: AtomicU64,
last_reset: AtomicU64,
}
pub struct RateLimitConfig {
pub per_minute: u32,
pub burst: u32,
}
impl RateLimiter {
pub fn check(&self, subject: &str) -> Result<(), RateLimitError>;
}
pub struct RateLimitError {
pub retry_after: Duration,
pub limit: u32,
pub remaining: u32,
}
pub struct SubjectConcurrencyLimiter {
semaphores: DashMap<String, Arc<Semaphore>>,
per_subject: usize,
}
impl SubjectConcurrencyLimiter {
pub async fn acquire(&self, subject: &str) -> OwnedSemaphorePermit;
}
Both live in ApiState and are applied via middleware. Rate limiting runs first (cheap atomic operations), then concurrency acquisition (may block briefly).
MetricsRegistry
pub struct MetricsRegistry {
pub http_requests_total: IntCounterVec,
pub http_request_duration: HistogramVec,
pub http_requests_in_flight: IntGaugeVec,
pub sessions_active: IntGauge,
pub sessions_created_total: IntCounter,
pub mcp_pool_size: IntGaugeVec,
pub mcp_acquire_latency: HistogramVec,
pub mcp_spawns_total: IntCounter,
pub mcp_idle_evictions_total: IntCounter,
pub auth_failures_total: IntCounterVec,
pub rate_limit_rejections_total: IntCounterVec,
}
Built on top of the prometheus crate. Exposed via GET /metrics with the Prometheus text exposition format. The registry bridges Phase 5's atomic counters into the Prometheus types without requiring Phase 5's code to change — Phase 5 keeps its simple counters, and Phase 6 reads them on each scrape to populate the Prometheus gauges.
AuditLogger
pub struct AuditLogger {
sink: AuditSink,
}
pub enum AuditSink {
Stderr, // default
File { path: PathBuf, rotation: Rotation },
Syslog { facility: String },
}
pub struct AuditEvent<'a> {
pub timestamp: OffsetDateTime,
pub request_id: Uuid,
pub subject: Option<&'a str>,
pub action: AuditAction,
pub target: Option<&'a str>,
pub result: AuditResult,
pub details: Option<serde_json::Value>,
}
pub enum AuditAction {
SessionCreate,
SessionRead,
SessionUpdate,
SessionDelete,
AgentActivate,
ToolExecute,
McpReload,
ConfigReload,
AuthFailure,
RateLimitRejection,
}
pub enum AuditResult {
Success,
Denied { reason: String },
Error { message: String },
}
impl AuditLogger {
pub fn log(&self, event: AuditEvent<'_>);
}
Audit events are emitted from handler middleware after request completion. The audit stream is deliberately separate from the regular tracing logs because audit logs have stricter retention/integrity requirements in regulated environments — you want to be able to pipe them to a WORM storage or SIEM without mixing in debug logs.
Migration Strategy
Step 1: Per-subject session ownership
The highest-impact security fix. No new deps, no new config — just enriching existing types.
- Add
owner: Option<String>andcreated_by_key_id: Option<String>to the session YAML_metablock. Serde skip if absent (backward compat for legacy files). - Update
SessionStore::createto record the caller's subject. - Update
SessionStore::opento takecaller: Option<&AuthContext>and enforce ownership. - Update
SessionStore::listto filter by caller subject (unless caller hasadmin:sessionsscope). - Add
SessionStore::set_ownerfor admin reassignment. - Implement the "claim on first mutation" behavior for legacy sessions.
- Update all API handlers to pass the
AuthContextthrough to store calls. - Add integration tests: Alice creates a session, Bob tries to read it (403), admin Claire can read it (200), Alice's
listreturns only her own, Claire'slistwithadmin:sessionsscope returns everything.
Verification: all new authz tests pass. CLI/REPL tests still pass because they pass caller: None.
Step 2: Scope-based authorization for endpoints
Phase 4's middleware attaches AuthContext with a scopes: Vec<String> field but handlers don't check it. Phase 6 adds the enforcement.
- Change
AuthContext.scopesfromVec<String>to aScopes(HashSet<String>)newtype withhas/has_any/has_allmethods. - Define the
Scopeenum with well-known constants. - Add a
require_scopehelper and a#[require_scope("read:sessions")]proc macro (or a handler-side check if proc macros add too much complexity). - Annotate every handler with the required scope(s):
GET /v1/sessions→read:sessionsPOST /v1/sessions→write:sessionsGET /v1/sessions/:id→read:sessionsDELETE /v1/sessions/:id→write:sessionsPOST /v1/sessions/:id/completions→write:sessions+run:agents(if the session has an agent)POST /v1/rags/:name/rebuild→admin:mcpGET /v1/agents,/v1/roles,/v1/rags,/v1/models→read:agents,read:roles, etc./metrics→admin:metrics(or unauthenticated if the endpoint is bound to a private network)
- Document the scope model in
docs/SECURITY.md.
Verification: per-endpoint authz tests. A key with only read:sessions can list and read but not write.
Step 3: JWT support in AuthConfig
Extend the auth mode enum:
pub enum AuthConfig {
Disabled,
StaticKeys { keys: Vec<AuthKeyEntry> },
Jwt(JwtConfig),
}
pub struct JwtConfig {
pub issuer: String,
pub audience: String,
pub jwks_url: String,
pub jwks_refresh_interval: Duration,
pub subject_claim: String, // e.g., "sub"
pub scopes_claim: String, // e.g., "scope" or "permissions"
pub leeway_seconds: u64,
}
- Add
jsonwebtokenandreqwest(already present) to dependencies. - Implement a
JwksCachethat fetchesjwks_urlon startup and refreshes everyjwks_refresh_interval. Usesreqwestwith a short timeout. Refreshes in the background viatokio::spawn. - The auth middleware branches on
AuthConfig:StaticKeyscontinues to work,Jwtcallsjsonwebtoken::decodewith the cached JWKS. - Extract subject from the configured claim name. Extract scopes from either a space-separated string (
scopeclaim) or an array claim (permissions). - Handle key rotation gracefully: if decoding fails with "unknown key ID," trigger an immediate JWKS refresh (debounced to once per minute) and retry once.
- Integration tests with a fake JWKS endpoint (use
mockitoorwiremock).
Verification: valid JWT authenticates; expired JWT rejected; invalid signature rejected; JWKS refresh handles key rotation.
Step 4: Real rate limiting
Replace the Phase 4 stub with a working sliding-window implementation.
- Add
dashmapdependency for the per-subject map (lock-free reads/writes). - Implement
SlidingWindowwith two adjacent one-minute buckets; the effective rate is the weighted sum of the current bucket plus the tail of the previous bucket based on how far into the current window we are. - Add
RateLimiter::check(subject) -> Result<(), RateLimitError>. - Write middleware that calls
checkbefore dispatching to handlers. OnErr, return 429 withRetry-Afterheader. - Add
rate_limit_per_minuteandrate_limit_burstconfig fields. Reasonable defaults: 60/min, burst 10. - Expose per-subject current rate as a gauge in the Prometheus registry.
- Integration test: fire N+1 requests as the same subject within a minute, assert the N+1th gets 429.
Verification: rate limiting works correctly across subjects; non-limited subjects aren't affected; burst allowance works.
Step 5: Per-subject concurrency limiter
Complements rate limiting — rate limits the count of requests over time, concurrency limits the simultaneous count.
- Implement
SubjectConcurrencyLimiterwith aDashMap<String, Arc<Semaphore>>. - Lazy-init semaphores per subject with
per_subject_concurrencyslots (default 8). - Middleware acquires a permit per request. If the subject's semaphore is full, queue briefly (
try_acquire_ownedwith a short timeout), then 503 if still full. - Garbage-collect unused semaphores periodically (entries with no waiters and full availability count haven't been used).
- Integration test: fire 10 concurrent requests as one subject with
per_subject_concurrency: 5, assert at least 5 serialize.
Verification: no subject can exceed its concurrency budget; other subjects unaffected.
Step 6: Prometheus metrics endpoint
- Add
prometheuscrate. - Implement
MetricsRegistrywith the metrics listed in the types section. - Wire metric updates into existing code:
- HTTP middleware:
http_requests_total.inc()on response,http_request_duration.observe(elapsed),http_requests_in_flight.inc()/dec() - Session creation:
sessions_created_total.inc(),sessions_active.set(store.count()) - MCP factory: read the Phase 5 atomic counters on scrape and populate the Prometheus types
- HTTP middleware:
- Add
GET /metricshandler that writes the Prometheus text exposition format. - Auth policy for
/metrics: configurable — either requiresadmin:metricsscope, or is opened to a private network viametrics_listen_addr: "127.0.0.1:9090"on a separate port (recommended). - Integration test: scrape
/metrics, parse the response, assert expected metrics are present with sensible values.
Verification: Prometheus scraping works; metrics increment correctly.
Step 7: Structured JSON logging
Replace the default tracing_subscriber format with JSON output.
- Add a
log_format: Text | Jsonconfig field, defaultTextfor CLI/REPL,Jsonfor--servemode. - Configure
tracing_subscriber::fmt::layer().json()conditionally. - Ensure every span has a
request_idfield (already present from Phase 4 middleware). - Add
subjectandsession_idas span fields when present, so they get included in every child log line automatically. - Add a
log_levelconfig field that SIGHUP reloads at runtime (see Step 12). - Integration test: capture stdout during a request, parse as JSON, assert the fields are present and correctly scoped.
Verification: loki --serve produces one-line-per-event JSON output suitable for log aggregators.
Step 8: Audit logging
Dedicated sink for security-relevant events.
- Implement
AuditLoggerwithStderr,File, andSyslogsinks. Start with justStderrandFile—Syslogviasyslogcrate can follow. - Emit audit events from:
- Auth middleware:
AuditAction::AuthFailureon any auth rejection - Rate limiter:
AuditAction::RateLimitRejectionon 429 - Session handlers:
AuditAction::SessionCreate/Read/Update/Delete - Agent handlers:
AuditAction::AgentActivate - MCP reload endpoint:
AuditAction::McpReload
- Auth middleware:
- Audit events are JSON lines with a schema documented in
docs/SECURITY.md. - Audit events don't interfere with the main tracing stream — they go to the configured audit sink independently.
- File rotation via
tracing-appenderor manual rotation with size + date cap.
Verification: every security-relevant action produces an audit event; failures include a reason.
Step 9: Security headers and misc middleware
- Add a
security_headersmiddleware layer that attaches:X-Content-Type-Options: nosniffX-Frame-Options: DENYReferrer-Policy: strict-origin-when-cross-originStrict-Transport-Security: max-age=31536000; includeSubDomains(only whenapi.force_https: true)- Do NOT set CSP — this is an API, not a browser app; CSP would confuse clients.
- Remove
Server: ...and other fingerprinting headers. - Handle
OPTIONSpreflight correctly (Phase 4's CORS layer does this; verify).
Verification: curl -I inspects headers; automated test asserts each required header is present.
Step 10: Config validation at startup
A single ApiConfig::validate() method that checks every field and aggregates ALL errors before failing.
- Implement validation for:
listen_addris parseable and bindableauth.modehas a valid configuration (e.g.,StaticKeyswith non-empty key list,Jwtwith reachable JWKS URL)auth.keys[].key_hashstarts with$argon2id$(catches plaintext keys)rate_limit_per_minute > 0andburst > 0max_body_bytes > 0and< 100 MiB(sanity)request_timeout_seconds > 0and< 3600shutdown_grace_seconds >= 0cors.allowed_originsentries are valid URLs or"*"
- Return a
ConfigValidationErrorthat lists every problem, not just the first. - Call
validate()inserve()before binding the listener. - Test: a deliberately-broken config produces an error listing all problems.
Verification: startup validation catches common mistakes; error message is actionable.
Step 11: Health check endpoints
GET /healthz/live— always returns 200 OK unless the process is in graceful shutdown. Body:{"status":"ok"}. No auth required.GET /healthz/ready— returns 200 OK when fully initialized and not saturated, otherwise 503 Service Unavailable. Readiness criteria:AppStatefully initialized- Session store writable (attempt a probe write to a reserved path)
- MCP pool initialized (at least the factory is alive)
- Concurrency semaphore has at least 10% available (not saturated)
- Both endpoints are unauthenticated and unmetered — load balancers hit them constantly.
- Document in
docs/DEPLOYMENT.mdhow Kubernetes, systemd, and other supervisors should use these.
Verification: endpoints return correct status under various load conditions.
Step 12: SIGHUP config reload
Reload a subset of config without restarting.
- Reloadable fields:
- Auth keys (StaticKeys mode)
- JWT config (including JWKS URL)
- Log level
- Rate limit config
- Per-subject concurrency limits
- Audit logger sink
- NOT reloadable (requires full restart):
- Listen address
- MCP pool config (pool holds live subprocesses)
- Session storage paths
- TLS certs (use a reverse proxy)
- Implementation: SIGHUP handler that re-reads
config.yaml, validates it, and atomically swaps the affected fields inApiState. Usesarc-swapcrate for lock-free swaps. - Audit every reload:
AuditAction::ConfigReloadwith before/after diff summary. - Document: rotation procedures for auth keys, logging level adjustments, etc.
Verification: start server, modify config.yaml, send SIGHUP, assert new config is in effect without dropped requests.
Step 13: Deployment manifests
13a. Dockerfile
Multi-stage build for a minimal runtime image:
# Build stage
FROM rust:1.82-slim AS builder
WORKDIR /build
COPY Cargo.toml Cargo.lock ./
COPY src ./src
COPY assets ./assets
RUN cargo build --release --bin loki
# Runtime stage
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
tini \
&& rm -rf /var/lib/apt/lists/*
RUN useradd --system --home /loki --shell /bin/false loki
COPY --from=builder /build/target/release/loki /usr/local/bin/loki
COPY --from=builder /build/assets /opt/loki/assets
USER loki
WORKDIR /loki
ENV LOKI_CONFIG_DIR=/loki/config
EXPOSE 3400
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["/usr/local/bin/loki", "--serve"]
Build args for targeting specific architectures. Result is a ~100 MB image.
13b. systemd unit
[Unit]
Description=Loki AI Server
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/local/bin/loki --serve
Restart=on-failure
RestartSec=5
User=loki
Group=loki
# Sandboxing
NoNewPrivileges=true
PrivateTmp=true
PrivateDevices=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/loki
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
RestrictRealtime=true
LockPersonality=true
# Resource limits
LimitNOFILE=65536
LimitNPROC=512
MemoryMax=4G
# Reload
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
Type=notify requires Loki to call sd_notify(READY=1) after successful startup — add this with the sd-notify crate.
13c. docker-compose example
For local development with TLS via nginx:
version: "3.9"
services:
loki:
build: .
environment:
LOKI_CONFIG_DIR: /loki/config
volumes:
- ./config:/loki/config:ro
- loki_data:/loki/data
ports:
- "127.0.0.1:3400:3400"
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3400/healthz/live"]
interval: 30s
timeout: 5s
retries: 3
nginx:
image: nginx:alpine
volumes:
- ./deploy/nginx.conf:/etc/nginx/nginx.conf:ro
- ./deploy/certs:/etc/nginx/certs:ro
ports:
- "443:443"
depends_on:
- loki
volumes:
loki_data:
Include a sample nginx.conf that terminates TLS and forwards to loki:3400.
13d. Kubernetes manifests
Provide deploy/k8s/ with:
namespace.yamldeployment.yaml(3 replicas, resource requests/limits, liveness/readiness probes)service.yaml(ClusterIP)configmap.yaml(non-secret config)secret.yaml(API keys, JWT config)hpa.yaml(HorizontalPodAutoscaler based on CPU + custom metric for requests/sec)ingress.yaml(optional example using nginx-ingress)
Document storage strategy: sessions use a PVC mounted at /loki/data; RAG embeddings use a read-only ConfigMap or a separate PVC.
Verification: each deployment target produces a running Loki that passes health checks.
Step 14: Operational runbook
Write docs/RUNBOOK.md with sections for:
- Starting and stopping the server
- Rotating auth keys (StaticKeys mode) — edit config, SIGHUP, verify in audit log
- Rotating auth keys (Jwt mode) — update JWKS at issuer, Loki auto-refreshes
- Rotating MCP credentials — update env vars,
POST /v1/mcp/reload(new endpoint in this phase) or restart - Diagnosing high latency — check MCP hit rate, check LLM provider latency, check concurrency saturation
- Diagnosing auth failures — audit log
AuthFailureevents, check key hash, check JWKS reachability - Diagnosing rate limit rejections — check per-subject counter, adjust limit or identify runaway client
- Diagnosing orphaned MCP subprocesses —
ps aux | grep loki, check logs forMcpFactory shutdown complete - Diagnosing session corruption — check
.yaml.tmpfiles (should not exist when server is idle), inspect session YAML for validity - Backup and restore — tar the
sessions/andagents/directories - Scaling horizontally — each replica has its own MCP pool and session store; share sessions via shared filesystem (NFS/EFS) or deferred to a database-backed SessionStore (not in this phase)
- Incident response — what logs to collect, what metrics to snapshot, how to reach a minimal reproducing state
Verification: walk through each procedure on a test deployment; fix any unclear steps.
Step 15: Deployment and security guides
docs/DEPLOYMENT.md — step-by-step for Docker, systemd, docker-compose, Kubernetes. Pre-flight checklist, first-time setup, upgrade procedure.
docs/SECURITY.md — threat model, hardening checklist, scope model, audit event schema, key rotation, reverse proxy configuration, network security recommendations, CVE reporting contact.
Cross-reference from README.md and add a "Production Deployment" section to the README that points to both docs.
Verification: a developer unfamiliar with Loki can deploy it successfully using only the docs.
Risks and Watch Items
| Risk | Severity | Mitigation |
|---|---|---|
| Session ownership migration breaks legacy users | Medium | Legacy sessions with owner: None stay readable by anyone; they get claimed forward on first mutation. Document this in RUNBOOK.md. Add a one-shot migration CLI command (loki migrate sessions --claim-to <subject>) that assigns ownership of all unowned sessions to a specific subject. |
| JWT JWKS fetch failures block startup | Medium | JWKS URL must be reachable at startup; if it's not, log an error and fall back to "reject all" mode until the fetch succeeds. A retry loop with exponential backoff runs in the background. Do NOT crash on JWKS failure. |
| Rate limiter DashMap growth | Low | Per-subject windows accumulate forever without cleanup. Add a background reaper that removes entries with zero recent activity every few minutes. Cap total entries at 100k as a safety valve. |
| Prometheus metric cardinality explosion | Low | http_requests_total with per-path labels could explode if routes have dynamic segments (/v1/sessions/:id). Use route templates as labels, not concrete paths. Validate label sets at registration. |
| Audit log retention compliance | Low | Audit logs might need to be retained for regulatory reasons. Phase 6 provides the emission; retention is the operator's responsibility. Document this in SECURITY.md. |
| SIGHUP reload partial failure | Medium | If the new config is invalid, don't swap it in — keep the old config running. Log the validation error. The operator can fix the file and SIGHUP again. Never leave the server in an inconsistent state. |
| Docker image size | Low | debian:bookworm-slim is ~80 MB; final image ~100 MB. If smaller is needed, use distroless/cc-debian12 for a ~35 MB image at the cost of not having tini or debugging tools. Document both options. |
| systemd Type=notify missing implementation | Medium | Adding sd_notify requires the sd-notify crate AND calling it after listener bind. Missing this call makes systemd think the service failed. Add an integration test that fakes systemd and asserts the notification is sent. |
| Kubernetes pod disruption | Low | HPA scales down during low traffic, but in-flight requests on the terminating pod must complete gracefully. Set terminationGracePeriodSeconds to at least shutdown_grace_seconds + 10. Document in DEPLOYMENT.md. |
| Running under a reverse proxy | Low | CORS, Host header handling, X-Forwarded-For for rate limiter subject identification. Document the expected proxy config (trust X-Forwarded-* headers only from trusted proxies). |
What Phase 6 Does NOT Do
- No multi-region replication. Loki is a single-instance service; scale out by running multiple instances behind a load balancer, each with its own pool. Cross-instance state sharing is not in scope.
- No database-backed session store.
FileSessionStoreis still the only implementation. APostgresSessionStoreis a clean extension point (SessionStoretrait is already there) but belongs to a follow-up. - No cluster coordination. Each Loki instance is independent. Running Loki in a "cluster" mode where instances share work is a separate project.
- No advanced ML observability. LLM call costs, token usage trends, provider error rates — these are tracked as counters but not aggregated into dashboards. Follow-up work.
- No built-in TLS termination. Use a reverse proxy (nginx, Caddy, Traefik, a cloud load balancer). Supporting TLS in-process adds complexity and key management concerns that reverse proxies solve better.
- No SAML or LDAP. Only StaticKeys and JWT. SAML/LDAP integration can extend
AuthConfiglater. - No plugin system. Extensions to auth, storage, or middleware require forking and rebuilding. A dynamic plugin loader is explicitly out of scope.
- No multi-tenancy beyond session ownership. Tenants share the same process, same MCP pool, same RAG cache, same resources. Strict tenant isolation (separate processes per tenant) requires orchestration outside Loki.
- No cost accounting per tenant. LLM API calls are tracked per-subject in audit logs but not aggregated into billing-grade cost reports.
Entry Criteria (from Phase 5)
McpFactorypooling works and has metrics- Graceful shutdown drains the MCP pool
- Phase 5 load test passes (hit rate >0.8, no orphaned subprocesses)
- Phase 4 API integration test suite passes
cargo check,cargo test,cargo clippyall clean
Exit Criteria (Phase 6 complete — v1 ready)
- Per-subject session ownership enforced; integration tests prove Alice can't read Bob's sessions
- Scope-based authorization enforced on every endpoint
- JWT authentication works with a real JWKS endpoint
- Real rate limiting replaces the Phase 4 stub; 429 responses include
Retry-After - Per-subject concurrency limiter prevents noisy-neighbor saturation
- Prometheus
/metricsendpoint scrapes cleanly - Structured JSON logs emitted in
--servemode - Audit events written for all security-relevant actions
- Security headers set on all responses
- Config validation fails fast at startup with readable errors
/healthz/liveand/healthz/readyendpoints work- SIGHUP reloads auth keys, log level, and rate limits without restart
- Dockerfile produces a minimal runtime image
- systemd unit with
Type=notifyworks correctly - docker-compose example runs end-to-end with TLS via nginx
- Kubernetes manifests deploy successfully
docs/RUNBOOK.mdcovers all common operational scenariosdocs/DEPLOYMENT.mdguides a first-time deployer to successdocs/SECURITY.mddocuments threat model, scopes, and hardeningcargo check,cargo test,cargo clippyall clean- End-to-end production smoke test: deploy to Kubernetes, send real traffic, scrape metrics, rotate a key, induce a failure, observe recovery
v1 Release Summary
After Phase 6 lands, Loki v1 has transformed from a single-user CLI tool into a production-ready multi-tenant AI service. Here's what the v1 release notes should say:
New in Loki v1:
- REST API — full HTTP surface for completions, sessions, agents, roles, RAGs, and metadata. Streaming via Server-Sent Events, synchronous via JSON.
- Multi-tenant sessions — UUID-primary identity with optional human-readable aliases. Per-subject ownership with scope-based access control.
- Concurrent safety — per-session mutex serialization, per-MCP-server Arc sharing, per-agent runtime isolation. Run dozens of concurrent requests without corruption.
- MCP pooling — recently-used MCP subprocesses stay warm across requests. Near-zero warm-path latency. Configurable idle timeout and LRU cap.
- Authentication — static API keys or JWT with JWKS. Argon2-hashed credentials. Scope-based authorization per endpoint.
- Observability — Prometheus metrics, structured JSON logging with correlation IDs, dedicated audit log stream.
- Rate limiting — sliding-window per subject with configurable limits and burst allowance.
- Graceful shutdown — in-flight requests complete within a grace period; MCP subprocesses terminate cleanly; session state is persisted.
- Deployment manifests — Dockerfile, systemd unit, docker-compose example, Kubernetes manifests.
- Full documentation — runbook, deployment guide, security guide, API reference.
Backward compatibility:
CLI and REPL continue to work identically to pre-v1 builds. Existing config.yaml, roles/, sessions/, agents/, rags/, and functions/ directories are read-compatible. The legacy session layout is migrated lazily on first access without destroying the old files.
What's next (v2+):
- Database-backed session store for cross-instance sharing
- Native TLS termination option
- SAML / LDAP authentication extensions
- Per-tenant cost accounting and quotas
- Dynamic plugin system for custom auth, storage, and middleware
- Multi-region replication
- WebSocket transport alongside SSE