testing
This commit is contained in:
@@ -0,0 +1,744 @@
|
||||
# Phase 6 Implementation Plan: Production Hardening
|
||||
|
||||
## Overview
|
||||
|
||||
Phase 6 closes out the refactor by picking up every "deferred to production hardening" item from Phases 1–5 and delivering a Loki build that's safe to run as a multi-tenant service. The preceding phases made Loki *functionally* a server — Phase 6 makes it *operationally* a server. That means real rate limiting instead of a stub, per-subject session ownership instead of flat visibility, Prometheus metrics instead of in-memory counters, structured JSON logging, deployment manifests, security headers, config validation, and operational runbooks.
|
||||
|
||||
This is the final phase. After it lands, Loki v1 is production-ready: you can run `loki --serve` in a container behind a reverse proxy, scrape its metrics from Prometheus, route requests through a rate limiter, and have multiple tenants share the same instance without seeing each other's data.
|
||||
|
||||
**Estimated effort:** ~1 week
|
||||
**Risk:** Low. Most of the work is applying well-known patterns (sliding-window rate limiting, row-level authz, Prometheus, structured logging) on top of the architecture the previous phases already built. No new core types, no new pipelines.
|
||||
**Depends on:** Phases 1–5 complete. The API server runs, MCP pool works, sessions are UUID-keyed.
|
||||
|
||||
---
|
||||
|
||||
## Why Phase 6 Exists
|
||||
|
||||
Phases 4 and 5 got the API server running with correct semantics, but several explicit gaps were called out as "stubs" or "follow-ups." A Phase 4 deployment is usable for a trusted single-tenant context (an internal tool, a personal server) but unsafe for anything else:
|
||||
|
||||
- **Anyone with a valid API key can see every session.** Phase 4 flagged this as "single-tenant-per-key." In a multi-tenant deployment where Alice and Bob both have keys, Alice can list Bob's sessions and read their messages. This is a security issue, not a feature gap.
|
||||
- **No real rate limiting.** Phase 4's `max_concurrent_requests` semaphore caps parallelism but doesn't throttle per-subject request rates. A single runaway client can exhaust the whole concurrency budget.
|
||||
- **No metrics for external observability.** Phase 5 added in-memory counters, but they're only reachable via the `.info mcp` dot-command or a one-shot JSON endpoint. Production needs Prometheus scraping so alerting and dashboards work.
|
||||
- **Logs aren't structured.** The `tracing` spans from Phase 4 middleware emit human-readable text. Aggregators like Loki (the other one), Datadog, or CloudWatch want JSON with correlation IDs.
|
||||
- **No deployment story.** There's no Dockerfile, no systemd unit, no documented way to actually run the thing in production. Every deploying team has to reinvent this.
|
||||
- **Security headers missing.** Phase 4's CORS handles cross-origin; it doesn't set `X-Content-Type-Options`, `X-Frame-Options`, or similar defaults that a browser-facing endpoint should have.
|
||||
- **No config validation at startup.** Mistyped config values produce runtime errors hours after deployment instead of failing fast at startup.
|
||||
- **Operational procedures are undocumented.** How do you rotate auth keys? How do you reload MCP credentials? What's the runbook when the MCP hit rate drops? None of this is written down.
|
||||
|
||||
Phase 6 delivers answers to all of the above. It's the "you can actually deploy this" phase.
|
||||
|
||||
---
|
||||
|
||||
## What Phase 6 Delivers
|
||||
|
||||
Grouped by theme rather than by dependency order. Each item is independently valuable and can be worked in parallel.
|
||||
|
||||
### Security and isolation
|
||||
|
||||
1. **Per-subject session ownership** — every session records the authenticated subject that created it; reads/writes are authz-checked against the caller's subject.
|
||||
2. **Scope-based authorization** — `AuthContext.scopes` are enforced per endpoint (e.g., `read:sessions`, `write:sessions`, `admin:mcp`). Phase 4's middleware already populates scopes; Phase 6 adds the enforcement.
|
||||
3. **JWT support** — extends `AuthConfig` with a `Jwt { issuer, audience, jwks_url }` variant that validates tokens against a JWKS endpoint and extracts subject + scopes from claims.
|
||||
4. **Security headers middleware** — `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin`, optional HSTS when behind HTTPS.
|
||||
5. **Audit logging** — structured audit events for every authenticated request (subject, action, target, result), written to a dedicated sink so they survive log rotation.
|
||||
|
||||
### Throughput and fairness
|
||||
|
||||
6. **Per-subject rate limiting** — sliding-window limiter keyed by subject. Enforces `rate_limit_per_minute` and related config. Returns `429 Too Many Requests` with a `Retry-After` header.
|
||||
7. **Per-subject concurrency limit** — subject-scoped semaphore so one noisy neighbor can't exhaust the global concurrency budget.
|
||||
8. **Backpressure signal** — expose a `/healthz/ready` endpoint that returns 503 when the server is saturated, so upstream load balancers can drain traffic.
|
||||
|
||||
### Observability
|
||||
|
||||
9. **Structured JSON logging** — every log line is JSON with `timestamp`, `level`, `target`, `request_id`, `subject`, `session_id`, and `fields`. Routes through `tracing_subscriber` with `fmt::layer().json()`.
|
||||
10. **Prometheus metrics endpoint** — `/metrics` exposing the existing Phase 5 counters plus new HTTP metrics (`http_requests_total`, `http_request_duration_seconds`, `http_requests_in_flight`), MCP metrics (`mcp_pool_size`, `mcp_acquire_latency_seconds` histogram), and session metrics (`sessions_active_total`, `sessions_created_total`).
|
||||
11. **Liveness and readiness probes** — `/healthz/live` for process liveness (always 200 unless shutting down), `/healthz/ready` for dependency readiness (config loaded, MCP pool initialized, storage writable).
|
||||
|
||||
### Operability
|
||||
|
||||
12. **Config validation at startup** — a dedicated `ApiConfig::validate()` that checks every field against a schema and fails fast with a readable error message listing *all* problems, not just the first one.
|
||||
13. **SIGHUP config reload** — reloads auth keys, log level, and rate limit settings without restarting the server. Does NOT reload MCP pool config (requires restart because the pool holds live subprocesses).
|
||||
14. **Dockerfile + multi-stage build** — minimal runtime image based on `debian:bookworm-slim` with the compiled binary, config directory, and non-root user.
|
||||
15. **systemd service unit** — with `Type=notify`, sandboxing directives, and resource limits.
|
||||
16. **docker-compose example** — for local development with nginx-as-TLS-terminator in front.
|
||||
17. **Kubernetes manifests** — Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler.
|
||||
|
||||
### Documentation
|
||||
|
||||
18. **Operational runbook** (`docs/RUNBOOK.md`) — documented procedures for common scenarios.
|
||||
19. **Deployment guide** (`docs/DEPLOYMENT.md`) — end-to-end instructions for each deployment target.
|
||||
20. **Security guide** (`docs/SECURITY.md`) — threat model, hardening checklist, key rotation procedures.
|
||||
|
||||
---
|
||||
|
||||
## Core Type Additions
|
||||
|
||||
Most of Phase 6 hangs off existing types. A few new concepts need introducing.
|
||||
|
||||
### `AuthContext` enrichment
|
||||
|
||||
Phase 4 defined `AuthContext { subject: String, scopes: Vec<String> }`. Phase 6 extends it:
|
||||
|
||||
```rust
|
||||
pub struct AuthContext {
|
||||
pub subject: String,
|
||||
pub scopes: Scopes,
|
||||
pub key_id: Option<String>, // for audit log correlation
|
||||
pub claims: Option<JwtClaims>, // present when auth mode is Jwt
|
||||
}
|
||||
|
||||
pub struct Scopes(HashSet<String>);
|
||||
|
||||
impl Scopes {
|
||||
pub fn has(&self, scope: &str) -> bool;
|
||||
pub fn has_any(&self, required: &[&str]) -> bool;
|
||||
pub fn has_all(&self, required: &[&str]) -> bool;
|
||||
}
|
||||
|
||||
pub enum Scope {
|
||||
ReadSessions, // "read:sessions"
|
||||
WriteSessions, // "write:sessions"
|
||||
ReadAgents, // "read:agents"
|
||||
RunAgents, // "run:agents"
|
||||
ReadModels, // "read:models"
|
||||
AdminMcp, // "admin:mcp"
|
||||
AdminSessions, // "admin:sessions" — can see all users' sessions
|
||||
}
|
||||
```
|
||||
|
||||
The `Scope` enum provides typed constants for the well-known scope strings used in the handlers. Custom scopes (for callers to define their own access tiers) continue to work as raw strings.
|
||||
|
||||
### `SessionOwnership` in the session store
|
||||
|
||||
The session metadata needs to record who owns each session so reads/writes can be authorized:
|
||||
|
||||
```rust
|
||||
pub struct SessionMeta {
|
||||
pub id: SessionId,
|
||||
pub alias: Option<SessionAlias>,
|
||||
pub owner: Option<String>, // subject that created it; None = legacy
|
||||
pub last_modified: SystemTime,
|
||||
pub is_autoname: bool,
|
||||
}
|
||||
```
|
||||
|
||||
On disk, the ownership field goes into the session's YAML file under a reserved `_meta` block:
|
||||
|
||||
```yaml
|
||||
_meta:
|
||||
owner: "alice"
|
||||
created_at: "2026-04-10T15:32:11Z"
|
||||
created_by_key_id: "key_3f2a..."
|
||||
# ... rest of session fields unchanged
|
||||
```
|
||||
|
||||
The `SessionStore` trait gets two new methods and an enriched `open` signature:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait SessionStore: Send + Sync {
|
||||
// existing methods unchanged except:
|
||||
async fn open(
|
||||
&self,
|
||||
agent: Option<&str>,
|
||||
id: SessionId,
|
||||
caller: Option<&AuthContext>, // NEW: for authz check
|
||||
) -> Result<SessionHandle, StoreError>;
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
agent: Option<&str>,
|
||||
caller: Option<&AuthContext>, // NEW: for filtering
|
||||
) -> Result<Vec<SessionMeta>, StoreError>;
|
||||
|
||||
// NEW: transfer ownership (e.g., admin reassignment)
|
||||
async fn set_owner(
|
||||
&self,
|
||||
id: SessionId,
|
||||
new_owner: Option<String>,
|
||||
) -> Result<(), StoreError>;
|
||||
}
|
||||
```
|
||||
|
||||
`caller: None` means internal or legacy access (CLI/REPL) — skip authz entirely. `caller: Some(...)` means an API call — enforce ownership.
|
||||
|
||||
**Authz rules:**
|
||||
- Own session: full access.
|
||||
- Other subject's session: denied unless caller has `admin:sessions` scope.
|
||||
- Legacy sessions with `owner: None`: accessible to anyone (grandfathered); every mutation attempts to set the owner to the current caller so they get claimed forward.
|
||||
- `list`: returns only sessions owned by the caller (or all if they have `admin:sessions`).
|
||||
|
||||
### `RateLimiter` and `ConcurrencyLimiter`
|
||||
|
||||
```rust
|
||||
pub struct RateLimiter {
|
||||
windows: DashMap<String, SlidingWindow>,
|
||||
config: RateLimitConfig,
|
||||
}
|
||||
|
||||
struct SlidingWindow {
|
||||
bucket_a: AtomicU64,
|
||||
bucket_b: AtomicU64,
|
||||
last_reset: AtomicU64,
|
||||
}
|
||||
|
||||
pub struct RateLimitConfig {
|
||||
pub per_minute: u32,
|
||||
pub burst: u32,
|
||||
}
|
||||
|
||||
impl RateLimiter {
|
||||
pub fn check(&self, subject: &str) -> Result<(), RateLimitError>;
|
||||
}
|
||||
|
||||
pub struct RateLimitError {
|
||||
pub retry_after: Duration,
|
||||
pub limit: u32,
|
||||
pub remaining: u32,
|
||||
}
|
||||
|
||||
pub struct SubjectConcurrencyLimiter {
|
||||
semaphores: DashMap<String, Arc<Semaphore>>,
|
||||
per_subject: usize,
|
||||
}
|
||||
|
||||
impl SubjectConcurrencyLimiter {
|
||||
pub async fn acquire(&self, subject: &str) -> OwnedSemaphorePermit;
|
||||
}
|
||||
```
|
||||
|
||||
Both live in `ApiState` and are applied via middleware. Rate limiting runs first (cheap atomic operations), then concurrency acquisition (may block briefly).
|
||||
|
||||
### `MetricsRegistry`
|
||||
|
||||
```rust
|
||||
pub struct MetricsRegistry {
|
||||
pub http_requests_total: IntCounterVec,
|
||||
pub http_request_duration: HistogramVec,
|
||||
pub http_requests_in_flight: IntGaugeVec,
|
||||
pub sessions_active: IntGauge,
|
||||
pub sessions_created_total: IntCounter,
|
||||
pub mcp_pool_size: IntGaugeVec,
|
||||
pub mcp_acquire_latency: HistogramVec,
|
||||
pub mcp_spawns_total: IntCounter,
|
||||
pub mcp_idle_evictions_total: IntCounter,
|
||||
pub auth_failures_total: IntCounterVec,
|
||||
pub rate_limit_rejections_total: IntCounterVec,
|
||||
}
|
||||
```
|
||||
|
||||
Built on top of the `prometheus` crate. Exposed via `GET /metrics` with the Prometheus text exposition format. The registry bridges Phase 5's atomic counters into the Prometheus types without requiring Phase 5's code to change — Phase 5 keeps its simple counters, and Phase 6 reads them on each scrape to populate the Prometheus gauges.
|
||||
|
||||
### `AuditLogger`
|
||||
|
||||
```rust
|
||||
pub struct AuditLogger {
|
||||
sink: AuditSink,
|
||||
}
|
||||
|
||||
pub enum AuditSink {
|
||||
Stderr, // default
|
||||
File { path: PathBuf, rotation: Rotation },
|
||||
Syslog { facility: String },
|
||||
}
|
||||
|
||||
pub struct AuditEvent<'a> {
|
||||
pub timestamp: OffsetDateTime,
|
||||
pub request_id: Uuid,
|
||||
pub subject: Option<&'a str>,
|
||||
pub action: AuditAction,
|
||||
pub target: Option<&'a str>,
|
||||
pub result: AuditResult,
|
||||
pub details: Option<serde_json::Value>,
|
||||
}
|
||||
|
||||
pub enum AuditAction {
|
||||
SessionCreate,
|
||||
SessionRead,
|
||||
SessionUpdate,
|
||||
SessionDelete,
|
||||
AgentActivate,
|
||||
ToolExecute,
|
||||
McpReload,
|
||||
ConfigReload,
|
||||
AuthFailure,
|
||||
RateLimitRejection,
|
||||
}
|
||||
|
||||
pub enum AuditResult {
|
||||
Success,
|
||||
Denied { reason: String },
|
||||
Error { message: String },
|
||||
}
|
||||
|
||||
impl AuditLogger {
|
||||
pub fn log(&self, event: AuditEvent<'_>);
|
||||
}
|
||||
```
|
||||
|
||||
Audit events are emitted from handler middleware after request completion. The audit stream is deliberately separate from the regular tracing logs because audit logs have stricter retention/integrity requirements in regulated environments — you want to be able to pipe them to a WORM storage or SIEM without mixing in debug logs.
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Step 1: Per-subject session ownership
|
||||
|
||||
The highest-impact security fix. No new deps, no new config — just enriching existing types.
|
||||
|
||||
1. Add `owner: Option<String>` and `created_by_key_id: Option<String>` to the session YAML `_meta` block. Serde skip if absent (backward compat for legacy files).
|
||||
2. Update `SessionStore::create` to record the caller's subject.
|
||||
3. Update `SessionStore::open` to take `caller: Option<&AuthContext>` and enforce ownership.
|
||||
4. Update `SessionStore::list` to filter by caller subject (unless caller has `admin:sessions` scope).
|
||||
5. Add `SessionStore::set_owner` for admin reassignment.
|
||||
6. Implement the "claim on first mutation" behavior for legacy sessions.
|
||||
7. Update all API handlers to pass the `AuthContext` through to store calls.
|
||||
8. Add integration tests: Alice creates a session, Bob tries to read it (403), admin Claire can read it (200), Alice's `list` returns only her own, Claire's `list` with `admin:sessions` scope returns everything.
|
||||
|
||||
**Verification:** all new authz tests pass. CLI/REPL tests still pass because they pass `caller: None`.
|
||||
|
||||
### Step 2: Scope-based authorization for endpoints
|
||||
|
||||
Phase 4's middleware attaches `AuthContext` with a `scopes: Vec<String>` field but handlers don't check it. Phase 6 adds the enforcement.
|
||||
|
||||
1. Change `AuthContext.scopes` from `Vec<String>` to a `Scopes(HashSet<String>)` newtype with `has`/`has_any`/`has_all` methods.
|
||||
2. Define the `Scope` enum with well-known constants.
|
||||
3. Add a `require_scope` helper and a `#[require_scope("read:sessions")]` proc macro (or a handler-side check if proc macros add too much complexity).
|
||||
4. Annotate every handler with the required scope(s):
|
||||
- `GET /v1/sessions` → `read:sessions`
|
||||
- `POST /v1/sessions` → `write:sessions`
|
||||
- `GET /v1/sessions/:id` → `read:sessions`
|
||||
- `DELETE /v1/sessions/:id` → `write:sessions`
|
||||
- `POST /v1/sessions/:id/completions` → `write:sessions` + `run:agents` (if the session has an agent)
|
||||
- `POST /v1/rags/:name/rebuild` → `admin:mcp`
|
||||
- `GET /v1/agents`, `/v1/roles`, `/v1/rags`, `/v1/models` → `read:agents`, `read:roles`, etc.
|
||||
- `/metrics` → `admin:metrics` (or unauthenticated if the endpoint is bound to a private network)
|
||||
5. Document the scope model in `docs/SECURITY.md`.
|
||||
|
||||
**Verification:** per-endpoint authz tests. A key with only `read:sessions` can list and read but not write.
|
||||
|
||||
### Step 3: JWT support in `AuthConfig`
|
||||
|
||||
Extend the auth mode enum:
|
||||
|
||||
```rust
|
||||
pub enum AuthConfig {
|
||||
Disabled,
|
||||
StaticKeys { keys: Vec<AuthKeyEntry> },
|
||||
Jwt(JwtConfig),
|
||||
}
|
||||
|
||||
pub struct JwtConfig {
|
||||
pub issuer: String,
|
||||
pub audience: String,
|
||||
pub jwks_url: String,
|
||||
pub jwks_refresh_interval: Duration,
|
||||
pub subject_claim: String, // e.g., "sub"
|
||||
pub scopes_claim: String, // e.g., "scope" or "permissions"
|
||||
pub leeway_seconds: u64,
|
||||
}
|
||||
```
|
||||
|
||||
1. Add `jsonwebtoken` and `reqwest` (already present) to dependencies.
|
||||
2. Implement a `JwksCache` that fetches `jwks_url` on startup and refreshes every `jwks_refresh_interval`. Uses `reqwest` with a short timeout. Refreshes in the background via `tokio::spawn`.
|
||||
3. The auth middleware branches on `AuthConfig`: `StaticKeys` continues to work, `Jwt` calls `jsonwebtoken::decode` with the cached JWKS.
|
||||
4. Extract subject from the configured claim name. Extract scopes from either a space-separated string (`scope` claim) or an array claim (`permissions`).
|
||||
5. Handle key rotation gracefully: if decoding fails with "unknown key ID," trigger an immediate JWKS refresh (debounced to once per minute) and retry once.
|
||||
6. Integration tests with a fake JWKS endpoint (use `mockito` or `wiremock`).
|
||||
|
||||
**Verification:** valid JWT authenticates; expired JWT rejected; invalid signature rejected; JWKS refresh handles key rotation.
|
||||
|
||||
### Step 4: Real rate limiting
|
||||
|
||||
Replace the Phase 4 stub with a working sliding-window implementation.
|
||||
|
||||
1. Add `dashmap` dependency for the per-subject map (lock-free reads/writes).
|
||||
2. Implement `SlidingWindow` with two adjacent one-minute buckets; the effective rate is the weighted sum of the current bucket plus the tail of the previous bucket based on how far into the current window we are.
|
||||
3. Add `RateLimiter::check(subject) -> Result<(), RateLimitError>`.
|
||||
4. Write middleware that calls `check` before dispatching to handlers. On `Err`, return 429 with `Retry-After` header.
|
||||
5. Add `rate_limit_per_minute` and `rate_limit_burst` config fields. Reasonable defaults: 60/min, burst 10.
|
||||
6. Expose per-subject current rate as a gauge in the Prometheus registry.
|
||||
7. Integration test: fire N+1 requests as the same subject within a minute, assert the N+1th gets 429.
|
||||
|
||||
**Verification:** rate limiting works correctly across subjects; non-limited subjects aren't affected; burst allowance works.
|
||||
|
||||
### Step 5: Per-subject concurrency limiter
|
||||
|
||||
Complements rate limiting — rate limits the *count* of requests over time, concurrency limits the *simultaneous* count.
|
||||
|
||||
1. Implement `SubjectConcurrencyLimiter` with a `DashMap<String, Arc<Semaphore>>`.
|
||||
2. Lazy-init semaphores per subject with `per_subject_concurrency` slots (default 8).
|
||||
3. Middleware acquires a permit per request. If the subject's semaphore is full, queue briefly (`try_acquire_owned` with a short timeout), then 503 if still full.
|
||||
4. Garbage-collect unused semaphores periodically (entries with no waiters and full availability count haven't been used).
|
||||
5. Integration test: fire 10 concurrent requests as one subject with `per_subject_concurrency: 5`, assert at least 5 serialize.
|
||||
|
||||
**Verification:** no subject can exceed its concurrency budget; other subjects unaffected.
|
||||
|
||||
### Step 6: Prometheus metrics endpoint
|
||||
|
||||
1. Add `prometheus` crate.
|
||||
2. Implement `MetricsRegistry` with the metrics listed in the types section.
|
||||
3. Wire metric updates into existing code:
|
||||
- HTTP middleware: `http_requests_total.inc()` on response, `http_request_duration.observe(elapsed)`, `http_requests_in_flight.inc()/dec()`
|
||||
- Session creation: `sessions_created_total.inc()`, `sessions_active.set(store.count())`
|
||||
- MCP factory: read the Phase 5 atomic counters on scrape and populate the Prometheus types
|
||||
4. Add `GET /metrics` handler that writes the Prometheus text exposition format.
|
||||
5. Auth policy for `/metrics`: configurable — either requires `admin:metrics` scope, or is opened to a private network via `metrics_listen_addr: "127.0.0.1:9090"` on a separate port (recommended).
|
||||
6. Integration test: scrape `/metrics`, parse the response, assert expected metrics are present with sensible values.
|
||||
|
||||
**Verification:** Prometheus scraping works; metrics increment correctly.
|
||||
|
||||
### Step 7: Structured JSON logging
|
||||
|
||||
Replace the default `tracing_subscriber` format with JSON output.
|
||||
|
||||
1. Add a `log_format: Text | Json` config field, default `Text` for CLI/REPL, `Json` for `--serve` mode.
|
||||
2. Configure `tracing_subscriber::fmt::layer().json()` conditionally.
|
||||
3. Ensure every span has a `request_id` field (already present from Phase 4 middleware).
|
||||
4. Add `subject` and `session_id` as span fields when present, so they get included in every child log line automatically.
|
||||
5. Add a `log_level` config field that SIGHUP reloads at runtime (see Step 12).
|
||||
6. Integration test: capture stdout during a request, parse as JSON, assert the fields are present and correctly scoped.
|
||||
|
||||
**Verification:** `loki --serve` produces one-line-per-event JSON output suitable for log aggregators.
|
||||
|
||||
### Step 8: Audit logging
|
||||
|
||||
Dedicated sink for security-relevant events.
|
||||
|
||||
1. Implement `AuditLogger` with `Stderr`, `File`, and `Syslog` sinks. Start with just `Stderr` and `File` — `Syslog` via `syslog` crate can follow.
|
||||
2. Emit audit events from:
|
||||
- Auth middleware: `AuditAction::AuthFailure` on any auth rejection
|
||||
- Rate limiter: `AuditAction::RateLimitRejection` on 429
|
||||
- Session handlers: `AuditAction::SessionCreate/Read/Update/Delete`
|
||||
- Agent handlers: `AuditAction::AgentActivate`
|
||||
- MCP reload endpoint: `AuditAction::McpReload`
|
||||
3. Audit events are JSON lines with a schema documented in `docs/SECURITY.md`.
|
||||
4. Audit events don't interfere with the main tracing stream — they go to the configured audit sink independently.
|
||||
5. File rotation via `tracing-appender` or manual rotation with size + date cap.
|
||||
|
||||
**Verification:** every security-relevant action produces an audit event; failures include a `reason`.
|
||||
|
||||
### Step 9: Security headers and misc middleware
|
||||
|
||||
1. Add a `security_headers` middleware layer that attaches:
|
||||
- `X-Content-Type-Options: nosniff`
|
||||
- `X-Frame-Options: DENY`
|
||||
- `Referrer-Policy: strict-origin-when-cross-origin`
|
||||
- `Strict-Transport-Security: max-age=31536000; includeSubDomains` (only when `api.force_https: true`)
|
||||
- Do NOT set CSP — this is an API, not a browser app; CSP would confuse clients.
|
||||
2. Remove `Server: ...` and other fingerprinting headers.
|
||||
3. Handle `OPTIONS` preflight correctly (Phase 4's CORS layer does this; verify).
|
||||
|
||||
**Verification:** `curl -I` inspects headers; automated test asserts each required header is present.
|
||||
|
||||
### Step 10: Config validation at startup
|
||||
|
||||
A single `ApiConfig::validate()` method that checks every field and aggregates ALL errors before failing.
|
||||
|
||||
1. Implement validation for:
|
||||
- `listen_addr` is parseable and bindable
|
||||
- `auth.mode` has a valid configuration (e.g., `StaticKeys` with non-empty key list, `Jwt` with reachable JWKS URL)
|
||||
- `auth.keys[].key_hash` starts with `$argon2id$` (catches plaintext keys)
|
||||
- `rate_limit_per_minute > 0` and `burst > 0`
|
||||
- `max_body_bytes > 0` and `< 100 MiB` (sanity)
|
||||
- `request_timeout_seconds > 0` and `< 3600`
|
||||
- `shutdown_grace_seconds >= 0`
|
||||
- `cors.allowed_origins` entries are valid URLs or `"*"`
|
||||
2. Return a `ConfigValidationError` that lists every problem, not just the first.
|
||||
3. Call `validate()` in `serve()` before binding the listener.
|
||||
4. Test: a deliberately-broken config produces an error listing all problems.
|
||||
|
||||
**Verification:** startup validation catches common mistakes; error message is actionable.
|
||||
|
||||
### Step 11: Health check endpoints
|
||||
|
||||
1. `GET /healthz/live` — always returns 200 OK unless the process is in graceful shutdown. Body: `{"status":"ok"}`. No auth required.
|
||||
2. `GET /healthz/ready` — returns 200 OK when fully initialized and not saturated, otherwise 503 Service Unavailable. Readiness criteria:
|
||||
- `AppState` fully initialized
|
||||
- Session store writable (attempt a probe write to a reserved path)
|
||||
- MCP pool initialized (at least the factory is alive)
|
||||
- Concurrency semaphore has at least 10% available (not saturated)
|
||||
3. Both endpoints are unauthenticated and unmetered — load balancers hit them constantly.
|
||||
4. Document in `docs/DEPLOYMENT.md` how Kubernetes, systemd, and other supervisors should use these.
|
||||
|
||||
**Verification:** endpoints return correct status under various load conditions.
|
||||
|
||||
### Step 12: SIGHUP config reload
|
||||
|
||||
Reload a subset of config without restarting.
|
||||
|
||||
1. Reloadable fields:
|
||||
- Auth keys (StaticKeys mode)
|
||||
- JWT config (including JWKS URL)
|
||||
- Log level
|
||||
- Rate limit config
|
||||
- Per-subject concurrency limits
|
||||
- Audit logger sink
|
||||
2. NOT reloadable (requires full restart):
|
||||
- Listen address
|
||||
- MCP pool config (pool holds live subprocesses)
|
||||
- Session storage paths
|
||||
- TLS certs (use a reverse proxy)
|
||||
3. Implementation: SIGHUP handler that re-reads `config.yaml`, validates it, and atomically swaps the affected fields in `ApiState`. Uses `arc-swap` crate for lock-free swaps.
|
||||
4. Audit every reload: `AuditAction::ConfigReload` with before/after diff summary.
|
||||
5. Document: rotation procedures for auth keys, logging level adjustments, etc.
|
||||
|
||||
**Verification:** start server, modify `config.yaml`, send SIGHUP, assert new config is in effect without dropped requests.
|
||||
|
||||
### Step 13: Deployment manifests
|
||||
|
||||
#### 13a. Dockerfile
|
||||
|
||||
Multi-stage build for a minimal runtime image:
|
||||
|
||||
```dockerfile
|
||||
# Build stage
|
||||
FROM rust:1.82-slim AS builder
|
||||
WORKDIR /build
|
||||
COPY Cargo.toml Cargo.lock ./
|
||||
COPY src ./src
|
||||
COPY assets ./assets
|
||||
RUN cargo build --release --bin loki
|
||||
|
||||
# Runtime stage
|
||||
FROM debian:bookworm-slim
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
tini \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
RUN useradd --system --home /loki --shell /bin/false loki
|
||||
COPY --from=builder /build/target/release/loki /usr/local/bin/loki
|
||||
COPY --from=builder /build/assets /opt/loki/assets
|
||||
USER loki
|
||||
WORKDIR /loki
|
||||
ENV LOKI_CONFIG_DIR=/loki/config
|
||||
EXPOSE 3400
|
||||
ENTRYPOINT ["/usr/bin/tini", "--"]
|
||||
CMD ["/usr/local/bin/loki", "--serve"]
|
||||
```
|
||||
|
||||
Build args for targeting specific architectures. Result is a ~100 MB image.
|
||||
|
||||
#### 13b. systemd unit
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Loki AI Server
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
ExecStart=/usr/local/bin/loki --serve
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
User=loki
|
||||
Group=loki
|
||||
|
||||
# Sandboxing
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/lib/loki
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectControlGroups=true
|
||||
RestrictSUIDSGID=true
|
||||
RestrictRealtime=true
|
||||
LockPersonality=true
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
LimitNPROC=512
|
||||
MemoryMax=4G
|
||||
|
||||
# Reload
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
`Type=notify` requires Loki to call `sd_notify(READY=1)` after successful startup — add this with the `sd-notify` crate.
|
||||
|
||||
#### 13c. docker-compose example
|
||||
|
||||
For local development with TLS via nginx:
|
||||
|
||||
```yaml
|
||||
version: "3.9"
|
||||
services:
|
||||
loki:
|
||||
build: .
|
||||
environment:
|
||||
LOKI_CONFIG_DIR: /loki/config
|
||||
volumes:
|
||||
- ./config:/loki/config:ro
|
||||
- loki_data:/loki/data
|
||||
ports:
|
||||
- "127.0.0.1:3400:3400"
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:3400/healthz/live"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
volumes:
|
||||
- ./deploy/nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
- ./deploy/certs:/etc/nginx/certs:ro
|
||||
ports:
|
||||
- "443:443"
|
||||
depends_on:
|
||||
- loki
|
||||
|
||||
volumes:
|
||||
loki_data:
|
||||
```
|
||||
|
||||
Include a sample `nginx.conf` that terminates TLS and forwards to `loki:3400`.
|
||||
|
||||
#### 13d. Kubernetes manifests
|
||||
|
||||
Provide `deploy/k8s/` with:
|
||||
- `namespace.yaml`
|
||||
- `deployment.yaml` (3 replicas, resource requests/limits, liveness/readiness probes)
|
||||
- `service.yaml` (ClusterIP)
|
||||
- `configmap.yaml` (non-secret config)
|
||||
- `secret.yaml` (API keys, JWT config)
|
||||
- `hpa.yaml` (HorizontalPodAutoscaler based on CPU + custom metric for requests/sec)
|
||||
- `ingress.yaml` (optional example using nginx-ingress)
|
||||
|
||||
Document storage strategy: sessions use a PVC mounted at `/loki/data`; RAG embeddings use a read-only ConfigMap or a separate PVC.
|
||||
|
||||
**Verification:** each deployment target produces a running Loki that passes health checks.
|
||||
|
||||
### Step 14: Operational runbook
|
||||
|
||||
Write `docs/RUNBOOK.md` with sections for:
|
||||
|
||||
- **Starting and stopping** the server
|
||||
- **Rotating auth keys** (StaticKeys mode) — edit config, SIGHUP, verify in audit log
|
||||
- **Rotating auth keys** (Jwt mode) — update JWKS at issuer, Loki auto-refreshes
|
||||
- **Rotating MCP credentials** — update env vars, `POST /v1/mcp/reload` (new endpoint in this phase) or restart
|
||||
- **Diagnosing high latency** — check MCP hit rate, check LLM provider latency, check concurrency saturation
|
||||
- **Diagnosing auth failures** — audit log `AuthFailure` events, check key hash, check JWKS reachability
|
||||
- **Diagnosing rate limit rejections** — check per-subject counter, adjust limit or identify runaway client
|
||||
- **Diagnosing orphaned MCP subprocesses** — `ps aux | grep loki`, check logs for `McpFactory shutdown complete`
|
||||
- **Diagnosing session corruption** — check `.yaml.tmp` files (should not exist when server is idle), inspect session YAML for validity
|
||||
- **Backup and restore** — tar the `sessions/` and `agents/` directories
|
||||
- **Scaling horizontally** — each replica has its own MCP pool and session store; share sessions via shared filesystem (NFS/EFS) or deferred to a database-backed SessionStore (not in this phase)
|
||||
- **Incident response** — what logs to collect, what metrics to snapshot, how to reach a minimal reproducing state
|
||||
|
||||
**Verification:** walk through each procedure on a test deployment; fix any unclear steps.
|
||||
|
||||
### Step 15: Deployment and security guides
|
||||
|
||||
`docs/DEPLOYMENT.md` — step-by-step for Docker, systemd, docker-compose, Kubernetes. Pre-flight checklist, first-time setup, upgrade procedure.
|
||||
|
||||
`docs/SECURITY.md` — threat model, hardening checklist, scope model, audit event schema, key rotation, reverse proxy configuration, network security recommendations, CVE reporting contact.
|
||||
|
||||
Cross-reference from `README.md` and add a "Production Deployment" section to the README that points to both docs.
|
||||
|
||||
**Verification:** a developer unfamiliar with Loki can deploy it successfully using only the docs.
|
||||
|
||||
---
|
||||
|
||||
## Risks and Watch Items
|
||||
|
||||
| Risk | Severity | Mitigation |
|
||||
|---|---|---|
|
||||
| **Session ownership migration breaks legacy users** | Medium | Legacy sessions with `owner: None` stay readable by anyone; they get claimed forward on first mutation. Document this in `RUNBOOK.md`. Add a one-shot migration CLI command (`loki migrate sessions --claim-to <subject>`) that assigns ownership of all unowned sessions to a specific subject. |
|
||||
| **JWT JWKS fetch failures block startup** | Medium | JWKS URL must be reachable at startup; if it's not, log an error and fall back to "reject all" mode until the fetch succeeds. A retry loop with exponential backoff runs in the background. Do NOT crash on JWKS failure. |
|
||||
| **Rate limiter DashMap growth** | Low | Per-subject windows accumulate forever without cleanup. Add a background reaper that removes entries with zero recent activity every few minutes. Cap total entries at 100k as a safety valve. |
|
||||
| **Prometheus metric cardinality explosion** | Low | `http_requests_total` with per-path labels could explode if routes have dynamic segments (`/v1/sessions/:id`). Use route templates as labels, not concrete paths. Validate label sets at registration. |
|
||||
| **Audit log retention compliance** | Low | Audit logs might need to be retained for regulatory reasons. Phase 6 provides the emission; retention is the operator's responsibility. Document this in `SECURITY.md`. |
|
||||
| **SIGHUP reload partial failure** | Medium | If the new config is invalid, don't swap it in — keep the old config running. Log the validation error. The operator can fix the file and SIGHUP again. Never leave the server in an inconsistent state. |
|
||||
| **Docker image size** | Low | `debian:bookworm-slim` is ~80 MB; final image ~100 MB. If smaller is needed, use `distroless/cc-debian12` for a ~35 MB image at the cost of not having `tini` or debugging tools. Document both options. |
|
||||
| **systemd Type=notify missing implementation** | Medium | Adding `sd_notify` requires the `sd-notify` crate AND calling it after listener bind. Missing this call makes systemd think the service failed. Add an integration test that fakes systemd and asserts the notification is sent. |
|
||||
| **Kubernetes pod disruption** | Low | HPA scales down during low traffic, but in-flight requests on the terminating pod must complete gracefully. Set `terminationGracePeriodSeconds` to at least `shutdown_grace_seconds + 10`. Document in `DEPLOYMENT.md`. |
|
||||
| **Running under a reverse proxy** | Low | CORS, `Host` header handling, `X-Forwarded-For` for rate limiter subject identification. Document the expected proxy config (trust `X-Forwarded-*` headers only from trusted proxies). |
|
||||
|
||||
---
|
||||
|
||||
## What Phase 6 Does NOT Do
|
||||
|
||||
- **No multi-region replication.** Loki is a single-instance service; scale out by running multiple instances behind a load balancer, each with its own pool. Cross-instance state sharing is not in scope.
|
||||
- **No database-backed session store.** `FileSessionStore` is still the only implementation. A `PostgresSessionStore` is a clean extension point (`SessionStore` trait is already there) but belongs to a follow-up.
|
||||
- **No cluster coordination.** Each Loki instance is independent. Running Loki in a "cluster" mode where instances share work is a separate project.
|
||||
- **No advanced ML observability.** LLM call costs, token usage trends, provider error rates — these are tracked as counters but not aggregated into dashboards. Follow-up work.
|
||||
- **No built-in TLS termination.** Use a reverse proxy (nginx, Caddy, Traefik, a cloud load balancer). Supporting TLS in-process adds complexity and key management concerns that reverse proxies solve better.
|
||||
- **No SAML or LDAP.** Only StaticKeys and JWT. SAML/LDAP integration can extend `AuthConfig` later.
|
||||
- **No plugin system.** Extensions to auth, storage, or middleware require forking and rebuilding. A dynamic plugin loader is explicitly out of scope.
|
||||
- **No multi-tenancy beyond session ownership.** Tenants share the same process, same MCP pool, same RAG cache, same resources. Strict tenant isolation (separate processes per tenant) requires orchestration outside Loki.
|
||||
- **No cost accounting per tenant.** LLM API calls are tracked per-subject in audit logs but not aggregated into billing-grade cost reports.
|
||||
|
||||
---
|
||||
|
||||
## Entry Criteria (from Phase 5)
|
||||
|
||||
- [ ] `McpFactory` pooling works and has metrics
|
||||
- [ ] Graceful shutdown drains the MCP pool
|
||||
- [ ] Phase 5 load test passes (hit rate >0.8, no orphaned subprocesses)
|
||||
- [ ] Phase 4 API integration test suite passes
|
||||
- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
|
||||
|
||||
## Exit Criteria (Phase 6 complete — v1 ready)
|
||||
|
||||
- [ ] Per-subject session ownership enforced; integration tests prove Alice can't read Bob's sessions
|
||||
- [ ] Scope-based authorization enforced on every endpoint
|
||||
- [ ] JWT authentication works with a real JWKS endpoint
|
||||
- [ ] Real rate limiting replaces the Phase 4 stub; 429 responses include `Retry-After`
|
||||
- [ ] Per-subject concurrency limiter prevents noisy-neighbor saturation
|
||||
- [ ] Prometheus `/metrics` endpoint scrapes cleanly
|
||||
- [ ] Structured JSON logs emitted in `--serve` mode
|
||||
- [ ] Audit events written for all security-relevant actions
|
||||
- [ ] Security headers set on all responses
|
||||
- [ ] Config validation fails fast at startup with readable errors
|
||||
- [ ] `/healthz/live` and `/healthz/ready` endpoints work
|
||||
- [ ] SIGHUP reloads auth keys, log level, and rate limits without restart
|
||||
- [ ] Dockerfile produces a minimal runtime image
|
||||
- [ ] systemd unit with `Type=notify` works correctly
|
||||
- [ ] docker-compose example runs end-to-end with TLS via nginx
|
||||
- [ ] Kubernetes manifests deploy successfully
|
||||
- [ ] `docs/RUNBOOK.md` covers all common operational scenarios
|
||||
- [ ] `docs/DEPLOYMENT.md` guides a first-time deployer to success
|
||||
- [ ] `docs/SECURITY.md` documents threat model, scopes, and hardening
|
||||
- [ ] `cargo check`, `cargo test`, `cargo clippy` all clean
|
||||
- [ ] End-to-end production smoke test: deploy to Kubernetes, send real traffic, scrape metrics, rotate a key, induce a failure, observe recovery
|
||||
|
||||
---
|
||||
|
||||
## v1 Release Summary
|
||||
|
||||
After Phase 6 lands, Loki v1 has transformed from a single-user CLI tool into a production-ready multi-tenant AI service. Here's what the v1 release notes should say:
|
||||
|
||||
**New in Loki v1:**
|
||||
|
||||
- **REST API** — full HTTP surface for completions, sessions, agents, roles, RAGs, and metadata. Streaming via Server-Sent Events, synchronous via JSON.
|
||||
- **Multi-tenant sessions** — UUID-primary identity with optional human-readable aliases. Per-subject ownership with scope-based access control.
|
||||
- **Concurrent safety** — per-session mutex serialization, per-MCP-server Arc sharing, per-agent runtime isolation. Run dozens of concurrent requests without corruption.
|
||||
- **MCP pooling** — recently-used MCP subprocesses stay warm across requests. Near-zero warm-path latency. Configurable idle timeout and LRU cap.
|
||||
- **Authentication** — static API keys or JWT with JWKS. Argon2-hashed credentials. Scope-based authorization per endpoint.
|
||||
- **Observability** — Prometheus metrics, structured JSON logging with correlation IDs, dedicated audit log stream.
|
||||
- **Rate limiting** — sliding-window per subject with configurable limits and burst allowance.
|
||||
- **Graceful shutdown** — in-flight requests complete within a grace period; MCP subprocesses terminate cleanly; session state is persisted.
|
||||
- **Deployment manifests** — Dockerfile, systemd unit, docker-compose example, Kubernetes manifests.
|
||||
- **Full documentation** — runbook, deployment guide, security guide, API reference.
|
||||
|
||||
**Backward compatibility:**
|
||||
|
||||
CLI and REPL continue to work identically to pre-v1 builds. Existing `config.yaml`, `roles/`, `sessions/`, `agents/`, `rags/`, and `functions/` directories are read-compatible. The legacy session layout is migrated lazily on first access without destroying the old files.
|
||||
|
||||
**What's next (v2+):**
|
||||
|
||||
- Database-backed session store for cross-instance sharing
|
||||
- Native TLS termination option
|
||||
- SAML / LDAP authentication extensions
|
||||
- Per-tenant cost accounting and quotas
|
||||
- Dynamic plugin system for custom auth, storage, and middleware
|
||||
- Multi-region replication
|
||||
- WebSocket transport alongside SSE
|
||||
Reference in New Issue
Block a user