docs: created documentation for how to patch requests via configuration settings

2025-11-07 13:39:04 -07:00
parent 33baeaa62d
commit d353767b2c
1 changed files with 368 additions and 0 deletions
@@ -0,0 +1,368 @@
+# Request Patching in Loki
+Loki provides two mechanisms for modifying API requests sent to LLM providers: **Model-Specific Patches** and 
+**Client Configuration Patches**. These allow you to customize request parameters, headers, and URLs to work around 
+provider quirks or add custom behavior.
+
+## Quick Links
+- [Model-Specific Patches](#model-specific-patches)
+- [Client Configuration Patches](#client-configuration-patches)
+- [Comparison](#comparison)
+- [Common Use Cases](#common-use-cases)
+- [Environment Variable Patches](#environment-variable-patches)
+- [Tips](#tips)
+- [Debugging Patches](#debugging-patches)
+
+---
+
+## Model-Specific Patches
+
+### Overview
+Model-specific patches are applied **unconditionally** to a single model. They are useful for handling model-specific 
+quirks or requirements.
+
+### When to Use
+- A specific model requires certain parameters to be set or removed
+- A model needs different default values than other models from the same provider
+- You need to add special configuration for one model only
+
+### Structure
+
+```yaml
+models:
+  - name: model-name
+    type: chat
+    # ... other model properties ...
+    patch:
+      url: "https://custom-endpoint.com"   # Optional: override the API endpoint
+      body:                                # Optional: modify request body
+        <parameter>: <value>               # Add or modify parameters
+        <parameter>: null                  # Remove parameters (set to null)
+      headers:                             # Optional: modify request headers
+        <header-name>: <value>             # Add or modify headers
+        <header-name>: null                # Remove headers (set to null)
+```
+
+### Examples
+
+#### Example 1: Removing Parameters
+OpenAI's o1 models don't support `temperature`, `top_p`, or `max_tokens` parameters. The `patch` removes them:
+
+```yaml
+- name: o4-mini
+  type: chat
+  max_input_tokens: 200000
+  max_output_tokens: 100000
+  supports_function_calling: true
+  patch:
+    body:
+      max_tokens: null      # Remove max_tokens from request
+      temperature: null     # Remove temperature from request
+      top_p: null           # Remove top_p from request
+```
+
+#### Example 2: Setting Required Parameters
+Some models require specific parameters to be set:
+
+```yaml
+- name: o4-mini-high
+  type: chat
+  patch:
+    body:
+      reasoning_effort: high  # Always set reasoning_effort to "high"
+      max_tokens: null
+      temperature: null
+```
+
+#### Example 3: Custom Endpoint
+If a model needs a different API endpoint:
+
+```yaml
+- name: custom-model
+  type: chat
+  patch:
+    url: "https://special-endpoint.example.com/v1/chat"
+```
+
+#### Example 4: Adding Headers
+Add authentication or custom headers:
+
+```yaml
+- name: special-model
+  type: chat
+  patch:
+    headers:
+      X-Custom-Header: "special-value"
+      X-API-Version: "2024-01"
+```
+
+### How It Works
+1. When you use a model, Loki loads its configuration
+2. If the model has a `patch` field, it's **always applied** to every request
+3. The patch modifies the request URL, body, or headers before sending to the API
+4. Parameters set to `null` are **removed** from the request
+
+---
+
+## Client Configuration Patches
+
+### Overview
+Client configuration patches allow you to apply customizations to **multiple models** based on 
+**regex pattern matching**. They're defined in your `config.yaml` file and can target specific API types (`chat`, 
+`embeddings`, or `rerank`).
+
+### When to Use
+- You want to apply the same settings to multiple models from a provider
+- You need different configurations for different groups of models
+- You want to override the default client model settings
+- You need environment-specific customizations
+
+### Structure
+
+```yaml
+clients:
+  - type: <client>                      # e.g., gemini, openai, claude
+    # ... client configuration ...
+    patch:
+      chat_completions:                 # For chat models
+        '<regex-pattern>':              # Regex to match model names
+          url: "..."                    # Optional: override endpoint
+          body:                         # Optional: modify request body
+            <parameter>: <value>
+          headers:                      # Optional: modify headers
+            <header>: <value>
+      embeddings:                       # For embedding models
+        '<regex-pattern>':
+          # ... same structure ...
+      rerank:                           # For reranker models
+        '<regex-pattern>':
+          # ... same structure ...
+```
+
+### Pattern Matching
+- Patterns are **regular expressions** that match against the model name
+- Use `.*` to match all models
+- Use specific patterns like `gpt-4.*` to match model families
+- Use `model1|model2` to match multiple specific models
+
+### Examples
+
+#### Example 1: Disable Safety Filters for Gemini Models
+Apply to all Gemini chat models:
+
+```yaml
+clients:
+  - type: gemini
+    api_key: "{{GEMINI_API_KEY}}"
+    patch:
+      chat_completions:
+        '.*':  # Matches all Gemini models
+          body:
+            safetySettings:
+              - category: HARM_CATEGORY_HARASSMENT
+                threshold: BLOCK_NONE
+              - category: HARM_CATEGORY_HATE_SPEECH
+                threshold: BLOCK_NONE
+              - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
+                threshold: BLOCK_NONE
+              - category: HARM_CATEGORY_DANGEROUS_CONTENT
+                threshold: BLOCK_NONE
+```
+
+#### Example 2: Apply Settings to Specific Model Family
+Only apply to GPT-4 models (not GPT-3.5):
+
+```yaml
+clients:
+  - type: openai
+    api_key: "{{OPENAI_API_KEY}}"
+    patch:
+      chat_completions:
+        'gpt-4.*':  # Matches gpt-4, gpt-4-turbo, gpt-4o, etc.
+          body:
+            frequency_penalty: 0.2
+            presence_penalty: 0.1
+```
+
+#### Example 3: Different Settings for Different Models
+Apply different patches based on model name:
+
+```yaml
+clients:
+  - type: openai
+    api_key: "{{OPENAI_API_KEY}}"
+    patch:
+      chat_completions:
+        'gpt-4o':  # Specific model
+          body:
+            temperature: 0.7
+        'gpt-3.5.*':  # Model family
+          body:
+            temperature: 0.9
+            max_tokens: 2000
+```
+
+#### Example 4: Modify Embedding Requests
+Apply to embedding models:
+
+```yaml
+clients:
+  - type: openai
+    api_key: "{{OPENAI_API_KEY}}"
+    patch:
+      embeddings:
+        'text-embedding-.*':  # All text-embedding models
+          body:
+            dimensions: 1536
+            encoding_format: "float"
+```
+
+#### Example 5: Custom Headers for Specific Models
+Add headers only for certain models:
+
+```yaml
+clients:
+  - type: openai-compatible
+    api_base: "https://api.example.com/v1"
+    patch:
+      chat_completions:
+        'custom-model-.*':
+          headers:
+            X-Custom-Auth: "bearer-token"
+            X-Model-Version: "latest"
+```
+
+#### Example 6: Override Endpoint for Specific Models
+Use different endpoints for different model groups:
+
+```yaml
+clients:
+  - type: openai-compatible
+    api_base: "https://default-endpoint.com/v1"
+    patch:
+      chat_completions:
+        'premium-.*':  # Premium models use different endpoint
+          url: "https://premium-endpoint.com/v1/chat/completions"
+```
+
+### How It Works
+1. When making a request, Loki checks if the client has a `patch` configuration
+2. It looks at the appropriate API type (`chat_completions`, `embeddings`, or `rerank`)
+3. For each pattern in that section, it checks if the regex matches the model name
+4. If a match is found, that patch is applied to the request
+5. Only the **first matching pattern** is applied (patterns are processed in order)
+
+---
+
+## Comparison
+
+| Feature               | Model-Specific Patch  | Client Configuration Patch          |
+|-----------------------|-----------------------|-------------------------------------|
+| **Scope**             | Single model only     | Multiple models via regex           |
+| **Matching**          | Exact model name      | Regular expression pattern          |
+| **Application**       | Always applied        | Only if pattern matches             |
+| **API Type**          | All APIs              | Separate for chat/embeddings/rerank |
+| **Override**          | Cannot be overridden  | Can override model patch            |
+| **Use Case**          | Model-specific quirks | User preferences & customization    |
+| **Application Order** | Applied first         | Applied second (can override)       |
+
+### Patch Application Order
+When both patches are present, they're applied in this order:
+
+1. **Model-Specific Patch**
+2. **Client Configuration Patch**
+
+This means client configuration patches can override model-specific patches if they modify the same parameters.
+
+## Common Use Cases
+
+### Removing Unsupported Parameters
+Some models don't support standard parameters like `temperature` or `max_tokens`:
+
+**Model Patch**:
+```yaml
+patch:
+  body:
+    temperature: null
+    max_tokens: null
+```
+
+### Adding Provider-Specific Parameters
+Providers often have unique parameters:
+
+**Client Patch**:
+```yaml
+patch:
+  chat_completions:
+    '.*':
+      body:
+        safetySettings: [...]        # Gemini
+        thinking_budget: 10000       # DeepSeek
+        response_format:             # OpenAI
+          type: json_object
+```
+
+### Changing Endpoints
+Use custom or regional endpoints:
+
+**Client Patch**:
+```yaml
+patch:
+  chat_completions:
+    '.*':
+      url: "https://eu-endpoint.example.com/v1/chat"
+```
+
+### Setting Default Values
+Provide defaults for specific models or model families:
+
+**Client Patch**:
+```yaml
+patch:
+  chat_completions:
+    'claude-3-.*':
+      body:
+        max_tokens: 4096
+        temperature: 0.7
+```
+
+### Custom Authentication
+Add special authentication headers:
+
+**Client Patch**:
+```yaml
+patch:
+  chat_completions:
+    '.*':
+      headers:
+        Authorization: "Bearer {{custom_token}}"
+        X-Organization-ID: "org-123"
+```
+
+## Environment Variable Patches
+You can also apply patches via environment variables for temporary overrides:
+
+```bash
+export LLM_PATCH_OPENAI_CHAT_COMPLETIONS='{"gpt-4.*":{"body":{"temperature":0.5}}}'
+```
+
+This takes precedence over client configuration patches but not model-specific patches.
+
+## Tips
+1. **Use model patches** for permanent, model-specific requirements
+2. **Use client patches** for personal preferences or environment-specific settings
+3. **Test regex patterns** carefully
+4. **Set to `null`** to remove parameters, don't just omit them
+5. **Check each model provider's docs** for available parameters and their formats
+6. **Be specific** with patterns to avoid unintended matches
+7. **Remember order matters** - first matching pattern wins for client patches
+8. **Patches merge** - both types can be applied, with client patches overriding model patches
+
+## Debugging Patches
+To see what request is actually being sent, enable debug logging:
+
+```bash
+export RUST_LOG=loki=debug
+loki "your prompt here"
+```
+
+This will show the final request body after all patches are applied.