From d353767b2c31a1f5bef7a776c602e5728f608395 Mon Sep 17 00:00:00 2001 From: Alex Clarke Date: Fri, 7 Nov 2025 13:39:04 -0700 Subject: [PATCH] docs: created documentation for how to patch requests via configuration settings --- docs/clients/PATCHES.md | 368 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 368 insertions(+) create mode 100644 docs/clients/PATCHES.md diff --git a/docs/clients/PATCHES.md b/docs/clients/PATCHES.md new file mode 100644 index 0000000..23246b9 --- /dev/null +++ b/docs/clients/PATCHES.md @@ -0,0 +1,368 @@ +# Request Patching in Loki +Loki provides two mechanisms for modifying API requests sent to LLM providers: **Model-Specific Patches** and +**Client Configuration Patches**. These allow you to customize request parameters, headers, and URLs to work around +provider quirks or add custom behavior. + +## Quick Links +- [Model-Specific Patches](#model-specific-patches) +- [Client Configuration Patches](#client-configuration-patches) +- [Comparison](#comparison) +- [Common Use Cases](#common-use-cases) +- [Environment Variable Patches](#environment-variable-patches) +- [Tips](#tips) +- [Debugging Patches](#debugging-patches) + +--- + +## Model-Specific Patches + +### Overview +Model-specific patches are applied **unconditionally** to a single model. They are useful for handling model-specific +quirks or requirements. + +### When to Use +- A specific model requires certain parameters to be set or removed +- A model needs different default values than other models from the same provider +- You need to add special configuration for one model only + +### Structure + +```yaml +models: + - name: model-name + type: chat + # ... other model properties ... + patch: + url: "https://custom-endpoint.com" # Optional: override the API endpoint + body: # Optional: modify request body + : # Add or modify parameters + : null # Remove parameters (set to null) + headers: # Optional: modify request headers + : # Add or modify headers + : null # Remove headers (set to null) +``` + +### Examples + +#### Example 1: Removing Parameters +OpenAI's o1 models don't support `temperature`, `top_p`, or `max_tokens` parameters. The `patch` removes them: + +```yaml +- name: o4-mini + type: chat + max_input_tokens: 200000 + max_output_tokens: 100000 + supports_function_calling: true + patch: + body: + max_tokens: null # Remove max_tokens from request + temperature: null # Remove temperature from request + top_p: null # Remove top_p from request +``` + +#### Example 2: Setting Required Parameters +Some models require specific parameters to be set: + +```yaml +- name: o4-mini-high + type: chat + patch: + body: + reasoning_effort: high # Always set reasoning_effort to "high" + max_tokens: null + temperature: null +``` + +#### Example 3: Custom Endpoint +If a model needs a different API endpoint: + +```yaml +- name: custom-model + type: chat + patch: + url: "https://special-endpoint.example.com/v1/chat" +``` + +#### Example 4: Adding Headers +Add authentication or custom headers: + +```yaml +- name: special-model + type: chat + patch: + headers: + X-Custom-Header: "special-value" + X-API-Version: "2024-01" +``` + +### How It Works +1. When you use a model, Loki loads its configuration +2. If the model has a `patch` field, it's **always applied** to every request +3. The patch modifies the request URL, body, or headers before sending to the API +4. Parameters set to `null` are **removed** from the request + +--- + +## Client Configuration Patches + +### Overview +Client configuration patches allow you to apply customizations to **multiple models** based on +**regex pattern matching**. They're defined in your `config.yaml` file and can target specific API types (`chat`, +`embeddings`, or `rerank`). + +### When to Use +- You want to apply the same settings to multiple models from a provider +- You need different configurations for different groups of models +- You want to override the default client model settings +- You need environment-specific customizations + +### Structure + +```yaml +clients: + - type: # e.g., gemini, openai, claude + # ... client configuration ... + patch: + chat_completions: # For chat models + '': # Regex to match model names + url: "..." # Optional: override endpoint + body: # Optional: modify request body + : + headers: # Optional: modify headers +
: + embeddings: # For embedding models + '': + # ... same structure ... + rerank: # For reranker models + '': + # ... same structure ... +``` + +### Pattern Matching +- Patterns are **regular expressions** that match against the model name +- Use `.*` to match all models +- Use specific patterns like `gpt-4.*` to match model families +- Use `model1|model2` to match multiple specific models + +### Examples + +#### Example 1: Disable Safety Filters for Gemini Models +Apply to all Gemini chat models: + +```yaml +clients: + - type: gemini + api_key: "{{GEMINI_API_KEY}}" + patch: + chat_completions: + '.*': # Matches all Gemini models + body: + safetySettings: + - category: HARM_CATEGORY_HARASSMENT + threshold: BLOCK_NONE + - category: HARM_CATEGORY_HATE_SPEECH + threshold: BLOCK_NONE + - category: HARM_CATEGORY_SEXUALLY_EXPLICIT + threshold: BLOCK_NONE + - category: HARM_CATEGORY_DANGEROUS_CONTENT + threshold: BLOCK_NONE +``` + +#### Example 2: Apply Settings to Specific Model Family +Only apply to GPT-4 models (not GPT-3.5): + +```yaml +clients: + - type: openai + api_key: "{{OPENAI_API_KEY}}" + patch: + chat_completions: + 'gpt-4.*': # Matches gpt-4, gpt-4-turbo, gpt-4o, etc. + body: + frequency_penalty: 0.2 + presence_penalty: 0.1 +``` + +#### Example 3: Different Settings for Different Models +Apply different patches based on model name: + +```yaml +clients: + - type: openai + api_key: "{{OPENAI_API_KEY}}" + patch: + chat_completions: + 'gpt-4o': # Specific model + body: + temperature: 0.7 + 'gpt-3.5.*': # Model family + body: + temperature: 0.9 + max_tokens: 2000 +``` + +#### Example 4: Modify Embedding Requests +Apply to embedding models: + +```yaml +clients: + - type: openai + api_key: "{{OPENAI_API_KEY}}" + patch: + embeddings: + 'text-embedding-.*': # All text-embedding models + body: + dimensions: 1536 + encoding_format: "float" +``` + +#### Example 5: Custom Headers for Specific Models +Add headers only for certain models: + +```yaml +clients: + - type: openai-compatible + api_base: "https://api.example.com/v1" + patch: + chat_completions: + 'custom-model-.*': + headers: + X-Custom-Auth: "bearer-token" + X-Model-Version: "latest" +``` + +#### Example 6: Override Endpoint for Specific Models +Use different endpoints for different model groups: + +```yaml +clients: + - type: openai-compatible + api_base: "https://default-endpoint.com/v1" + patch: + chat_completions: + 'premium-.*': # Premium models use different endpoint + url: "https://premium-endpoint.com/v1/chat/completions" +``` + +### How It Works +1. When making a request, Loki checks if the client has a `patch` configuration +2. It looks at the appropriate API type (`chat_completions`, `embeddings`, or `rerank`) +3. For each pattern in that section, it checks if the regex matches the model name +4. If a match is found, that patch is applied to the request +5. Only the **first matching pattern** is applied (patterns are processed in order) + +--- + +## Comparison + +| Feature | Model-Specific Patch | Client Configuration Patch | +|-----------------------|-----------------------|-------------------------------------| +| **Scope** | Single model only | Multiple models via regex | +| **Matching** | Exact model name | Regular expression pattern | +| **Application** | Always applied | Only if pattern matches | +| **API Type** | All APIs | Separate for chat/embeddings/rerank | +| **Override** | Cannot be overridden | Can override model patch | +| **Use Case** | Model-specific quirks | User preferences & customization | +| **Application Order** | Applied first | Applied second (can override) | + +### Patch Application Order +When both patches are present, they're applied in this order: + +1. **Model-Specific Patch** +2. **Client Configuration Patch** + +This means client configuration patches can override model-specific patches if they modify the same parameters. + +## Common Use Cases + +### Removing Unsupported Parameters +Some models don't support standard parameters like `temperature` or `max_tokens`: + +**Model Patch**: +```yaml +patch: + body: + temperature: null + max_tokens: null +``` + +### Adding Provider-Specific Parameters +Providers often have unique parameters: + +**Client Patch**: +```yaml +patch: + chat_completions: + '.*': + body: + safetySettings: [...] # Gemini + thinking_budget: 10000 # DeepSeek + response_format: # OpenAI + type: json_object +``` + +### Changing Endpoints +Use custom or regional endpoints: + +**Client Patch**: +```yaml +patch: + chat_completions: + '.*': + url: "https://eu-endpoint.example.com/v1/chat" +``` + +### Setting Default Values +Provide defaults for specific models or model families: + +**Client Patch**: +```yaml +patch: + chat_completions: + 'claude-3-.*': + body: + max_tokens: 4096 + temperature: 0.7 +``` + +### Custom Authentication +Add special authentication headers: + +**Client Patch**: +```yaml +patch: + chat_completions: + '.*': + headers: + Authorization: "Bearer {{custom_token}}" + X-Organization-ID: "org-123" +``` + +## Environment Variable Patches +You can also apply patches via environment variables for temporary overrides: + +```bash +export LLM_PATCH_OPENAI_CHAT_COMPLETIONS='{"gpt-4.*":{"body":{"temperature":0.5}}}' +``` + +This takes precedence over client configuration patches but not model-specific patches. + +## Tips +1. **Use model patches** for permanent, model-specific requirements +2. **Use client patches** for personal preferences or environment-specific settings +3. **Test regex patterns** carefully +4. **Set to `null`** to remove parameters, don't just omit them +5. **Check each model provider's docs** for available parameters and their formats +6. **Be specific** with patterns to avoid unintended matches +7. **Remember order matters** - first matching pattern wins for client patches +8. **Patches merge** - both types can be applied, with client patches overriding model patches + +## Debugging Patches +To see what request is actually being sent, enable debug logging: + +```bash +export RUST_LOG=loki=debug +loki "your prompt here" +``` + +This will show the final request body after all patches are applied.