Files

369 lines
11 KiB
Markdown

# Request Patching in Loki
Loki provides two mechanisms for modifying API requests sent to LLM providers: **Model-Specific Patches** and
**Client Configuration Patches**. These allow you to customize request parameters, headers, and URLs to work around
provider quirks or add custom behavior.
## Quick Links
- [Model-Specific Patches](#model-specific-patches)
- [Client Configuration Patches](#client-configuration-patches)
- [Comparison](#comparison)
- [Common Use Cases](#common-use-cases)
- [Environment Variable Patches](#environment-variable-patches)
- [Tips](#tips)
- [Debugging Patches](#debugging-patches)
---
## Model-Specific Patches
### Overview
Model-specific patches are applied **unconditionally** to a single model. They are useful for handling model-specific
quirks or requirements.
### When to Use
- A specific model requires certain parameters to be set or removed
- A model needs different default values than other models from the same provider
- You need to add special configuration for one model only
### Structure
```yaml
models:
- name: model-name
type: chat
# ... other model properties ...
patch:
url: "https://custom-endpoint.com" # Optional: override the API endpoint
body: # Optional: modify request body
<parameter>: <value> # Add or modify parameters
<parameter>: null # Remove parameters (set to null)
headers: # Optional: modify request headers
<header-name>: <value> # Add or modify headers
<header-name>: null # Remove headers (set to null)
```
### Examples
#### Example 1: Removing Parameters
OpenAI's o1 models don't support `temperature`, `top_p`, or `max_tokens` parameters. The `patch` removes them:
```yaml
- name: o4-mini
type: chat
max_input_tokens: 200000
max_output_tokens: 100000
supports_function_calling: true
patch:
body:
max_tokens: null # Remove max_tokens from request
temperature: null # Remove temperature from request
top_p: null # Remove top_p from request
```
#### Example 2: Setting Required Parameters
Some models require specific parameters to be set:
```yaml
- name: o4-mini-high
type: chat
patch:
body:
reasoning_effort: high # Always set reasoning_effort to "high"
max_tokens: null
temperature: null
```
#### Example 3: Custom Endpoint
If a model needs a different API endpoint:
```yaml
- name: custom-model
type: chat
patch:
url: "https://special-endpoint.example.com/v1/chat"
```
#### Example 4: Adding Headers
Add authentication or custom headers:
```yaml
- name: special-model
type: chat
patch:
headers:
X-Custom-Header: "special-value"
X-API-Version: "2024-01"
```
### How It Works
1. When you use a model, Loki loads its configuration
2. If the model has a `patch` field, it's **always applied** to every request
3. The patch modifies the request URL, body, or headers before sending to the API
4. Parameters set to `null` are **removed** from the request
---
## Client Configuration Patches
### Overview
Client configuration patches allow you to apply customizations to **multiple models** based on
**regex pattern matching**. They're defined in your `config.yaml` file and can target specific API types (`chat`,
`embeddings`, or `rerank`).
### When to Use
- You want to apply the same settings to multiple models from a provider
- You need different configurations for different groups of models
- You want to override the default client model settings
- You need environment-specific customizations
### Structure
```yaml
clients:
- type: <client> # e.g., gemini, openai, claude
# ... client configuration ...
patch:
chat_completions: # For chat models
'<regex-pattern>': # Regex to match model names
url: "..." # Optional: override endpoint
body: # Optional: modify request body
<parameter>: <value>
headers: # Optional: modify headers
<header>: <value>
embeddings: # For embedding models
'<regex-pattern>':
# ... same structure ...
rerank: # For reranker models
'<regex-pattern>':
# ... same structure ...
```
### Pattern Matching
- Patterns are **regular expressions** that match against the model name
- Use `.*` to match all models
- Use specific patterns like `gpt-4.*` to match model families
- Use `model1|model2` to match multiple specific models
### Examples
#### Example 1: Disable Safety Filters for Gemini Models
Apply to all Gemini chat models:
```yaml
clients:
- type: gemini
api_key: "{{GEMINI_API_KEY}}"
patch:
chat_completions:
'.*': # Matches all Gemini models
body:
safetySettings:
- category: HARM_CATEGORY_HARASSMENT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_HATE_SPEECH
threshold: BLOCK_NONE
- category: HARM_CATEGORY_SEXUALLY_EXPLICIT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_DANGEROUS_CONTENT
threshold: BLOCK_NONE
```
#### Example 2: Apply Settings to Specific Model Family
Only apply to GPT-4 models (not GPT-3.5):
```yaml
clients:
- type: openai
api_key: "{{OPENAI_API_KEY}}"
patch:
chat_completions:
'gpt-4.*': # Matches gpt-4, gpt-4-turbo, gpt-4o, etc.
body:
frequency_penalty: 0.2
presence_penalty: 0.1
```
#### Example 3: Different Settings for Different Models
Apply different patches based on model name:
```yaml
clients:
- type: openai
api_key: "{{OPENAI_API_KEY}}"
patch:
chat_completions:
'gpt-4o': # Specific model
body:
temperature: 0.7
'gpt-3.5.*': # Model family
body:
temperature: 0.9
max_tokens: 2000
```
#### Example 4: Modify Embedding Requests
Apply to embedding models:
```yaml
clients:
- type: openai
api_key: "{{OPENAI_API_KEY}}"
patch:
embeddings:
'text-embedding-.*': # All text-embedding models
body:
dimensions: 1536
encoding_format: "float"
```
#### Example 5: Custom Headers for Specific Models
Add headers only for certain models:
```yaml
clients:
- type: openai-compatible
api_base: "https://api.example.com/v1"
patch:
chat_completions:
'custom-model-.*':
headers:
X-Custom-Auth: "bearer-token"
X-Model-Version: "latest"
```
#### Example 6: Override Endpoint for Specific Models
Use different endpoints for different model groups:
```yaml
clients:
- type: openai-compatible
api_base: "https://default-endpoint.com/v1"
patch:
chat_completions:
'premium-.*': # Premium models use different endpoint
url: "https://premium-endpoint.com/v1/chat/completions"
```
### How It Works
1. When making a request, Loki checks if the client has a `patch` configuration
2. It looks at the appropriate API type (`chat_completions`, `embeddings`, or `rerank`)
3. For each pattern in that section, it checks if the regex matches the model name
4. If a match is found, that patch is applied to the request
5. Only the **first matching pattern** is applied (patterns are processed in order)
---
## Comparison
| Feature | Model-Specific Patch | Client Configuration Patch |
|-----------------------|-----------------------|-------------------------------------|
| **Scope** | Single model only | Multiple models via regex |
| **Matching** | Exact model name | Regular expression pattern |
| **Application** | Always applied | Only if pattern matches |
| **API Type** | All APIs | Separate for chat/embeddings/rerank |
| **Override** | Cannot be overridden | Can override model patch |
| **Use Case** | Model-specific quirks | User preferences & customization |
| **Application Order** | Applied first | Applied second (can override) |
### Patch Application Order
When both patches are present, they're applied in this order:
1. **Model-Specific Patch**
2. **Client Configuration Patch**
This means client configuration patches can override model-specific patches if they modify the same parameters.
## Common Use Cases
### Removing Unsupported Parameters
Some models don't support standard parameters like `temperature` or `max_tokens`:
**Model Patch**:
```yaml
patch:
body:
temperature: null
max_tokens: null
```
### Adding Provider-Specific Parameters
Providers often have unique parameters:
**Client Patch**:
```yaml
patch:
chat_completions:
'.*':
body:
safetySettings: [...] # Gemini
thinking_budget: 10000 # DeepSeek
response_format: # OpenAI
type: json_object
```
### Changing Endpoints
Use custom or regional endpoints:
**Client Patch**:
```yaml
patch:
chat_completions:
'.*':
url: "https://eu-endpoint.example.com/v1/chat"
```
### Setting Default Values
Provide defaults for specific models or model families:
**Client Patch**:
```yaml
patch:
chat_completions:
'claude-3-.*':
body:
max_tokens: 4096
temperature: 0.7
```
### Custom Authentication
Add special authentication headers:
**Client Patch**:
```yaml
patch:
chat_completions:
'.*':
headers:
Authorization: "Bearer {{custom_token}}"
X-Organization-ID: "org-123"
```
## Environment Variable Patches
You can also apply patches via environment variables for temporary overrides:
```bash
export LLM_PATCH_OPENAI_CHAT_COMPLETIONS='{"gpt-4.*":{"body":{"temperature":0.5}}}'
```
This takes precedence over client configuration patches but not model-specific patches.
## Tips
1. **Use model patches** for permanent, model-specific requirements
2. **Use client patches** for personal preferences or environment-specific settings
3. **Test regex patterns** carefully
4. **Set to `null`** to remove parameters, don't just omit them
5. **Check each model provider's docs** for available parameters and their formats
6. **Be specific** with patterns to avoid unintended matches
7. **Remember order matters** - first matching pattern wins for client patches
8. **Patches merge** - both types can be applied, with client patches overriding model patches
## Debugging Patches
To see what request is actually being sent, enable debug logging:
```bash
export RUST_LOG=loki=debug
loki "your prompt here"
```
This will show the final request body after all patches are applied.