11 KiB
Request Patching in Loki
Loki provides two mechanisms for modifying API requests sent to LLM providers: Model-Specific Patches and Client Configuration Patches. These allow you to customize request parameters, headers, and URLs to work around provider quirks or add custom behavior.
Quick Links
- Model-Specific Patches
- Client Configuration Patches
- Comparison
- Common Use Cases
- Environment Variable Patches
- Tips
- Debugging Patches
Model-Specific Patches
Overview
Model-specific patches are applied unconditionally to a single model. They are useful for handling model-specific quirks or requirements.
When to Use
- A specific model requires certain parameters to be set or removed
- A model needs different default values than other models from the same provider
- You need to add special configuration for one model only
Structure
models:
- name: model-name
type: chat
# ... other model properties ...
patch:
url: "https://custom-endpoint.com" # Optional: override the API endpoint
body: # Optional: modify request body
<parameter>: <value> # Add or modify parameters
<parameter>: null # Remove parameters (set to null)
headers: # Optional: modify request headers
<header-name>: <value> # Add or modify headers
<header-name>: null # Remove headers (set to null)
Examples
Example 1: Removing Parameters
OpenAI's o1 models don't support temperature, top_p, or max_tokens parameters. The patch removes them:
- name: o4-mini
type: chat
max_input_tokens: 200000
max_output_tokens: 100000
supports_function_calling: true
patch:
body:
max_tokens: null # Remove max_tokens from request
temperature: null # Remove temperature from request
top_p: null # Remove top_p from request
Example 2: Setting Required Parameters
Some models require specific parameters to be set:
- name: o4-mini-high
type: chat
patch:
body:
reasoning_effort: high # Always set reasoning_effort to "high"
max_tokens: null
temperature: null
Example 3: Custom Endpoint
If a model needs a different API endpoint:
- name: custom-model
type: chat
patch:
url: "https://special-endpoint.example.com/v1/chat"
Example 4: Adding Headers
Add authentication or custom headers:
- name: special-model
type: chat
patch:
headers:
X-Custom-Header: "special-value"
X-API-Version: "2024-01"
How It Works
- When you use a model, Loki loads its configuration
- If the model has a
patchfield, it's always applied to every request - The patch modifies the request URL, body, or headers before sending to the API
- Parameters set to
nullare removed from the request
Client Configuration Patches
Overview
Client configuration patches allow you to apply customizations to multiple models based on
regex pattern matching. They're defined in your config.yaml file and can target specific API types (chat,
embeddings, or rerank).
When to Use
- You want to apply the same settings to multiple models from a provider
- You need different configurations for different groups of models
- You want to override the default client model settings
- You need environment-specific customizations
Structure
clients:
- type: <client> # e.g., gemini, openai, claude
# ... client configuration ...
patch:
chat_completions: # For chat models
'<regex-pattern>': # Regex to match model names
url: "..." # Optional: override endpoint
body: # Optional: modify request body
<parameter>: <value>
headers: # Optional: modify headers
<header>: <value>
embeddings: # For embedding models
'<regex-pattern>':
# ... same structure ...
rerank: # For reranker models
'<regex-pattern>':
# ... same structure ...
Pattern Matching
- Patterns are regular expressions that match against the model name
- Use
.*to match all models - Use specific patterns like
gpt-4.*to match model families - Use
model1|model2to match multiple specific models
Examples
Example 1: Disable Safety Filters for Gemini Models
Apply to all Gemini chat models:
clients:
- type: gemini
api_key: "{{GEMINI_API_KEY}}"
patch:
chat_completions:
'.*': # Matches all Gemini models
body:
safetySettings:
- category: HARM_CATEGORY_HARASSMENT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_HATE_SPEECH
threshold: BLOCK_NONE
- category: HARM_CATEGORY_SEXUALLY_EXPLICIT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_DANGEROUS_CONTENT
threshold: BLOCK_NONE
Example 2: Apply Settings to Specific Model Family
Only apply to GPT-4 models (not GPT-3.5):
clients:
- type: openai
api_key: "{{OPENAI_API_KEY}}"
patch:
chat_completions:
'gpt-4.*': # Matches gpt-4, gpt-4-turbo, gpt-4o, etc.
body:
frequency_penalty: 0.2
presence_penalty: 0.1
Example 3: Different Settings for Different Models
Apply different patches based on model name:
clients:
- type: openai
api_key: "{{OPENAI_API_KEY}}"
patch:
chat_completions:
'gpt-4o': # Specific model
body:
temperature: 0.7
'gpt-3.5.*': # Model family
body:
temperature: 0.9
max_tokens: 2000
Example 4: Modify Embedding Requests
Apply to embedding models:
clients:
- type: openai
api_key: "{{OPENAI_API_KEY}}"
patch:
embeddings:
'text-embedding-.*': # All text-embedding models
body:
dimensions: 1536
encoding_format: "float"
Example 5: Custom Headers for Specific Models
Add headers only for certain models:
clients:
- type: openai-compatible
api_base: "https://api.example.com/v1"
patch:
chat_completions:
'custom-model-.*':
headers:
X-Custom-Auth: "bearer-token"
X-Model-Version: "latest"
Example 6: Override Endpoint for Specific Models
Use different endpoints for different model groups:
clients:
- type: openai-compatible
api_base: "https://default-endpoint.com/v1"
patch:
chat_completions:
'premium-.*': # Premium models use different endpoint
url: "https://premium-endpoint.com/v1/chat/completions"
How It Works
- When making a request, Loki checks if the client has a
patchconfiguration - It looks at the appropriate API type (
chat_completions,embeddings, orrerank) - For each pattern in that section, it checks if the regex matches the model name
- If a match is found, that patch is applied to the request
- Only the first matching pattern is applied (patterns are processed in order)
Comparison
| Feature | Model-Specific Patch | Client Configuration Patch |
|---|---|---|
| Scope | Single model only | Multiple models via regex |
| Matching | Exact model name | Regular expression pattern |
| Application | Always applied | Only if pattern matches |
| API Type | All APIs | Separate for chat/embeddings/rerank |
| Override | Cannot be overridden | Can override model patch |
| Use Case | Model-specific quirks | User preferences & customization |
| Application Order | Applied first | Applied second (can override) |
Patch Application Order
When both patches are present, they're applied in this order:
- Model-Specific Patch
- Client Configuration Patch
This means client configuration patches can override model-specific patches if they modify the same parameters.
Common Use Cases
Removing Unsupported Parameters
Some models don't support standard parameters like temperature or max_tokens:
Model Patch:
patch:
body:
temperature: null
max_tokens: null
Adding Provider-Specific Parameters
Providers often have unique parameters:
Client Patch:
patch:
chat_completions:
'.*':
body:
safetySettings: [...] # Gemini
thinking_budget: 10000 # DeepSeek
response_format: # OpenAI
type: json_object
Changing Endpoints
Use custom or regional endpoints:
Client Patch:
patch:
chat_completions:
'.*':
url: "https://eu-endpoint.example.com/v1/chat"
Setting Default Values
Provide defaults for specific models or model families:
Client Patch:
patch:
chat_completions:
'claude-3-.*':
body:
max_tokens: 4096
temperature: 0.7
Custom Authentication
Add special authentication headers:
Client Patch:
patch:
chat_completions:
'.*':
headers:
Authorization: "Bearer {{custom_token}}"
X-Organization-ID: "org-123"
Environment Variable Patches
You can also apply patches via environment variables for temporary overrides:
export LLM_PATCH_OPENAI_CHAT_COMPLETIONS='{"gpt-4.*":{"body":{"temperature":0.5}}}'
This takes precedence over client configuration patches but not model-specific patches.
Tips
- Use model patches for permanent, model-specific requirements
- Use client patches for personal preferences or environment-specific settings
- Test regex patterns carefully
- Set to
nullto remove parameters, don't just omit them - Check each model provider's docs for available parameters and their formats
- Be specific with patterns to avoid unintended matches
- Remember order matters - first matching pattern wins for client patches
- Patches merge - both types can be applied, with client patches overriding model patches
Debugging Patches
To see what request is actually being sent, enable debug logging:
export RUST_LOG=loki=debug
loki "your prompt here"
This will show the final request body after all patches are applied.