Files
loki/docs/clients/PATCHES.md

11 KiB

Request Patching in Loki

Loki provides two mechanisms for modifying API requests sent to LLM providers: Model-Specific Patches and Client Configuration Patches. These allow you to customize request parameters, headers, and URLs to work around provider quirks or add custom behavior.


Model-Specific Patches

Overview

Model-specific patches are applied unconditionally to a single model. They are useful for handling model-specific quirks or requirements.

When to Use

  • A specific model requires certain parameters to be set or removed
  • A model needs different default values than other models from the same provider
  • You need to add special configuration for one model only

Structure

models:
  - name: model-name
    type: chat
    # ... other model properties ...
    patch:
      url: "https://custom-endpoint.com"   # Optional: override the API endpoint
      body:                                # Optional: modify request body
        <parameter>: <value>               # Add or modify parameters
        <parameter>: null                  # Remove parameters (set to null)
      headers:                             # Optional: modify request headers
        <header-name>: <value>             # Add or modify headers
        <header-name>: null                # Remove headers (set to null)

Examples

Example 1: Removing Parameters

OpenAI's o1 models don't support temperature, top_p, or max_tokens parameters. The patch removes them:

- name: o4-mini
  type: chat
  max_input_tokens: 200000
  max_output_tokens: 100000
  supports_function_calling: true
  patch:
    body:
      max_tokens: null      # Remove max_tokens from request
      temperature: null     # Remove temperature from request
      top_p: null           # Remove top_p from request

Example 2: Setting Required Parameters

Some models require specific parameters to be set:

- name: o4-mini-high
  type: chat
  patch:
    body:
      reasoning_effort: high  # Always set reasoning_effort to "high"
      max_tokens: null
      temperature: null

Example 3: Custom Endpoint

If a model needs a different API endpoint:

- name: custom-model
  type: chat
  patch:
    url: "https://special-endpoint.example.com/v1/chat"

Example 4: Adding Headers

Add authentication or custom headers:

- name: special-model
  type: chat
  patch:
    headers:
      X-Custom-Header: "special-value"
      X-API-Version: "2024-01"

How It Works

  1. When you use a model, Loki loads its configuration
  2. If the model has a patch field, it's always applied to every request
  3. The patch modifies the request URL, body, or headers before sending to the API
  4. Parameters set to null are removed from the request

Client Configuration Patches

Overview

Client configuration patches allow you to apply customizations to multiple models based on regex pattern matching. They're defined in your config.yaml file and can target specific API types (chat, embeddings, or rerank).

When to Use

  • You want to apply the same settings to multiple models from a provider
  • You need different configurations for different groups of models
  • You want to override the default client model settings
  • You need environment-specific customizations

Structure

clients:
  - type: <client>                      # e.g., gemini, openai, claude
    # ... client configuration ...
    patch:
      chat_completions:                 # For chat models
        '<regex-pattern>':              # Regex to match model names
          url: "..."                    # Optional: override endpoint
          body:                         # Optional: modify request body
            <parameter>: <value>
          headers:                      # Optional: modify headers
            <header>: <value>
      embeddings:                       # For embedding models
        '<regex-pattern>':
          # ... same structure ...
      rerank:                           # For reranker models
        '<regex-pattern>':
          # ... same structure ...

Pattern Matching

  • Patterns are regular expressions that match against the model name
  • Use .* to match all models
  • Use specific patterns like gpt-4.* to match model families
  • Use model1|model2 to match multiple specific models

Examples

Example 1: Disable Safety Filters for Gemini Models

Apply to all Gemini chat models:

clients:
  - type: gemini
    api_key: "{{GEMINI_API_KEY}}"
    patch:
      chat_completions:
        '.*':  # Matches all Gemini models
          body:
            safetySettings:
              - category: HARM_CATEGORY_HARASSMENT
                threshold: BLOCK_NONE
              - category: HARM_CATEGORY_HATE_SPEECH
                threshold: BLOCK_NONE
              - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
                threshold: BLOCK_NONE
              - category: HARM_CATEGORY_DANGEROUS_CONTENT
                threshold: BLOCK_NONE

Example 2: Apply Settings to Specific Model Family

Only apply to GPT-4 models (not GPT-3.5):

clients:
  - type: openai
    api_key: "{{OPENAI_API_KEY}}"
    patch:
      chat_completions:
        'gpt-4.*':  # Matches gpt-4, gpt-4-turbo, gpt-4o, etc.
          body:
            frequency_penalty: 0.2
            presence_penalty: 0.1

Example 3: Different Settings for Different Models

Apply different patches based on model name:

clients:
  - type: openai
    api_key: "{{OPENAI_API_KEY}}"
    patch:
      chat_completions:
        'gpt-4o':  # Specific model
          body:
            temperature: 0.7
        'gpt-3.5.*':  # Model family
          body:
            temperature: 0.9
            max_tokens: 2000

Example 4: Modify Embedding Requests

Apply to embedding models:

clients:
  - type: openai
    api_key: "{{OPENAI_API_KEY}}"
    patch:
      embeddings:
        'text-embedding-.*':  # All text-embedding models
          body:
            dimensions: 1536
            encoding_format: "float"

Example 5: Custom Headers for Specific Models

Add headers only for certain models:

clients:
  - type: openai-compatible
    api_base: "https://api.example.com/v1"
    patch:
      chat_completions:
        'custom-model-.*':
          headers:
            X-Custom-Auth: "bearer-token"
            X-Model-Version: "latest"

Example 6: Override Endpoint for Specific Models

Use different endpoints for different model groups:

clients:
  - type: openai-compatible
    api_base: "https://default-endpoint.com/v1"
    patch:
      chat_completions:
        'premium-.*':  # Premium models use different endpoint
          url: "https://premium-endpoint.com/v1/chat/completions"

How It Works

  1. When making a request, Loki checks if the client has a patch configuration
  2. It looks at the appropriate API type (chat_completions, embeddings, or rerank)
  3. For each pattern in that section, it checks if the regex matches the model name
  4. If a match is found, that patch is applied to the request
  5. Only the first matching pattern is applied (patterns are processed in order)

Comparison

Feature Model-Specific Patch Client Configuration Patch
Scope Single model only Multiple models via regex
Matching Exact model name Regular expression pattern
Application Always applied Only if pattern matches
API Type All APIs Separate for chat/embeddings/rerank
Override Cannot be overridden Can override model patch
Use Case Model-specific quirks User preferences & customization
Application Order Applied first Applied second (can override)

Patch Application Order

When both patches are present, they're applied in this order:

  1. Model-Specific Patch
  2. Client Configuration Patch

This means client configuration patches can override model-specific patches if they modify the same parameters.

Common Use Cases

Removing Unsupported Parameters

Some models don't support standard parameters like temperature or max_tokens:

Model Patch:

patch:
  body:
    temperature: null
    max_tokens: null

Adding Provider-Specific Parameters

Providers often have unique parameters:

Client Patch:

patch:
  chat_completions:
    '.*':
      body:
        safetySettings: [...]        # Gemini
        thinking_budget: 10000       # DeepSeek
        response_format:             # OpenAI
          type: json_object

Changing Endpoints

Use custom or regional endpoints:

Client Patch:

patch:
  chat_completions:
    '.*':
      url: "https://eu-endpoint.example.com/v1/chat"

Setting Default Values

Provide defaults for specific models or model families:

Client Patch:

patch:
  chat_completions:
    'claude-3-.*':
      body:
        max_tokens: 4096
        temperature: 0.7

Custom Authentication

Add special authentication headers:

Client Patch:

patch:
  chat_completions:
    '.*':
      headers:
        Authorization: "Bearer {{custom_token}}"
        X-Organization-ID: "org-123"

Environment Variable Patches

You can also apply patches via environment variables for temporary overrides:

export LLM_PATCH_OPENAI_CHAT_COMPLETIONS='{"gpt-4.*":{"body":{"temperature":0.5}}}'

This takes precedence over client configuration patches but not model-specific patches.

Tips

  1. Use model patches for permanent, model-specific requirements
  2. Use client patches for personal preferences or environment-specific settings
  3. Test regex patterns carefully
  4. Set to null to remove parameters, don't just omit them
  5. Check each model provider's docs for available parameters and their formats
  6. Be specific with patterns to avoid unintended matches
  7. Remember order matters - first matching pattern wins for client patches
  8. Patches merge - both types can be applied, with client patches overriding model patches

Debugging Patches

To see what request is actually being sent, enable debug logging:

export RUST_LOG=loki=debug
loki "your prompt here"

This will show the final request body after all patches are applied.