loki/docs/clients/PATCHES.md

# Request Patching in Loki
Loki provides two mechanisms for modifying API requests sent to LLM providers: **Model-Specific Patches** and
**Client Configuration Patches**. These allow you to customize request parameters, headers, and URLs to work around
provider quirks or add custom behavior.

## Quick Links
- [Model-Specific Patches](#model-specific-patches)
- [Client Configuration Patches](#client-configuration-patches)
- [Comparison](#comparison)
- [Common Use Cases](#common-use-cases)
- [Environment Variable Patches](#environment-variable-patches)
- [Tips](#tips)
- [Debugging Patches](#debugging-patches)

---

## Model-Specific Patches

### Overview
Model-specific patches are applied **unconditionally** to a single model. They are useful for handling model-specific
quirks or requirements.

### When to Use
- A specific model requires certain parameters to be set or removed
- A model needs different default values than other models from the same provider
- You need to add special configuration for one model only

### Structure

```yaml
models:
  - name: model-name
    type: chat
    # ... other model properties ...
    patch:
      url: "https://custom-endpoint.com"   # Optional: override the API endpoint
      body:                                # Optional: modify request body
        <parameter>: <value>               # Add or modify parameters
        <parameter>: null                  # Remove parameters (set to null)
      headers:                             # Optional: modify request headers
        <header-name>: <value>             # Add or modify headers
        <header-name>: null                # Remove headers (set to null)
```

### Examples

#### Example 1: Removing Parameters
OpenAI's o1 models don't support `temperature`, `top_p`, or `max_tokens` parameters. The `patch` removes them:

```yaml
- name: o4-mini
  type: chat
  max_input_tokens: 200000
  max_output_tokens: 100000
  supports_function_calling: true
  patch:
    body:
      max_tokens: null      # Remove max_tokens from request
      temperature: null     # Remove temperature from request
      top_p: null           # Remove top_p from request
```

#### Example 2: Setting Required Parameters
Some models require specific parameters to be set:

```yaml
- name: o4-mini-high
  type: chat
  patch:
    body:
      reasoning_effort: high  # Always set reasoning_effort to "high"
      max_tokens: null
      temperature: null
```

#### Example 3: Custom Endpoint
If a model needs a different API endpoint:

```yaml
- name: custom-model
  type: chat
  patch:
    url: "https://special-endpoint.example.com/v1/chat"
```

#### Example 4: Adding Headers
Add authentication or custom headers:

```yaml
- name: special-model
  type: chat
  patch:
    headers:
      X-Custom-Header: "special-value"
      X-API-Version: "2024-01"
```

### How It Works
1. When you use a model, Loki loads its configuration
2. If the model has a `patch` field, it's **always applied** to every request
3. The patch modifies the request URL, body, or headers before sending to the API
4. Parameters set to `null` are **removed** from the request

---

## Client Configuration Patches

### Overview
Client configuration patches allow you to apply customizations to **multiple models** based on
**regex pattern matching**. They're defined in your `config.yaml` file and can target specific API types (`chat`,
`embeddings`, or `rerank`).

### When to Use
- You want to apply the same settings to multiple models from a provider
- You need different configurations for different groups of models
- You want to override the default client model settings
- You need environment-specific customizations

### Structure

```yaml
clients:
  - type: <client>                      # e.g., gemini, openai, claude
    # ... client configuration ...
    patch:
      chat_completions:                 # For chat models
        '<regex-pattern>':              # Regex to match model names
          url: "..."                    # Optional: override endpoint
          body:                         # Optional: modify request body
            <parameter>: <value>
          headers:                      # Optional: modify headers
            <header>: <value>
      embeddings:                       # For embedding models
        '<regex-pattern>':
          # ... same structure ...
      rerank:                           # For reranker models
        '<regex-pattern>':
          # ... same structure ...
```

### Pattern Matching
- Patterns are **regular expressions** that match against the model name
- Use `.*` to match all models
- Use specific patterns like `gpt-4.*` to match model families
- Use `model1|model2` to match multiple specific models

### Examples

#### Example 1: Disable Safety Filters for Gemini Models
Apply to all Gemini chat models:

```yaml
clients:
  - type: gemini
    api_key: "{{GEMINI_API_KEY}}"
    patch:
      chat_completions:
        '.*':  # Matches all Gemini models
          body:
            safetySettings:
              - category: HARM_CATEGORY_HARASSMENT
                threshold: BLOCK_NONE
              - category: HARM_CATEGORY_HATE_SPEECH
                threshold: BLOCK_NONE
              - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
                threshold: BLOCK_NONE
              - category: HARM_CATEGORY_DANGEROUS_CONTENT
                threshold: BLOCK_NONE
```

#### Example 2: Apply Settings to Specific Model Family
Only apply to GPT-4 models (not GPT-3.5):

```yaml
clients:
  - type: openai
    api_key: "{{OPENAI_API_KEY}}"
    patch:
      chat_completions:
        'gpt-4.*':  # Matches gpt-4, gpt-4-turbo, gpt-4o, etc.
          body:
            frequency_penalty: 0.2
            presence_penalty: 0.1
```

#### Example 3: Different Settings for Different Models
Apply different patches based on model name:

```yaml
clients:
  - type: openai
    api_key: "{{OPENAI_API_KEY}}"
    patch:
      chat_completions:
        'gpt-4o':  # Specific model
          body:
            temperature: 0.7
        'gpt-3.5.*':  # Model family
          body:
            temperature: 0.9
            max_tokens: 2000
```

#### Example 4: Modify Embedding Requests
Apply to embedding models:

```yaml
clients:
  - type: openai
    api_key: "{{OPENAI_API_KEY}}"
    patch:
      embeddings:
        'text-embedding-.*':  # All text-embedding models
          body:
            dimensions: 1536
            encoding_format: "float"
```

#### Example 5: Custom Headers for Specific Models
Add headers only for certain models:

```yaml
clients:
  - type: openai-compatible
    api_base: "https://api.example.com/v1"
    patch:
      chat_completions:
        'custom-model-.*':
          headers:
            X-Custom-Auth: "bearer-token"
            X-Model-Version: "latest"
```

#### Example 6: Override Endpoint for Specific Models
Use different endpoints for different model groups:

```yaml
clients:
  - type: openai-compatible
    api_base: "https://default-endpoint.com/v1"
    patch:
      chat_completions:
        'premium-.*':  # Premium models use different endpoint
          url: "https://premium-endpoint.com/v1/chat/completions"
```

### How It Works
1. When making a request, Loki checks if the client has a `patch` configuration
2. It looks at the appropriate API type (`chat_completions`, `embeddings`, or `rerank`)
3. For each pattern in that section, it checks if the regex matches the model name
4. If a match is found, that patch is applied to the request
5. Only the **first matching pattern** is applied (patterns are processed in order)

---

## Comparison

| Feature               | Model-Specific Patch  | Client Configuration Patch          |
|-----------------------|-----------------------|-------------------------------------|
| **Scope**             | Single model only     | Multiple models via regex           |
| **Matching**          | Exact model name      | Regular expression pattern          |
| **Application**       | Always applied        | Only if pattern matches             |
| **API Type**          | All APIs              | Separate for chat/embeddings/rerank |
| **Override**          | Cannot be overridden  | Can override model patch            |
| **Use Case**          | Model-specific quirks | User preferences & customization    |
| **Application Order** | Applied first         | Applied second (can override)       |

### Patch Application Order
When both patches are present, they're applied in this order:

1. **Model-Specific Patch**
2. **Client Configuration Patch**

This means client configuration patches can override model-specific patches if they modify the same parameters.

## Common Use Cases

### Removing Unsupported Parameters
Some models don't support standard parameters like `temperature` or `max_tokens`:

**Model Patch**:
```yaml
patch:
  body:
    temperature: null
    max_tokens: null
```

### Adding Provider-Specific Parameters
Providers often have unique parameters:

**Client Patch**:
```yaml
patch:
  chat_completions:
    '.*':
      body:
        safetySettings: [...]        # Gemini
        thinking_budget: 10000       # DeepSeek
        response_format:             # OpenAI
          type: json_object
```

### Changing Endpoints
Use custom or regional endpoints:

**Client Patch**:
```yaml
patch:
  chat_completions:
    '.*':
      url: "https://eu-endpoint.example.com/v1/chat"
```

### Setting Default Values
Provide defaults for specific models or model families:

**Client Patch**:
```yaml
patch:
  chat_completions:
    'claude-3-.*':
      body:
        max_tokens: 4096
        temperature: 0.7
```

### Custom Authentication
Add special authentication headers:

**Client Patch**:
```yaml
patch:
  chat_completions:
    '.*':
      headers:
        Authorization: "Bearer {{custom_token}}"
        X-Organization-ID: "org-123"
```

## Environment Variable Patches
You can also apply patches via environment variables for temporary overrides:

```bash
export LLM_PATCH_OPENAI_CHAT_COMPLETIONS='{"gpt-4.*":{"body":{"temperature":0.5}}}'
```

This takes precedence over client configuration patches but not model-specific patches.

## Tips
1. **Use model patches** for permanent, model-specific requirements
2. **Use client patches** for personal preferences or environment-specific settings
3. **Test regex patterns** carefully
4. **Set to `null`** to remove parameters, don't just omit them
5. **Check each model provider's docs** for available parameters and their formats
6. **Be specific** with patterns to avoid unintended matches
7. **Remember order matters** - first matching pattern wins for client patches
8. **Patches merge** - both types can be applied, with client patches overriding model patches

## Debugging Patches
To see what request is actually being sent, enable debug logging:

```bash
export RUST_LOG=loki=debug
loki "your prompt here"
```

This will show the final request body after all patches are applied.