Load Balancing

Available on all Portkey plans.

Distribute traffic across multiple LLMs to prevent any single provider from becoming a bottleneck.

Examples

{
  "strategy": { "mode": "loadbalance" },
  "targets": [
    { "provider": "@openai-prod", "weight": 0.7 },
    { "provider": "@azure-prod", "weight": 0.3 }
  ]
}

Pattern	Use Case
Between Providers	Route to different providers; model comes from request
Multiple API Keys	Distribute load across rate limits from different accounts
Cost Optimization	Send most traffic to cheaper models, reserve premium for a portion
Gradual Migration	Test new models with small percentage before full rollout

The @provider-slug/model-name format automatically routes to the correct provider. Set up providers in Model Catalog.

Create and use configs in your requests.

How It Works

Define targets & weights — Assign a weight to each target. Weights represent relative share of traffic.
Weight normalization — Portkey normalizes weights to sum to 100%. Example: weights 5, 3, 1 become 55%, 33%, 11%.
Request distribution — Each request routes to a target based on normalized probabilities.

Default weight: 1
Minimum weight: 0 (stops traffic without removing from config)
Unset weights default to 1

Considerations

Ensure LLMs in your list are compatible with your use case
Monitor usage per LLM—weight distribution affects spend
Each LLM has different latency and pricing

Sticky Load Balancing

Sticky load balancing ensures that requests with the same identifier are consistently routed to the same target. This is useful for:

Maintaining conversation context across multiple requests
Ensuring consistent model behavior for A/B testing
Session-based routing for user-specific experiences

Configuration

Add sticky_session to your load balancing strategy:

{
  "strategy": {
    "mode": "loadbalance",
    "sticky_session": {
      "hash_fields": ["metadata.user_id"],
      "ttl": 3600
    }
  },
  "targets": [
    {
      "provider": "@openai-virtual-key",
      "weight": 0.5
    },
    {
      "provider": "@anthropic-virtual-key", 
      "weight": 0.5
    }
  ]
}

Parameters

Parameter	Type	Description
`hash_fields`	array	Fields to use for generating the sticky session identifier. Supports dot notation for nested fields (e.g., `metadata.user_id`, `metadata.session_id`)
`ttl`	number	Time-to-live in seconds for the sticky session. After this period, a new target may be selected. Default: 3600 (1 hour)

How It Works

Identifier Generation: When a request arrives, Portkey generates a hash from the specified hash_fields values
Target Lookup: The hash is used to look up the previously assigned target from cache
Consistent Routing: If a cached assignment exists and hasn’t expired, the request goes to the same target
New Assignment: If no cached assignment exists, a new target is selected based on weights and cached for future requests

Sticky sessions use a two-tier cache system (in-memory + Redis) for fast lookups and persistence across gateway instances in distributed deployments.

Caveats and Considerations

While the Load Balancing feature offers numerous benefits, there are a few things to consider:

Ensure the LLMs in your list are compatible with your use case. Not all LLMs offer the same capabilities or respond in the same format.
Be aware of your usage with each LLM. Depending on your weight distribution, your usage with each LLM could vary significantly.
Keep in mind that each LLM has its own latency and pricing. Diversifying your traffic could have implications on the cost and response time.
Sticky sessions require Redis for persistence across gateway instances. Without Redis, sticky sessions will only work within a single gateway instance’s memory.

Introduction

Product

Self-Hosting

Support

Examples

How It Works

Considerations

Sticky Load Balancing

Configuration

Parameters

How It Works

Caveats and Considerations

Introduction

Product

Self-Hosting

Support

​Examples

​How It Works

​Considerations

​Sticky Load Balancing

​Configuration

​Parameters

​How It Works

​Caveats and Considerations

Examples

How It Works

Considerations

Sticky Load Balancing

Configuration

Parameters

How It Works

Caveats and Considerations