Available on all Portkey plans.
Examples
| Pattern | Use Case |
|---|---|
| Between Providers | Route to different providers; model comes from request |
| Multiple API Keys | Distribute load across rate limits from different accounts |
| Cost Optimization | Send most traffic to cheaper models, reserve premium for a portion |
| Gradual Migration | Test new models with small percentage before full rollout |
The
@provider-slug/model-name format automatically routes to the correct provider. Set up providers in Model Catalog.How It Works
- Define targets & weights — Assign a
weightto each target. Weights represent relative share of traffic. - Weight normalization — Portkey normalizes weights to sum to 100%. Example: weights 5, 3, 1 become 55%, 33%, 11%.
- Request distribution — Each request routes to a target based on normalized probabilities.
- Default
weight:1 - Minimum
weight:0(stops traffic without removing from config) - Unset weights default to
1
Considerations
- Ensure LLMs in your list are compatible with your use case
- Monitor usage per LLM—weight distribution affects spend
- Each LLM has different latency and pricing
Sticky Load Balancing
Sticky load balancing ensures that requests with the same identifier are consistently routed to the same target. This is useful for:- Maintaining conversation context across multiple requests
- Ensuring consistent model behavior for A/B testing
- Session-based routing for user-specific experiences
Configuration
Addsticky_session to your load balancing strategy:
Parameters
| Parameter | Type | Description |
|---|---|---|
hash_fields | array | Fields to use for generating the sticky session identifier. Supports dot notation for nested fields (e.g., metadata.user_id, metadata.session_id) |
ttl | number | Time-to-live in seconds for the sticky session. After this period, a new target may be selected. Default: 3600 (1 hour) |
How It Works
- Identifier Generation: When a request arrives, Portkey generates a hash from the specified
hash_fieldsvalues - Target Lookup: The hash is used to look up the previously assigned target from cache
- Consistent Routing: If a cached assignment exists and hasn’t expired, the request goes to the same target
- New Assignment: If no cached assignment exists, a new target is selected based on weights and cached for future requests
Sticky sessions use a two-tier cache system (in-memory + Redis) for fast lookups and persistence across gateway instances in distributed deployments.
Caveats and Considerations
While the Load Balancing feature offers numerous benefits, there are a few things to consider:- Ensure the LLMs in your list are compatible with your use case. Not all LLMs offer the same capabilities or respond in the same format.
- Be aware of your usage with each LLM. Depending on your weight distribution, your usage with each LLM could vary significantly.
- Keep in mind that each LLM has its own latency and pricing. Diversifying your traffic could have implications on the cost and response time.
- Sticky sessions require Redis for persistence across gateway instances. Without Redis, sticky sessions will only work within a single gateway instance’s memory.

