Load Balancer Routing
Distribute requests across multiple models to ensure high availability, balance load, and optimize performance. Use real-time metrics to select the best available model.
Use Case
- High availability requirements
- Load distribution across models
- Performance optimization
- Failover scenarios
Configuration
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"routes": [
{
"name": "Balanced",
"targets": {
"$any": [
"openai/gpt-4.1-nano",
"gemini/gemini-2.0-flash",
"bedrock/llama3-2-3b-instruct-v1.0"
],
"sort_by": "requests",
"sort_order": "min"
}
}
]
}
}
How It Works
- Model Pool: Defines three models for load distribution (GPT-4.1-nano, Gemini-2.0-flash, Llama3-2-3b)
- Load Balancing: Automatically selects the model with the least current load (requests)
- Automatic Distribution: Requests are distributed across the available models based on their current usage
Variables Used
requests: Current load metric (used for sorting)
Customization
- Adjust health thresholds
- Add more models to the pool
- Use different sorting strategies (ttft, price, etc.)
- Implement weighted load balancing
- Add geographic considerations