Skip to main content

Connecting to OSS Models

LangDB AI Gateway supports connecting to open-source models through providers like Ollama and vLLM. This allows you to use locally hosted models while maintaining the same OpenAI-compatible API interface.

Configuration

To use Ollama or vLLM, you need to provide a list of models with their endpoints. By default, ai-gateway loads models from ~/.langdb/models.yaml. You can define your models there in the following format:

- model: gpt-oss
model_provider: ollama
inference_provider:
provider: ollama
model_name: gpt-oss
endpoint: https://my-ollama-server.localhost
price:
per_input_token: 0.0
per_output_token: 0.0
input_formats:
- text
output_formats:
- text
limits:
max_context_size: 128000
capabilities: ['tools']
type: completions
description: OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Configuration Fields

FieldDescriptionRequired
modelThe model identifier used in API requestsYes
model_providerThe provider type (e.g., ollama, vllm)Yes
inference_providerProvider-specific configurationYes
priceToken pricing (set to 0.0 for local models)Yes
input_formatsSupported input formatsYes
output_formatsSupported output formatsYes
limitsModel limitations (context size, etc.)Yes
capabilitiesModel capabilities array (e.g., ['tools'] for function calling)Yes
typeModel type (e.g., completions)Yes
descriptionHuman-readable model descriptionYes

Example Usage

Once configured, you can use your OSS models through the standard OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'

Supported Providers

Ollama

  • Provider: ollama
  • Endpoint: URL to your Ollama server
  • Model Name: The model name as configured in Ollama

vLLM

  • Provider: vllm
  • Endpoint: URL to your vLLM server
  • Model Name: The model name as configured in vLLM

Best Practices

  1. Local Development: Use localhost or 127.0.0.1 for local Ollama/vLLM instances
  2. Production: Use proper domain names or IP addresses for remote instances
  3. Security: Ensure your OSS model endpoints are properly secured
  4. Performance: Consider the network latency between ai-gateway and your model servers
  5. Monitoring: Use the observability features to monitor OSS model performance