Connecting to OSS Models

LangDB AI Gateway supports connecting to open-source models through providers like Ollama and vLLM. This allows you to use locally hosted models while maintaining the same OpenAI-compatible API interface.

Configuration

To use Ollama or vLLM, you need to provide a list of models with their endpoints. By default, ai-gateway loads models from ~/.langdb/models.yaml. You can define your models there in the following format:

- model: gpt-oss
  model_provider: ollama
  inference_provider:
    provider: ollama
    model_name: gpt-oss
    endpoint: https://my-ollama-server.localhost
  price:
    per_input_token: 0.0
    per_output_token: 0.0
  input_formats:
  - text
  output_formats:
  - text
  limits:
    max_context_size: 128000
  capabilities: ['tools']
  type: completions
  description: OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Configuration Fields

Field	Description	Required
`model`	The model identifier used in API requests	Yes
`model_provider`	The provider type (e.g., `ollama`, `vllm`)	Yes
`inference_provider`	Provider-specific configuration	Yes
`price`	Token pricing (set to 0.0 for local models)	Yes
`input_formats`	Supported input formats	Yes
`output_formats`	Supported output formats	Yes
`limits`	Model limitations (context size, etc.)	Yes
`capabilities`	Model capabilities array (e.g., `['tools']` for function calling)	Yes
`type`	Model type (e.g., `completions`)	Yes
`description`	Human-readable model description	Yes

Example Usage

Once configured, you can use your OSS models through the standard OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

Supported Providers

Ollama

Provider: ollama
Endpoint: URL to your Ollama server
Model Name: The model name as configured in Ollama

vLLM

Provider: vllm
Endpoint: URL to your vLLM server
Model Name: The model name as configured in vLLM

Best Practices

Local Development: Use localhost or 127.0.0.1 for local Ollama/vLLM instances
Production: Use proper domain names or IP addresses for remote instances
Security: Ensure your OSS model endpoints are properly secured
Performance: Consider the network latency between ai-gateway and your model servers
Monitoring: Use the observability features to monitor OSS model performance

Configuration​

Configuration Fields​

Example Usage​

Supported Providers​

Ollama​

vLLM​

Best Practices​