Python

The official openai Python SDK targets OpenAI-compatible endpoints out of the box. Point its base_url at OwnLLM and use it normally.

Install

pip install openai

Client setup

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["OPENAI_BASE_URL"],   # https://acme-prod.ownllm.app/v1
    api_key=os.environ["OPENAI_API_KEY"],     # sk-ownllm-...
)

List models

models = client.models.list()
for m in models.data:
    print(m.id, m.capabilities)  # capabilities is the OwnLLM-extended field

Non-streaming completion

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"],
        },
    },
}]

response = client.chat.completions.create(
    model="qwen2.5:32b",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
    tool_choice="auto",
)
tool_calls = response.choices[0].message.tool_calls
for call in tool_calls or []:
    print(call.function.name, call.function.arguments)

If the chosen model doesn't support tools, you get a model_does_not_support_tools error — catch it and fall back to a tool-capable model.

Async

The SDK ships an async client too:

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url=os.environ["OPENAI_BASE_URL"],
    api_key=os.environ["OPENAI_API_KEY"],
)

async def main():
    stream = await client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[{"role": "user", "content": "Hello"}],
        stream=True,
    )
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

Audit attribution

Pass user= so the request is attributed to a specific person in your audit logs:

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[...],
    user="alice@acme.com",
)

On this page