APIIntegrations
Python
Use the OpenAI Python SDK with OwnLLM.
The official openai Python SDK targets OpenAI-compatible endpoints
out of the box. Point its base_url at OwnLLM and use it normally.
Install
pip install openaiClient setup
import os
from openai import OpenAI
client = OpenAI(
base_url=os.environ["OPENAI_BASE_URL"], # https://acme-prod.ownllm.app/v1
api_key=os.environ["OPENAI_API_KEY"], # sk-ownllm-...
)List models
models = client.models.list()
for m in models.data:
print(m.id, m.capabilities) # capabilities is the OwnLLM-extended fieldNon-streaming completion
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response.choices[0].message.content)Streaming
stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)Tool calling
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather.",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
}]
response = client.chat.completions.create(
model="qwen2.5:32b",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
tool_choice="auto",
)
tool_calls = response.choices[0].message.tool_calls
for call in tool_calls or []:
print(call.function.name, call.function.arguments)If the chosen model doesn't support tools, you get a
model_does_not_support_tools error — catch
it and fall back to a tool-capable model.
Async
The SDK ships an async client too:
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url=os.environ["OPENAI_BASE_URL"],
api_key=os.environ["OPENAI_API_KEY"],
)
async def main():
stream = await client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Audit attribution
Pass user= so the request is attributed to a specific person in your
audit logs:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[...],
user="alice@acme.com",
)