API
POST /v1/chat/completions
Create a chat completion. Streaming SSE supported. OpenAI-compatible.
POST /v1/chat/completions
Authorization: Bearer sk-ownllm-...
Content-Type: application/jsonOpenAI-compatible chat completions. Both streaming (stream: true,
SSE) and non-streaming responses are supported.
Request body
{
"model": "llama-3.3-70b",
"messages": [
{ "role": "system", "content": "You are concise." },
{ "role": "user", "content": "Why is the sky blue?" }
],
"stream": true,
"temperature": 0.7,
"max_tokens": 512,
"top_p": 0.95,
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location.",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto",
"user": "usr_xxx"
}Required
model— a model id from/v1/models.messages— array of{ role, content }(or{ role, tool_calls }for assistant turns;{ role: "tool", tool_call_id, content }for tool replies).
Optional
stream— boolean, defaultfalse. Whentrue, the response istext/event-streamwith one delta per event.temperature,top_p,max_tokens,presence_penalty,frequency_penalty,seed— standard OpenAI params, forwarded as Ollama options.tools,tool_choice— only if the model has thetoolscapability. See Tool calling.response_format—{ type: "json_object" }and{ type: "json_schema", json_schema: { ... } }are supported on compatible models.user— string identifying the end-user. Recorded in audit logs. We don't enforce it; supplying it makes audits much more useful.
Streaming response
data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}], ...}
data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}], ...}
data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{"content":" sky"},"finish_reason":null}], ...}
...
data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}], ...}
data: [DONE]The terminating data: [DONE] line matches OpenAI exactly so the
OpenAI SDK and friends work without changes.
Non-streaming response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1714500000,
"model": "llama-3.3-70b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The sky is blue because ..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 110,
"total_tokens": 130
}
}Tool calling
If the model has capabilities.tools = true, you can pass tools
and tool_choice. The response then includes tool_calls:
{
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Paris\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}If the model does not have tools: true and the request
includes tools or tool_choice, OwnLLM returns
model_does_not_support_tools.
Streaming + tool calling
Tool-call deltas stream the same way text deltas do — the OpenAI SDK handles them transparently. If you write your own client, the relevant deltas look like:
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"loc"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"ation\":\"Paris\"}"}}]}}]}Errors
See Errors for the full code list.