POST /v1/chat/completions

POST /v1/chat/completions
Authorization: Bearer sk-ownllm-...
Content-Type: application/json

OpenAI-compatible chat completions. Both streaming (stream: true, SSE) and non-streaming responses are supported.

Request body

{
  "model": "llama-3.3-70b",
  "messages": [
    { "role": "system", "content": "You are concise." },
    { "role": "user", "content": "Why is the sky blue?" }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 512,
  "top_p": 0.95,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather in a location.",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "user": "usr_xxx"
}

Required

model — a model id from /v1/models.
messages — array of { role, content } (or { role, tool_calls } for assistant turns; { role: "tool", tool_call_id, content } for tool replies).

Optional

stream — boolean, default false. When true, the response is text/event-stream with one delta per event.
temperature, top_p, max_tokens, presence_penalty, frequency_penalty, seed — standard OpenAI params, forwarded as Ollama options.
tools, tool_choice — only if the model has the tools capability. See Tool calling.
response_format — { type: "json_object" } and { type: "json_schema", json_schema: { ... } } are supported on compatible models.
user — string identifying the end-user. Recorded in audit logs. We don't enforce it; supplying it makes audits much more useful.

Streaming response

data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}], ...}

data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}], ...}

data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{"content":" sky"},"finish_reason":null}], ...}

...

data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}], ...}

data: [DONE]

The terminating data: [DONE] line matches OpenAI exactly so the OpenAI SDK and friends work without changes.

Non-streaming response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1714500000,
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The sky is blue because ..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 110,
    "total_tokens": 130
  }
}

Tool calling

If the model has capabilities.tools = true, you can pass tools and tool_choice. The response then includes tool_calls:

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Paris\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

If the model does not have tools: true and the request includes tools or tool_choice, OwnLLM returns model_does_not_support_tools.

Streaming + tool calling

Tool-call deltas stream the same way text deltas do — the OpenAI SDK handles them transparently. If you write your own client, the relevant deltas look like:

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather"}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"loc"}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"ation\":\"Paris\"}"}}]}}]}

Errors

See Errors for the full code list.

POST /v1/chat/completions

On this page