Overview

The OwnLLM public API — an OpenAI-compatible HTTP endpoint backed by your private GPU machine.

OwnLLM exposes an OpenAI-compatible HTTP API at https://<your-slug>.ownllm.app/v1. Anything that speaks OpenAI — Cursor, Claude Code, OpenCode, the OpenAI SDK — works against it.

The API is a proxy: requests come in to ownllm.app, get authenticated, scoped, audited, and then forwarded to your paired GPU machine through an outbound Cloudflare Tunnel. Inference happens on your hardware. The site never sees your model weights or completions.

Base URL

https://<your-slug>.ownllm.app/v1

<your-slug> is the slug your tenant picked at signup. The URL is visible on the Atlas dashboard and in the admin web at Admin → Tenant → Public URL.

Quick start

export OPENAI_BASE_URL=https://acme-prod.ownllm.app/v1
export OPENAI_API_KEY=sk-ownllm-user-xxxxxxxxxxxxxxxx

curl $OPENAI_BASE_URL/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

from openai import OpenAI

client = OpenAI(
    base_url="https://acme-prod.ownllm.app/v1",
    api_key="sk-ownllm-user-xxxxxxxxxxxxxxxx",
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Endpoints

Method	Path	Description
`GET`	`/v1/models`	List models the calling key is allowed to use.
`POST`	`/v1/chat/completions`	Chat completion with streaming SSE.

completions (the legacy text endpoint), embeddings, and images are not exposed in v1. Most modern clients only use chat completions; if you need one of the other endpoints, open an issue.

Tool calling

OwnLLM forwards tools and tool_choice to the underlying model only if the model exposes the tools capability (Ollama /api/show). If a request includes tools for a model without that capability, the API returns:

{
  "error": {
    "type": "invalid_request_error",
    "code": "model_does_not_support_tools",
    "message": "The model 'llama-3.3-70b' does not support tool calling. Try 'qwen2.5:32b' or 'qwen3:32b'."
  }
}

This is intentional — exposing a brittle Ollama error to clients breaks tool-calling clients. The capability check happens once per model and the result is cached.

What you can't do

Train or fine-tune — OwnLLM ships managed inference, not custom training. Fine-tuning is on the v2 roadmap.
Embeddings — out of scope for v1; track the issue if you need it.
DALL-E / image generation — same as above.
Streaming function calls in OpenAI Functions v0.x format — we follow the modern tool_calls shape. Update your client.

Authentication — keys, scopes, budgets, rotation.
Models endpoint — discover models a key can use.
Chat completions — the full request / response schema.
Integrations — Cursor, Claude Code, OpenCode, curl, Python, Node.js.

Base URL

Quick start

Endpoints

Tool calling

What you can't do

Next

On this page