Overview
The OwnLLM public API — an OpenAI-compatible HTTP endpoint backed by your private GPU machine.
OwnLLM exposes an OpenAI-compatible HTTP API at
https://<your-slug>.ownllm.app/v1. Anything that speaks OpenAI —
Cursor, Claude Code, OpenCode, the OpenAI SDK — works against it.
The API is a proxy: requests come in to ownllm.app, get
authenticated, scoped, audited, and then forwarded to your paired GPU
machine through an outbound Cloudflare Tunnel. Inference happens on
your hardware. The site never sees your model weights or completions.
Base URL
https://<your-slug>.ownllm.app/v1<your-slug> is the slug your tenant picked at signup. The URL is
visible on the Atlas dashboard and in the admin web at
Admin → Tenant → Public URL.
Quick start
export OPENAI_BASE_URL=https://acme-prod.ownllm.app/v1
export OPENAI_API_KEY=sk-ownllm-user-xxxxxxxxxxxxxxxx
curl $OPENAI_BASE_URL/models \
-H "Authorization: Bearer $OPENAI_API_KEY"from openai import OpenAI
client = OpenAI(
base_url="https://acme-prod.ownllm.app/v1",
api_key="sk-ownllm-user-xxxxxxxxxxxxxxxx",
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")Endpoints
| Method | Path | Description |
|---|---|---|
GET | /v1/models | List models the calling key is allowed to use. |
POST | /v1/chat/completions | Chat completion with streaming SSE. |
completions (the legacy text endpoint), embeddings, and images
are not exposed in v1. Most modern clients only use chat
completions; if you need one of the other endpoints, open an issue.
Tool calling
OwnLLM forwards tools and tool_choice to the underlying model
only if the model exposes the tools capability (Ollama
/api/show). If a request includes tools for a model without that
capability, the API returns:
{
"error": {
"type": "invalid_request_error",
"code": "model_does_not_support_tools",
"message": "The model 'llama-3.3-70b' does not support tool calling. Try 'qwen2.5:32b' or 'qwen3:32b'."
}
}This is intentional — exposing a brittle Ollama error to clients breaks tool-calling clients. The capability check happens once per model and the result is cached.
What you can't do
- Train or fine-tune — OwnLLM ships managed inference, not custom training. Fine-tuning is on the v2 roadmap.
- Embeddings — out of scope for v1; track the issue if you need it.
- DALL-E / image generation — same as above.
- Streaming function calls in OpenAI Functions v0.x format — we
follow the modern
tool_callsshape. Update your client.
Next
- Authentication — keys, scopes, budgets, rotation.
- Models endpoint — discover models a key can use.
- Chat completions — the full request / response schema.
- Integrations — Cursor, Claude Code, OpenCode, curl, Python, Node.js.