Cursor
Connect Cursor to your private OwnLLM instance via the OpenAI-compatible API.
Cursor supports custom OpenAI-compatible endpoints. Setting one to OwnLLM gives you private chat and autocomplete from a model running on your own machine.
Settings
In Cursor, open Settings → Models:
- Toggle Override OpenAI Base URL on.
- Set the base URL to
https://<your-slug>.ownllm.app/v1. - Paste your OwnLLM API key (
sk-ownllm-...). - Click Verify — Cursor calls
GET /v1/modelsto confirm.
Recommended models
Cursor's chat works best with code-tuned, tool-capable models:
| Use case | Model | Why |
|---|---|---|
| Inline edits | qwen2.5-coder:32b | Fast, code-tuned, supports tools. |
| Tab completion | qwen2.5-coder:14b | Latency-optimised. |
| Long-context refactors | qwen3:32b | 128k context, thinking-capable. |
For Apple Silicon hosts, apple-mlx-coder
(qwen3.5:35b-a3b-coding-nvfp4) is the right pick — MLX accelerates
inference noticeably for code workloads on Apple Silicon.
Toggle off OpenAI
When the override is on, Cursor sends all OpenAI-shape traffic to
OwnLLM — including features like Chat or Apply that previously hit
api.openai.com. If something doesn't work as expected, double-check
the model has the right capability (tools, thinking).
Privacy
Cursor still ships some non-LLM features that talk to Cursor's own backend (cloud rules, account, telemetry). The override only redirects LLM traffic. If you need a fully air-gapped editor, see Claude Code or OpenCode, both of which can run without any non-LLM cloud calls.
Troubleshooting
Verify fails. Wrong slug, expired key, or the agent is offline.
Check the Atlas dashboard or run ownllm status.
model_does_not_support_tools. Cursor will pass tools for
features that need them — the model you picked doesn't support tool
calling. Switch to one that does (qwen2.5-coder:32b,
llama-3.3-70b, qwen3:32b).
Slow autocomplete. The model is too big for the host or
num_parallel is too low. Pick a smaller model or raise
num_parallel with ownllm models config.