OwnLLM Docs
APIIntegrations

Cursor

Connect Cursor to your private OwnLLM instance via the OpenAI-compatible API.

Cursor supports custom OpenAI-compatible endpoints. Setting one to OwnLLM gives you private chat and autocomplete from a model running on your own machine.

Settings

In Cursor, open Settings → Models:

  1. Toggle Override OpenAI Base URL on.
  2. Set the base URL to https://<your-slug>.ownllm.app/v1.
  3. Paste your OwnLLM API key (sk-ownllm-...).
  4. Click Verify — Cursor calls GET /v1/models to confirm.

Cursor's chat works best with code-tuned, tool-capable models:

Use caseModelWhy
Inline editsqwen2.5-coder:32bFast, code-tuned, supports tools.
Tab completionqwen2.5-coder:14bLatency-optimised.
Long-context refactorsqwen3:32b128k context, thinking-capable.

For Apple Silicon hosts, apple-mlx-coder (qwen3.5:35b-a3b-coding-nvfp4) is the right pick — MLX accelerates inference noticeably for code workloads on Apple Silicon.

Toggle off OpenAI

When the override is on, Cursor sends all OpenAI-shape traffic to OwnLLM — including features like Chat or Apply that previously hit api.openai.com. If something doesn't work as expected, double-check the model has the right capability (tools, thinking).

Privacy

Cursor still ships some non-LLM features that talk to Cursor's own backend (cloud rules, account, telemetry). The override only redirects LLM traffic. If you need a fully air-gapped editor, see Claude Code or OpenCode, both of which can run without any non-LLM cloud calls.

Troubleshooting

Verify fails. Wrong slug, expired key, or the agent is offline. Check the Atlas dashboard or run ownllm status.

model_does_not_support_tools. Cursor will pass tools for features that need them — the model you picked doesn't support tool calling. Switch to one that does (qwen2.5-coder:32b, llama-3.3-70b, qwen3:32b).

Slow autocomplete. The model is too big for the host or num_parallel is too low. Pick a smaller model or raise num_parallel with ownllm models config.

On this page