Cursor

Cursor supports custom OpenAI-compatible endpoints. Setting one to OwnLLM gives you private chat and autocomplete from a model running on your own machine.

Settings

In Cursor, open Settings → Models:

Toggle Override OpenAI Base URL on.
Set the base URL to https://<your-slug>.ownllm.app/v1.
Paste your OwnLLM API key (sk-ownllm-...).
Click Verify — Cursor calls GET /v1/models to confirm.

Recommended models

Cursor's chat works best with code-tuned, tool-capable models:

Use case	Model	Why
Inline edits	`qwen2.5-coder:32b`	Fast, code-tuned, supports `tools`.
Tab completion	`qwen2.5-coder:14b`	Latency-optimised.
Long-context refactors	`qwen3:32b`	128k context, thinking-capable.

For Apple Silicon hosts, apple-mlx-coder (qwen3.5:35b-a3b-coding-nvfp4) is the right pick — MLX accelerates inference noticeably for code workloads on Apple Silicon.

When the override is on, Cursor sends all OpenAI-shape traffic to OwnLLM — including features like Chat or Apply that previously hit api.openai.com. If something doesn't work as expected, double-check the model has the right capability (tools, thinking).

Privacy

Cursor still ships some non-LLM features that talk to Cursor's own backend (cloud rules, account, telemetry). The override only redirects LLM traffic. If you need a fully air-gapped editor, see Claude Code or OpenCode, both of which can run without any non-LLM cloud calls.

Troubleshooting

Verify fails. Wrong slug, expired key, or the agent is offline. Check the Atlas dashboard or run ownllm status.

model_does_not_support_tools. Cursor will pass tools for features that need them — the model you picked doesn't support tool calling. Switch to one that does (qwen2.5-coder:32b, llama-3.3-70b, qwen3:32b).

Slow autocomplete. The model is too big for the host or num_parallel is too low. Pick a smaller model or raise num_parallel with ownllm models config.

Settings

Recommended models

Toggle off OpenAI

Privacy

Troubleshooting

On this page