1. Architecture
Inference runs entirely on the GPU machine paired by the customer. Chat and API payloads are routed through the OwnLLM control plane and outbound tunnel, while hosted conversation history is governed by plan-based retention and encryption.
No inbound port is opened on the customer side. Communication between the site and the agent uses an outbound Cloudflare Tunnel initiated by the app and terminated with TLS on Cloudflare infrastructure.