Local inference, OpenAI API surface
Ollama exposes a fully OpenAI-compatible REST endpoint on localhost — which means sgcWebSockets connects to it with identical code to any cloud provider. Switch between Llama, Phi, Gemma, or Mistral models by changing a single string. Run your development loop entirely offline, test against multiple model families without API costs, and deploy to air-gapped customer environments without rearchitecting a single component. Local AI has never been this transparent to your existing stack.
Cut AI costs to near zero
Every token you send to a cloud AI provider costs money — and leaves your infrastructure. Ollama eliminates both problems. Running entirely on your own hardware, it processes requests locally with no per-call fees, no usage caps, and no third-party visibility into your business data. For companies in healthcare, legal, finance, or government — where data residency is non-negotiable — Ollama integrated into your Delphi system is not just cost-effective, it is the only compliant path forward.
An AI that works when the internet doesn't
With Ollama running locally inside your Delphi application, the AI assistant is always available — whether you are on a plane, in a basement, or behind a corporate firewall. And unlike cloud-based AI, nothing you type or upload ever leaves your device. Your documents, your queries, your ideas stay yours. It responds in seconds, works completely offline, and asks nothing in return except a bit of your machine's processing power.