A native Delphi client for the OpenAI REST and Realtime APIs. Chat completions with token streaming, function calling and structured outputs; embeddings; Whisper speech-to-text; TTS; DALL-E image generation; Assistants v2 with file search; and the GPT-4o Realtime audio API over WebSocket — all from a single Pascal component.
Calling the ChatGPT API from Delphi used to mean hand-rolling HTTP requests, JSON serialisers, multipart uploaders and SSE stream parsers. Drop in one component instead.
A Delphi OpenAI client is the bridge between your VCL / FMX application and the OpenAI platform: ChatGPT for natural-language reasoning, embeddings for semantic search and RAG, Whisper for transcription, TTS for spoken output, DALL-E for image generation and the Realtime API for low-latency voice agents. sgcWebSockets ships TsgcHTTP_API_OpenAI, a single non-visual component that wraps every public endpoint with Pascal-idiomatic properties, methods and events.
Because the component is built on the same HTTP/2 stack as the rest of the library, you get token streaming via Server-Sent Events, full function calling (now “tools”) with automatic JSON-schema marshalling, parallel tool calls, structured outputs (JSON-schema-constrained responses), the Assistants v2 thread/run/message lifecycle, vector stores for file search, and the GPT-4o Realtime API over WebSocket with bi-directional audio streaming. The same component talks to Azure OpenAI, the OpenAI-compatible endpoints exposed by Anthropic, Groq, Together, Mistral and DeepSeek, and local OpenAI-compatible servers (Ollama, LM Studio, vLLM).
One method per endpoint, strongly-typed parameters, async events for streaming responses.
Chat completions
ChatCompletions() with full message-array support, system / user / assistant / tool roles, vision (image_url content), reasoning models (o1, o3) and JSON-schema structured outputs.
Streaming
StreamChatCompletions() raises OnChatCompletionStreamChunk per delta token — type-writer UX in your VCL grid or FMX text-view with zero SSE parsing code.
Function calling / tools
Register a list of tools with JSON-schema parameters; when the model calls one, you receive OnChatCompletionToolCall with the parsed arguments. Push the result back and resume.
Embeddings
Embeddings() with text-embedding-3-small / -large — the foundation for semantic search, clustering and RAG over your Pascal data.
Whisper (STT)
AudioTranscription() and AudioTranslation() — upload a WAV/MP3/M4A, get a transcript or English translation, with timestamps and word-level confidence.
TTS
AudioSpeech() returns synthesised speech in MP3, Opus, AAC or FLAC — pick a voice, stream-play the result.
DALL-E
ImageGeneration(), ImageEdit(), ImageVariation() with DALL-E 2 / 3 and gpt-image-1. Returns URLs or base64-encoded PNG.
Assistants v2
Threads, runs, messages, files, vector stores and code interpreter — with the run-streaming events so you don’t poll.
Realtime API
Bi-directional WebSocket with input/output audio buffers, voice activity detection, function calling and GPT-4o latency — build voice agents that interrupt naturally.
QuickStart
Streaming chat with a function tool
A ChatGPT call that streams tokens and can invoke a Delphi function mid-conversation.
uses
sgcHTTP_API_OpenAI, sgcHTTP_API_OpenAI_Types;
var
OpenAI: TsgcHTTP_API_OpenAI;
oChat: TsgcHTTPOpenAIChatCompletionRequest;
oTool: TsgcHTTPOpenAITool;
begin
OpenAI := TsgcHTTP_API_OpenAI.Create(nil);
OpenAI.ApiKey := 'sk-...';
OpenAI.OnChatCompletionStreamChunk := DoChunk;
OpenAI.OnChatCompletionToolCall := DoToolCall;
oChat := TsgcHTTPOpenAIChatCompletionRequest.Create;
try
oChat.Model := 'gpt-4o';
oChat.Messages.AddSystem('You are a Delphi assistant.');
oChat.Messages.AddUser('What is the weather in Madrid?');
oTool := oChat.Tools.AddFunction('get_weather',
'Return current weather for a city.');
oTool.Parameters
.AddString('city', 'City name', True);
OpenAI.StreamChatCompletions(oChat);
finally
oChat.Free;
end;
end;
procedure TForm1.DoChunk(Sender: TObject;
const aChunk: TsgcHTTPOpenAIChatCompletionStreamChunk);
begin
Memo1.Text := Memo1.Text + aChunk.Content;
end;
procedure TForm1.DoToolCall(Sender: TObject;
const aToolCall: TsgcHTTPOpenAIChatCompletionToolCall);
var
vCity, vJSON: string;
beginif aToolCall.FunctionName = 'get_weather'thenbegin
vCity := aToolCall.Arguments.S['city'];
vJSON := MyWeatherLookup(vCity); // your own Delphi code
OpenAI.SubmitToolOutput(aToolCall.Id, vJSON);
end;
end;
Realtime API
GPT-4o voice agents over WebSocket
The OpenAI Realtime API is a WebSocket endpoint that accepts a stream of input audio frames and emits a stream of output audio frames — with sub-second latency, native barge-in (the model stops talking when the user starts), server-side voice activity detection and the same tool-calling surface as the REST API. sgcWebSockets wraps it with TsgcHTTP_API_OpenAI_Realtime, which gives you typed events for every server message (OnSessionUpdated, OnConversationItemCreated, OnInputAudioBufferSpeechStarted, OnResponseAudioDelta, etc.) so you can plug it straight into TBass, TMediaPlayer or any audio queue.
Compatibility
Same component, multiple backends
Azure OpenAI
Set Endpoint to your Azure resource URL and ApiKey to the resource key — deployment IDs replace model names.
Ollama / LM Studio
Point Endpoint at http://localhost:11434/v1 — the same component drives local Llama, Mistral, Qwen and Phi models.
DeepSeek, Groq, Together, Mistral
All expose OpenAI-compatible endpoints; only the base URL and API key change.
vLLM, SGLang, llama.cpp server
Self-hosted inference servers with OpenAI-compatible REST — the streaming SSE path is byte-identical.