Streaming LLM Responses in Delphi: Token-by-Token with Server-Sent Events

7 June 2026 · Components

Quick answer: To stream an LLM response in Delphi with sgcWebSockets, assign the component's OnHTTPAPISSE event, then call the streaming method — _CreateChatCompletion with Stream := True for OpenAI, _CreateMessageStream for Anthropic Claude and Ollama, _CreateContentStream for Gemini. Each delta arrives in the handler as a Server-Sent Event; append aData to a Memo and the answer types itself out as the model generates it.

A non-streaming LLM call blocks until the whole answer is ready. For a one-paragraph reply that is fine, but for a long completion the user stares at a frozen window for several seconds with no feedback. Streaming fixes that: the model returns its output as a sequence of small chunks over a single HTTP connection, and you render each chunk the moment it lands. The result is the familiar "typing" effect you see in ChatGPT, and a UI that feels responsive even when the full answer takes a while.

How streaming works: OnHTTPAPISSE

Under the hood every sgcWebSockets AI component streams using Server-Sent Events (SSE). The provider keeps the response open and pushes events as tokens are produced. The component parses those events and raises one event per chunk through a single handler with the same signature on every component:

procedure TForm1.HandleSSE(Sender: TObject;
  const aEvent, aData: string; var Cancel: Boolean);
begin
  // aEvent  -> the SSE event name (provider-specific)
  // aData   -> the payload for this chunk (the token delta)
  // Cancel  -> set True to abort the stream early
  Memo1.Lines.Add(aData);
end;

That is the whole contract. aData carries the incremental payload, aEvent tells you which kind of event it is (useful with providers that emit several event types), and setting Cancel := True stops the stream, for example when the user clicks a Stop button. The same handler shape works for OpenAI, Anthropic, Gemini and Ollama, so once you have written it once you can reuse it across providers.

OpenAI: Stream := True

With TsgcHTTP_API_OpenAI you opt into streaming on the request and hook the event. Set OpenAIOptions.ApiKey, assign OnHTTPAPISSE, and the chat completion is delivered chunk by chunk instead of in one block:

uses
  sgcHTTP_API_OpenAI;

var
  OpenAI: TsgcHTTP_API_OpenAI;
begin
  OpenAI := TsgcHTTP_API_OpenAI.Create(nil);
  OpenAI.OpenAIOptions.ApiKey := 'sk-...';

  // Each token delta arrives in HandleSSE
  OpenAI.OnHTTPAPISSE := HandleSSE;

  // Build a typed request, set Stream := True, then call
  OpenAI._CreateChatCompletion('gpt-4o-mini', 'Explain WebSockets in detail.');
end;

The typed request exposes a Stream property, so when you build the request object yourself you set Stream := True before sending. Tokens then surface through OnHTTPAPISSE as they are generated, and you append each one to your Memo.

Anthropic Claude: _CreateMessageStream

Claude exposes a dedicated streaming helper, so there is no separate flag to flip: calling _CreateMessageStream on TsgcHTTP_API_Anthropic turns on SSE for that request. Set the API key and version, assign the handler, and call it:

uses
  sgcHTTP_API_Anthropic;

var
  Anthropic: TsgcHTTP_API_Anthropic;
begin
  Anthropic := TsgcHTTP_API_Anthropic.Create(nil);
  Anthropic.AnthropicOptions.ApiKey := 'sk-ant-...';
  Anthropic.AnthropicOptions.AnthropicVersion := '2023-06-01';

  Anthropic.OnHTTPAPISSE := HandleSSE;
  Anthropic._CreateMessageStream(
    'claude-3-5-sonnet-latest',
    'Summarise RFC 6455',
    1024);
end;

Claude emits several SSE event types as it streams (content blocks start, deltas, then stop). The aEvent argument lets you tell them apart if you need to; for a simple "show the text as it arrives" UI, appending aData is enough.

Gemini and Ollama: the same shape

Google Gemini follows the identical pattern with its own streaming method, _CreateContentStream on TsgcHTTP_API_Gemini:

Gemini.OnHTTPAPISSE := HandleSSE;
Gemini._CreateContentStream(
  'gemini-2.0-flash',
  'Explain quantum entanglement',
  1024);

Local models run exactly the same way. TsgcHTTP_API_Ollama needs no API key — point OllamaOptions.BaseUrl at http://localhost:11434/api and call _CreateMessageStream, and the open model on your own hardware streams back through the same OnHTTPAPISSE handler:

Ollama.OllamaOptions.BaseUrl := 'http://localhost:11434/api';
Ollama.OnHTTPAPISSE := HandleSSE;
Ollama._CreateMessageStream('llama3', 'Summarise RFC 6455');

Four providers, one event, one method call each. Switching the streaming backend is a localized edit, not a rewrite.

Updating the UI safely

A couple of practical notes for the handler. First, keep the work inside OnHTTPAPISSE small — append the delta and return. Heavy per-token processing will make the stream feel choppy, so accumulate text and do expensive formatting once the stream finishes. Second, mind the thread context. If you start the request from a background thread, the SSE event fires on that thread, and touching VCL or FMX controls off the main thread is not safe. In that case marshal the update back with TThread.Synchronize (or TThread.Queue for a non-blocking append):

procedure TForm1.HandleSSE(Sender: TObject;
  const aEvent, aData: string; var Cancel: Boolean);
begin
  TThread.Queue(nil,
    procedure
    begin
      Memo1.SelStart := Length(Memo1.Text);
      Memo1.SelText := aData; // append at the caret, no full repaint
    end);
end;

Appending with SelText rather than Lines.Add avoids reflowing the whole Memo on every token, which keeps a long stream smooth. If you call the API from the main thread, you can drop the TThread.Queue wrapper and update the control directly.

Getting started

All of these components ship in sgcWebSockets. Grab the free trial, drop in the component for the provider you want, assign OnHTTPAPISSE and call the streaming method — you will have a token-by-token UI in a few lines. See the OpenAI component page and the Anthropic component page for the full method reference, or browse every model on the AI & LLM components hub.

Questions or feedback? Get in touch — you will get a reply from the people who wrote the code.