Anthropic Claude in Delphi — Complete Tutorial (2026)

· Components

Why a Native Delphi Component for Claude?

Anthropic Claude is one of the most capable AI families on the market, but every public example uses Python or Node. For Delphi and C++Builder developers, calling the REST endpoints by hand means hand-rolling JSON, juggling Server-Sent Events, managing TLS, handling rate limits, and rewriting boilerplate every time Anthropic ships a new feature — which they do roughly every quarter. The TsgcHTTP_API_Anthropic component shipped with sgcWebSockets removes that friction. It is a strongly-typed wrapper around the full Anthropic surface — messages, streaming, vision, tool use, extended thinking, prompt caching, files, batches, and the Model Context Protocol connector — that you can drop on a form and use from any VCL, FMX, or console application.

This tutorial walks through every major capability with working Delphi code. By the end you will be able to build a chat client, a vision-enabled document analyser, an agentic tool runner, and a cost-optimised production pipeline. All snippets target the latest claude-sonnet-4-20250514 and claude-opus-4-20250514 models, and all of them run unchanged on Delphi 7 through Delphi 13.

A quick note on philosophy before we dive in. The component intentionally exposes two surfaces. The "quick" surface (methods like _CreateMessage, _CreateMessageStream, _CreateMessageWithImage) accepts a few strings and returns a string — perfect for prototypes, demos, and the 80% of calls where you do not care about temperature, top-p, metadata, or stop sequences. The "typed" surface (classes like TsgcAnthropicClass_Request_Messages and TsgcAnthropicClass_Response_Messages) gives you full control over every parameter the Anthropic API supports, with strong typing and IDE auto-completion. Use the quick API to learn; promote to typed API for production. Same component, two layers, no duplication.

1. Setup and Your First Message

Add sgcHTTP_API_Anthropic to your uses clause, create the component, set your API key (get one from console.anthropic.com), and call _CreateMessage. This is the absolute minimum to talk to Claude.

uses
  sgcHTTP_API_Anthropic;

var
  oClaude: TsgcHTTP_API_Anthropic;
  vReply : string;
begin
  oClaude := TsgcHTTP_API_Anthropic.Create(nil);
  try
    oClaude.AnthropicOptions.ApiKey := 'sk-ant-api03-...';
    vReply := oClaude._CreateMessage(
      'claude-sonnet-4-20250514',
      'Write a haiku about Pascal compilers.');
    ShowMessage(vReply);
  finally
    oClaude.Free;
  end;
end;

The component does the heavy lifting: it builds the JSON request body, sets the x-api-key and anthropic-version headers, posts to /v1/messages, and parses the response into a plain Delphi string. If you need full control over request parameters, use the typed TsgcAnthropicClass_Request_Messages class.

One operational tip: never bake the API key into the binary. Read it from an environment variable, a registry key, or a secrets manager. Anthropic now scans GitHub for leaked keys and revokes them automatically — you do not want to ship an update at 6pm Friday because someone screenshotted your .pas file.

2. Streaming Responses with SSE

Synchronous calls are fine for short prompts, but for a chat UI you want tokens to appear as Claude generates them. Anthropic streams responses as Server-Sent Events, and the component exposes them through the OnHTTPAPISSE event.

procedure TForm1.FormCreate(Sender: TObject);
begin
  oClaude := TsgcHTTP_API_Anthropic.Create(Self);
  oClaude.AnthropicOptions.ApiKey := 'sk-ant-api03-...';
  oClaude.OnHTTPAPISSE := ClaudeSSE;
end;

procedure TForm1.ClaudeSSE(Sender: TObject;
  const aEvent, aData: string; var Cancel: Boolean);
var
  vDelta: string;
begin
  // aEvent values: message_start, content_block_delta,
  //                content_block_stop, message_stop
  if aEvent = 'content_block_delta' then
  begin
    vDelta := oClaude.SSEExtractText(aData);
    Memo1.Text := Memo1.Text + vDelta;
    Application.ProcessMessages;
  end;
end;

procedure TForm1.btnAskClick(Sender: TObject);
begin
  Memo1.Clear;
  oClaude._CreateMessageStream(
    'claude-sonnet-4-20250514',
    edtPrompt.Text);
end;

One detail worth noting: streaming runs on a background thread, so update the UI through TThread.Synchronize or TThread.Queue in production code. The snippet above uses ProcessMessages for brevity. Another: the SSE stream sends multiple event types in sequence (message_start, content_block_start, repeated content_block_delta, content_block_stop, message_delta, message_stop) and you should ignore the ones you do not need. The helper SSEExtractText handles the common case of pulling the text delta out of content_block_delta; for usage stats and stop reasons you parse message_delta directly.

Streaming is essential for any user-facing chat UI — users perceive a response that starts in 400 ms as fast, even if the full answer takes ten seconds. Without streaming, they stare at a spinner for ten seconds and assume the app is broken. The cost is identical: streaming and non-streaming requests are billed the same.

3. Vision — Sending Images

Claude can analyse JPEG, PNG, GIF, and WebP images. You pass them either as a public URL or as base64-encoded bytes. The component exposes _CreateMessageWithImage for the URL case and the typed API for everything else.

var
  oRequest : TsgcAnthropicClass_Request_Messages;
  oMessage : TsgcAnthropicClass_Request_Message;
  oImage   : TsgcAnthropicClass_Request_Content_Image;
  oResponse: TsgcAnthropicClass_Response_Messages;
begin
  oRequest := TsgcAnthropicClass_Request_Messages.Create;
  try
    oRequest.Model     := 'claude-sonnet-4-20250514';
    oRequest.MaxTokens := 1024;

    oMessage := oRequest.NewMessage('user');
    oMessage.AddText('Describe what you see and read any text.');

    oImage := oMessage.AddImage;
    oImage.Source.LoadFromFile('C:\invoices\inv-2026-05-12.png');
    oImage.MediaType := 'image/png';

    oResponse := oClaude.CreateMessage(oRequest);
    try
      Memo1.Lines.Add(oResponse.Content[0].Text);
    finally
      oResponse.Free;
    end;
  finally
    oRequest.Free;
  end;
end;

Vision is ideal for OCR on scanned invoices, screenshot triage in support tickets, chart interpretation, and any task where a deterministic OCR engine would struggle with layout. Watch the token cost: a 1024x1024 image consumes roughly 1,600 input tokens. Anthropic resizes anything larger than 1568px on the long edge before processing, so there is no point uploading 4K screenshots — downscale on your side and save the bandwidth.

Practical use cases we have seen Delphi shops ship in the last year: extracting line items from supplier PDFs that were too inconsistent for traditional OCR pipelines, classifying medical imagery into broad categories before routing to specialist software, reading meter values from field-service photos, and triaging UI-bug screenshots in helpdesk tickets ("does the screenshot show a layout problem or a data problem?"). In every case the win was not raw accuracy — it was eliminating the need to write and maintain a brittle, per-document parser.

4. Tool Use (Function Calling)

Tool use lets Claude decide when to call your Pascal functions. You declare each tool with a name, description, and JSON Schema for its parameters. When Claude responds with a tool_use block instead of plain text, you execute the call and feed the result back into the conversation.

var
  oRequest: TsgcAnthropicClass_Request_Messages;
  oTool   : TsgcAnthropicClass_Request_Tool;
begin
  oRequest := TsgcAnthropicClass_Request_Messages.Create;
  oRequest.Model     := 'claude-sonnet-4-20250514';
  oRequest.MaxTokens := 1024;

  oTool := oRequest.NewTool;
  oTool.Name        := 'get_stock_price';
  oTool.Description := 'Return the current bid/ask for a US ticker symbol.';
  oTool.InputSchema :=
    '{"type":"object",' +
     '"properties":{"symbol":{"type":"string","description":"Ticker, e.g. AAPL"}},' +
     '"required":["symbol"]}';

  oRequest.NewMessage('user').AddText('What is Apple trading at?');

  oResponse := oClaude.CreateMessage(oRequest);
  if oResponse.StopReason = 'tool_use' then
  begin
    vSymbol := oResponse.ToolUse[0].InputAsJSON.S['symbol'];
    vPrice  := MyQuoteFeed.Quote(vSymbol);        // your code
    oClaude.SendToolResult(oResponse.ToolUse[0].Id,
      Format('{"bid":%.2f,"ask":%.2f}', [vPrice.Bid, vPrice.Ask]));
  end;
end;

Build agentic workflows by chaining tools: a research agent might combine web_search, read_pdf, and send_email tools. Always keep an iMaxIterations guard so a misbehaving model cannot loop forever. In production we cap at five tool calls per user turn for cost reasons; if Claude needs more, it is usually a sign the prompt or the tool design is wrong.

The single biggest determinant of tool-calling quality is the description text. Models pick the right tool with the right arguments roughly 99% of the time when descriptions are precise ("Return the current bid/ask for a US ticker symbol. Use this only for equities, not for crypto or FX"); they drop to maybe 70% with a vague description ("Get a price"). Spend the time. Add examples in the description. State what the tool does NOT do. Future-you, debugging a $0.40 hallucinated function call at 11pm, will thank present-you.

5. Extended Thinking

Claude 4 introduces a thinking mode where the model reasons through a problem step by step before answering. You allocate a thinking budget in tokens, and Claude returns the reasoning trace separately from the final answer. This is a game changer for math, code review, and multi-step analysis.

oRequest.Thinking.Enabled       := True;
oRequest.Thinking.BudgetTokens  := 8000;   // soft cap on internal reasoning
oRequest.MaxTokens              := 16000;

oRequest.NewMessage('user').AddText(
  'A train leaves Madrid at 07:00 doing 220 km/h. Another leaves ' +
  'Barcelona at 07:15 doing 250 km/h. The route is 621 km. ' +
  'Where do they meet?');

oResponse := oClaude.CreateMessage(oRequest);
MemoThinking.Lines.Text := oResponse.Thinking;   // reasoning trace
MemoAnswer.Lines.Text   := oResponse.Content[0].Text;

Use extended thinking sparingly — reasoning tokens are billed as output, so a 16k-token thinking budget on Opus 4 can easily cost more than a normal call. Reserve it for problems where correctness matters more than latency. Good fits: legal document analysis, financial reconciliation, complex SQL generation, debugging stack traces, multi-constraint scheduling. Bad fits: chat replies, content classification, simple lookups — the thinking time and cost are not justified.

A useful trick is to expose the reasoning trace in your UI as a collapsible "show thinking" section, like the public Claude app does. Power users love seeing how the model arrived at the answer; casual users ignore it. Either way you have an audit trail for free.

6. Prompt Caching

If you keep sending the same long system prompt, knowledge base, or tool definitions, prompt caching can cut costs by up to 90% and reduce time-to-first-token by 80%. You mark a content block as cacheable; Anthropic stores it on their side for 5 minutes (or 1 hour with the extended cache) and only re-bills the cheaper cache-read price on subsequent calls.

var
  oSystem: TsgcAnthropicClass_Request_System;
begin
  oSystem := oRequest.NewSystemBlock;
  oSystem.Text         := LoadFile('C:\kb\product-manual.txt'); // 50k tokens
  oSystem.CacheControl := 'ephemeral';        // mark as cacheable

  oRequest.NewMessage('user').AddText('How do I configure SSL on the server?');
  oResponse := oClaude.CreateMessage(oRequest);

  // Inspect cache stats
  ShowMessage(Format('Cache: created=%d, read=%d, input=%d, output=%d',
    [oResponse.Usage.CacheCreationInputTokens,
     oResponse.Usage.CacheReadInputTokens,
     oResponse.Usage.InputTokens,
     oResponse.Usage.OutputTokens]));
end;

Rule of thumb: anything over 1,024 tokens that you reuse within five minutes is worth caching. Big documentation corpora, few-shot examples, and large tool schemas are the obvious candidates. The accounting: cache writes cost 25% more than a normal input token, cache reads cost 10% of a normal input token. So you break even after the second hit and start saving real money from the third. For a customer support bot answering 50 questions per minute against a 40k-token knowledge base, prompt caching typically cuts the monthly Anthropic bill by 80–85%.

You can mark up to four content blocks as cacheable per request. A common pattern is: tools (cacheable, rarely changes), system prompt (cacheable, rarely changes), large document (cacheable, changes per session), recent messages (NOT cacheable, changes every turn). The component handles this layering naturally — just set CacheControl on whichever blocks you want cached.

7. MCP Connector

The Model Context Protocol lets Claude talk to remote tool servers without you having to wrap each tool by hand. Point the component at an MCP server URL and Claude can discover the tools, call them, and chain the results.

oRequest.MCPServers.Add(
  'weather-mcp',
  'https://mcp.example.com/weather',
  'Bearer ' + GetMcpToken);

oRequest.NewMessage('user').AddText(
  'What is the weather like in Madrid and should I take an umbrella?');

oResponse := oClaude.CreateMessage(oRequest);
ShowMessage(oResponse.Content[0].Text);

Combine the MCP connector with your own TsgcAI_MCP_Server (covered in a separate tutorial) and you have a fully Delphi-native agent that exposes your domain APIs to any MCP-aware AI client — Claude Desktop, Cursor, Continue, your own apps, anything that speaks the protocol. Authentication is your responsibility: pass a bearer token or signed header, and validate on the server side. Anthropic does not see your credentials — the MCP connector handshake routes the token from the request to the target server.

For multi-tenant SaaS deployments, the typical pattern is a single MCP server per tenant, with the tenant ID embedded in the URL or the bearer token. Claude calls each tenant's tools without ever cross-contaminating data. We have seen production deployments fan out to 200+ MCP servers from a single conversation.

Production Checklist

Before you flip an Anthropic-powered Delphi app into production, walk through this short list:

ConcernHow to handle it
API key storageEnvironment variable or OS secret store, never hard-coded
Retries on 429 / 529Exponential backoff with jitter, max 3 attempts
Cost ceilingsTrack Usage per request, refuse new calls past daily budget
PII redactionStrip emails/SSN/CC numbers before sending to the API
Model version pinningUse full dated model names; do not rely on "latest" aliases
Prompt versioningStore system prompts in source control alongside code
TelemetryLog model, input tokens, output tokens, latency per call

Where to Go Next

This tutorial covered the eight features you need 95% of the time. The component also supports message batches (cheap async processing of thousands of prompts — 50% cheaper than synchronous calls, ideal for nightly enrichment jobs), the Files API (upload once, reference forever — perfect for big PDFs you query repeatedly), token counting (estimate costs before paying), and structured JSON outputs (forced schema conformance, no more parse errors). Browse the Anthropic component page for the full feature matrix and head to the Getting Started hub if you have not installed sgcWebSockets yet.

And if you build something interesting with Claude in Delphi — an agent, a copilot, a doc analyser — tell us. We love seeing what Pascal developers do once the AI friction disappears.