Google Vertex AI over gRPC in Delphi

9 June 2026 · Components

Vertex AI is Google's generative-AI platform on Google Cloud. It exposes the Gemini models through a gRPC PredictionService, where a GenerateContent call sends your prompt and returns the model's answer. sgcWebSockets Enterprise ships a typed Vertex AI gRPC client that sits on top of TsgcGRPCClient, so you can call Gemini from Delphi and C++Builder directly over gRPC, without any external runtime and without hand-assembling Protocol Buffers.

How it works

gRPC is Protocol Buffers messages framed over HTTP/2, so a Vertex AI call is an HTTP/2 request to the regional aiplatform.googleapis.com endpoint. The transport is a TsgcHTTP2Client pointed at the region host on port 443 with TLS enabled, and TsgcGRPCClient handles the framing, headers, timeouts and trailers on top of it.

The Vertex AI helper is a set of typed message classes in sgcGRPC_Google_VertexAI. You build a TsgcGRPCVertexAIGenerateContentRequest in code, serialize it with ToBytes, hand the bytes to the gRPC client, then load the reply bytes back into a TsgcGRPCVertexAIGenerateContentResponse. The request mirrors the Vertex AI schema: a request carries a model resource name and a list of contents, each content has a role and one or more parts, and a part holds either text or inline binary data.

Vertex AI uses Google Cloud service-account authentication. The client signs a service-account JWT and sends it as an authorization: Bearer header in the gRPC metadata, so every call is authenticated against the regional Vertex AI endpoint.

Authenticating with a service account

The Google Cloud client signs a self-signed service-account JWT from your service-account JSON key. That JWT is audience-bound, so it has to target the regional Vertex AI endpoint, otherwise aiplatform.googleapis.com rejects it with UNAUTHENTICATED. Once a token is acquired, add it to the gRPC client's DefaultMetadata so it travels with every call.

uses
  sgcHTTP_Google_Cloud, sgcGRPC_Client;

// CloudClient is a TsgcHTTPGoogleCloud_PubSub_Client used here only to sign the JWT
CloudClient.GoogleCloudOptions.Authentication := gcaJWT;
CloudClient.GoogleCloudOptions.JWT.ClientEmail   := ClientEmail;
CloudClient.GoogleCloudOptions.JWT.PrivateKeyId  := PrivateKeyId;
CloudClient.GoogleCloudOptions.JWT.PrivateKey.Text := PrivateKey;
CloudClient.GoogleCloudOptions.JWT.ProjectId     := ProjectId;
CloudClient.GoogleCloudOptions.JWT.API_Endpoint  :=
  'https://' + Region + '-aiplatform.googleapis.com/';

// after the token arrives in OnAuthToken
GRPC.DefaultMetadata.Clear;
GRPC.DefaultMetadata.Add('authorization', 'Bearer ' + Token);

Setting up the transport

Create a TsgcHTTP2Client for the regional host and assign it to the gRPC client. Vertex AI speaks application/grpc+proto, so leave the channel content type at grpcProto.

uses
  sgcHTTP2_Client, sgcGRPC_Client, sgcGRPC_Types;

HTTP2 := TsgcHTTP2Client.Create(nil);
HTTP2.Host := Region + '-aiplatform.googleapis.com';   // e.g. us-central1-...
HTTP2.Port := 443;
HTTP2.TLS  := True;

GRPC := TsgcGRPCClient.Create(nil);
GRPC.Client := HTTP2;
GRPC.ChannelOptions.ContentType := grpcProto;
GRPC.ChannelOptions.Compression := grpcNoCompression;

HTTP2.Active := True;

Generating content from a prompt

Build the request from the typed message classes. The model is the full resource name, the single content has the user role, and the prompt goes into a text part. Call blocks until the reply arrives and returns a TsgcGRPCResponse with the raw Data bytes, which you load into a typed response.

uses
  sgcGRPC_Client, sgcGRPC_Types, sgcGRPC_Google_VertexAI;

var
  oRequest: TsgcGRPCVertexAIGenerateContentRequest;
  oContent: TsgcGRPCVertexAIContent;
  oPart: TsgcGRPCVertexAIPart;
  oResponse: TsgcGRPCResponse;
begin
  oRequest := TsgcGRPCVertexAIGenerateContentRequest.Create;
  try
    oRequest.Model := 'projects/' + ProjectId + '/locations/' + Region +
      '/publishers/google/models/' + Model;   // e.g. gemini-2.0-flash

    oContent := oRequest.AddContent;
    oContent.Role := 'user';
    oPart := oContent.AddPart;
    oPart.Text := 'Explain gRPC in one sentence.';

    oResponse := GRPC.Call(
      'google.cloud.aiplatform.v1.PredictionService', 'GenerateContent',
      oRequest.ToBytes);

    if oResponse.StatusCode = grpcOK then
      ParseResponse(oResponse.Data)
    else
      ShowMessage('gRPC error: ' + oResponse.StatusMessage);
  finally
    oRequest.Free;
  end;
end;

Reading the response

Load the reply bytes into a TsgcGRPCVertexAIGenerateContentResponse. It exposes the candidates, each with its content parts and a finish reason, plus a UsageMetadata block with the prompt, candidate and total token counts.

procedure ParseResponse(const aData: TBytes);
var
  oResponse: TsgcGRPCVertexAIGenerateContentResponse;
  oCandidate: TsgcGRPCVertexAICandidate;
  i, j: Integer;
begin
  oResponse := TsgcGRPCVertexAIGenerateContentResponse.Create;
  try
    oResponse.LoadFromBytes(aData);
    for i := 0 to oResponse.CandidateCount - 1 do
    begin
      oCandidate := oResponse.Candidate(i);
      for j := 0 to oCandidate.Content.PartCount - 1 do
        Memo1.Lines.Add(oCandidate.Content.Part(j).Text);
    end;
    Memo1.Lines.Add('Total tokens: ' +
      IntToStr(oResponse.UsageMetadata.TotalTokenCount));
  finally
    oResponse.Free;
  end;
end;

Generation config and safety settings

The request also carries an optional GenerationConfig and a list of safety settings. Use the config to control sampling and length: Temperature, TopP, TopK, CandidateCount, MaxOutputTokens and StopSequences. Each safety setting pairs a harm Category with a blocking Threshold.

oRequest.GenerationConfig.Temperature     := 0.7;
oRequest.GenerationConfig.MaxOutputTokens := 1024;
oRequest.GenerationConfig.StopSequences.Add('END');

with oRequest.AddSafetySetting do
begin
  Category  := 7;   // HARM_CATEGORY_DANGEROUS_CONTENT
  Threshold := 2;   // BLOCK_MEDIUM_AND_ABOVE
end;

Streaming responses

Vertex AI also exposes StreamGenerateContent, the server-streaming counterpart that returns the answer as a sequence of partial chunks instead of one block. Because the Vertex AI helper is built on TsgcGRPCClient, the same typed request feeds the client's server-streaming API: start the call, decode each chunk into a TsgcGRPCVertexAIGenerateContentResponse as it arrives, and append the text from its candidates to update the UI as the model writes.

Availability

The Vertex AI gRPC client is part of the sgcWebSockets Enterprise edition. A ready-to-run sample is in Demos\21.GRPC\17.Vertex_AI: paste or load your service-account JSON key, set the project, region and model, connect, then send a prompt with Generate Content. The full reference is on the gRPC Client product page.

Questions or feedback? Get in touch. You will get a reply from the people who wrote the code.