在 Delphi 中通过 gRPC 使用 Google Vertex AI

2026年6月9日 · 组件

Vertex AI 是 Google 在 Google Cloud 上的生成式 AI 平台。它通过一个 gRPC PredictionService 提供 Gemini 模型，其中一次 GenerateContent 调用发送你的提示词并返回模型的回答。sgcWebSockets Enterprise 提供了一个位于 TsgcGRPCClient 之上的强类型 Vertex AI gRPC 客户端，因此你可以直接从 Delphi 和 C++Builder 通过 gRPC 调用 Gemini，无需任何外部运行时，也无需手工组装 Protocol Buffers。

工作原理

gRPC 是通过 HTTP/2 封帧的 Protocol Buffers 消息，因此一次 Vertex AI 调用就是一个发往区域性 aiplatform.googleapis.com 端点的 HTTP/2 请求。传输是一个指向区域主机端口 443、启用 TLS 的 TsgcHTTP2Client，而 TsgcGRPCClient 在其之上处理封帧、请求头、超时和尾部。

Vertex AI 辅助组件是 sgcGRPC_Google_VertexAI 中的一组强类型消息类。你在代码中构建一个 TsgcGRPCVertexAIGenerateContentRequest，用 ToBytes 序列化它，将字节交给 gRPC 客户端，然后将回复字节加载回一个 TsgcGRPCVertexAIGenerateContentResponse。请求镜像 Vertex AI 架构：一个请求携带一个模型资源名称和一个内容列表，每个内容有一个角色和一个或多个部分，而一个部分保存文本或内联二进制数据。

Vertex AI 使用 Google Cloud 服务账号身份验证。客户端签名一个服务账号 JWT 并将它作为 gRPC 元数据中的 authorization: Bearer 头发送，因此每个调用都针对区域性 Vertex AI 端点进行了身份验证。

使用服务账号进行身份验证

Google Cloud 客户端从你的服务账号 JSON 密钥签名一个自签名的服务账号 JWT。该 JWT 是受众绑定的，因此它必须以区域性 Vertex AI 端点为目标，否则 aiplatform.googleapis.com 会用 UNAUTHENTICATED 拒绝它。一旦获取到令牌，就将它添加到 gRPC 客户端的 DefaultMetadata 中，以便它随每个调用一起传递。

uses
  sgcHTTP_Google_Cloud, sgcGRPC_Client;

// CloudClient is a TsgcHTTPGoogleCloud_PubSub_Client used here only to sign the JWT
CloudClient.GoogleCloudOptions.Authentication := gcaJWT;
CloudClient.GoogleCloudOptions.JWT.ClientEmail   := ClientEmail;
CloudClient.GoogleCloudOptions.JWT.PrivateKeyId  := PrivateKeyId;
CloudClient.GoogleCloudOptions.JWT.PrivateKey.Text := PrivateKey;
CloudClient.GoogleCloudOptions.JWT.ProjectId     := ProjectId;
CloudClient.GoogleCloudOptions.JWT.API_Endpoint  :=
  'https://' + Region + '-aiplatform.googleapis.com/';

// after the token arrives in OnAuthToken
GRPC.DefaultMetadata.Clear;
GRPC.DefaultMetadata.Add('authorization', 'Bearer ' + Token);

设置传输

为区域主机创建一个 TsgcHTTP2Client 并将它赋给 gRPC 客户端。Vertex AI 使用 application/grpc+proto，因此将通道内容类型保持为 grpcProto。

uses
  sgcHTTP2_Client, sgcGRPC_Client, sgcGRPC_Types;

HTTP2 := TsgcHTTP2Client.Create(nil);
HTTP2.Host := Region + '-aiplatform.googleapis.com';   // e.g. us-central1-...
HTTP2.Port := 443;
HTTP2.TLS  := True;

GRPC := TsgcGRPCClient.Create(nil);
GRPC.Client := HTTP2;
GRPC.ChannelOptions.ContentType := grpcProto;
GRPC.ChannelOptions.Compression := grpcNoCompression;

HTTP2.Active := True;

从提示词生成内容

从强类型消息类构建请求。模型是完整的资源名称，单个内容具有 user 角色，提示词放入一个文本部分中。Call 会阻塞，直到回复到达，并返回一个带有原始 Data 字节的 TsgcGRPCResponse，你将其加载到一个强类型响应中。

uses
  sgcGRPC_Client, sgcGRPC_Types, sgcGRPC_Google_VertexAI;

var
  oRequest: TsgcGRPCVertexAIGenerateContentRequest;
  oContent: TsgcGRPCVertexAIContent;
  oPart: TsgcGRPCVertexAIPart;
  oResponse: TsgcGRPCResponse;
begin
  oRequest := TsgcGRPCVertexAIGenerateContentRequest.Create;
  try
    oRequest.Model := 'projects/' + ProjectId + '/locations/' + Region +
      '/publishers/google/models/' + Model;   // e.g. gemini-2.0-flash

    oContent := oRequest.AddContent;
    oContent.Role := 'user';
    oPart := oContent.AddPart;
    oPart.Text := 'Explain gRPC in one sentence.';

    oResponse := GRPC.Call(
      'google.cloud.aiplatform.v1.PredictionService', 'GenerateContent',
      oRequest.ToBytes);

    if oResponse.StatusCode = grpcOK then
      ParseResponse(oResponse.Data)
    else
      ShowMessage('gRPC error: ' + oResponse.StatusMessage);
  finally
    oRequest.Free;
  end;
end;

读取响应

将回复字节加载到一个 TsgcGRPCVertexAIGenerateContentResponse 中。它暴露候选结果，每个候选带有它的内容部分和一个完成原因，再加上一个 UsageMetadata 块，包含提示词、候选和总的 token 计数。

procedure ParseResponse(const aData: TBytes);
var
  oResponse: TsgcGRPCVertexAIGenerateContentResponse;
  oCandidate: TsgcGRPCVertexAICandidate;
  i, j: Integer;
begin
  oResponse := TsgcGRPCVertexAIGenerateContentResponse.Create;
  try
    oResponse.LoadFromBytes(aData);
    for i := 0 to oResponse.CandidateCount - 1 do
    begin
      oCandidate := oResponse.Candidate(i);
      for j := 0 to oCandidate.Content.PartCount - 1 do
        Memo1.Lines.Add(oCandidate.Content.Part(j).Text);
    end;
    Memo1.Lines.Add('Total tokens: ' +
      IntToStr(oResponse.UsageMetadata.TotalTokenCount));
  finally
    oResponse.Free;
  end;
end;

生成配置和安全设置

请求还携带一个可选的 GenerationConfig 和一个安全设置列表。使用该配置来控制采样和长度：Temperature、TopP、TopK、CandidateCount、MaxOutputTokens 和 StopSequences。每个安全设置将一个危害 Category 与一个阻断 Threshold 配对。

oRequest.GenerationConfig.Temperature     := 0.7;
oRequest.GenerationConfig.MaxOutputTokens := 1024;
oRequest.GenerationConfig.StopSequences.Add('END');

with oRequest.AddSafetySetting do
begin
  Category  := 7;   // HARM_CATEGORY_DANGEROUS_CONTENT
  Threshold := 2;   // BLOCK_MEDIUM_AND_ABOVE
end;

流式响应

Vertex AI 还提供 StreamGenerateContent，这是服务器流式的对应方法，它以一系列部分块而不是一整块的形式返回回答。由于 Vertex AI 辅助组件构建在 TsgcGRPCClient 之上，因此同一个强类型请求可以喂给客户端的服务器流式 API：启动调用，随着每个块到达将其解码为一个 TsgcGRPCVertexAIGenerateContentResponse，并追加它候选中的文本，以在模型书写时更新 UI。

可用性

Vertex AI gRPC 客户端是 sgcWebSockets Enterprise 版本的一部分。一个开箱即用的示例位于 Demos\21.GRPC\17.Vertex_AI：粘贴或加载你的服务账号 JSON 密钥，设置项目、区域和模型，连接，然后用 Generate Content 发送一个提示词。完整参考资料请见gRPC Client 产品页面。

有问题或反馈？联系我们。你会收到来自编写这段代码的人的回复。