Delphi에서 gRPC로 Google Vertex AI 사용하기

2026년 6월 9일 · 컴포넌트

Vertex AI는 Google Cloud의 생성형 AI 플랫폼입니다. gRPC PredictionService를 통해 Gemini 모델을 제공하며, GenerateContent 호출이 프롬프트를 보내고 모델의 답변을 반환합니다. sgcWebSockets Enterprise는 TsgcGRPCClient 위에 위치하는 형식화된 Vertex AI gRPC 클라이언트를 함께 제공하므로, 외부 런타임이나 Protocol Buffers를 직접 조립하지 않고 Delphi와 C++Builder에서 gRPC를 통해 Gemini를 직접 호출할 수 있습니다.

작동 방식

gRPC는 HTTP/2 위에 프레이밍된 Protocol Buffers 메시지이므로, Vertex AI 호출은 리전별 aiplatform.googleapis.com 엔드포인트로의 HTTP/2 요청입니다. 전송은 TLS가 활성화된 443 포트의 리전 호스트를 가리키는 TsgcHTTP2Client이며, 그 위에서 TsgcGRPCClient가 프레이밍, 헤더, 타임아웃, 트레일러를 처리합니다.

Vertex AI 헬퍼는 sgcGRPC_Google_VertexAI의 형식화된 메시지 클래스 모음입니다. 코드에서 TsgcGRPCVertexAIGenerateContentRequest를 만들고, ToBytes로 직렬화하고, 그 바이트를 gRPC 클라이언트에 넘긴 다음, 응답 바이트를 TsgcGRPCVertexAIGenerateContentResponse로 다시 로드합니다. 요청은 Vertex AI 스키마를 반영합니다. 요청은 모델 리소스 이름과 콘텐츠 목록을 담고, 각 콘텐츠는 역할과 하나 이상의 파트를 가지며, 파트는 텍스트 또는 인라인 바이너리 데이터를 담습니다.

Vertex AI는 Google Cloud 서비스 계정 인증을 사용합니다. 클라이언트는 서비스 계정 JWT에 서명하여 gRPC 메타데이터에 authorization: Bearer 헤더로 전송하므로, 모든 호출이 리전별 Vertex AI 엔드포인트에 대해 인증됩니다.

서비스 계정으로 인증하기

Google Cloud 클라이언트는 서비스 계정 JSON 키로 자체 서명된 서비스 계정 JWT에 서명합니다. 그 JWT는 audience에 바인딩되므로 리전별 Vertex AI 엔드포인트를 대상으로 해야 하며, 그렇지 않으면 aiplatform.googleapis.com이 UNAUTHENTICATED로 거부합니다. 토큰이 획득되면 gRPC 클라이언트의 DefaultMetadata에 추가하여 모든 호출에 따라가게 합니다.

uses
  sgcHTTP_Google_Cloud, sgcGRPC_Client;

// CloudClient is a TsgcHTTPGoogleCloud_PubSub_Client used here only to sign the JWT
CloudClient.GoogleCloudOptions.Authentication := gcaJWT;
CloudClient.GoogleCloudOptions.JWT.ClientEmail   := ClientEmail;
CloudClient.GoogleCloudOptions.JWT.PrivateKeyId  := PrivateKeyId;
CloudClient.GoogleCloudOptions.JWT.PrivateKey.Text := PrivateKey;
CloudClient.GoogleCloudOptions.JWT.ProjectId     := ProjectId;
CloudClient.GoogleCloudOptions.JWT.API_Endpoint  :=
  'https://' + Region + '-aiplatform.googleapis.com/';

// after the token arrives in OnAuthToken
GRPC.DefaultMetadata.Clear;
GRPC.DefaultMetadata.Add('authorization', 'Bearer ' + Token);

전송 설정하기

리전 호스트용 TsgcHTTP2Client를 만들어 gRPC 클라이언트에 할당합니다. Vertex AI는 application/grpc+proto를 사용하므로, 채널 콘텐츠 타입은 grpcProto로 둡니다.

uses
  sgcHTTP2_Client, sgcGRPC_Client, sgcGRPC_Types;

HTTP2 := TsgcHTTP2Client.Create(nil);
HTTP2.Host := Region + '-aiplatform.googleapis.com';   // e.g. us-central1-...
HTTP2.Port := 443;
HTTP2.TLS  := True;

GRPC := TsgcGRPCClient.Create(nil);
GRPC.Client := HTTP2;
GRPC.ChannelOptions.ContentType := grpcProto;
GRPC.ChannelOptions.Compression := grpcNoCompression;

HTTP2.Active := True;

프롬프트로 콘텐츠 생성하기

형식화된 메시지 클래스로 요청을 구성합니다. 모델은 전체 리소스 이름이고, 단일 콘텐츠는 user 역할을 가지며, 프롬프트는 텍스트 파트에 들어갑니다. Call은 응답이 도착할 때까지 블로킹하고 원시 Data 바이트가 담긴 TsgcGRPCResponse를 반환하며, 이를 형식화된 응답에 로드합니다.

uses
  sgcGRPC_Client, sgcGRPC_Types, sgcGRPC_Google_VertexAI;

var
  oRequest: TsgcGRPCVertexAIGenerateContentRequest;
  oContent: TsgcGRPCVertexAIContent;
  oPart: TsgcGRPCVertexAIPart;
  oResponse: TsgcGRPCResponse;
begin
  oRequest := TsgcGRPCVertexAIGenerateContentRequest.Create;
  try
    oRequest.Model := 'projects/' + ProjectId + '/locations/' + Region +
      '/publishers/google/models/' + Model;   // e.g. gemini-2.0-flash

    oContent := oRequest.AddContent;
    oContent.Role := 'user';
    oPart := oContent.AddPart;
    oPart.Text := 'Explain gRPC in one sentence.';

    oResponse := GRPC.Call(
      'google.cloud.aiplatform.v1.PredictionService', 'GenerateContent',
      oRequest.ToBytes);

    if oResponse.StatusCode = grpcOK then
      ParseResponse(oResponse.Data)
    else
      ShowMessage('gRPC error: ' + oResponse.StatusMessage);
  finally
    oRequest.Free;
  end;
end;

응답 읽기

응답 바이트를 TsgcGRPCVertexAIGenerateContentResponse로 로드합니다. 이는 후보(candidate)를 제공하며, 각 후보는 콘텐츠 파트와 종료 사유를 가지고, 프롬프트, 후보, 전체 토큰 수가 담긴 UsageMetadata 블록도 함께 제공됩니다.

procedure ParseResponse(const aData: TBytes);
var
  oResponse: TsgcGRPCVertexAIGenerateContentResponse;
  oCandidate: TsgcGRPCVertexAICandidate;
  i, j: Integer;
begin
  oResponse := TsgcGRPCVertexAIGenerateContentResponse.Create;
  try
    oResponse.LoadFromBytes(aData);
    for i := 0 to oResponse.CandidateCount - 1 do
    begin
      oCandidate := oResponse.Candidate(i);
      for j := 0 to oCandidate.Content.PartCount - 1 do
        Memo1.Lines.Add(oCandidate.Content.Part(j).Text);
    end;
    Memo1.Lines.Add('Total tokens: ' +
      IntToStr(oResponse.UsageMetadata.TotalTokenCount));
  finally
    oResponse.Free;
  end;
end;

생성 구성과 안전 설정

요청은 선택적 GenerationConfig와 안전 설정 목록도 담고 있습니다. 구성을 사용하여 샘플링과 길이를 제어합니다. Temperature, TopP, TopK, CandidateCount, MaxOutputTokens, StopSequences입니다. 각 안전 설정은 위해 Category를 차단 Threshold와 짝짓습니다.

oRequest.GenerationConfig.Temperature     := 0.7;
oRequest.GenerationConfig.MaxOutputTokens := 1024;
oRequest.GenerationConfig.StopSequences.Add('END');

with oRequest.AddSafetySetting do
begin
  Category  := 7;   // HARM_CATEGORY_DANGEROUS_CONTENT
  Threshold := 2;   // BLOCK_MEDIUM_AND_ABOVE
end;

스트리밍 응답

Vertex AI는 답변을 하나의 블록 대신 일련의 부분 청크로 반환하는 서버 스트리밍 대응물인 StreamGenerateContent도 제공합니다. Vertex AI 헬퍼가 TsgcGRPCClient 위에 구축되어 있으므로, 동일한 형식화된 요청이 클라이언트의 서버 스트리밍 API에 공급됩니다. 호출을 시작하고, 각 청크가 도착할 때마다 TsgcGRPCVertexAIGenerateContentResponse로 디코딩하고, 모델이 작성하는 대로 UI를 업데이트하기 위해 그 후보의 텍스트를 덧붙입니다.

제공 범위

Vertex AI gRPC 클라이언트는 sgcWebSockets Enterprise 에디션의 일부입니다. 바로 실행 가능한 샘플은 Demos\21.GRPC\17.Vertex_AI에 있습니다. 서비스 계정 JSON 키를 붙여넣거나 로드하고, 프로젝트, 리전, 모델을 설정하고, 연결한 다음, Generate Content로 프롬프트를 보냅니다. 전체 레퍼런스는 gRPC Client 제품 페이지에 있습니다.

질문이나 의견이 있으신가요? 문의하기. 코드를 작성한 사람들로부터 답변을 받게 됩니다.