Google Cloud Vision over gRPC in Delphi

· Components

Google Cloud Vision analyzes images and tells you what is in them: it returns descriptive labels, reads printed and handwritten text (OCR), detects faces, recognises famous landmarks and logos, and more. sgcWebSockets Enterprise ships a typed Vision gRPC client that sits on top of TsgcGRPCClient, so you can send an image and read back its annotations from Delphi and C++Builder without any external runtime or hand-written protobufs.

How it works

Cloud Vision exposes an ImageAnnotator gRPC service. A request is a batch of one or more images, each paired with the list of features you want detected, and the response is a matching batch of annotations. The whole exchange is Protocol Buffers messages framed over HTTP/2.

sgcWebSockets already ships a complete HTTP/2 stack, so the transport is a TsgcHTTP2Client pointed at vision.googleapis.com:443 over TLS. TsgcGRPCClient sits on top of it and does the gRPC framing, headers, timeouts and trailer parsing. The Vision message classes in sgcGRPC_Google_Vision are typed protobuf helpers: you fill in the request object, call ToBytes to serialize it, send it with the generic client, and load the reply bytes back into a typed response object with LoadFromBytes.

Authentication is the standard Google service-account flow. A self-signed JWT is exchanged for a bearer token and sent as gRPC metadata on every call. Because the JWT is audience-bound, the token is targeted at the Vision endpoint so vision.googleapis.com accepts it.

Setting up the clients

Create the HTTP/2 transport, attach the gRPC client to it, and tell the channel to use the binary protobuf content type. Vision speaks application/grpc+proto.

uses
  sgcHTTP2_Client, sgcGRPC_Types, sgcGRPC_Client,
  sgcGRPC_Google_Vision;

var
  HTTP2: TsgcHTTP2Client;
  GRPC: TsgcGRPCClient;
begin
  HTTP2 := TsgcHTTP2Client.Create(nil);
  HTTP2.Host := 'vision.googleapis.com';
  HTTP2.Port := 443;
  HTTP2.TLS  := True;

  GRPC := TsgcGRPCClient.Create(nil);
  GRPC.Client := HTTP2;
  GRPC.ChannelOptions.ContentType := grpcProto;
  GRPC.ChannelOptions.Compression := grpcNoCompression;
end;

Authentication

The bearer token comes from your service-account credentials. Configure the JWT options with the values from the downloaded JSON key, point the audience at the Vision endpoint, and add the resulting token to DefaultMetadata so it travels on every gRPC call.

// configure the service-account JWT (values from the JSON key file)
CloudClient.GoogleCloudOptions.Authentication := gcaJWT;
CloudClient.GoogleCloudOptions.JWT.ClientEmail  := ClientEmail;
CloudClient.GoogleCloudOptions.JWT.PrivateKeyId := PrivateKeyId;
CloudClient.GoogleCloudOptions.JWT.PrivateKey.Text := PrivateKey;
CloudClient.GoogleCloudOptions.JWT.ProjectId := ProjectId;
// the self-signed JWT is audience-bound to the Vision endpoint
CloudClient.GoogleCloudOptions.JWT.API_Endpoint :=
  'https://vision.googleapis.com/';

// once the token is acquired, send it as gRPC metadata
GRPC.DefaultMetadata.Clear;
GRPC.DefaultMetadata.Add('authorization', 'Bearer ' + Token);

Annotating an image

To analyze an image you build a TsgcGRPCVisionBatchAnnotateImagesRequest, add one image request, point it at the image (a Google Cloud Storage URI here, but you can also send raw bytes), and add one or more features. Each feature has a FeatureType and an optional MaxResults. Serialize with ToBytes and call the BatchAnnotateImages method of the ImageAnnotator service.

var
  oRequest: TsgcGRPCVisionBatchAnnotateImagesRequest;
  oImgReq: TsgcGRPCVisionAnnotateImageRequest;
  oFeature: TsgcGRPCVisionFeature;
  oResponse: TsgcGRPCResponse;
begin
  oRequest := TsgcGRPCVisionBatchAnnotateImagesRequest.Create;
  try
    oImgReq := oRequest.AddRequest;
    oImgReq.Image.Source.GcsImageUri :=
      'gs://cloud-samples-data/vision/demo-image.jpg';

    oFeature := oImgReq.AddFeature;
    oFeature.FeatureType := 4;   // LABEL_DETECTION
    oFeature.MaxResults := 10;

    oResponse := GRPC.Call('google.cloud.vision.v1.ImageAnnotator',
      'BatchAnnotateImages', oRequest.ToBytes);
  finally
    oRequest.Free;
  end;
end;

The FeatureType values follow the Vision API enum: 1 FACE_DETECTION, 2 LANDMARK_DETECTION, 3 LOGO_DETECTION, 4 LABEL_DETECTION, 5 TEXT_DETECTION, 6 DOCUMENT_TEXT_DETECTION, and the rest. To run several detections on the same image in one round trip, add more than one feature.

Reading the annotations

The reply bytes load straight into a typed batch response. Each image in the batch carries separate lists for label, landmark, logo and text annotations. Every entry is a TsgcGRPCVisionEntityAnnotation with a Description and, for labels, a confidence Score.

var
  oResponse: TsgcGRPCVisionBatchAnnotateImagesResponse;
  oImgResp: TsgcGRPCVisionAnnotateImageResponse;
  i, j: Integer;
begin
  oResponse := TsgcGRPCVisionBatchAnnotateImagesResponse.Create;
  try
    oResponse.LoadFromBytes(aData);
    for i := 0 to oResponse.ResponseCount - 1 do
    begin
      oImgResp := oResponse.Response(i);

      for j := 0 to oImgResp.LabelAnnotationCount - 1 do
        Memo1.Lines.Add('Label: ' + oImgResp.LabelAnnotation(j).Description +
          ' (score: ' + FloatToStr(oImgResp.LabelAnnotation(j).Score) + ')');

      for j := 0 to oImgResp.TextAnnotationCount - 1 do
        Memo1.Lines.Add('Text: ' + oImgResp.TextAnnotation(j).Description);

      for j := 0 to oImgResp.LandmarkAnnotationCount - 1 do
        Memo1.Lines.Add('Landmark: ' + oImgResp.LandmarkAnnotation(j).Description);
    end;
  finally
    oResponse.Free;
  end;
end;

Images from storage or from bytes

The image source is flexible. Set Image.Source.GcsImageUri for an object in Google Cloud Storage, or Image.Source.ImageUri for a public HTTP(S) URL. To annotate a local file, read it into a TBytes and assign it to Image.Content instead, so the picture travels inline in the request. The same request and response classes handle every source.

Synchronous or asynchronous

The example above uses the blocking Call, which returns a TsgcGRPCResponse with the StatusCode, the raw Data bytes and the trailers. To keep the UI responsive, use CallAsync and handle the reply in the OnGRPCResponse event, where you parse aResponse.Data exactly the same way. A non-OK status surfaces through OnGRPCError, and a transport failure through OnGRPCException.

Availability

The typed Vision gRPC client is part of the sgcWebSockets Enterprise edition and runs on Windows, macOS, Linux, iOS and Android. A ready-to-run sample is in Demos\21.GRPC\13.Vision, and the full reference is on the gRPC Client product page.

Questions or feedback? Get in touch. You will get a reply from the people who wrote the code.