Google Cloud Speech-to-Text

Typed interface to call the Google Cloud Speech-to-Text gRPC API and transcribe audio into text.

Introduction

Google Cloud Speech-to-Text converts audio into text using Google's speech-recognition models. The gRPC API is exposed through the google.cloud.speech.v1.Speech service, reached at speech.googleapis.com:443 over TLS, and the main method is Recognize for synchronous recognition.

Requests are built with TsgcGRPCSpeechRecognizeRequest, where Config sets Encoding, SampleRateHertz, LanguageCode and EnableAutomaticPunctuation, and Audio.Uri points to the audio. The reply is returned in TsgcGRPCSpeechRecognizeResponse, whose Results contain Alternatives with Transcript and Confidence.

The example below authenticates with a service-account JWT, wires a TsgcGRPCClient over a TsgcHTTP2Client to the Speech host, sets the authorization Bearer metadata and calls Recognize on a LINEAR16 16000 Hz en-US audio stored at a gs:// URI:


    oHTTP2 := TsgcHTTP2Client.Create(nil);
    oHTTP2.Host := 'speech.googleapis.com';
    oHTTP2.Port := 443;
    oHTTP2.TLS := True;

    oGRPC := TsgcGRPCClient.Create(nil);
    oGRPC.Client := oHTTP2;

    // service-account JWT authentication
    oGRPC.GoogleCloudOptions.JWT.KeyFile := 'service-account.json';
    oGRPC.GoogleCloudOptions.JWT.API_Endpoint := 'https://speech.googleapis.com/';
    oGRPC.DefaultMetadata.AddValue('authorization', 'Bearer ' + oGRPC.GoogleCloudOptions.JWT.Token);

    // build the typed request and call the method
    oRequest := TsgcGRPCSpeechRecognizeRequest.Create;
    try
      oRequest.Config.Encoding := 'LINEAR16';
      oRequest.Config.SampleRateHertz := 16000;
      oRequest.Config.LanguageCode := 'en-US';
      oRequest.Config.EnableAutomaticPunctuation := True;
      oRequest.Audio.Uri := 'gs://my-bucket/audio.wav';
      oResponse := oGRPC.Call('google.cloud.speech.v1.Speech', 'Recognize', oRequest.ToBytes);
      ShowMessage(oResponse.DataString);
    finally
      oRequest.Free;
    end;

Methods

Name	Description
Recognize	Performs synchronous speech recognition: sends the audio config and audio data, and returns the transcribed text with confidence scores.

Demo

A working sample is available in the demo folder Demos/21.GRPC/11.Speech_to_Text, which shows how to authenticate and transcribe an audio file with the Recognize method.

Google Cloud Speech-to-Text

Introduction

Methods

Demo

See Also