Typed interface to call the Google Cloud Speech-to-Text gRPC API and transcribe audio into text.
Google Cloud Speech-to-Text converts audio into text using Google's speech-recognition models. The gRPC API is exposed through the google.cloud.speech.v1.Speech service, reached at speech.googleapis.com:443 over TLS, and the main method is Recognize for synchronous recognition.
Requests are built with TsgcGRPCSpeechRecognizeRequest, where Config sets Encoding, SampleRateHertz, LanguageCode and EnableAutomaticPunctuation, and Audio.Uri points to the audio. The reply is returned in TsgcGRPCSpeechRecognizeResponse, whose Results contain Alternatives with Transcript and Confidence.
The example below authenticates with a service-account JWT, wires a TsgcGRPCClient over a TsgcHTTP2Client to the Speech host, sets the authorization Bearer metadata and calls Recognize on a LINEAR16 16000 Hz en-US audio stored at a gs:// URI:
oHTTP2 := TsgcHTTP2Client.Create(nil);
oHTTP2.Host := 'speech.googleapis.com';
oHTTP2.Port := 443;
oHTTP2.TLS := True;
oGRPC := TsgcGRPCClient.Create(nil);
oGRPC.Client := oHTTP2;
// service-account JWT authentication
oGRPC.GoogleCloudOptions.JWT.KeyFile := 'service-account.json';
oGRPC.GoogleCloudOptions.JWT.API_Endpoint := 'https://speech.googleapis.com/';
oGRPC.DefaultMetadata.AddValue('authorization', 'Bearer ' + oGRPC.GoogleCloudOptions.JWT.Token);
// build the typed request and call the method
oRequest := TsgcGRPCSpeechRecognizeRequest.Create;
try
oRequest.Config.Encoding := 'LINEAR16';
oRequest.Config.SampleRateHertz := 16000;
oRequest.Config.LanguageCode := 'en-US';
oRequest.Config.EnableAutomaticPunctuation := True;
oRequest.Audio.Uri := 'gs://my-bucket/audio.wav';
oResponse := oGRPC.Call('google.cloud.speech.v1.Speech', 'Recognize', oRequest.ToBytes);
ShowMessage(oResponse.DataString);
finally
oRequest.Free;
end;
| Name | Description |
|---|---|
| Recognize | Performs synchronous speech recognition: sends the audio config and audio data, and returns the transcribed text with confidence scores. |
A working sample is available in the demo folder Demos/21.GRPC/11.Speech_to_Text, which shows how to authenticate and transcribe an audio file with the Recognize method.