You can use the Realtime API for transcription-only use cases, either with input from a microphone or from a file. For example, you can use it to generate subtitles or transcripts in real-time. With the transcription-only mode, the model will not generate responses.
To use the Realtime API for transcription, you need to create a transcription session, connecting via WebSockets. Use the component TsgcWSAPI_OpenAI and TsgcWebSocketClient to start a new transcription session.
Find below an example of real-time transcription using OpenAI API.
WSClient := TsgcWebSocketClient.Create(nil);
oAudio := TsgcAudioRecorderWave.Create(nil);
OpenAI := TsgcWSAPI_OpenAI.Create(nil);
OpenAI.Client := WSClient;
OpenAI.AudioRecorder := oAudio;
OpenAI.OpenAIOptions.APIKey := 'your-api-key-here';
OpenAI.OpenAIOptions.method := sgcoaimTranscription;
OpenAI.OpenAIOptions.provider := sgcoaipOpenAI;
OpenAI.InputAudio.Language := 'en';
OpenAI.InputAudio.Model := 'whisper-1';
procedure OnOpenAIAudioTranscriptionCompleted(Sender: TObject; const aItem: TsgcWSOpenAIConversation_Item_Completed);
begin
Log('#transcription_completed: ' + aItem.Transcript);
end;
The component allows you to send audio files manually. You can use the method AppendInputAudioBuffer and pass the audio as a TStream. The audio format must be 24 kHz mono PCM (only a rate of 24000 is supported).