From sgcWebSockets 2023.5.0 building Real-Time Translators is more easy using OpenAI APIs and Text-To-Speech APIs from Windows, Google or Amazon.
OpenAI translation building applications offer a multitude of advantages. They provide fast and accurate translations across multiple languages, enabling seamless communication and breaking down language barriers. These applications leverage state-of-the-art machine learning algorithms, ensuring high-quality outputs. Moreover, they can be easily integrated into various platforms, making them versatile and accessible for a wide range of users.
Translator Component
To build a Translator with voice commands, the following steps are required:
- The Microphone Audio must be captured, so a speech to text system is needed to get the text that will be sent to OpenAI.
- Capturing the Microphone Audio is done using the component TsgcAudioRecorderMCI.
- Once we've captured the audio, this is sent to the OpenAI whisper api to convert the audio file to text.
- Once we get the speech to text, now we send the text to OpenAI using the ChatCompletion API.
- The response from OpenAI must be converted now to Speech using one of the following components:
- TsgcTextToSpeechSystem: (currently only for Windows) uses the Windows Speech To Text from Operating System.
- TsgcTextToSpeechGoogle: sends the response from OpenAI to the Google Cloud Servers and an mp3 file is returned which is played by the TsgcAudioPlayerMCI.
- TsgcTextToSpeechAmazon: ends the response from OpenAI to the Amazon AWS Servers and an mp3 file is returned which is played by the TsgcAudioPlayerMCI.
- OpenAIOptions: configure here the OpenAI properties.
- ApiKey: an API key is required to interactuate with the OpenAI APIs.
- LogOptions
- Enabled: if set to true, the API requests will be log into a text file.
- FileName: the filename of the log.
- Organization: an optional OpenAI API field.
- TranslatorOptions: configure here the Translator properties.
- Translation: configure here the OpenAI Translation API settings.
- Model: by default whisper-1
- Translation: configure here the OpenAI Translation API settings.
- AudioRecorder: assign a TsgcAudioRecorder component to capture the microphone audio.
- TextToSpeech: assign a TsgcTextToSpeech component to listen the response from OpenAI.
- OnAudioStart: the event is called when the Audio Starts to being recorded.
- OnAudioStop: the event is called after the Audio Stops Recording.
- OnTranslation: the event is called when receiving a response from OpenAI Translation API with the translation result.
Delphi Code Example
Create a new Translator, using the default Text-To-Speech from Microsoft Windows. Use Start to Start the recording of the audio and Stop to Stop the recording and send the audio to the OpenAI API and translate it.
// ... create the translator component sgcTranslator := TsgcAIOpenAITranslator.Create(nil); sgcTranslator.OpenAIOptions.ApiKey := 'your_openapi_api_key'; // ... create audio recorder and tex-to-speech sgcAudioRecorder := TsgcAudioRecorderMCI.Create(nil); sgcTextToSpeech := TsgcTextToSpeechSystem.Create(nil); // ... assign audio components to translator sgcTranslator.AudioRecorder := sgcAudioRecorder; sgcTranslator.TextToSpeech := sgcTextToSpeech; // ... start the translator, speak with a microphone to capture the audio and stop to translate it sgcTranslator.Start; ... speak sgcTranslator.Stop;
Delphi Real-Time Translation AI Video
Delphi Translator Application Demo
Find below the source code of the Translator Application Demo showing the main features of the Real-Time Translator built with the sgcWebSockets library for windows.