Voice Service

The Voice Service component provides voice interaction capabilities within the AAQ system. It supports both speech recognition (STT) and text-to-speech (TTS) functionalities through two primary methods:

In-House Models: Utilize in-house, dockerized models for STT and TTS to process speech data locally.
External APIs: Integrate with Google Cloud's Speech-to-Text and Text-to-Speech APIs for enhanced flexibility and accuracy.

This documentation will guide you through setting up, configuring, and using the Voice Service in various scenarios.

Note: To enable the /voice-search endpoint in the question-answer service, you need to set the TOGGLE_VOICE environment variable in .core_backend.env (cf. Configuring AAQ)

To use the speech service for manual setup and testing, you must install ffmpeg on your system.

Using a combination of internal and external models

You have the flexibility to use both internal and external models simultaneously by setting the environment variables accordingly. If one of the environment variables is not set, the system will automatically default to the external model. For information on configuring and using external models, refer to our External APIs and In-house Models guide.

Using In-House Models

Follow the steps to set up and use the in-house STT and TTS models.

More info
Using External APIs

Learn how to integrate Google Cloud's Speech-to-Text and Text-to-Speech services.

More info