Voice-Search

This service accepts an audio file URL, transcribes the speech to text, processes the query similar to the text-based search, and returns both a text response and an audio response URL.

See OpenAPI specification or SwaggerUI for more details on how to call the service.

Process flow for End to End Speech Generation

sequenceDiagram
    autonumber
    User->>AAQ: Audio file URL
    AAQ->>Cloud Storage: save audio to Cloud Storage
    AAQ->>Speech-to-Text: Transcribe audio
    Speech-to-Text->>AAQ: Transcribed text
    AAQ->>LLM: Identify language
    LLM->>AAQ: <Language>
    AAQ->>LLM: Paraphrase question
    LLM->>AAQ: <Paraphrased question>
    AAQ->>Vector DB: Request N most similar contents in DB
    Vector DB->>AAQ: <N contents with similarity score>
    AAQ->>Cross-encoder: Re-rank to get top N contents
    Cross-encoder->>AAQ: <N contents with similarity score>
    AAQ->>LLM: Generate response based on top N contents
    LLM->>AAQ: <Text response>
    AAQ->>Text-to-Speech: Convert text response to audio
    AAQ->>Cloud Storage: Save audio file to Cloud Storage
    Text-to-Speech->>AAQ: Audio file URL
    AAQ->>User: Return JSON with text response, audio URL, and N contents

Voice Service Integration

For detailed information on how to integrate voice capabilities into your application using AAQ, including setup instructions for both in-house and cloud-based speech services, please refer to our Voice Service documentation. This documentation covers:

Setting up the dockerized container for in-house ASR and TTS models
Configuring Google Cloud Speech-to-Text and Text-to-Speech integration
Best practices for voice input and output in your application

Explore Voice Service Documentation