Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio transcription in supervisor route #19

Merged
merged 11 commits into from
Dec 2, 2024
Merged

Conversation

Luisotee
Copy link
Collaborator

Audio Transcription Support for Supervisor Route

Overview

Add audio transcription capabilities to the supervisor route using Groq's Whisper V3 Turbo API. This centralizes audio processing in the AI API service, eliminating the need for individual transcription handling in client integrations (simulator, WhatsApp, Telegram).

Technical Changes

  • Added support for multiple audio formats:
    • Direct support: mp3, mp4, mpeg, mpga, m4a, wav, webm
    • Conversion support: ogg -> mp3
  • Implemented audio file handling with temporary storage
  • Integrated Groq's Whisper V3 Turbo for transcription
  • Added content type detection and validation
  • Centralized error handling for audio processing

Dependencies

  • Added python-multipart for form data handling
  • Added python-ffmpeg for audio conversion
  • Added groq for Whisper API access

Configuration

Requires GROQ_API_KEY environment variable

Benefits

  • Centralized audio processing
  • Consistent transcription quality
  • Reduced implementation complexity in clients
  • Unified error handling

@Luisotee Luisotee added the feature New feature label Nov 27, 2024
This was linked to issues Nov 27, 2024
@Luisotee Luisotee requested a review from luandro November 27, 2024 09:00
@luandro
Copy link
Contributor

luandro commented Nov 27, 2024

@Luisotee, it's working great, amazing job! Just don't forget to add the packages you use. For example:

uv add ffmpeg python-multipart

Which will automatically add to the pyproject.toml file and will reflect on every run of the the project. Added on my commit.

A second comment is that the route /api/supervisor/supervisor isn't ideal. Might be a good opportunity to change to /api/classifier or something else.

@Luisotee
Copy link
Collaborator Author

@luandro should be fine now

@luandro
Copy link
Contributor

luandro commented Nov 28, 2024

@Luisotee when testing on the docs page, "send empty value" works when set for message, but for some reason when setting empty value for message an error is throw:

image

@Luisotee
Copy link
Collaborator Author

Luisotee commented Dec 1, 2024

@luandro Let me explain the "Send empty value" checkbox:

When checked, it will send an empty argument to the API (like -F 'audio=' or -F 'message='). If you don't want to include either audio or message in the API call, simply leave the checkbox unchecked - this way the parameter won't be included in the request at all.

The error you encountered happened because you were sending both an empty string and no audio file simultaneously, leaving nothing for the API to process.

I've also fixed the dependency issue, so this PR is now ready to be merged.

@luandro
Copy link
Contributor

luandro commented Dec 1, 2024

@Luisotee, thanks for the explanation, but as you can see on the picture I'm not sending an empty message, it has a string: "hello". There is something that the API should process.

@Luisotee
Copy link
Collaborator Author

Luisotee commented Dec 1, 2024

@luandro The error occurred due to the audio being sent as an empty file rather than being omitted or providing a valid file.

The correct usage is:

  • If you don't want to send an audio file, just omit it completely
  • If you include the audio parameter, the code expects a valid audio file - otherwise it will (and should) throw this error

About the "Send empty value" checkbox - it's just a default Swagger UI feature. We don't actually use it in our use case, and I haven't found a way to remove it from the interface.

@luandro luandro merged commit c540a59 into main Dec 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate text-to-speech Intent classification
2 participants