Audio transcription in supervisor route #19

Luisotee · 2024-11-27T08:59:52Z

Audio Transcription Support for Supervisor Route

Overview

Add audio transcription capabilities to the supervisor route using Groq's Whisper V3 Turbo API. This centralizes audio processing in the AI API service, eliminating the need for individual transcription handling in client integrations (simulator, WhatsApp, Telegram).

Technical Changes

Added support for multiple audio formats:
- Direct support: mp3, mp4, mpeg, mpga, m4a, wav, webm
- Conversion support: ogg -> mp3
Implemented audio file handling with temporary storage
Integrated Groq's Whisper V3 Turbo for transcription
Added content type detection and validation
Centralized error handling for audio processing

Dependencies

Added python-multipart for form data handling
Added python-ffmpeg for audio conversion
Added groq for Whisper API access

Configuration

Requires GROQ_API_KEY environment variable

Benefits

Centralized audio processing
Consistent transcription quality
Reduced implementation complexity in clients
Unified error handling

… value

luandro · 2024-11-27T16:48:22Z

@Luisotee, it's working great, amazing job! Just don't forget to add the packages you use. For example:

uv add ffmpeg python-multipart

Which will automatically add to the pyproject.toml file and will reflect on every run of the the project. Added on my commit.

A second comment is that the route /api/supervisor/supervisor isn't ideal. Might be a good opportunity to change to /api/classifier or something else.

…idem/earth-defenders-assistant into luisotee/audio-transcription

Luisotee · 2024-11-28T09:52:08Z

@luandro should be fine now

…lassify

luandro · 2024-11-28T13:43:33Z

@Luisotee when testing on the docs page, "send empty value" works when set for message, but for some reason when setting empty value for message an error is throw:

Luisotee · 2024-12-01T13:48:32Z

@luandro Let me explain the "Send empty value" checkbox:

When checked, it will send an empty argument to the API (like -F 'audio=' or -F 'message='). If you don't want to include either audio or message in the API call, simply leave the checkbox unchecked - this way the parameter won't be included in the request at all.

The error you encountered happened because you were sending both an empty string and no audio file simultaneously, leaving nothing for the API to process.

I've also fixed the dependency issue, so this PR is now ready to be merged.

luandro · 2024-12-01T18:20:19Z

@Luisotee, thanks for the explanation, but as you can see on the picture I'm not sending an empty message, it has a string: "hello". There is something that the API should process.

Luisotee · 2024-12-01T19:08:37Z

@luandro The error occurred due to the audio being sent as an empty file rather than being omitted or providing a valid file.

The correct usage is:

If you don't want to send an audio file, just omit it completely
If you include the audio parameter, the code expects a valid audio file - otherwise it will (and should) throw this error

About the "Send empty value" checkbox - it's just a default Swagger UI feature. We don't actually use it in our use case, and I haven't found a way to remove it from the interface.

Luisotee added 3 commits November 27, 2024 04:33

audio route

868d2b7

feat: audio transcription

41fc958

lint, optimizations

ad2f818

Luisotee added the feature New feature label Nov 27, 2024

This was linked to issues Nov 27, 2024

Intent classification #1

Closed

Integrate text-to-speech #12

Closed

Luisotee requested a review from luandro November 27, 2024 09:00

luandro added 2 commits November 27, 2024 13:47

chore: add missing deps for uv project

48851e2

chore: make sure both API params are optional and have a None default…

d875b3a

… value

Luisotee added 2 commits November 28, 2024 06:48

renamed supervisor to classifier

b90ce01

Merge branch 'luisotee/audio-transcription' of https://github.com/dig…

7f1ef02

…idem/earth-defenders-assistant into luisotee/audio-transcription

chore: deduplicate route from /classifier/classifier to /classifier/c…

e0853e0

…lassify

fix: updated ffmpeg dependency to ffmpeg-python

474b39d

Luisotee added 2 commits December 1, 2024 16:00

Handle cases with string message and audio

61c8a12

run lint

6214aa3

luandro merged commit c540a59 into main Dec 2, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio transcription in supervisor route #19

Audio transcription in supervisor route #19

Luisotee commented Nov 27, 2024

luandro commented Nov 27, 2024

Luisotee commented Nov 28, 2024

luandro commented Nov 28, 2024

Luisotee commented Dec 1, 2024

luandro commented Dec 1, 2024

Luisotee commented Dec 1, 2024

Audio transcription in supervisor route #19

Audio transcription in supervisor route #19

Conversation

Luisotee commented Nov 27, 2024

Audio Transcription Support for Supervisor Route

Overview

Technical Changes

Dependencies

Configuration

Benefits

luandro commented Nov 27, 2024

Luisotee commented Nov 28, 2024

luandro commented Nov 28, 2024

Luisotee commented Dec 1, 2024

luandro commented Dec 1, 2024

Luisotee commented Dec 1, 2024