Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue building DockerFile #315

Open
hobodrifterdavid opened this issue Jun 28, 2024 · 5 comments
Open

Issue building DockerFile #315

hobodrifterdavid opened this issue Jun 28, 2024 · 5 comments
Assignees

Comments

@hobodrifterdavid
Copy link

hobodrifterdavid commented Jun 28, 2024

Hello. This project looks very interesting. I hit some issues building the Dockerfile as described in the readme:

docker build -t wordcab-transcribe:latest .

docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    -p 5001:5001 \
    -v ~/.cache:/root/.cache \
    wordcab-transcribe:latest

On the first machine (Ubuntu Server 22 LTS, 4x 3090), the build process completed, but I got an 'illegal memory access' error, I think from a CUDA library, when starting up. This machine previously had a modified nvidia driver for P2P access, so it's possible it's not your issue. (tinygrad/open-gpu-kernel-modules#4)

On the second machine (Ubuntu Server 22 LTS, 1x 3090), initially I had an error about the specific version of openssl not being available or compatible, I removed the version number specified in the Dockerfile, and the build continued. But the latest error is "ModuleNotFoundError: No module named 'IPython'"

Just a heads up, ideally I'd be able to help you debug.

image

@aleksandr-smechov
Copy link
Contributor

aleksandr-smechov commented Jun 28, 2024

@hobodrifterdavid Thanks for bringing up the issue. The documentation is a bit outdated. Can you please try the latest main branch and this Docker command instead:

docker run --name wordcab-transcribe --gpus all --shm-size 1g --restart unless-stopped -p 5001:5001 -e WORDCAB_TRANSCRIBE_API_KEY="x" -e WHISPER_MODEL="medium" -e WHISPER_ENGINE="faster-whisper-batch" -e ALIGN_MODEL="tiny" -e DIARIZATION_BACKED="longform-diarizer" -e COMPUTE_TYPE="float16" -e DEBUG="True" -e USERNAME="admin" -e PASSWORD="password" -e OPENSSL_KEY="0123456789abcdefghijklmnopqrstuvwyz" -e WINDOW_LENGTHS="2.0,1.5,1.0,0.75,0.5" -e SHIFT_LENGTHS="1.0,0.75,0.625,0.5,0.25" -e TENSORRT_LLM_VERSION="0.9.0.dev2024032600" wordcab-transcribe

The environment variables are from the .env file, feel free to customize.

@aleksandr-smechov aleksandr-smechov self-assigned this Jun 28, 2024
@hobodrifterdavid
Copy link
Author

hobodrifterdavid commented Jun 29, 2024

On the second machine, I'm able to build if I add ipython to requirements.txt. The 'docker run' command in the readme does start the container sucessfully, and I'm able to process a request, but it errors out if I try to use the VAD. It seems okay with the updated command you sent. On the first machine, still illegal memory access, but I will wipe the machine and try again.

image

I got a few questions. :)

Is there a preferred backend for processing a long file over multiple GPUs?

In your docs, TensorRT-LLM doesn't allow passing a prompt. The prompt is useful for nudging the model towards outputing zh-CN or zh-TW, as there is only a single supported Chinese language code for whisper. Although, I guess machine translation as a post-processing step might be reasonable way to handle this.

Faster-Whisper has a length_penalty parameter that I understand increases the probabilty of the 'end of segment' token, the longer the segment gets. I think it's useful for pushing the output towards making shorter segments/subs. Could it be exposed in the API? The current output often gives segments that are too long to show as subtitles. btw, I noticed today that stable-ts has a set of functions for splitting and merging subs, although a proper sentence segmenter would additionally be helpful.

@aleksandr-smechov
Copy link
Contributor

@hobodrifterdavid I noticed the missing IPython as well, check out the latest main branch that I just pushed, should resolve a few issues.

I kind of now prefer the Whisper engine I just added, faster-whisper-batched, which adds a bunch of unmerged PRs from the faster-whisper library that make things go fast.

Use the edited docker run command above and head to the FastAPI docs, where the first audio file endpoint should have the length_penalty parameter. I recommend setting batch_size to 4 or 8 at least, and num_beams to 5. given your GPU.

FastAPI docs are a bit weird for list input, so if you want to add vocab you'll need to use curl or requests with the audio file endpoint. You can use the audio-url endpoint and add vocab and the other parameters in the JSON, but you'll need a presigned URL to test that.

@hobodrifterdavid
Copy link
Author

I wiped the first machine, it runs fine now. I didn't see the length_penalty param in the docs yet.

The Silero VAD is used? Do you know how it compares to other VADs (nemo etc.), in different languages?

@hobodrifterdavid
Copy link
Author

I think you might have not pushed the length_penalty. 👀🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants