Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VTT generation produces files that show one word at a time (as viewed via VLC) #3

Open
cr08 opened this issue Nov 4, 2022 · 4 comments
Assignees
Labels
bug Something isn't working low priority Low priority bug fix/feature

Comments

@cr08
Copy link
Owner

cr08 commented Nov 4, 2022

Initial testing of the VTT render code and playing back in VLC shows that it displays a single word at a time on screen. Accuracy at first blush seems good and timing is perfect otherwise.

More research needs to be done here. Plan is to either find out how to fix the VTT file generation or skip it and just have Vosk write out an SRT file instead.

This is low priority.

@cr08 cr08 added bug Something isn't working low priority Low priority bug fix/feature labels Nov 4, 2022
@cr08 cr08 transferred this issue from cr08/twitch_vod_creator Nov 5, 2022
@cr08
Copy link
Owner Author

cr08 commented Nov 10, 2022

Looks like vosk-api has a good example of SRT output. Will be testing to see if the same single word output issue exists and further research, however this looks promising and will test this soon...

https://github.com/alphacep/vosk-api/blob/master/python/example/test_srt.py

cr08 added a commit that referenced this issue Nov 11, 2022
… in transcription

Since these scripts use Vosk as the offline speech-to-text system, researching issue #3 and the current 'one word per line' result, it may not be possible to fix with the VTT output.

Moved to SRT output which by default seems to do up to 7 words a line and configurable. For our uses, VTT doesn't add any useful features above SRT so we're fine switching to this. Also GREATLY simplifies the code as a result.

Also removed the old 0_main_vtt_generation.py file. It doesn't appear to do anything useful for this repo. The main videos.py file will automatically transcribe existing files if SRT's are missing.

TODO: Change VTT references/variables
@cr08
Copy link
Owner Author

cr08 commented Nov 11, 2022

The basis of this has been sorted and ironically the code streamlines quite a bit. In brief testing this works. Will continue to test but the change has been committed to the repo.

Still need to change all 'VTT' references/variables to SRT now that we are outputting that format. This does not affect the functionality in any way, just a final cleanup step that needs to be done.

@cr08 cr08 self-assigned this Nov 11, 2022
@cr08
Copy link
Owner Author

cr08 commented Nov 11, 2022

One additional side addition I want to do here is see about adding the option to use the 'full' English speech model (vosk-model-en-us-0.22) in addition to the current 'small' model as included in this repo. More testing needs to be done to see how resource intensive this is on an 'average' test system.

We'll probably do like TDCLI and remove the models entirely from the repo and rely on the user to download and place it in their working directory. This will especially be needed for the full model which runs at a whopping 1.8GB. Pretty sure Github won't be happy hosting that in a code repo. 😛

@cr08
Copy link
Owner Author

cr08 commented Nov 14, 2022

Work is largely completed on this and functional. The only thing remaining is changing calls from VTT/WebVTT to something more 'current' with the new code. All of it is merely cosmetic work. Keeping this open as a TODO until that is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working low priority Low priority bug fix/feature
Projects
None yet
Development

No branches or pull requests

1 participant