VTT generation produces files that show one word at a time (as viewed via VLC) #3

cr08 · 2022-11-04T15:02:19Z

Initial testing of the VTT render code and playing back in VLC shows that it displays a single word at a time on screen. Accuracy at first blush seems good and timing is perfect otherwise.

More research needs to be done here. Plan is to either find out how to fix the VTT file generation or skip it and just have Vosk write out an SRT file instead.

This is low priority.

cr08 · 2022-11-10T15:05:49Z

Looks like vosk-api has a good example of SRT output. Will be testing to see if the same single word output issue exists and further research, however this looks promising and will test this soon...

https://github.com/alphacep/vosk-api/blob/master/python/example/test_srt.py

… in transcription Since these scripts use Vosk as the offline speech-to-text system, researching issue #3 and the current 'one word per line' result, it may not be possible to fix with the VTT output. Moved to SRT output which by default seems to do up to 7 words a line and configurable. For our uses, VTT doesn't add any useful features above SRT so we're fine switching to this. Also GREATLY simplifies the code as a result. Also removed the old 0_main_vtt_generation.py file. It doesn't appear to do anything useful for this repo. The main videos.py file will automatically transcribe existing files if SRT's are missing. TODO: Change VTT references/variables

cr08 · 2022-11-11T02:26:03Z

The basis of this has been sorted and ironically the code streamlines quite a bit. In brief testing this works. Will continue to test but the change has been committed to the repo.

Still need to change all 'VTT' references/variables to SRT now that we are outputting that format. This does not affect the functionality in any way, just a final cleanup step that needs to be done.

cr08 · 2022-11-11T03:06:00Z

One additional side addition I want to do here is see about adding the option to use the 'full' English speech model (vosk-model-en-us-0.22) in addition to the current 'small' model as included in this repo. More testing needs to be done to see how resource intensive this is on an 'average' test system.

We'll probably do like TDCLI and remove the models entirely from the repo and rely on the user to download and place it in their working directory. This will especially be needed for the full model which runs at a whopping 1.8GB. Pretty sure Github won't be happy hosting that in a code repo. 😛

cr08 · 2022-11-14T02:21:03Z

Work is largely completed on this and functional. The only thing remaining is changing calls from VTT/WebVTT to something more 'current' with the new code. All of it is merely cosmetic work. Keeping this open as a TODO until that is completed.

cr08 added bug Something isn't working low priority Low priority bug fix/feature labels Nov 4, 2022

cr08 transferred this issue from cr08/twitch_vod_creator Nov 5, 2022

cr08 self-assigned this Nov 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VTT generation produces files that show one word at a time (as viewed via VLC) #3

VTT generation produces files that show one word at a time (as viewed via VLC) #3

cr08 commented Nov 4, 2022

cr08 commented Nov 10, 2022

cr08 commented Nov 11, 2022

cr08 commented Nov 11, 2022

cr08 commented Nov 14, 2022

VTT generation produces files that show one word at a time (as viewed via VLC) #3

VTT generation produces files that show one word at a time (as viewed via VLC) #3

Comments

cr08 commented Nov 4, 2022

cr08 commented Nov 10, 2022

cr08 commented Nov 11, 2022

cr08 commented Nov 11, 2022

cr08 commented Nov 14, 2022