Skip to content

speechToSpeechLLM - a Dockerized three-prong backend of KoboldCPP, CoquiTTS, and WhisperCPP. Example Rshiny application included for open source localized end-to-end speech to speech frame working

Notifications You must be signed in to change notification settings

snakewizardd/speechToSpeechLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

speechToSpeechLLM

A free, open-source implementation of Speech-to-Speech technology

image

To run the composite backend of

  • Kobold CPP (NeuralBeagle 7B) on port 5001
  • Coqui TTS on port 5002
  • WhisperCPP on port 8080

Run

chmod 555 run_entire_build.sh
./run_entire_build.sh

To stop

chmod 555 prune_entire_build.sh
./prune_entire_build.sh

The user-facing application right now is a POC, just a simple Rshiny app that interfaces between the backends. It is built for MacOS right now as it considers the inbuilt 'rec' command to record audio input.

A simple port can be modified for Windows using a software like ffmpeg. Still tbd for linux audio device recording.

All APIs run independently of the Rshiny app, which is NOT packaged with the docker compose build. Simply install R and the dependencies listed in /rshiny_deps Dockerfile to set up the environment for the front end. This is more a philosophical interlay of technologies than a true working POC

Initial Greeting speech input image

Follow up Message speech input image


NOTE: The only part of this build that seems to need a bit of troubleshooting is the Coqui image. if you have any latency issues when installing, feel free to use the build_coqui.sh script on its own to isolate the build. Hopefully we can fix this in a future build. Once you get the image built with the English model it should run no problem

About

speechToSpeechLLM - a Dockerized three-prong backend of KoboldCPP, CoquiTTS, and WhisperCPP. Example Rshiny application included for open source localized end-to-end speech to speech frame working

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published