Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model chooser screen in settings #18

Open
soupslurpr opened this issue Apr 1, 2024 · 7 comments
Open

Model chooser screen in settings #18

soupslurpr opened this issue Apr 1, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@soupslurpr
Copy link
Owner

A screen in settings to download more models from huggingface from the app itself, pick the model that will be used, and manage/delete them. The models would be downloaded from a repo from my huggingface account and the hashes of the files would be checked against hashes included with Transcribro to ensure integrity even in the event of a huggingface server compromise.

There should be a text box at the top of the screen to test the selected model using the Voice Input Keyboard.

The most recommended models would be shown first and there shouldn't be an overwhelming amount of choice for no benefit. Test different model quants to choose enough models for a sensible variety of speed vs accuracy vs multilingualism and clearly communicate those properties in the interface. If needed, there can be a "more models" button that goes to a screen with the other models to keep the list from being too long.

Additionally, there should be an option to import a model from a file which can show up below the ones downloaded from the app but in a separate section to not mistake them from the official ones.

@soupslurpr soupslurpr added the enhancement New feature or request label Apr 1, 2024
@soupslurpr soupslurpr self-assigned this Apr 1, 2024
@soupslurpr soupslurpr added this to the v0.3.0 milestone Apr 1, 2024
@soupslurpr
Copy link
Owner Author

File sizes (in bytes) will also be verified while downloading to prevent a waste of bandwidth from a compromised account or server.

@soupslurpr soupslurpr removed this from the v0.3.0 milestone Apr 26, 2024
@soupslurpr soupslurpr added the blocked until Rust Blocked until we move to using Rust to run the Whisper models instead of whisper.cpp. label Apr 26, 2024
@soupslurpr
Copy link
Owner Author

This will have to be delayed to avoid complications for when we switch to using Rust for running the Whisper models instead of whisper.cpp.

@machiav3lli
Copy link

machiav3lli commented Jul 20, 2024

This would also solve #27 as far as I see. What are the blockers for using a generic interface that would be hot-swapped when the rust engine is implemented?

@soupslurpr
Copy link
Owner Author

It could open up more work for the future, such as needing to make a model converter if the Rust engine uses a different format. Switching to using Rust to run the models is under high priority and I'm currently researching different Rust machine learning libraries to run Whisper with to find one that's at least close to whisper.cpp's speed.

@machiav3lli
Copy link

@soupslurpr thanks for the explaination. Although it would surprise me if a cpp-rust-adapter would ignore the established formats just for sake of being something else.

That said, it's the FOSS world we're talking about here and logic doesn't always prevail.

@soupslurpr
Copy link
Owner Author

Whisper.cpp itself doesn't use GGUF. It uses its own custom .bin format. No idea why they didn't switch. Also, the Rust library I choose (may be burn) may use a different format and don't know how difficult it would be to convert.

@soupslurpr
Copy link
Owner Author

I'm having trouble finding a library in Rust that can provide sufficient speed. I think this'll have to be implemented before migrating to Rust.

Refactors and rewrites need to be done first, and then a model picker can be implemented. To keep things simple, it'll probably only have a few choices at first. One of them could be a multilingual model. There can be an option to choose whether to automatically detect the language or specify a language. Testing is needed to make sure only languages which actually work with the model are exposed.

The Base Q8_0 model could be the initial multilingual model. I'll have to check benchmarks and maybe ask for community feedback to determine which languages to allow choosing to use for it. Showing all languages, including ones that haven't been tested with the model at all would be harmful because it would project a false impression that Transcribro supports it, and people will be upset that it outputs gibberish with their language.

@soupslurpr soupslurpr removed the blocked until Rust Blocked until we move to using Rust to run the Whisper models instead of whisper.cpp. label Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants