Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi model load #1

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from
Draft

Multi model load #1

wants to merge 27 commits into from

Conversation

sachaarbonel
Copy link
Member

@sachaarbonel sachaarbonel commented Nov 1, 2024

Multi-Instance Whisper Server with Thread Pool Enhancement

Overview

This PR significantly improves the server implementation by introducing multi-instance model loading and a thread pool for efficient request handling. These changes enable better concurrency, resource utilization, and overall server performance.

Key Changes

1. Multi-Instance Support

  • Added capability to run multiple Whisper model instances concurrently
  • Implemented a thread-safe instance pool with dynamic allocation
  • Added new parameter -i/--instances to control number of model instances
  • Instances are managed through RAII-compliant WhisperContext class

2. Thread Pool Implementation

  • Added a new thread pool implementation for managing worker threads
  • Improved request handling through asynchronous task processing
  • Better resource utilization and reduced thread creation overhead
  • Separated HTTP thread handling from model inference threads

3. Server Improvements

  • Upgraded to latest cpp-httplib version for better performance and stability
  • Added configurable HTTP thread count with -ht/--http-threads
  • Implemented proper request timeout handling
  • Added graceful shutdown mechanism
  • Enhanced error handling and logging
  • Set maximum upload size limit (10MB)

Performance Impact

  • Enables parallel processing of multiple requests
  • Reduces memory overhead through thread reuse
  • Better CPU utilization through dedicated thread pools
  • Improved request handling capacity
  • Enhanced HTTP server performance through httplib upgrade

Usage Example

./server -m models/ggml-base.en.bin -i 4 --http-threads 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant