Multi model load #1

sachaarbonel · 2024-11-01T13:36:00Z

Multi-Instance Whisper Server with Thread Pool Enhancement

Overview

This PR significantly improves the server implementation by introducing multi-instance model loading and a thread pool for efficient request handling. These changes enable better concurrency, resource utilization, and overall server performance.

Key Changes

1. Multi-Instance Support

Added capability to run multiple Whisper model instances concurrently
Implemented a thread-safe instance pool with dynamic allocation
Added new parameter -i/--instances to control number of model instances
Instances are managed through RAII-compliant WhisperContext class

2. Thread Pool Implementation

Added a new thread pool implementation for managing worker threads
Improved request handling through asynchronous task processing
Better resource utilization and reduced thread creation overhead
Separated HTTP thread handling from model inference threads

3. Server Improvements

Upgraded to latest cpp-httplib version for better performance and stability
Added configurable HTTP thread count with -ht/--http-threads
Implemented proper request timeout handling
Added graceful shutdown mechanism
Enhanced error handling and logging
Set maximum upload size limit (10MB)

Performance Impact

Enables parallel processing of multiple requests
Reduces memory overhead through thread reuse
Better CPU utilization through dedicated thread pools
Improved request handling capacity
Enhanced HTTP server performance through httplib upgrade

Usage Example

./server -m models/ggml-base.en.bin -i 4 --http-threads 8

sachaarbonel added 4 commits November 1, 2024 10:49

wip

8179d4e

handle graceful shutdown + better logs

fce0342

cuda wip

db7ddb5

wip makefile

abf87b3

sachaarbonel force-pushed the multi-model-load branch from f5afea5 to abf87b3 Compare November 1, 2024 14:01

sachaarbonel added 23 commits November 1, 2024 15:13

wip makefile

d544743

wip makefile

136104e

wip makefile

afab079

wip makefile

8225156

wip makefile

ae94341

wip makefile

b13e1b0

wip

fc7025e

threadpool

33b7a66

fix shutdown

22e6bfc

fix shutdown

c20aa7c

wip

ff2cb48

cleanup unused

198d086

round robin

bfad042

fix malloc

4dd2d66

fix k6

463c753

wip

081ad1f

wip

3475175

wip

b0b95ab

k6

9843e4e

simplify + cpp 11

3f71fe8

cpp 11

16289f7

cpp 11

42b225c

cpp 11

daf0045

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi model load #1

Multi model load #1

sachaarbonel commented Nov 1, 2024 •

edited

Loading

Multi model load #1

Are you sure you want to change the base?

Multi model load #1

Conversation

sachaarbonel commented Nov 1, 2024 • edited Loading

Multi-Instance Whisper Server with Thread Pool Enhancement

Overview

Key Changes

1. Multi-Instance Support

2. Thread Pool Implementation

3. Server Improvements

Performance Impact

Usage Example

sachaarbonel commented Nov 1, 2024 •

edited

Loading