-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triton ensemble model configuration for transducer models #371
Comments
I am working on it now, and managed to follow the docs to make triton server up and running(all components including ensemble transducer are up and running), but when I am try to start client script it fails.
One weird thing I noticed was that when I start the server, I see these warnings/errors
Is this possibly related to memory leak issue ? |
Also, I am trying pretrained model from section "Deploy onnx with arbitrary pruned_transducer_stateless_X(2,3,4,5) model for Chinese or English recipes" After downloading the model files, I am getting the following error:
|
I am getting this error when I try to run default streaming example provided in sherpa/triton folder (https://github.com/k2-fsa/sherpa/tree/master/triton/model_repo_streaming)
Did you guys encounter this issue as well ? What is your current status on transducer setup for triton, is it stable for you guys ? Also, as mentioned in previous comment(#371 (comment)), I suspect its related kaldifeats library memory leak issue, if this is a known issue, do you suggest to try using a different library for feature extraction ? |
Could you help to have a look at this issue? |
Hi, thanks for trying this triton recipe.
|
Hi @yuekaizhang I ensured that the config parameters in $model_repo_path/*/config.pbtxt are matching properties as per onnx export log file For reference:
Still getting same error
" This error may be caused by outdated request. At the beginning of the service startup, due to insufficient warming up, if a request is cleared due to timeout, it will cause later arriving chunks to lose their start flag. You may first try to warmup service with small batch size and concurrency." I also tried with --num-tasks 1 argument, but it still fails.
|
https://github.com/k2-fsa/sherpa/blob/master/triton/client/decode_manifest_triton.py#L381-L383 Here, please check your first_chunk_ms, decoding_window_length, |
I have checked, these values are correct.
|
If the values are correct, could you trace back to figure out how do you get this 12640 number? |
Hi @yuekaizhang I noticed that in the wav_segs(https://github.com/k2-fsa/sherpa/blob/master/triton/client/decode_manifest_triton.py#L269), in the last segment, the number of samples(length) are different from all other segments, causing an issue. So after adding at https://github.com/k2-fsa/sherpa/blob/master/triton/client/decode_manifest_triton.py#L282 problem is fixed. Do you think this is a bug? |
I am not sure. If it is a bug, it will exist in feature_extractor/1/model.py rather than this client here. Could you make sure that https://github.com/k2-fsa/sherpa/blob/master/triton/model_repo_streaming/feature_extractor/1/model.py#L52 here assert len(self.wav) > 0 always hold? Otherwise, there is a problem somewhere. Since if you keep that last seg, I don't understand https://github.com/k2-fsa/sherpa/blob/master/triton/model_repo_streaming/feature_extractor/1/model.py#L45 why len(self.wav) become 0 except for first chunk. How do you fix this previous issue |
Good observation, |
Okay, close the issue since it is fixed. |
Hello, could you please give a reference configuration for the ensemble transducer model in the example repository for a pure triton-based inference? Specifically, how do we interface the variable
y
from the scorer back to the decoder input? Also, the template misses thejoiner_decoder_proj
andjoiner_decoder_proj
parts.The text was updated successfully, but these errors were encountered: