-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I convert a siglip only and not a siglip based LLM? #16
Comments
Yea you could instantiate the CLIP class directly with name or path of SigLIP model, I think you will need to make your own script for calling it though.
…________________________________
From: Billy Cao ***@***.***>
Sent: Thursday, June 6, 2024 12:28:29 PM
To: dusty-nv/NanoLLM ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [dusty-nv/NanoLLM] Can I convert a siglip only and not a siglip based LLM? (Issue #16)
Based on supported models, conversion of SigLIP to TRT is already done, but can I use it standalone for a SigLIP model only?
—
Reply to this email directly, view it on GitHub<#16>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADVEGKY6TL5J5RK2YETRBJLZGCE23AVCNFSM6AAAAABI5CNRNGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTQNRYGI4DCMY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Do i have to convert siglip to TRT myself, or can nanollm handle the conversion if I supply a pytorch model from HF? |
It should handle it internally for you, and will save the trt engine
…________________________________
From: Billy Cao ***@***.***>
Sent: Thursday, June 6, 2024 2:01:47 PM
To: dusty-nv/NanoLLM ***@***.***>
Cc: Dustin Franklin ***@***.***>; Comment ***@***.***>
Subject: Re: [dusty-nv/NanoLLM] Can I convert a siglip only and not a siglip based LLM? (Issue #16)
Do i have to convert siglip to TRT myself, or can nanollm handle the conversion if I supply a pytorch model from HF?
—
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADVEGK2JSQTU5HOY2GOMED3ZGCPYXAVCNFSM6AAAAABI5CNRNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJTGEYDIOJYGM>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Thanks! |
Sorry another question, where is the code that does the inference on TRT engine? I am trying to write my own inference script but faced some issues. I thought you are experienced with TRT inference so i want to try my luck and see how you implemented it. This is the script I have for context
|
@aliencaocao I am using torch2trt from @jaybdub which gives you a model object with the same interface as pytorch so you can just transparently replace your pytorch model with the TRT version. Which is what I do here: NanoLLM/nano_llm/vision/clip.py Line 176 in b0be327
If I wanted to make something that didn't depend on pytorch, then yea I would use the TRT API directly, or onnxruntime if I wanted to be able to fallback to cuDNN. |
Thanks for pointing out. Do you know how can I load a .engine directly using torch2trt? I have exported it separately to use my own shape configs. |
Actually i managed to port it over but im getting some |
@aliencaocao not specifically as related to this (being CLIP/SigLIP) |
Thank you. I will try to convert using torch2trt myself and see how. I tried to use nanoLLM but it is missing the text model of siglip which I also need, else I would have used it straight away. Thank you for your help! |
OK thanks, let me know if you find that TRT or torch2trt can build/run the SigLIP text encoder, that would be good for me to add too. |
This comment was marked as resolved.
This comment was marked as resolved.
Got it to work after NVIDIA-AI-IOT/torch2trt#932 |
Full conversion script, requires
Remove all the torch.float16 if you want to be in fp32 Then can just load via To get the logits:
|
Thanks @aliencaocao , that's great! I'm going to unify the various CLIP/SigLIP implementations I have floating around between NanoLLM/NanoDB with support for the text encoder in TRT alongside the vision encoder 👍 |
@aliencaocao did you get the text encoder working in TRT with real token ID's? The output delta is small when the input_id's are all 1, but when I actually tokenize a real string it doesn't work. Which version of TensorRT are you using? edit: I also tried using the attention_mask from the tokenizer |
TRT 10.1. Yes all tokenid, i am using it in on over 20k samples already |
One very important thing is do not use HF Processor but instead HF tokenzier. The processor does not pad input to 64 ('max-length') token which is what siglip has been trained on. also double check that padding token id is 1. Also note that I hard coded batchsize of text to be 1, and you may have to change it to a more dynamic one. Attenmask is not needed. Also make sure you are using the right logit_scale_exp and logit_bias from the original HF model, they changes for fine-tuned models. And don't forget to And to get the same output as HF pipeline (python/js), you need to add a torch.sigmoid(scores) before the last line. For purpose of image reranking that's not needed since sigmoid wont change the order. |
Based on supported models, conversion of SigLIP to TRT is already done, but can I use it standalone for a SigLIP model only?
The text was updated successfully, but these errors were encountered: