-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in exporting soundstream to onnx #254
Comments
Update: the error goes away when |
@kalradivyanshu ahh darn... can't wait until Tri's flash attention cuda kernel is standard. it has windowed local attention built-in |
some researchers have told me local attention does little, but i wouldn't bet against it |
@kalradivyanshu maybe i can offer a full attention causal layer in there? with flash attention, 8k is nothing these days |
@lucidrains i was able to make the export work by changing a line in local attention, will attach the details when i get home. I also wanted to ask, if i were to make the soundstream model smaller, its 100mb right now (checkpoint file size), 30mb after quantizing, i were to make it 3x smaller, which layers should i reduce with least impact to accuracy/loss? I basically want to make it work in client devices, downloading 30+mb seems a lot. I just am using soundstream, as an encoder/decoder. |
@kalradivyanshu oh good to hear! would welcome a PR upstream, in the spirit of open source 🙏 i don't really know, as i haven't been following the research there |
Oh okay. from: if m.is_integer():
return False, tensor to if (torch.is_tensor(m) and m.item().is_integer()) or (not torch.is_tensor(m) and m.is_integer()):
return False, tensor Because in some function during onnx export, m was going as a 1D tensor with m inside it, so I hacked this together, no idea if it has any other impact. If you think it has no impact, let me know if I should open a PR. |
@kalradivyanshu hmm, strange, |
@kalradivyanshu is your converted onnx model working ok with this change? |
from audiolm_pytorch import SoundStream, SoundStreamTrainer
import torch
soundstream = SoundStream(
codebook_size = 4096,
rq_num_quantizers = 8,
rq_groups = 2, # this paper proposes using multi-headed residual vector quantization - https://arxiv.org/abs/2305.02765
use_lookup_free_quantizer = False, # whether to use residual lookup free quantization
use_finite_scalar_quantizer = True, # whether to use residual finite scalar quantization
attn_window_size = 128, # local attention receptive field at bottleneck
attn_depth = 2 # 2 local attention transformer blocks - the soundstream folks were not experts with attention, so i took the liberty to add some. encodec went with lstms, but attention should be better
).cpu()
soundstream.eval()
audio = torch.randn(10080).cpu()
torch.onnx.export(soundstream, audio, "soundstream.onnx", input_names = ["input"], output_names=["output"]) this is my export code. |
@kalradivyanshu oh i see, ok yea let me know if the onnx model is ok wow, you are using an unpublished feature! (residual FSQ) does it work? |
I am saving a random untrained soundstream just to test if it can be exported to onnx, will try with a trained one and get back to you. |
ahh, you haven't actually trained it yet, gotcha |
no clue on the residual FSQ, will have to see and get back to you. |
Has anyone exported the soundstream model to ONNX? I tried:
torch.onnx.export(soundstream, audio, "soundstream.onnx")
but it fails with
Any help would be really appreciated, thanks!
The text was updated successfully, but these errors were encountered: