You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I would like to ask a question about parameter settings.I want to prune the llama2 model without changing the hidden_size, which means it is fixed at 4096. However i want to change the num_heads of attention,which means i want to prune the q/k/v/o from 4096 x 4096 to 4096 x 2048.Can i use the code to do this without change something? Also i noticed that in zs_block may have 'qk_head_dim_z', What does this thing do?
The text was updated successfully, but these errors were encountered:
qk_head_dim_z is not supported in the current code yet, and it was supposed to prune head dimensions instead of full heads. The current code supports pruning only the heads without pruning the hidden dimensions. You need to remove hidden from the prune_params. Let me know if you encounter any issues!
Hello, I would like to ask a question about parameter settings.I want to prune the llama2 model without changing the hidden_size, which means it is fixed at 4096. However i want to change the num_heads of attention,which means i want to prune the q/k/v/o from 4096 x 4096 to 4096 x 2048.Can i use the code to do this without change something? Also i noticed that in zs_block may have 'qk_head_dim_z', What does this thing do?
The text was updated successfully, but these errors were encountered: