Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tensor_parallel.py #2798

Closed
wants to merge 1 commit into from
Closed

Update tensor_parallel.py #2798

wants to merge 1 commit into from

Conversation

Lacacy
Copy link

@Lacacy Lacacy commented Dec 3, 2024

Resolve the issue of abnormal conversation performance in the Baichuan large model.

Fix the bug in the norm_head adaptation for Baichuan.

Fixes #2780

https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#:~:text=self.weight.data%20%3D%20nn.functional.normalize(self.weight)

image

@OlivierDehaene OR @Narsil

Resolve the issue of abnormal conversation performance in the Baichuan large model.
@Narsil
Copy link
Collaborator

Narsil commented Dec 11, 2024

We cannot really accept this.
This is a bug in Baichuan weights, not in our code.

The issue with your proposed fix is that we support tensor parallelism (TP), which means weight values will depend on what TP value you're using, leading to potentially even more massive discrepancies.
The "true" fix in that sense would be to load the entire weight, normalize it, and then split it across GPU, but it will lead to other issues, the first of which will be excess of VRAM usage, which can cause unwanted OOMs.
Baichuan should fix their weights (unless there's a valid reason to keep the unnormalized weights, but I don't think there's one).

@Narsil Narsil closed this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

I encountered the same issue while using baichuan2-13B-chat..
2 participants