You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regarding the note from @jnwei on May 3rd, have this been addressed?
"A quick note on the pytorch 2 / CUDA 12 upgrade:
We've run into some technical issues with the pytorch 2 upgrade. Briefly, we observe large instabilities in our training losses in the pytorch2 version relative to our pytorch 1 version.
For inference, we're also observing a slight difference between model outputs in pytorch 1 and pytorch 2. The difference in final output coordinates is about RMSD~0.05A for the proteins I've looked at While these differences might seem small, it may point to a larger issue that is also occurring in training; we're currently looking into it.
Until we find the root cause of the discrepancy, or a way around the training instability, we're not ready to update the main branch to pytorch 2.
Meanwhile, we will upgrade the main branch to use pytorch lightning 2, which has a few features that the team has found useful. I'll also push some changes to pl_upgrades that integrate some of the changes from the main branch, and cleans up the conda environment / docker for a CUDA 12 / pytorch 2.
We are actively working on debugging the instability, and we'll keep you posted as soon as we are ready to upgrade. Thank you all for your interest and your patience.
Regarding the note from @jnwei on May 3rd, have this been addressed?
"A quick note on the pytorch 2 / CUDA 12 upgrade:
We've run into some technical issues with the pytorch 2 upgrade. Briefly, we observe large instabilities in our training losses in the pytorch2 version relative to our pytorch 1 version.
For inference, we're also observing a slight difference between model outputs in pytorch 1 and pytorch 2. The difference in final output coordinates is about RMSD~0.05A for the proteins I've looked at While these differences might seem small, it may point to a larger issue that is also occurring in training; we're currently looking into it.
Until we find the root cause of the discrepancy, or a way around the training instability, we're not ready to update the main branch to pytorch 2.
Meanwhile, we will upgrade the main branch to use pytorch lightning 2, which has a few features that the team has found useful. I'll also push some changes to pl_upgrades that integrate some of the changes from the main branch, and cleans up the conda environment / docker for a CUDA 12 / pytorch 2.
We are actively working on debugging the instability, and we'll keep you posted as soon as we are ready to upgrade. Thank you all for your interest and your patience.
Originally posted by @jnwei in #403 (comment)"
The text was updated successfully, but these errors were encountered: