You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
i was wondering if anyone has ever retrained OpenFold on a small dataset, e.g. Cath (4.2) to see how well/bad it does at small scale. I am curious for benchmarking -- to compare a new folding method I don't necessarily want to initially train it at the same size as AF/Openfold and see on a smaller scale how they compare so I can guesstimate/extrapolate if it makes sense to scale my novel method.
Does anyone have recommendations on model size and training parameters (lr etc) for Openfold training if I were to train eg on Cath4.2 which is 18k-ish sequence-structure pairs.
Thanks
Olivia
The text was updated successfully, but these errors were encountered:
This is a great question Olivia and while I do not know the answer, I'd love to. I guess everybody who plans some OF training would like to start small and test the tools first. Having some small datasets recommendations (and reference results) in OF docs would be great. Articles often report results with smaller datasets (to show how much data is needed, to show generalization abilities from one data type to another etc.) so the design work on data split was already done multiple times. Having few such datasets documented here would be very welcome. (I briefly discussed this with the OF team at the last OMSF meeting and they seemed to like the idea, too.)
It would be wonderful if the "big models" had small scale versions of them to quantify their scaling behaviour.
Ok, I just set up the openfold repo locally to retrain as described in these docs. I prepared the CATH4.2 training data as described in these docs
The step I struggle is the template cif creation. How do I find/create the right template files for my CATH 4.2 training set?
Do you have any advice on this step? @vaclavhanzl
Thanks
Olivia
Hi,
i was wondering if anyone has ever retrained OpenFold on a small dataset, e.g. Cath (4.2) to see how well/bad it does at small scale. I am curious for benchmarking -- to compare a new folding method I don't necessarily want to initially train it at the same size as AF/Openfold and see on a smaller scale how they compare so I can guesstimate/extrapolate if it makes sense to scale my novel method.
Does anyone have recommendations on model size and training parameters (lr etc) for Openfold training if I were to train eg on Cath4.2 which is 18k-ish sequence-structure pairs.
Thanks
Olivia
The text was updated successfully, but these errors were encountered: