Retraining OpenFold on Cath4.2 #498

OliviaViessmann · 2024-10-31T19:58:49Z

Hi,
i was wondering if anyone has ever retrained OpenFold on a small dataset, e.g. Cath (4.2) to see how well/bad it does at small scale. I am curious for benchmarking -- to compare a new folding method I don't necessarily want to initially train it at the same size as AF/Openfold and see on a smaller scale how they compare so I can guesstimate/extrapolate if it makes sense to scale my novel method.
Does anyone have recommendations on model size and training parameters (lr etc) for Openfold training if I were to train eg on Cath4.2 which is 18k-ish sequence-structure pairs.

Thanks
Olivia

vaclavhanzl · 2024-10-31T20:32:19Z

This is a great question Olivia and while I do not know the answer, I'd love to. I guess everybody who plans some OF training would like to start small and test the tools first. Having some small datasets recommendations (and reference results) in OF docs would be great. Articles often report results with smaller datasets (to show how much data is needed, to show generalization abilities from one data type to another etc.) so the design work on data split was already done multiple times. Having few such datasets documented here would be very welcome. (I briefly discussed this with the OF team at the last OMSF meeting and they seemed to like the idea, too.)

OliviaViessmann · 2024-11-05T14:57:16Z

It would be wonderful if the "big models" had small scale versions of them to quantify their scaling behaviour.
Ok, I just set up the openfold repo locally to retrain as described in these docs. I prepared the CATH4.2 training data as described in these docs
The step I struggle is the template cif creation. How do I find/create the right template files for my CATH 4.2 training set?
Do you have any advice on this step? @vaclavhanzl
Thanks
Olivia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retraining OpenFold on Cath4.2 #498

Retraining OpenFold on Cath4.2 #498

OliviaViessmann commented Oct 31, 2024

vaclavhanzl commented Oct 31, 2024 •

edited

Loading

OliviaViessmann commented Nov 5, 2024

Retraining OpenFold on Cath4.2 #498

Retraining OpenFold on Cath4.2 #498

Comments

OliviaViessmann commented Oct 31, 2024

vaclavhanzl commented Oct 31, 2024 • edited Loading

OliviaViessmann commented Nov 5, 2024

vaclavhanzl commented Oct 31, 2024 •

edited

Loading