Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retraining OpenFold on Cath4.2 #498

Open
OliviaViessmann opened this issue Oct 31, 2024 · 2 comments
Open

Retraining OpenFold on Cath4.2 #498

OliviaViessmann opened this issue Oct 31, 2024 · 2 comments

Comments

@OliviaViessmann
Copy link

Hi,
i was wondering if anyone has ever retrained OpenFold on a small dataset, e.g. Cath (4.2) to see how well/bad it does at small scale. I am curious for benchmarking -- to compare a new folding method I don't necessarily want to initially train it at the same size as AF/Openfold and see on a smaller scale how they compare so I can guesstimate/extrapolate if it makes sense to scale my novel method.
Does anyone have recommendations on model size and training parameters (lr etc) for Openfold training if I were to train eg on Cath4.2 which is 18k-ish sequence-structure pairs.

Thanks
Olivia

@vaclavhanzl
Copy link
Contributor

vaclavhanzl commented Oct 31, 2024

This is a great question Olivia and while I do not know the answer, I'd love to. I guess everybody who plans some OF training would like to start small and test the tools first. Having some small datasets recommendations (and reference results) in OF docs would be great. Articles often report results with smaller datasets (to show how much data is needed, to show generalization abilities from one data type to another etc.) so the design work on data split was already done multiple times. Having few such datasets documented here would be very welcome. (I briefly discussed this with the OF team at the last OMSF meeting and they seemed to like the idea, too.)

@OliviaViessmann
Copy link
Author

It would be wonderful if the "big models" had small scale versions of them to quantify their scaling behaviour.
Ok, I just set up the openfold repo locally to retrain as described in these docs. I prepared the CATH4.2 training data as described in these docs
The step I struggle is the template cif creation. How do I find/create the right template files for my CATH 4.2 training set?
Do you have any advice on this step? @vaclavhanzl
Thanks
Olivia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants