Rework of train_model function #65

Old-Shatterhand · 2024-11-14T10:45:16Z

I reworked the metric-handling in the train_model function by changing the following:

Adding metrics. The metric calculations are still fully based on the scikit-learn. It now tracks the following:
- ACC, MCC, and AUROC (new) for single- and multi-class classification
- ACC (new), MCC (new), LRAP, and NDCG for multilabel classification
- MSE, MAE (new), and R2 (new) for regression
Weighting of the metrics when summarizing based on individual batch sizes (the last batch might be smaller than the others). It only has minor effects, but it is still more precise.
The best performances are now taken from the best model (not the best ever-seen value which might come from a different model).
The best model is determined based on the lowest loss value (applies to all classification and regression settings).
The function has a new boolean parameter return_metrics, which defaults to False to preserve the old behaviour. If set to True, the function returns the best model and all training and validation metrics as a tuple of type Tuple[torch.nn.Module, dict["train": dict[...], "val": dict[...]]].
Added comment to the optimizer argument that it has to be a SAM optimizer for classification settings. The standard optimizers like Adam don't have first_step and second_step functionality.
Telling names. Before, best_acc could have been an actual accuracy, an MSE, or some LRAP. This functionality has been replaced by a variable called best_lead_metric that is inferred from the validation ACC, MSE, or LRAP.

The new code is tested for a LectinOracle training (works) and should not tamper with the old behaviour, i. e., everything possible before, is still possible. Only the selection of the best model has changed to be loss-based and not lead-metric-based (best_acc before).

back-merge master into dev

…ction

# Conflicts: # glycowork/ml/model_training.py # glycowork/ml/models.py

Bribak · 2024-11-14T13:58:59Z

Thank you & great job!
But could you merge into dev instead? Since the documentation is generated from master, any development work could/would derail, or at least alter, the documentation of the currently stable version. We'll then roll it into master with v1.5

Bribak and others added 8 commits November 11, 2024 14:31

Merge pull request #64 from BojarLab/master

5bcb2aa

back-merge master into dev

add type hints

e4d45a3

Rework of metric tracking in model training and minor type-hint corre…

2a9a39b

…ction

make type hints Python 3.8 compatible and switch to Docment docstrings

e6721a1

minor style fixes

add3192

Merge branch 'refs/heads/dev' into dev_dl

df452c8

# Conflicts: # glycowork/ml/model_training.py # glycowork/ml/models.py

Bug fixing and merge clearing of train_model

49c8949

Some final fixes for the PR

2bc034d

Old-Shatterhand added the enhancement New feature or request label Nov 14, 2024

Old-Shatterhand requested a review from Bribak November 14, 2024 10:45

Old-Shatterhand closed this Nov 14, 2024

Bribak deleted the dev_dl branch November 14, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework of train_model function #65

Rework of train_model function #65

Old-Shatterhand commented Nov 14, 2024

Bribak commented Nov 14, 2024

Rework of train_model function #65

Rework of train_model function #65

Conversation

Old-Shatterhand commented Nov 14, 2024

Bribak commented Nov 14, 2024