Fine-tune it for multiclass or multilabel text classification #5

lpschaub · 2023-11-03T16:12:26Z

I have medical reports, and I try to predict the disease associated with each report :

both medical reports and disease to predict are written by humans -> mistakes, inconsistency in label names (same disease different ways to write it and reverse)
Should I use RobertaForSequenceClassifier or AutomodelForSequenceClassifier ?
what-s the best way to handle imperfect labels ? Embed them in the same roberta tokenized space, and predict the mean of the vector, or predict the whole vector (it becomes then a multilabel task).
best

Provide feedback