Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tune it for multiclass or multilabel text classification #5

Open
lpschaub opened this issue Nov 3, 2023 · 0 comments
Open

Fine-tune it for multiclass or multilabel text classification #5

lpschaub opened this issue Nov 3, 2023 · 0 comments

Comments

@lpschaub
Copy link

lpschaub commented Nov 3, 2023

I have medical reports, and I try to predict the disease associated with each report :

  1. both medical reports and disease to predict are written by humans -> mistakes, inconsistency in label names (same disease different ways to write it and reverse)
  2. Should I use RobertaForSequenceClassifier or AutomodelForSequenceClassifier ?
  3. what-s the best way to handle imperfect labels ? Embed them in the same roberta tokenized space, and predict the mean of the vector, or predict the whole vector (it becomes then a multilabel task).
    best
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant