Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local sparsity control for Naive Bayes with extreme misclassiication costs #20

Open
flrngel opened this issue Sep 12, 2018 · 0 comments

Comments

@flrngel
Copy link
Owner

flrngel commented Sep 12, 2018

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.5667&rep=rep1&type=pdf

1. Introduction

  • In text domain, there is excessive number of features
  • To control sparsity, "threshold to cut-off feature" was used traditionally
  • This paper suggests that local approaches (feature selection) has potential benefit

4. Sparsity control via feature selection

  • Global sparsity cut-off (feature ranking) is better than feature count cut-off
  • Cannot say local approach is always better than global approach but seems to better on many cases

6. Datasets

6.2. Model comparision

  • NBLOC is best
    image
    image

7. Results

image

8. Conclusions

  • standard Naive Bayes classifier has propensity to make errors with high confidence
    • especially in the text domain where overconfidence can come from large dimensionality of the feature
  • paper claims to use local approach and document-specific approach
  • local feature selection may preferable which dataset and feature ranking functions are considered
  • Naive Bayes could perform better with document-specific feature selection at cost settings
  • paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant