Local sparsity control for Naive Bayes with extreme misclassiication costs #20

flrngel · 2018-09-12T06:04:55Z

1. Introduction

In text domain, there is excessive number of features
To control sparsity, "threshold to cut-off feature" was used traditionally
This paper suggests that local approaches (feature selection) has potential benefit

Global sparsity cut-off (feature ranking) is better than feature count cut-off
Cannot say local approach is always better than global approach but seems to better on many cases

standard Naive Bayes classifier has propensity to make errors with high confidence
- especially in the text domain where overconfidence can come from large dimensionality of the feature
paper claims to use local approach and document-specific approach
local feature selection may preferable which dataset and feature ranking functions are considered
Naive Bayes could perform better with document-specific feature selection at cost settings
paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit

flrngel added the Naive Bayes label Sep 12, 2018