Skip to content

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

License

Notifications You must be signed in to change notification settings

jamescporter/MISSION

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

MISSION

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

An ICML 2018 paper by Amirali Aghazadeh*, Ryan Spring*, Daniel LeJeune, Gautam Dasarathy, Anshumali Shrivastava, Richard G. Baraniuk

* These authors contributed equally and are listed alphabetically.

Code Versions

  1. Mission Logistic Regression
  2. Mission Softmax Regression
  3. Feature Hashing Softmax Regression

Optimizations

  • Mission streams in the dataset via Memory-Mapped I/O instead of loading everything directly into memory -
    Necessary for Tera-Scale Datasets
  • AVX SIMD optimization for fast Softmax Regression
  • The code is currently optimized for the Splice-Site and DNA Metagenomics datasets.

Datasets

  1. KDD 2012
  2. RCV1
  3. Webspam - Trigram
  4. DNA Metagenomics
  5. Criteo 1TB
  6. Splice-Site 3.2TB

About

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 96.5%
  • Makefile 2.0%
  • Objective-C 1.5%