To build the project, please refer to original BMXNet-v2
This codebase optimize binary gemm kernel of BMXNet-v2 using AVX instructions. The optimization are two parts:
- The xnor popcnt gemm. First remove openmp because this slows down the performance. Second use AVX to optimize xnor+popcnt (this gives ~2x speedup).
- Convert float to bits. This part cost a lot of time and is easily optimized. This gives 5x speedup.
In addition to optimiza binary kernel, A Binary IDF for lossless compression is supplied under example.