FastNCD is a simple C++ library to compute the Normalized Compression Distance (NCD) between two strings.
FastNCD currently supports the following compression methods:
Compressor | Main Algorithm | Compression Level | Flag |
---|---|---|---|
Gzip | DEFLATE | 9 (best) | z_gzip_bc |
Zlib | DEFLATE | 9 (best) | z_zlib_bc |
Zlib | DEFLATE | 1 (fastest) | z_zlib_fc |
Snappy | LZ77 | Default | z_snappy_ds |
Bzip2 | Burrows–Wheeler Transform | Default | z_bzip2_ds |
Use the following command to create a shared library:
make all
Then, you can use the library in your project by including the header file fast_ncd.h
in your source code and linking
your code with the shared library (libnewFastNCD.so
file). Also, don't forget to link your
code with libsnappy
and libboost_iostreams
while building your project.
To run the tests, use the following command:
bash scripts/run_tests.sh
Example code for calculating the NCD between two identical C++ strings by using different compressors:
#include "fast_ncd.h"
#include <iostream>
#include <string>
using namespace std;
int main() {
// Two identical long (over 500 characters long) strings from The American Crisis by Thomas Paine
string x =
"THESE are the times that try men's souls. The summer soldier and the sunshine patriot will, in this crisis, shrink from the service of their country; but he that stands by it now deserves the love and thanks of man and woman. Tyranny, like hell, is not easily conquered; yet we have this consolation with us, that the harder the conflict, the more glorious the triumph. What we obtain too cheap, we esteem too lightly: it is dearness only that gives everything its value. Heaven knows how to put a proper price upon its goods, and it would be strange indeed if so celestial an article as FREEDOM should not be highly rated";
string y =
"THESE are the times that try men's souls. The summer soldier and the sunshine patriot will, in this crisis, shrink from the service of their country; but he that stands by it now deserves the love and thanks of man and woman. Tyranny, like hell, is not easily conquered; yet we have this consolation with us, that the harder the conflict, the more glorious the triumph. What we obtain too cheap, we esteem too lightly: it is dearness only that gives everything its value. Heaven knows how to put a proper price upon its goods, and it would be strange indeed if so celestial an article as FREEDOM should not be highly rated";
// Of course, we first need to create an instance for our NCD class ;)
NCD ncd = NCD();
cout << "string x is: " << x << endl << endl;
cout << "string y is: " << y << endl << endl;
// Snappy with default compression settings used as Z
cout << "snappy with default compression settings: " << ncd.calculate_ncd(x, y, z_snappy_ds)
<< endl;
// GZip with best compression (level 9 compression) used as Z
cout << "gzip with the best compression: "
<< ncd.calculate_ncd(x, y, z_gzip_bc) << endl;
// ZLib with best compression (level 9 compression) used as Z
cout << "zlib with the best compression: "
<< ncd.calculate_ncd(x, y, z_zlib_bc) << endl;
// ZLib with fastest compression (level 1 compression) used as Z
cout << "zlib with fastest compression: "
<< ncd.calculate_ncd(x, y, z_zlib_fc) << endl;
// Bzip2 with default compression settings used as Z
cout << "bzip2 with default compression settings: "
<< ncd.calculate_ncd(x, y, z_bzip2_ds) << endl;
return 0;
}
If you are using a Debian-based operating system, you can install the required dependencies by using the following command:
sudo apt-get install libboost-iostreams-dev libsnappy-dev zlib1g-dev libbz2-dev
The main dependencies of the project are:
- Boost.Iostreams
- Snappy Compression Library
- Zlib Compression Library
- Bzip2 Compression Library
- Gzip Compression Library
- Zlib Compression Library
Other dependencies like g++
, GNU Make
, and Bash
are required to build the project and run the tests.
This project is licensed under the MIT License - see the LICENSE file for details.