This repository provides benchmarking tools and results for evaluating the performance of BtrBlocks, an efficient columnar compression library designed for data lakes. The goal of this project is to analyze compression ratios and decompression speed.
To ensure accurate benchmarking results, this repository follows a structured approach:
-
Dataset Preparation
- Uses public-bi datasets to test compression efficiency.
-
Benchmarking Metrics
- Compression ratio (original size vs. compressed size).
- Decompression speed (MB/s).
benchmarks/
- Scripts and configurations for running benchmarks.results/
- Stored benchmark results and reports.scripts/
- Helper scripts for data generation and automation.doc/
- Documentation and findings.
Ensure you have the following dependencies installed:
- CMake (>=3.16)
- GCC or Clang with C++20 support
- Python 3.x (for result analysis)
- BtrBlocks (ensure it's installed or built from source)
mkdir build
cd build
cmake ..
make
Execute the benchmarking tool with:
./benchmarks/run_benchmarks
To compare results with Parquet and ORC:
./benchmarks/compare_with_parquet_orc
Detailed results and analyses are available in the results/
directory. We provide breakdowns of compression ratios, speed, and efficiency across different data types and workloads.
- Your Name
- Other Contributors
MIT - See License File.
** MODIFICATION **
- "-DCMAKE_CXX_FLAGS="-Wno-error=deprecated-declarations" is needed
- sudo apt install libtbb-dev