QAT Codec

QAT Codec project provides compression and decompression library for Apache Hadoop to make use of the Intel® QuickAssist Technology for compression/decompression.

Big data analytics are commonly performed on large data sets that are moved within a Hadoop cluster containing high-volume, industry-standard servers. A significant amount of time and network bandwidth can be saved when the data is compressed before it is passed between servers, as long as the compression/ decompression operations are efficient and require negligible CPU cycles. This is possible with the hardware-based compression delivered by Intel® QuickAssist Technology, which is easy to integrate into existing systems and networks using the available Intel drivers and patches.

Notice

[Intel QATCodec, version 2.3.0] has been updated to include functional and security updates. Users should update to the latest version.

Online Documentation

http://www.intel.com/content/www/us/en/embedded/technology/quickassist/overview.html

Building QAT Codec

1. Building with Maven

This option assumes that you have installed maven in your build machine. Also assumed to have java installed and set JAVA_HOME

Run the following command for building qatcodec.jar and libqatcodec.so

mvn clean install -Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE PATH

Here

 qatzip.libs - A path where qatzip libraries placed. This is needed because QATCodec depends on qatzip libraries.
 qatzip.src  - A path where qatzip source code placed. This is needed because QATCodec needs qatzip exposed h files for building.

Native code building will be skipped in Windows machine as QATCodec native code can not be build in Windows.

When you run the build in Linux os, native code will be build automatically when run the above command.

If you want native building to be skipped in linux os explicitly, then you need to mention -DskipNative

ex: mvn clean install -Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE PATH -DskipNative

By default above commands will run the test cases as well. TO skip the test cases to run use the following command

mvn clean install -DskipTests Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE_PATH

To run the specific test cases

mvn clean test -Dtest=TestQatCompressorDecompressor Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE PATH

2. Building with Makefile

1. Building qatcodec.jar

Set the below env variables,

JAVA_HOME - Java home

HADOOPJARS - Cloudera Hadoop jars

After exporting above parameters execute the following commands

cd QATCodec/build/

make

2. Building libqatcodec.so

Set the below env variables,

JAVA_HOME - Java home

QATZIPSRC - QATZIP source code path

LD_LIBRARY_PATH - make sure to export LD_LIBRARY_PATH with qatzip libraries

After exporting above parameters execute the following commands

cd QATCodec/build/native/

make

Build for CDP DC 7.0

Build the hive module for QAT

1. Run the scripts

$ cd columnar_format_qat_wrapper
$ ./apply_hive_jars.sh 7.0.0 $PATH/TO/IntelQATCodec

After this, we can see that in the folder under columnar_format_qat_wrapper/target, there have four parts: (1) parquet-format (2) parquet-mr (3) orc (4) hive

2. Building the Parquet-format

go to the folder target/parquet-format
Please refer to documentation at building for detailed prerequisites and guidance on building parquet-format.

3. Building the Parquet-mr

Install Protobuf 3.5.1
Install Thrift 0.9.3
go to the folder target/parquet-mr
Please refer to the documentation at building for detailed prerequisites and guidance on building parquet-mr.

4. Building the ORC

Install java 1.7 or higher
Install maven 3 or higher
Install cmake
go to the folder target/orc
Please refer to the documentation at building for detailed prerequisites and guidance on buidling orc.

5. Building the Hive

go to the folder target/hive
Please refer to the documentation at getting-started for detailed prerequisites and guidance on buidling hive.

6. Copy the jars to the CDP

Please copy the following jars obtained in the previous steps to appropriate location in CDP.

parquet-format-2.4.0.jar
parquet-common-1.10.0.jar
parquet-hadoop-1.10.0.jar
orc-core-1.5.1.jar
orc-shims-1.5.1.jar
hive-exec-3.1.0.jar

How to use QATCodec for Spark SQL Parquet Datasource

1. Copy the jars to the Spark

Please copy the following jars obtained in the previous steps to appropriate location in Spark

parquet-format-2.4.0.jar
parquet-common-1.10.1.jar
parquet-hadoop-1.10.1.jar

2. Configuration to enable QATCodec

Put below configurations to $SPARK_HOME/conf/spark-defaults.conf or via spark-shell --conf

spark.sql.parquet.compression.codec gzip
spark.hadoop.io.compression.codec.qat.enable true

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assembly		assembly
carbondata_qat_wrapper		carbondata_qat_wrapper
columnar_format_qat_wrapper		columnar_format_qat_wrapper
docs		docs
hadoop_qat_wrapper		hadoop_qat_wrapper
kafka_qat_wrapper		kafka_qat_wrapper
spark_qat_wrapper		spark_qat_wrapper
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pom.xml		pom.xml
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QAT Codec

Notice

Online Documentation

Building QAT Codec

1. Building with Maven

2. Building with Makefile

1. Building qatcodec.jar

2. Building libqatcodec.so

Build for CDP DC 7.0

Build the hive module for QAT

1. Run the scripts

2. Building the Parquet-format

3. Building the Parquet-mr

4. Building the ORC

5. Building the Hive

6. Copy the jars to the CDP

How to use QATCodec for Spark SQL Parquet Datasource

1. Copy the jars to the Spark

2. Configuration to enable QATCodec

For any security concerns, please visit https://01.org/security.

About

Releases 7

Packages

Contributors 6

Languages

License

Intel-bigdata/IntelQATCodec

Folders and files

Latest commit

History

Repository files navigation

QAT Codec

Notice

Online Documentation

Building QAT Codec

1. Building with Maven

2. Building with Makefile

1. Building qatcodec.jar

2. Building libqatcodec.so

Build for CDP DC 7.0

Build the hive module for QAT

1. Run the scripts

2. Building the Parquet-format

3. Building the Parquet-mr

4. Building the ORC

5. Building the Hive

6. Copy the jars to the CDP

How to use QATCodec for Spark SQL Parquet Datasource

1. Copy the jars to the Spark

2. Configuration to enable QATCodec

For any security concerns, please visit https://01.org/security.

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 6

Languages

Packages