ChaCha20 128-Bit Vectorized Implementation

This is an optimized version of the ChaCha20 algorithm that uses SIMD (Single Instruction, Multiple Data) instructions to process multiple pieces of data simultaneously, instead of the sequential processing provided by the regular chacha20 algorithm.

This version utilizes Intel Intrinsics through the header immintrin.h.

It is called 128-bit vectorization because it requires 4 128-bit vectors as input (each vector is a complete row of the state matrix, 4 words x 4 bytes in each word x 8 bits in each byte).

This vectorization performs 4 consecutive operations of the non-vectorized version of chacha20 into 1 single vectorized operation:

This operation, which I called a "Double whole round" can now be reduced to just two vectorized operations (one for the row round and one for the column round), because each vector operation handles two quarter-rounds at once.
It will concatenate the columns and diagonal permutation rounds one after the other.
Double (columns + diagonals) Whole (4 parallel quarter rounds) Round.

Resources:

These resources were used throughout the development of the project outlined above.

Paper: Original research paper on ChaCha20 by Daniel J. Bernstein.
RFC 8439: Standardized specification of ChaCha20 and its use in internet protocols.
SIMD Programming Blog entry: In-depth explanation on how to program with SIMD instructions.

Tests

The compiled executable, chacha20, supports the following options for testing and usage:

Run all available tests: Test vectors (decryption and encryption), and clock cycle tests

./chacha20 --all-tests

Encrypt test vector N (where N is from 1 to 5 for each test vector within the RFC 8439).

./chacha20 --enc-tv N

Decrypt test vector N (where N is from 1 to 5).

./chacha20 --dec-tv N

Clock cycles test N (where N is from 1 to 5 for each plaintext length per ECRYPT).

./chacha20 --clock-ct N

Encrypt a custom input provided by the user.

./chacha20 --enc-ci

Decrypt a custom input provided by the user.

./chacha20 --dec-ci

Performance

This implementation of ChaCha20 has been optimized for performance and includes benchmarking scripts to measure its efficiency. The results are comparable to established benchmarks, such as those available at ECRYPT. To measure the cycles per byte, use the provided automation scripts:

Windows

Run the clock_cycles_tests.bat script:

clock_cycles_tests.bat

Linux

Run the clock_cycles_tests.sh script:

clock_cycles_tests.sh

These scripts perform automated benchmarking of the ChaCha20 implementation, similar to how benchmarks are conducted for cryptographic algorithms at ECRYPT. The average and median results in cycles per byte are as follows:

Plaintext Length	Average	Median
8-bytes	167.80	158.88
64-bytes	15.32	14.96
576-bytes	10.73	9.65
1536-bytes	10.78	9.22
4096-bytes	10.30	9.39

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
Makefile		Makefile
README.md		README.md
benchmarking_chacha20_v128(4).xlsx		benchmarking_chacha20_v128(4).xlsx
benchmarking_chacha20_v128(5).xlsx		benchmarking_chacha20_v128(5).xlsx
benchmarking_chacha20_v128_github_actions(6).xlsx		benchmarking_chacha20_v128_github_actions(6).xlsx
calculate_throughput.c		calculate_throughput.c
chacha20_functions_v128.h		chacha20_functions_v128.h
chacha20_v128.c		chacha20_v128.c
clock_cycles_tests.bat		clock_cycles_tests.bat
decrypt_custom_input.c		decrypt_custom_input.c
decrypt_v128.c		decrypt_v128.c
double_whole_round_v128.c		double_whole_round_v128.c
encrypt_custom_input.c		encrypt_custom_input.c
encrypt_v128.c		encrypt_v128.c
generated_results(4).csv		generated_results(4).csv
generated_results(5).csv		generated_results(5).csv
generated_results(6).csv		generated_results(6).csv
permute_state_v128.c		permute_state_v128.c
run_decrypt_test.c		run_decrypt_test.c
run_encrypt_test.c		run_encrypt_test.c
state_init.c		state_init.c
state_to_vectors_v128.c		state_to_vectors_v128.c
test_benchmarking_v128.xlsx		test_benchmarking_v128.xlsx
vectors_to_state_v128.c		vectors_to_state_v128.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChaCha20 128-Bit Vectorized Implementation

Resources:

Tests

Performance

Windows

Linux

About

Releases

Packages

Languages

pablogf-uma/chacha20-128-vec

Folders and files

Latest commit

History

Repository files navigation

ChaCha20 128-Bit Vectorized Implementation

Resources:

Tests

Performance

Windows

Linux

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages