Skip to content

Fast lossless entropy-based compressor for FASTQ files

License

Notifications You must be signed in to change notification settings

iam28th/fqcomp28

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reimplementation of fqzcomp4 compression tool with caryless rangecoder replaced by fse codec & static mode, aiming at increased throughput for cases when speed is overly important, such as disk-based reads reordering and real-time compression.

Build from source

git clone [email protected]:iam28th/fqcomp28.git
cd fqcomp28
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

./fqcomp28 --help-all

Usage

./fqcomp28 c --input reads.fastq --output archive.fqc --threads 4
./fqcomp28 d --input archive.fqc --output1 decompressed.fastq --threads 4

Comparison with fqzcomp{4,5}

fqcomp28 seemingly offers a better compression ratio/speed tradeoff than fqzcomp4 and fqzcomp5 with default mode (-5), but looks inferior to faster modes of fqzcomp5 (-3 and -1), particulary because of longer decompression times (the cause of which is under investigation).

All tools noticeably outperform gzip/pigz in both speed and compression ratio.

See Results.md for details.

Limitations

  • headers
    • must all have same structure (same number of "fields" consisting of alphanumeric characters with non-alphanumeric separators between the "fields");
    • a "field" can be no more than 255 characters long;
    • each header must end with an alphanumeric character;
  • read lengths must fit into uint32_t (that is, be <= 4294967296);
  • bases must be one of ACGTN;
  • quality headers (that is, repeated headers on the 3rd line of the record, after +), if present, are discarded in decompressed data (but otherwise compression is lossless).

References

About

Fast lossless entropy-based compressor for FASTQ files

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published