Skip to content

Reece-Ausmus/File-Compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TXT-Compression

This Python script will use Huffman Coding to compress files. It also provides functionality to decompress .bin files assuming they are compressed with this program. The script includes methods to build a Huffman Tree based on the frequency of characters in a text, encode the text using Huffman Codes, and compress the data into a binary file.

Disclaimer

This script does not guarantee a compressed file that is smaller in size than the original input file. This script utilizes Huffman Coding as the only form of compression. Due to the nature of Huffman Coding, a file is not guaranteed to be smaller once encoded. If the output file would be larger than the input file, no additional output file will be created or written to.

Usage

python3 huffman_compression.py <file_name> -c/-d [<output_file>]
  • <file_name>: The name of the input file
    • Use a .txt file for compression
    • Use a .bin file for decompression
  • -c/-d: Use "-c" for compression, "-d" for decompression
  • <output_file>: The name of the output file (optional)

Compression Example

python3 huffman_compression.py input.txt c

This command compresses the "input.txt" file and generates a binary compressed file named "input_compressed.bin".

With Fourth Argument

python3 huffman_compression.py input.txt c output.bin

This command compresses the "input.txt" file and writes to a binary compressed file named "output.bin".

Decompression Example

python3 huffman_compression.py input.bin d

This command decompresses the "input.bin" file and generates a text decompressed file named "input_decompressed.txt".

With Fourth Argument

python3 huffman_compression.py input.bin d output.txt

This command decompresses the "input.bin" file and writes to a text decompressed file named "output.txt".

Huffman Code Storage

The Huffman Codes are stored at the beginning of the binary compressed file during compression using the pickle module. The compressed file format includes the Huffman Codes as a dictionary, a separator byte (0x00), 4 bytes containing the length of the encoded text, and the encoded text itself.

About

This program uses Huffman Coding to compress files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages