-
-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On-the-Fly Decompression of Compressed Document Formats for Improved Deduplication and Compression #8627
Comments
IIRC there was a similar discussion about image recompression, maybe you can find it. |
I believe the conclusion was that bit identical reconstruction was more important and sometimes different compression implementations produce different but valid results |
My issue was more about exposing the internal structures of those files, as they may change little, but due to compression result in very different documents that do not well deduplicate |
IMHO that's a job for document management Backup needs to be bit identical |
@Beiri22 a backup tool must restore the bit-identical file, as it was at the time of back up. so, if we "unzip" to expose the raw files / boundaries between files and back up that, we would need to "zip" at extract time. And that could likely result in a not identical zip archive, even if the contents are the same. |
I see; but for archival purposes this could be a good - lets say - separate tool prepare your data |
Related: #63 |
Have you checked borgbackup docs, FAQ, and open GitHub issues?
yes, hopefully good enough
Is this a BUG / ISSUE report or a QUESTION?
Feature Proposal
Problem Description
Modern document formats are compressed containers (e.g., ZIP-based) that hinder deduplication and compression in BorgBackup. Minor changes in content often result in significantly different binary representations, reducing storage efficiency. Could we improve deduplication and compression efficiency for inherently compressed document formats like
.docx
,.pptx
, and.odp
by decompressing and recompressing them without compression (store) during the backup process?Proposed Solution
Implement an optional feature to:
Benefits
Challenges and Mitigation
The text was updated successfully, but these errors were encountered: