Big files vs small files #6045

Sdub76 · 2021-11-11T23:34:23Z

Sdub76
Nov 11, 2021

I'm working on trying to backup my gmail. Right now I'm using Thunderbird to backup via IMAP to an mbox file, which is one big monolithic 15TB file. The mbox file format is plain concatenated text.

An alternative is to use a program that saves in maildir format. This would result in 200k+ small files.

Which would be more efficient to back up? Presumably since the mbox is concatenated text and not encrypted, large parts of it should be identical when chunked (though I admit I don't really understand how that works). Weigh that against the overhead of scanning 200k files.

Thoughts? I'm running my borg routine against local and remote repos twice daily.

Thanks

ThomasWaldmann · 2021-11-12T12:53:49Z

ThomasWaldmann
Nov 12, 2021
Maintainer

Usually dealing with fewer bigger files has better performance than with huge amounts of small files.

In case of borg, of course the boundaries between emails get lost if all emails are concatenated in a single huge file, so it will not create chunks with usually 1 email per chunk (as for maildir format), but at boundaries determined by the chunking algorithm.

As long as you do not shuffle or delete/compact huge amount of emails frequently, I'ld guess mbox is still more efficient and dedup will still work quite ok. Compression might work better for mbox as chunks might be bigger.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big files vs small files #6045

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Big files vs small files #6045

Sdub76 Nov 11, 2021

Replies: 1 comment

ThomasWaldmann Nov 12, 2021 Maintainer

Sdub76
Nov 11, 2021

ThomasWaldmann
Nov 12, 2021
Maintainer