Skip to content
Alexander Regueiro edited this page Nov 22, 2014 · 3 revisions

ddar

store multiple files efficiently in a de-duplicated archive

Synopsis

ddar [-]c [-f] archive member [member...]
ddar [-]c [-f] [server:]archive [-N member-name] < member
ddar [-]c [-f] [server:]archive [-N member-name] member
ddar [-]c [-f] archive server:member [-N member-name]
ddar [-]c [-f] archive server:!cmd [-N member-name]
ddar [-]x [*options*] [-f] archive > member
ddar [-]x [*options*] [-f] archive member-name > member
ddar [-]t [-f] archive
ddar [-]d [-f] archive member-name [member-name...]
ddar --fsck [-f] archive

Description

ddar creates, modifies and extracts from archives. An archive is a directory holding a collection of other files in a structure that makes it possible to retrieve the original individual files (called members of the archive).

Members of an archive are stored efficiently: ddar finds regions that are the same across members and stores them only once.

ddar follows the Unix philosophy, dealing only with the task of de-duplication. It is intended to be used in conjunction with other standard Unix tools such as tar (1) and gzip (1). For example, it can be used to augment the standard tar | gzip -style pipeline, eliminating the need and complexities of incremental and differential backups by making the storage of multiple large full backups equally feasible.

ddar may be considered a companion to tarsnap (1). Is useful for local archive storage in cases where the available upload bandwidth is too low to deal with the size of the archive that needs to be created.

Mandatory arguments

One operation argument from c, x, t, d or --fsck must be specified. As specifying an operation is mandatory, the prefix - is optional if the operation is the first argument.

<dt>c</dt>
<dd>Add member to <em>archive</em>. If <em>archive</em> does not exist, it will be created. If <em>member-name</em> is not specified, then the base name of member is used; if not specified, then a suitable name is automatically generated. If <em>server</em> is specified (where permitted), then the archive or member is accessed from <em>server</em> as appropriate, with de-duplication taking place at the source to save bandwidth. If <em>cmd</em> is specified, then it is called to generate the member data on its stdout. All remote options require ssh (1) to be available and for ddar to also be installed at the remote end.</dd>

<dt>x</dt>
<dd>Extract <em>member-name</em> from <em>archive</em>. The member is written to stdout. If stdout is a terminal, then ddar will refuse unless <em>--force-stdout</em> is used. If <em>member-name</em> is not specified, then the last member to be added to the archive is used (based on addition order, not time).</dd>

<dt>t</dt>
<dd>List the members of <em>archive</em>.</dd>

<dt>d</dt>
<dd>Delete <em>member-name</em> from <em>archive</em>.</dd>

<dt>--fsck</dt>
<dd>Check <em>archive</em> for internal consistency. This also verifies that all members match the checksum computed when they were first stored. This operation is extremely time consuming, requiring two passes over the data.</dd>

<dt>[-f] <em>archive</em></dt>
<dd>Specify the archive upon which the operation will take effect. *-f* is optional here as all operations require <em>archive</em> and so this is not really an option. It exists for attempted option parity with <strong>tar (1)</strong> for ease of use.</dd>

Options

-N member-name
(create/append only) Override the member-name to be used when adding a member to an archive.
<dt>--force-stdout</dt>
<dd>(extract only) Force ddar to extract a member to stdout even when stdout is a terminal.</dd>

Notes

Unlike tarsnap (1), tar (1) includes atimes by default, causing many changes to its output, thus making ddar ineffective at de-duplicating its archives. On Linux you can use the relatime option to help avoid this, which has been the default since 2.6.30.

ddar will run more efficiently if the archive is stored on a filesystem that does not degrade as directories become large, such as ext3 or ext4 with dir_index enabled.

Normally, compression and encryption algorithms have the property that a single bit flip in the input cause a complete apparant change in the output. This works contrary to de-duplication, which looks for regions that are the same to optimise storage. To avoid this issue, use gzip --rsyncable for compression and if encryption is required then encrypt the entire archive, for example with cryptsetup, encdrive, or BitLocker.

Examples

Back up your home directory to an external disk daily:

tar c ~ | gzip --rsyncable | ddar cf /mnt/external_disk/home_backup

Back up your home directory to a remote server daily (using ssh, with ddar also installed remotely):

tar c ~ | gzip --rsyncable | ddar cf server:home_backup

Back up your home directory on a remote server to a local machine daily:

ddar cf server_home_backup server:\\!"'tar c ~ | gzip --rsyncable'"

Restore your home directory from a local disk after a disaster:

ddar xf /mnt/external_disk/home_backup | tar xzC/

Restore your home directory from a remote server after a disaster:

ssh server ddar xf home_backup | tar xzC/

Security

ddar relies on system security and does not encrypt data. For remote archives, it relies on ssh (1) to encrypt communications. You may wish to keep your ddar archives in a directory to which only you have access, adjust the permission bits on the ddar archive directories themselves and/or encrypt the entire filesystem on which your ddar archives are stored. For a cryptographically secure, hosted de-duplicating backup solution, try tarsnap (1) http://www.tarsnap.com/.

Bugs

ddar currently assumes that SHA-256 never collides. Although no collisions are known to date (2011), collisions must nevertheless exist due to the pigeonhole principle. However, the probability of such a collision occurring is so low that it is considered far more likely that the computer will perform a random computation error, so save for a SHA-256 weakness being found ddar should be safe from data corruption in this manner. A future version of ddar may enhance the archive format to automatically detect and accomodate collisions, although checking for collisions would impact performance.

As an extra precaution, ddar computes the SHA-256 of the entire member on insertion into an archive, and verifies this on extraction. However, this means that a failure would only be detected on extraction and not on insertion.

See Also

ar (1), tar (1), gzip (1), tarsnap (1).

Author

ddar and this man page were both written by Robie Basak.

Credits

ddar is sponsored by Synctus http://www.synctus.com/, a multi-master, conflict-free, real-time file replication system.

ddar was inspired by tarsnap http://www.tarsnap.com/, a cloud-based de-duplicating backup tool.

Copyright

Copyright 2010-2011 True Blue Logic Ltd.

This program is free software: you can redistribute it and/or modify it under the terms of version 3 of the GNU General Public License as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

This manpage is in part derived from ar (1); thus the following notices apply:

Copyright (c) 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.