monthly offsite backup to Amazon glacier #3

jdimatteo · 2013-10-07T17:13:10Z

probably try using duplicity
see some of the notes on Setup Nightly Backup Job #1
document the configuration on https://github.com/BradnerLab/SystemAdmin/wiki

jdimatteo · 2013-11-06T02:05:35Z

It would probably make sense to plan on transferring the data to Amazon cloud in such a way that it is available both for backups (e.g. in Amazon Glacier) and also for computation (e.g. Amazon Elastic Map Reduce).

charlesylin · 2013-11-06T02:09:49Z

Agreed. I will get the ball rolling.

-Charles

On Nov 5, 2013, at 9:05 PM, jdimatteo [email protected] wrote:

It would probably make sense to plan on transferring the data to Amazon cloud in such a way that it is available both for backups (e.g. in Amazon Glacier) and also for computation (e.g. Amazon Elastic Map Reduce).

—
Reply to this email directly or view it on GitHub.

jdimatteo · 2014-10-30T01:45:04Z

We might want to consider duply, which is supposed to ease duplicity use.

This might be a useful overview: http://blog.phusion.nl/2013/11/11/duplicity-s3-easy-cheap-encrypted-automated-full-disk-backups-for-your-servers/

After we get duplicity working and see how well it works, we might want to consider making that our sole backup solution so we don't have to maintain both.

charlesylin · 2014-10-30T01:48:31Z

This sounds like an awesome idea.

Do you want me to set up an S3 bucket?

-Charles

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Wed, Oct 29, 2014 at 9:45 PM, John DiMatteo [email protected]
wrote:

We might want to consider duply http://duply.net, which is supposed to
ease duplicity use.

This might be a useful overview:
http://blog.phusion.nl/2013/11/11/duplicity-s3-easy-cheap-encrypted-automated-full-disk-backups-for-your-servers/

After we get duplicity working and see how well it works, we might want to
consider making that our sole backup solution so we don't have to maintain
both.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

jdimatteo · 2014-10-30T02:53:31Z

@bradnerComputation : would you like to create a bradnerlab_backup bucket?

I'm testing out duplicity now. I just installed duplicity/duply, and I think that was successful but it seems like the unrelated libpam-systemd package is in a broken state. Have you seen this error on TOD before?

jdm@tod:~$ sudo apt-get install duplicity duply python-boto
...
Setting up libpam-systemd:amd64 (204-5ubuntu20.6) ...
Can't locate Debconf/Client/ConfModule.pm in @INC (you may need to install the Debconf::Client::ConfModule module) (@INC contains: /usr/local/lib/perl5/site_perl/5.18.1/x86_64-linux /usr/local/lib/perl5/site_perl/5.18.1 /usr/local/lib/perl5/5.18.1/x86_64-linux /usr/local/lib/perl5/5.18.1 .) at /usr/sbin/pam-auth-update line 28.
BEGIN failed--compilation aborted at /usr/sbin/pam-auth-update line 28.
dpkg: error processing package libpam-systemd:amd64 (--configure):
 subprocess installed post-installation script returned error exit status 2
Setting up python-lockfile (1:0.8-2ubuntu2) ...
Setting up duplicity (0.6.23-1ubuntu4.1) ...
Setting up duply (1.5.10-1) ...
Setting up python-boto (2.20.1-2ubuntu2) ...
Errors were encountered while processing:
 libpam-systemd:amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)
jdm@tod:~$

I saw this related thread. I guess we can just ignore it for now, since it doesn't seem fixed in Ubuntu Trusty yet.

jdimatteo · 2014-11-05T10:03:23Z

@bradnerComputation I finally got duplicity running with a small sample.

I don't think a full backup to ec2 of something as big as tod:/grail is feasible. I just measured tod's upload bandwidth to be about 30 megabits/second. Grail is about 12TB, and at that rate it would take about 37 days of uninterrupted upload. Your IT department might not appreciate you using all this upload bandwidth for 37 days either.

We might want to consider instead copying /Grail to 6 or 7 of internal 2TB disks and mailing them to Amazon via their import program (http://calculator.s3.amazonaws.com/index.html?s=importexport). You could probably buy the drives for about $500 and then the Amazon upload cost would be about $200 I think, so around $700 (and they would return the drives so you could use them for something else). This would allow us to get the initial backup done, then we could do incremental backups over TOD's internet connection. This seems like significant hassle -- are we sure the Dana Farber IT doesn't have some other offsite solution (e.g. maybe an internal network to storage in another building that could effectively count as offsite)? If going forward we anticipate processing bams on EC2, then maybe this hassle is worth it.

Please let me know your thoughts. I think last time we discussed this you suggested just trying the backup over the internet and seeing how it goes -- let me know if you'd like me to just start the 30 day transfer.

I successfully followed the instructions at http://blog.phusion.nl/2013/11/11/duplicity-s3-easy-cheap-encrypted-automated-full-disk-backups-for-your-servers/ after figuring out the following:

The TARGET_USER and TARGET_PASS correspond to my jdimatteo IAM "Access Key ID" and "Secret Access Key"
the target should be specified in the following format:

TARGET='s3://s3.amazonaws.com/bradnerlab_private/jdimatteo/gunk/backup-test/'

charlesylin · 2014-11-05T15:48:19Z

I think I can get 40TB of storage from them. would that be sufficient?

-Charles

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Wed, Nov 5, 2014 at 5:03 AM, John DiMatteo [email protected]
wrote:

@bradnerComputation https://github.com/bradnerComputation I finally got
duplicity running with a small sample.

I don't think a full backup to ec2 of something as big as tod:/grail is
feasible. I just measured tod's upload bandwidth to be about 30
megabits/second. Grail is about 12TB, and at that rate it would take about
37 days of uninterrupted upload. Your IT department might not appreciate
you using all this upload bandwidth for 37 days either.

We might want to consider instead copying /Grail to 6 or 7 of internal 2TB
disks and mailing them to Amazon via their import program (
http://calculator.s3.amazonaws.com/index.html?s=importexport). You could
probably buy the drives for about $500 and then the Amazon upload cost
would be about $200 I think, so around $700 (and they would return the
drives so you could use them for something else). This would allow us to
get the initial backup done, then we could do incremental backups over
TOD's internet connection. This seems like significant hassle -- are we
sure the Dana Farber IT doesn't have some other offsite solution (e.g.
maybe an internal network to storage in another building that could
effectively count as offsite)? If going forward we anticipate processing
bams on EC2, then maybe this hassle is worth it.

Please let me know your thoughts. I think last time we discussed this you
suggested just trying the backup over the internet and seeing how it goes

-- let me know if you'd like me to just start the 30 day transfer.

I successfully followed the instructions at
http://blog.phusion.nl/2013/11/11/duplicity-s3-easy-cheap-encrypted-automated-full-disk-backups-for-your-servers/
after figuring out the following:

The TARGET_USER and TARGET_PASS correspond to my jdimatteo IAM
"Access Key ID" and "Secret Access Key"

the target should be specified in the following format:

TARGET='s3://s3.amazonaws.com/bradnerlab_private/jdimatteo/gunk/backup-test/'

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

jdimatteo · 2014-11-06T06:23:59Z

40TB sounds good. You talking about 40TB of storage at another building at Dana Farber, right? Have you tested the upload bandwith to that 40TB?

jdimatteo mentioned this issue Oct 7, 2013

Setup Nightly Backup Job #1

Closed

jdimatteo mentioned this issue Nov 5, 2014

tod /crusader backup filesystem full BradnerLab/pipeline#41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monthly offsite backup to Amazon glacier #3

monthly offsite backup to Amazon glacier #3

jdimatteo commented Oct 7, 2013

jdimatteo commented Nov 6, 2013

charlesylin commented Nov 6, 2013

jdimatteo commented Oct 30, 2014

charlesylin commented Oct 30, 2014

jdimatteo commented Oct 30, 2014

jdimatteo commented Nov 5, 2014

charlesylin commented Nov 5, 2014

-- let me know if you'd like me to just start the 30 day transfer.

jdimatteo commented Nov 6, 2014

monthly offsite backup to Amazon glacier #3

monthly offsite backup to Amazon glacier #3

Comments

jdimatteo commented Oct 7, 2013

jdimatteo commented Nov 6, 2013

charlesylin commented Nov 6, 2013

jdimatteo commented Oct 30, 2014

charlesylin commented Oct 30, 2014

jdimatteo commented Oct 30, 2014

jdimatteo commented Nov 5, 2014

charlesylin commented Nov 5, 2014

-- let me know if you'd like me to just start the 30 day transfer.

jdimatteo commented Nov 6, 2014