Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Backup Sync Implementation #115

Closed
wants to merge 1 commit into from
Closed

First Backup Sync Implementation #115

wants to merge 1 commit into from

Conversation

fboucquez
Copy link
Owner

@fboucquez fboucquez commented Jan 14, 2021

Added --backupSync to config command. This command downloads a zip file of the mongo and rocksdb databases from a known trusted HTTP location (in this case S3 but it could be somewhere else).

The zip file is locally cached. --backupSync only redownloads if there is a new remove backup (by checking local and remote file sizes).

The first example of the backup is:

https://symbol-bootstrap.s3-eu-west-1.amazonaws.com/testnet/testnet-partial-backup.zip

@Jaguar0625 @gimer could you validate the zip file checking if there is any node related information that should not be there?

Note that I need both mongo and rockdb backups as mongo cannot be recreated from the rockdb database.
https://github.com/nemtech/catapult-server/issues/123

This backup only has 25k height of testnet. Once bootstrap start, it starts synchronizing from that height.
Eventually, we will have a full-size backup of the current network. The current testnet db size is 35GBs, too large for this initial testing.

To create a backup (to be automatized by NGL):

  1. Have a testnet fully sync running
  2. Stop the server/bootstrap
  3. Run a script/command to create the zip file
  4. Push the ZIP into S3 keeping the same s3 file path
  5. Start bootstrap

Then, when a new bootstrap node starts with --backupSync, it will download this backup starting the node from the backup height and not from 0.

A very basic script to create a backup can be found in the backup-sync-testnet-backup.sh

I'll convert this script to a bootstrap command for easier use.

Fixes #70

@fboucquez fboucquez requested review from Wayonb and rg911 January 14, 2021 03:33
Copy link
Contributor

@segfaultxavi segfaultxavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammar fix

src/commands/config.ts Outdated Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
docs/start.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@rg911 rg911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the state of the backup needs validated?

presets/testnet/network.yml Outdated Show resolved Hide resolved
@fboucquez fboucquez changed the title First Fast Sync Implementation First Backup Sync Implementation Jan 14, 2021
@fboucquez
Copy link
Owner Author

fboucquez commented Jan 17, 2021

There are 2 possibilities of the backup-sync feature:

Full Sync Backup

RocksDB and Mongo databases are backed up and restored.

Pros:

  • It works!
  • There is no need to call the server's recovery
  • Once downloaded faster to restore, less CPU intensive.

Const:

  • Backup zip file is around 50% bigger.
  • Slower to upload and download
  • It requires more disk space for backup management.

I've added the symbol-bootstrap backup that generates a backup zip from node after stoping it. Then the zip can be uploaded to the public s3 location. Backup, at least for testnet/public net should only be called by NGL.

Numbers:

  • 35 GB the size of testenet target folder
  • 21 GB the size of the Full Sync Backup is https://symbol-bootstrap.s3-eu-west-1.amazonaws.com/testnet/backup.zip
  • 10 minutes to download the current full backup (s3 to EC2 box in Ireland, hours if I want to download down here)
  • 4 minutes to unzip the RockDB backup
  • 3 minutes to unzip the Mongo database.
  • 20 minutes for a full resync!!!

My restored node http://54.155.40.57:3000/chain/info is full synced, up and running

If you want to use the --backupSync feature. Your node would need at least 80 GB for the current testnet size. 21GB to download the zip file, 35 GB for the generated target folder, and some space just in case. Note that once the node's restored, the backup zip file could be removed to reclaim the 21GB space for the future node use.

Partial Sync Backup

Only RocksDB database is backed up and restored. Mongo is regenerated locally from the restored RocksDB

Pros:

  • Smaller zip file
  • Faster to upload and download.
  • Less space required for backup management

Const:

  • It doesn't work yet!!!
  • It requires to call the server recovery command, I'm guessing it can take time and CPU for full restore/recovery of the mongo DB

The question would be if we should keep trying to make Partial Sync Backup to work or if the Full Sync Backup is good enough / the best, at least for a first version.

CHANGELOG.md Outdated Show resolved Hide resolved
README.md Outdated
@@ -264,6 +264,7 @@ General users should install this tool like any other node module.
<!-- commands -->
# Command Topics

* [`symbol-bootstrap backup`](docs/backup.md) - The command backups the Mongo and RocksDb data folder into a Zip file that can be used for --backupSync feature. Bootstrap compose services must be stopped before calling this command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [`symbol-bootstrap backup`](docs/backup.md) - The command backups the Mongo and RocksDb data folder into a Zip file that can be used for --backupSync feature. Bootstrap compose services must be stopped before calling this command.
* [`symbol-bootstrap backup`](docs/backup.md) - The command backs up the Mongo and RocksDb data folder into a Zip file that can then be used by the `--backupSync` feature. Bootstrap compose services must be stopped before calling this command.

docs/backup.md Outdated
`symbol-bootstrap backup`
=========================

The command backups the Mongo and RocksDb data folder into a Zip file that can be used for --backupSync feature. Bootstrap compose services must be stopped before calling this command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The command backups the Mongo and RocksDb data folder into a Zip file that can be used for --backupSync feature. Bootstrap compose services must be stopped before calling this command.
The command backs up the Mongo and RocksDb data folder into a Zip file that can then be used by the `--backupSync` feature. Bootstrap compose services must be stopped before calling this command.

docs/backup.md Outdated

The command backups the Mongo and RocksDb data folder into a Zip file that can be used for --backupSync feature. Bootstrap compose services must be stopped before calling this command.

Note: this command is designed for NGL to be used when running public main or public test networks. It's not backing up any node specific information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? Couldn't it be used by a private network to speed up sync?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be used. Atm we don't have a well-documented process of a user joining a private network with bootstrap. Pieces are there, I just need to put them together. Main issue is how to easily share a private network preset and seed in new private nodes

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a warning for the current users that may want to use it on testnet backups. This backup is not backing a node (like's node configuration or keys), it's just the data that can be shared between multiples nodes

docs/backup.md Outdated Show resolved Hide resolved
docs/config.md Show resolved Hide resolved
src/commands/backup.ts Outdated Show resolved Hide resolved
src/commands/backup.ts Outdated Show resolved Hide resolved
src/service/BackupSyncService.ts Outdated Show resolved Hide resolved
src/service/BackupSyncService.ts Outdated Show resolved Hide resolved
Added --backup-sync to config command. This command downloads a known location via s3.

First example back is

https://symbol-bootstrap.s3-eu-west-1.amazonaws.com/testnet/testnet-partial-backup.zip
@fboucquez
Copy link
Owner Author

Superseded by #151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants