-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/ timeline id in clone section for Spilo #760
base: master
Are you sure you want to change the base?
Feature/ timeline id in clone section for Spilo #760
Conversation
@@ -20,25 +20,45 @@ | |||
|
|||
def read_configuration(): | |||
parser = argparse.ArgumentParser(description="Script to clone from S3 with support for point-in-time-recovery") | |||
parser.add_argument('--scope', required=True, help='target cluster name') | |||
parser.add_argument('--datadir', required=True, help='target cluster postgres data directory') | |||
parser.add_argument('--scope', required=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument('--scope', required=True, | |
parser.add_argument('--scope', required=True, |
parser.add_argument('--datadir', required=True, help='target cluster postgres data directory') | ||
parser.add_argument('--scope', required=True, | ||
help='target cluster name') | ||
parser.add_argument('--datadir', required=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument('--datadir', required=True, | |
parser.add_argument('--datadir', required=True, |
parser.add_argument('--recovery-target-time', | ||
help='the timestamp up to which recovery will proceed (including time zone)', | ||
dest='recovery_target_time_string') | ||
parser.add_argument('--dry-run', action='store_true', help='find a matching backup and build the wal-e ' | ||
parser.add_argument('--dry-run', action='store_true', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument('--dry-run', action='store_true', | |
parser.add_argument('--dry-run', action='store_true', |
parser.add_argument('--recovery-target-timeline', | ||
help='the timeline up to which recovery will proceed. Leave empty for latest.', | ||
dest='recovery_target_timeline', | ||
type=lambda timeline_id: int(timeline_id,16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type=lambda timeline_id: int(timeline_id,16)) | |
type=lambda timeline_id: int(timeline_id, 16)) |
if args.recovery_target_time_string: | ||
recovery_target_time = parse(args.recovery_target_time_string) | ||
if recovery_target_time.tzinfo is None: | ||
raise Exception("recovery target time must contain a timezone") | ||
else: | ||
recovery_target_time = None | ||
|
||
return options(args.scope, args.datadir, recovery_target_time, args.dry_run) | ||
if args.recovery_target_timeline == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if args.recovery_target_timeline == None: | |
if args.recovery_target_timeline is None: |
|
||
def get_latest_timeline(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_latest_timeline(): | |
def get_latest_timeline(): |
for backup in backup_list: | ||
if int(backup["name"][5:13], 16) > latest_timeline_id: | ||
latest_timeline_id = int(backup["name"][5:13], 16) | ||
return latest_timeline_id | ||
|
||
def build_wale_command(command, datadir=None, backup=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def build_wale_command(command, datadir=None, backup=None): | |
def build_wale_command(command, datadir=None, backup=None): |
def get_latest_timeline(): | ||
env = os.environ.copy() | ||
backup_list = list_backups(env) | ||
latest_timeline_id = int("00000000",16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest_timeline_id = int("00000000",16) | |
latest_timeline_id = int("00000000", 16) |
@Sudeepta92 let's step back and explain what problem are you trying to solve? The thing is that recovery_target_timeline has only very narrow use: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES
This is exactly what postgres-operator avoids: every time you deploy a new cluster it gets a new and unique archive location. That is, we will never have the situation when two recovery attempts are performed with the same archive location, what makes this feature effectively useless... |
@CyberDem0n I'm not sure if I understand you correctly. It's not always true that a unique archive location is created for a database. This can be because of two reasons. First, you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID. Second, you can do an "in place restore" (described here) This will lead to a new cluster UID but in the case above where the UID is not represented in the archive location, this will lead to the same problem. On the Operator repository, there is also a open PR to add the needed configuration to the postgresql manifest, they seem to be willing to implement this. (See here) When having UIDs in the backup location path, this makes it nearly impossible to automate a restore process and make backups selectable over some kind of webinterface, cause you probably will not have a history of all UIDs that a database might had in the past, therefore you cannot make the backups selectable anymore using an automation. I think this is the reason why one would disable the prefix / suffix stuff. Just my thoughts. Philip |
Well, if you shoot your own foot, it hurts badly. Just don't do it. Two different clusters must not be writing to the same archive location. |
They are not different clusters, just a restored one which takes place from the original one (inplace restore). And second, I don't understand why you use an offensive term like "shoot in the foot", it's the concept of timelines in Postgres, so not a shady workaround. Philip |
If the old cluster is still up and running and the new cluster is forked from the old one (with the new timeline), they are two different clusters.
If you are unlucky enough, both clusters will end up on the same timeline and comparable LSNs, and there will be clashes in WAL file names. Even if the old cluster is down, there will be a mess in the archive after PITR. I don't understand the offensiveness of the "shoot your own foot". It is a well-known idiom and in this context, it means that we shouldn't be creating problems in the first place in order to make a significant effort to solve these problems later. |
To modify Spilo image to process timeline id in the clone section.