Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ timeline id in clone section for Spilo #760

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Sudeepta92
Copy link
Contributor

To modify Spilo image to process timeline id in the clone section.

@Sudeepta92 Sudeepta92 marked this pull request as draft August 2, 2022 08:47
@Sudeepta92 Sudeepta92 marked this pull request as ready for review August 2, 2022 11:36
@@ -20,25 +20,45 @@

def read_configuration():
parser = argparse.ArgumentParser(description="Script to clone from S3 with support for point-in-time-recovery")
parser.add_argument('--scope', required=True, help='target cluster name')
parser.add_argument('--datadir', required=True, help='target cluster postgres data directory')
parser.add_argument('--scope', required=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument('--scope', required=True,
parser.add_argument('--scope', required=True,

parser.add_argument('--datadir', required=True, help='target cluster postgres data directory')
parser.add_argument('--scope', required=True,
help='target cluster name')
parser.add_argument('--datadir', required=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument('--datadir', required=True,
parser.add_argument('--datadir', required=True,

parser.add_argument('--recovery-target-time',
help='the timestamp up to which recovery will proceed (including time zone)',
dest='recovery_target_time_string')
parser.add_argument('--dry-run', action='store_true', help='find a matching backup and build the wal-e '
parser.add_argument('--dry-run', action='store_true',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument('--dry-run', action='store_true',
parser.add_argument('--dry-run', action='store_true',

parser.add_argument('--recovery-target-timeline',
help='the timeline up to which recovery will proceed. Leave empty for latest.',
dest='recovery_target_timeline',
type=lambda timeline_id: int(timeline_id,16))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type=lambda timeline_id: int(timeline_id,16))
type=lambda timeline_id: int(timeline_id, 16))

if args.recovery_target_time_string:
recovery_target_time = parse(args.recovery_target_time_string)
if recovery_target_time.tzinfo is None:
raise Exception("recovery target time must contain a timezone")
else:
recovery_target_time = None

return options(args.scope, args.datadir, recovery_target_time, args.dry_run)
if args.recovery_target_timeline == None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if args.recovery_target_timeline == None:
if args.recovery_target_timeline is None:


def get_latest_timeline():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_latest_timeline():
def get_latest_timeline():

for backup in backup_list:
if int(backup["name"][5:13], 16) > latest_timeline_id:
latest_timeline_id = int(backup["name"][5:13], 16)
return latest_timeline_id

def build_wale_command(command, datadir=None, backup=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def build_wale_command(command, datadir=None, backup=None):
def build_wale_command(command, datadir=None, backup=None):

def get_latest_timeline():
env = os.environ.copy()
backup_list = list_backups(env)
latest_timeline_id = int("00000000",16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
latest_timeline_id = int("00000000",16)
latest_timeline_id = int("00000000", 16)

@CyberDem0n
Copy link
Contributor

@Sudeepta92 let's step back and explain what problem are you trying to solve?

The thing is that recovery_target_timeline has only very narrow use: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES

The default behavior of recovery is to recover to the latest timeline found in the archive. If you wish to recover to the timeline that was current when the base backup was taken or into a specific child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify current or the target timeline ID in recovery_target_timeline. You cannot recover into timelines that branched off earlier than the base backup.

This is exactly what postgres-operator avoids: every time you deploy a new cluster it gets a new and unique archive location. That is, we will never have the situation when two recovery attempts are performed with the same archive location, what makes this feature effectively useless...

@thedatabaseme
Copy link
Contributor

This is exactly what postgres-operator avoids: every time you deploy a new cluster it gets a new and unique archive location. That is, we will never have the situation when two recovery attempts are performed with the same archive location, what makes this feature effectively useless...

@CyberDem0n I'm not sure if I understand you correctly. It's not always true that a unique archive location is created for a database. This can be because of two reasons. First, you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID. Second, you can do an "in place restore" (described here) This will lead to a new cluster UID but in the case above where the UID is not represented in the archive location, this will lead to the same problem.

On the Operator repository, there is also a open PR to add the needed configuration to the postgresql manifest, they seem to be willing to implement this. (See here)

When having UIDs in the backup location path, this makes it nearly impossible to automate a restore process and make backups selectable over some kind of webinterface, cause you probably will not have a history of all UIDs that a database might had in the past, therefore you cannot make the backups selectable anymore using an automation. I think this is the reason why one would disable the prefix / suffix stuff.

Just my thoughts.

Philip

@CyberDem0n
Copy link
Contributor

you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID.

Well, if you shoot your own foot, it hurts badly. Just don't do it. Two different clusters must not be writing to the same archive location.

@thedatabaseme
Copy link
Contributor

you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID.

Well, if you shoot your own foot, it hurts badly. Just don't do it. Two different clusters must not be writing to the same archive location.

They are not different clusters, just a restored one which takes place from the original one (inplace restore). And second, I don't understand why you use an offensive term like "shoot in the foot", it's the concept of timelines in Postgres, so not a shady workaround.

Philip

@CyberDem0n
Copy link
Contributor

If the old cluster is still up and running and the new cluster is forked from the old one (with the new timeline), they are two different clusters.
They just have two common things:

  1. cluster system identifier
  2. and the ancestor

If you are unlucky enough, both clusters will end up on the same timeline and comparable LSNs, and there will be clashes in WAL file names.

Even if the old cluster is down, there will be a mess in the archive after PITR.

I don't understand the offensiveness of the "shoot your own foot". It is a well-known idiom and in this context, it means that we shouldn't be creating problems in the first place in order to make a significant effort to solve these problems later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants