-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctlog: write pending tiles to protect against concurrent sequencers #1
base: main
Are you sure you want to change the base?
Conversation
c770894
to
5e3c551
Compare
If two sequencers (somehow) end up running at the same time, they could scribble over each other's uploads to the backend storage. Protect against that special case by writing the tiles in two steps. First, all tiles are staged in a unique tar file (keyed by the hash of the new tree), and only written to their final location after a successful lock update. In the normal case, this will incur one extra upload and delete, adding one write operation, and double the amount of data written to backend storage. To make sure that the (rare) recovery code path is well-tested, use the same code path in both the recovery and normal case. A special IAM policy could allow deletes only of pending files.
5e3c551
to
b89dbd8
Compare
if err := uploadPending(ctx, l.c, checkpoint, tree.Tree); err != nil { | ||
// Return an error so we don't produce SCTs that, although committed, | ||
// aren't yet part of a publicly visible tree. | ||
return fmtErrorf("couldn't copy pending files to object storage: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Sorry for resurrecting this so much later. I am finally doing a pass of large changes to Sunlight, and decided to go for a variant of this, thank you so much for bringing up the issue and proposing this solution.
I think this needs to be a fatal error, right? If we just fail the round and continue, the next round will make progress, but there will be missing tiles in the tree. (They will be in staging because we won't reach the Delete, so it will be recoverable, but still.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, unsure how TestRecoverLog is passing, since it checks that Sequence returns an error, which only happens for fatal errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Sorry for resurrecting this so much later. I am finally doing a pass of large changes to Sunlight, and decided to go for a variant of this, thank you so much for bringing up the issue and proposing this solution.
That's exciting to hear! No worries at all. Let me know if you would like me to update this PR or if you prefer your own code.
I think this needs to be a fatal error, right? If we just fail the round and continue, the next round will make progress, but there will be missing tiles in the tree. (They will be in staging because we won't reach the Delete, so it will be recoverable, but still.)
That sounds right to me. It would be good to spell out exactly what the protocol guarantees at each write to either the lock or storage backends. With a fatal error here you get the guarantee that every lock update is followed by a successful uploadPending call, and I agree that without it's not clear that all tiles will be uploaded.
Also, unsure how TestRecoverLog is passing, since it checks that Sequence returns an error, which only happens for fatal errors.
It's been a while and so I don't remember the details, but I think TestRecoverLog passes because uploadPending returns a fatal error in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this in FiloSottile@1ed5201. If you have time to review it, that would be amazing.
Based on a design by @jellevandenhooff at jellevandenhooff#1. Fixes #11 Co-authored-by: Jelle van den Hooff <jelle@vandenhooff.name>
Based on a design by @jellevandenhooff at jellevandenhooff#1. Fixes #11 Co-authored-by: Jelle van den Hooff <jelle@vandenhooff.name>
Based on a design by @jellevandenhooff at jellevandenhooff#1. Fixes #11 Co-authored-by: Jelle van den Hooff <jelle@vandenhooff.name>
If two sequencers (somehow) end up running at the same time, they could scribble over each other's uploads to the backend storage.
Protect against that special case by writing the tiles in two steps. First, all tiles are staged in a unique tar file (keyed by the hash of the new tree), and only written to their final location after a successful lock update.
In the normal case, this will incur one extra upload and delete, adding one write operation, and double the amount of data written to backend storage.
To make sure that the (rare) recovery code path is well-tested, use the same code path in both the recovery and normal case.
A special IAM policy could allow deletes only of pending files.