Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a new pgcopydb internal message: ENDPOS. #321

Merged
merged 3 commits into from
Jun 15, 2023

Conversation

dimitri
Copy link
Owner

@dimitri dimitri commented Jun 14, 2023

When reaching endpos in the middle of a transaction we should stop processing the stream, transform what we have, and apply up to the last full transaction, ignoring the partial one at the end.

Because of the streaming approach taken by pgcopydb, the next best thing we can do is to ROLLBACK the last partial transaction while still being able to recognise that --endpos has been reached.

Previously to this patch, pgcopydb would use the endpos LSN to forge a keepalive message, meaning that we would then update our replay_lsn and replication origin tracking to a position in the middle of a transaction we did not replay.

In case when the user then restart pgcopydb with a new endpos then we would skip one transaction which was written off as already replayed. This patch fixes this situation.

When reaching endpos in the middle of a transaction we should stop
processing the stream, transform what we have, and apply up to the last full
transaction, ignoring the partial one at the end.

Because of the streaming approach taken by pgcopydb, the next best thing we
can do is to ROLLBACK the last partial transaction while still being able to
recognise that --endpos has been reached.

Previously to this patch, pgcopydb would use the endpos LSN to forge a
keepalive message, meaning that we would then update our replay_lsn and
replication origin tracking to a position in the middle of a transaction we
did not replay.

In case when the user then restart pgcopydb with a new endpos then we would
skip one transaction which was written off as already replayed. This patch
fixes this situation.
It should be possible to reset endpos to a later point in time and resume
replaying changes up to the new point in time, even when the previously set
endpos did fell in the middle of a transaction.
When comparing startpos with a previously written JSON file we need to avoid
writing the same LSN twice, but that computation/guard doesn't apply when
starting CDC the first time from the replication slot: then we want to write
starting at the startpos, not skip it.
@dimitri dimitri merged commit b482276 into main Jun 15, 2023
@dimitri dimitri deleted the fix/endpos-within-transaction branch June 15, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant