Introduce a new pgcopydb internal message: ENDPOS. #321

dimitri · 2023-06-14T11:41:17Z

When reaching endpos in the middle of a transaction we should stop processing the stream, transform what we have, and apply up to the last full transaction, ignoring the partial one at the end.

Because of the streaming approach taken by pgcopydb, the next best thing we can do is to ROLLBACK the last partial transaction while still being able to recognise that --endpos has been reached.

Previously to this patch, pgcopydb would use the endpos LSN to forge a keepalive message, meaning that we would then update our replay_lsn and replication origin tracking to a position in the middle of a transaction we did not replay.

In case when the user then restart pgcopydb with a new endpos then we would skip one transaction which was written off as already replayed. This patch fixes this situation.

When reaching endpos in the middle of a transaction we should stop processing the stream, transform what we have, and apply up to the last full transaction, ignoring the partial one at the end. Because of the streaming approach taken by pgcopydb, the next best thing we can do is to ROLLBACK the last partial transaction while still being able to recognise that --endpos has been reached. Previously to this patch, pgcopydb would use the endpos LSN to forge a keepalive message, meaning that we would then update our replay_lsn and replication origin tracking to a position in the middle of a transaction we did not replay. In case when the user then restart pgcopydb with a new endpos then we would skip one transaction which was written off as already replayed. This patch fixes this situation.

It should be possible to reset endpos to a later point in time and resume replaying changes up to the new point in time, even when the previously set endpos did fell in the middle of a transaction.

When comparing startpos with a previously written JSON file we need to avoid writing the same LSN twice, but that computation/guard doesn't apply when starting CDC the first time from the replication slot: then we want to write starting at the startpos, not skip it.

dimitri added 3 commits June 14, 2023 13:36

Fix resuming CDC after chaging endpos again.

c33043c

It should be possible to reset endpos to a later point in time and resume replaying changes up to the new point in time, even when the previously set endpos did fell in the middle of a transaction.

dimitri merged commit b482276 into main Jun 15, 2023

dimitri deleted the fix/endpos-within-transaction branch June 15, 2023 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a new pgcopydb internal message: ENDPOS. #321

Introduce a new pgcopydb internal message: ENDPOS. #321

dimitri commented Jun 14, 2023

Introduce a new pgcopydb internal message: ENDPOS. #321

Introduce a new pgcopydb internal message: ENDPOS. #321

Conversation

dimitri commented Jun 14, 2023