-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logical Replication transactions applied multiple times #6977
Comments
To achieve exactly once semantics for incoming logical replication postgres stamps each commit record with a 10-byte tuple of That way postgres always has knowledge of remote lsn of last applied record and in case of replication reconnect it can ask to continue replication from a proper point. For context: postgres replication is somewhat pull model -- it is on replica to reconnect and ask for data starting with some lsn. How can we support it in Neon:
|
Formally speaking it is not precisely true. Commit record may include origin_lsn&origin_timestamp (16 bytes totally) if XACT_XINFO_HAS_HAS_ORIGIN but is set in xinfo. But it is not principle. |
Well, presence of WAL record doesn't exclude necessity to have key with which this WAL record will be associated. Please notice that our storage is key-value storage. For example now commit records are associated with CLOG pages.
No problem with it. I have already implemented this part.
Do you mean that we should add REPL_ORIGIN key and associate commit records with it? It means that we need to write commit in two places: CLOG and repl. origin.
There is no any efficient way to locate N last commits. But I do not understand what do we need to load N last commits at all. If we introduce REPL_ORIGIN(id) key, then we can associate repl origin updates with this page. And we need to load just one value (preceding basebackup LSN). The question is how to find all origins. In Postgres each slot has its own origin and we just iterate through all slots. PS knows nothing about replication slots. But it has AUX file with slot state. In principle it can get this files, parse it and so get array of origins. Alternatively we can use range scan to find all available REPL_ORIGIN(id) keys. Last approach seems to be less efficient but easier to implement and doesn't require PS to know format of replication slot state file. |
Store logical replication origin in KV storage ## Problem See #6977 ## Summary of changes * Extract origin_lsn from commit WAl record * Add ReplOrigin key to KV storage and store origin_lsn * In basebackup replace snapshot origin_lsn with last committed origin_lsn at basebackup LSN ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>
This was fixed by #7099 |
Steps to reproduce
Make Neon a subscriber to some workload that produces several inserts/sec on a table with a primary key. Then restart it. Replication will fail because of duplicate key insert.
Expected result
No duplicate key insert
Actual result
Duplicate key insert
Environment
Logs, links
Discussion: https://neondb.slack.com/archives/C04DGM6SMTM/p1708363190710839
It seems that pageserver isn't applying advancing replorigin, and so the compute's origin when it restarts is whatever was in the last checkpoint.
The text was updated successfully, but these errors were encountered: