-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted streaming fixes, including skipping of empty transactions. #257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri With a mild load and an idle system, the lag is almost zero, which is a major improvement. However, when the system experiences higher loads, the problem still remains, as you can expect. Currently, this improvement seems adequate considering the scope of this PR for the messages that are empty transactions, but the problem will be entirely solved once the issue raised in this comment #254 (comment) is addressed.
I have an interesting case I observed earlier (migration using {"action":"C","xid":"64395","lsn":"16/A601C0B0","timestamp":"2023-04-18 08:45:03.608755+0530","message":{"action":"C","xid":64395}}
{"action":"B","xid":"64396","lsn":"16/A601C0E8","timestamp":"2023-04-18 08:45:13.624321+0530","message":{"action":"B","xid":64396}}
{"action":"C","xid":"64396","lsn":"16/A601C190","timestamp":"2023-04-18 08:45:13.624321+0530","message":{"action":"C","xid":64396}}
{"action":"X","lsn":"16/A7000028"}
{"action":"B","xid":"64397","lsn":"16/A601D618","timestamp":"2023-04-18 08:45:29.842951+0530","message":{"action":"B","xid":64397}}
{"action":"X","lsn":"16/A7003600"} which led to, COMMIT; -- {"xid":64395,"lsn":"16/A601C0B0","timestamp":"2023-04-18 08:45:03.608755+0530"}
BEGIN; -- {"xid":64396,"lsn":"16/A601C0E8","timestamp":"2023-04-18 08:45:13.624321+0530"}
COMMIT; -- {"xid":64396,"lsn":"16/A601C190","timestamp":"2023-04-18 08:45:13.624321+0530"}
-- SWITCH WAL {"lsn":"16/A7000028"}
-- SWITCH WAL {"lsn":"16/A7003600"} and hence migration failed,
I can't understand how this could happen. Also, can you please elaborate a bit what was earlier missed around |
We missed updating the “latest” symlink at close file time, meaning that when restarting streaming later we would re-fetch LSN positions that made it to disk locally already in the previous round. I believe that could explain double SWITCH entries in the JSON files. |
Now that we need to keep around the previous LogicalMessageMetadata, we also need to keep around its JSON buffer. This means we can't re-use the pointer to the current context buffer, and we need to duplicate memory even when using wal2json and keeping the same JSON buffer as received. In passing, adjust the JSON files in the test suite according to the new empty transactions filtering and synthetic KEEPALIVE situation.
The exit code 141 is 128 + 13, and 13 is SIGPIPE. It means that in our cdc-low-level test case when doing receive | transform | apply, one of the processes is trying to write to its output pipe when the next process has already exited, or at least closed its pipe input file descriptor. By ignoring SIGPIPE we ask the system to return EPIPE at write time. Let's now see how our code handles that.
Should fix #254. Or at least help with it.
Also should fix issues with SWITCH WAL messages found in the middle of JSON files, when it comes from a situation where streaming has been restarted.