back-pressure / keep alive reply #66
Replies: 3 comments 8 replies
-
implementation proposal done in #65 |
Beta Was this translation helpful? Give feedback.
-
Any thoughts on this @DohanKim ? |
Beta Was this translation helpful? Give feedback.
-
Addition information will be usefull here. When a big transaction occurs, Walex will receive a big number of request in a row. For that reason message_middleware must reply as fast as possible to avoid this case and the processing must not take too much ressources to process otherwise the above describe situation will occur. (FYI: we are currently testing Walex integration on our side and are running in this case) |
Beta Was this translation helpful? Give feedback.
-
issue:
Currently once the Replication.Server decodes the data payload from Postgres it casts it to Replication.Publisher.
In cases where big changes occurrs (for instance a delete or insert of X * Mil records),
as the message will get stored in-memory,
WalEx can end-up consuming all of its available memory.
For temp slots, this is not that critical as when Walex will restart it will ignore the problematic transaction and start at the latest transaction instead.
But for durable replication slots, the issue becomes that Walex will retry multiple times that transaction, creating strain on the CPU of the Postgres database and in the mid-time the WAL files will start to pile on.
After a long while, as replies to the keep alive increases the
restart_lns
field of the durable slot, the trasaction will get dropped but it can take a very long time.proposal:
To fix this we propose to make the way Server communicates with Publisher configurable,
the idea would be to add a message_middleware option that would take the current decoded server message and app_name.
This callback would then become responsible to publish the message to Publisher.
This would allow for user-define back-pressure.
For instance the callback could write the messages to the local disk, an SQS FIFO queue, etc.
It would then become the responsibility of the end-user to read these messages and to send them to WalEx.Replication.Publisher
to continue the processing (using for instance Broadway).
Not fixed by this proposal:
Walex will still require that the full transaction be held in memory by Publisher before its sent to Destination,
but that's a seperate issue and the size of the in-memory final transaction should be much less than
all the binary payloads with the associated Relation, Commit, Begin, etc.
Beta Was this translation helpful? Give feedback.
All reactions