-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: CDC replication/apply is slow #704
Conversation
I will keep this as draft until I fix all CI failures. |
@dimitri Would you still suggest to conditionally enable the pipeline implementation for libpq >= 14? |
Thanks @arajkumar for working on this, that's much appreciated! My understanding is that we need libpq 14+ at build time to enable this yes. Unfortunately not all build systems out there understand that we can build with libpq 16 and then work with different Postgres versions (RPM is one of these). Another aspect: is it possible to have a “sync” step in the pipeline mode, for the COMMIT and LSN tracking? if that was possible, we could skip using a second transaction entirely, and I think we have to do that for correctness. |
f36aead
to
9bfbc58
Compare
@dimitri I've added this in the recent commit.
This would make pipeline to be synced for each commit and reduce the overall throughput of the pipeline implementation. It would essentially make pipeline mode to used only with in a txn. We are having exactly same implementation on our internal fork which is branched out around 0.13. The throughput is not so great when you have lots of txns with single DML statement. Since we already do |
Here more details about overhead caused by the current LSN tracking implementation. This is kinda micro benchmark where it uses only
With pipeline mode, apply took 4m 34 seconds to ingest 122987 records. i.e 448 records/sec. If we dissect those steps further, DML insert took (06:06:29.934 - 06:05:02.982) => 1m 27 seconds, i.e. 1,413 recods/sec. Pgcopydb sentinel took (06:09:37.642-06:06:29.934) => 3m 8seconds. You can find the CPU profiling taken during the above benchmark => https://pprof.me/46714b5a2a244609af5e72e035427a5c/ The benchmark shows considerable amount of time (~80%) of the time spend in stream_apply_sync_sentinel and stream_apply_track_insert_lsn. Is there an opportunity to optimize stream_apply_sync_sentinel & stream_apply_track_insert_lsn?, probably yes, I didn't really spent time on that, but should we do that? Why not make replication_origin work and simply use replication_origin_progress as replay_lsn.? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of review is in, lots of naming issue, the build-time vs run-time thing with libpq14 for pipeline mode, and not so much after that. I understand why we need to stay in pipeline mode in the connection and can't have sync queries in there, but I failed to read a comment explaining why we need two connections now, what they are used for, etc. It's also a problem with the PGSQL client connection names.
src/bin/pgcopydb/pgsql.c
Outdated
/* | ||
* pgsql_pipeline_enter enables the pipeline mode in the given PGSQL | ||
* connection. It also sets the connection to non-blocking mode. | ||
*/ | ||
bool | ||
pgsql_pipeline_enter(PGSQL *pgsql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer another function name, such as pgsql_enable_pipeline_mode
.
src/bin/pgcopydb/pgsql.c
Outdated
|
||
if (!is_response_ok(res)) | ||
{ | ||
(void) pgcopy_log_error(pgsql, res, "Read after pipeline sync failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do better in terms of error message here?
9bfbc58
to
5be4b2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're almost there, thanks again!
Our existing implementation is basic and it will execute a statement and waits for its result to come before issuing next statement. We already improved DML statements targetting same table with in a transaction by coalscing them into multi-value insert statement, but it will be beneficial only for logical decoding transaction which has multiple DML statements with in a txn. However, we have encountered few cases where the transaction had only one DML statement. This kind of txn will be slower because statements are executed sequentially from the client's perspective. Here is an example of how a single statement transaction will be executed now, 1. Execute ["BEGIN"]() 2. Execute [bunch of "SET" statements]() which are related to replication session setup 3. Prepare [DML statement]() 4. Execute [prepared statement with values]() 5. Execute procedure to update [replication origin progress]() 6. Execute [COMMIT]() 7. Map [target current insert lsn to commit lsn]() for feedback reporting (i.e. sentinel replay_lsn) **Solution**: Use [pipeline API](https://www.postgresql.org/docs/current/libpq-pipeline-mode.html) from libpq client library The proposed implementation would enter into pipeline mode as soon as a new libpq connection is created for the ld_apply/ld_replay process. All the statements would be executed on a pipeline by default except few for statements which needs response immediately(e.g. step 7). A dedicated connection would be used to serve (step 7), because it needs synchronous response. The following functions are being called on the target PGSQL handle from ld_apply & ld_replay. * pgsql_begin * pgsql_set_gucs * pgsql_execute * pgsql_replication_origin_xact_setup * pgsql_prepare * pgsql_execute_prepared * pgsql_current_wal_insert_lsn * pgsql_current_wal_flush_lsn Among all of the above function, only pgsql_current_wal_insert_lsn and pgsql_current_wal_flush_lsn returns values and other functions are write only. The idea is to have 2 PGSQL connection handles, 1 for all write activity which can go through pipeline and another one could be used for reading. Pipeline connection has to be synced/drained at some point to avoid accumulating results on the server & client which would end up eating lots of heap memory. The current implementation syncs based on the time interval(i.e. for every 1s). There are other methods like statement/txn count based sync, which may or may not be efficient. The following command can be used to generate loads to understand the performance improvement made by this commit, ``` CREATE TABLE metrics ( time TIMESTAMP NOT NULL, name TEXT, id NUMERIC, value FLOAT ); ``` ``` -- insert_metric.sql \set id random(1, 1000000) \set value random(0,100) INSERT INTO metrics (time, name, value) VALUES (NOW(), 'metric_' || :id, :value); ``` ``` pgbench -n -c 40 -j 1 -t 10000 -f insert_metric.sql $SOURCE ``` - `-c` number of database connection to utlize (i.e. server side concurrency) - `-j` number of threads to create on the client machine (i.e. client side concurrency) synchronous_commit=off) ``` pgbench -n -c 1 -j 1 -t 10000 -f insert_metric.sql $TARGET ``` ``` tps = 1177 (without initial connection time) ``` ``` tps = 175 ``` ``` tps = 1652 ``` **This commit improves the single statement txn throughput by 10x** Ideally, we should aim to get performance number close to direct ingestion(i.e. 1200 txn/s). We are 40% performing better than the baseline in this iteration. However, we can aim more as in the real system there will be more than 1 connection will be utilized. We can't really race against multiple connection doing steady ingestion around 1000 txn/s per connection, but lets optimize the single connection throughput to the max! This change will be a foundation to the future improvements which steers towards that. 1. Optimize step 2 - Instead of executing bunch of SET statements for every txn, run once in the beginning for the session 2. Optimize step 9 - Probably we can simply use pg_replication_origin_progress as replay_lsn? Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
2ccaa5c
to
a935f3d
Compare
Thanks a lot @dimitri for reviewing and providing detailed review comments. Much appreciated! Next, I'm planning to change Dockerfile to install newer versions of postgres-client(14?) libraries to enable the pipeline mode by default. |
Sounds good. I have also been wondering if switching the copy protocol calls that we do to asynchronous would raise any performance improvements... would that be an area you'd be willing to investigate? |
@dimitri Do you mean copy protocol used for initial data copying? |
I was thinking about |
We have implemented pipeline mode in dimitri#704, but it is not yet enabled by default as the current docker image has PG13. Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
We have implemented pipeline mode in #704, but it is not yet enabled by default as the current docker image has PG13. Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Our existing implementation is basic and it will execute a statement and waits for its result to arrive before issuing next statement. We already improved DML statements targeting same table with in a transaction by coalescing them into multi-value insert statement, but it will be beneficial only for a transaction which has multiple DML statements.
However, we have encountered few cases where the transaction had only one DML statement. This kind of txn will be slower because statements are executed sequentially from the client's perspective and network round trip latency would be added for each statement. For e.g. consider a network round trip of 10ms, executing 100 statements would simply add 1000ms(1s) to the overall response time.
Here is an example of how a single statement transaction will be executed now,
Solution: Use pipeline mode from libpq. It would facilitate sending commands without waiting for it's result.
The goal of this implementation is to reduce network latency as much as possible while apply logical messages.
The proposed implementation would enter into pipeline mode as soon as a new libpq connection is created for the ld_apply/ld_replay process. All the statements would be executed on a pipeline by default except few for statements which needs response immediately(e.g. step 7). A dedicated connection would be used to serve (step 7), because it needs synchronous response.
The following functions are being called on the target PGSQL handle from ld_apply & ld_replay.
Among all of the above function, only pgsql_current_wal_insert_lsn and pgsql_current_wal_flush_lsn returns values and other functions are write only.
The idea is to have 2 PGSQL connection handles, 1 for all write activity which can go through pipeline and another one could be used for reading.
Pipeline connection has to be synced/drained at some point to avoid accumulating results on the server & client which would end up eating lots of heap memory. The current implementation syncs based on the time interval(i.e. for every 1s). There are other methods like statement/txn count based sync, which may or may not be efficient.
The following command can be used to generate loads to understand the performance improvement made by this commit,
-c
number of database connection to utlize (i.e. server side concurrency)-j
number of threads to create on the client machine (i.e. client side concurrency)synchronous_commit=off)
This commit improves the single statement txn throughput by 10x
Ideally, we should aim to get performance number close to direct ingestion(i.e. 1200 txn/s). We are 40% performing better than the baseline in this iteration. However, we can aim more as in the real system there will be more than 1 connection will be utilized. We can't really race against multiple connection doing steady ingestion around 1000 txn/s per connection, but lets optimize the single connection throughput to the max!
This change will be a foundation to the future improvements which steers towards that.
TODO