Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage surges during redo apply recently #10900

Closed
fubinzh opened this issue Apr 11, 2024 · 3 comments · Fixed by #10904
Closed

Memory usage surges during redo apply recently #10900

fubinzh opened this issue Apr 11, 2024 · 3 comments · Fixed by #10904
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. severity/major type/enhancement The issue or PR belongs to an enhancement.

Comments

@fubinzh
Copy link

fubinzh commented Apr 11, 2024

What did you do?

During recent testing we found that cdc consumes more memory during redo apply and results in OOM issue.

What did you expect to see?

redo apply memory usage should be stable.

What did you see instead?

4/11 memory usage:

[root@upstream-ticdc-0 /]# /cdc  redo  apply "--sink-uri=mysql://root:@downstream-tidb.cdc-testbed-tps-7566087-1-319:4000" "--storage=s3://tmp/test-infra-redolog/redo-apply-multiple-workload-node-restart-sync-90c0bc8f-323c-41f6-b662-706b847c6d4e?access-key=xxx&secret-access-key=xxx&endpoint=http://minio-peer:9000&force-path-style=true" "--tmp-dir=/tmp/1"
[2024/04/10 11:04:13.471 +00:00] [INFO] [helper.go:54] ["init log"] [file=] [level=info]
[2024/04/10 11:04:13.475 +00:00] [INFO] [s3.go:407] ["succeed to get bucket region from s3"] ["bucket region"=us-east-1]
[2024/04/10 11:04:13.737 +00:00] [INFO] [mem_quota.go:82] ["New memory quota"] [namespace=default] [changefeed=redo-applier] [total=1073741824]
[2024/04/10 11:04:13.737 +00:00] [INFO] [redo.go:163] ["apply redo log starts"] [checkpointTs=448985877505376323] [resolvedTs=448986017647558699]
[2024/04/10 11:04:13.738 +00:00] [INFO] [s3.go:407] ["succeed to get bucket region from s3"] ["bucket region"=us-east-1]
[2024/04/10 11:04:13.738 +00:00] [INFO] [tz.go:35] ["Use the timezone of the TiCDC server machine"] [timezoneName=System] [timezone=UTC]
[2024/04/10 11:04:13.738 +00:00] [WARN] [config.go:381] ["Because time-zone is not specified, the timezone of the TiCDC server will be used. We recommend that you specify the time-zone explicitly. Please make sure that the timezone of the TiCDC server, sink-uri and the downstream datab
ase are consistent. If the downstream database does not load the timezone information, you can refer to https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html."] [timezone=UTC]
[2024/04/10 11:04:13.739 +00:00] [INFO] [s3.go:407] ["succeed to get bucket region from s3"] ["bucket region"=us-east-1]
[2024/04/10 11:04:13.745 +00:00] [INFO] [db_helper.go:173] ["sink uri is configured"] [dsn="root:******@tcp(downstream-tidb.cdc-testbed-tps-7566087-1-319:4000)/?interpolateParams=true&multiStatements=true&allow_auto_random_explicit_insert=1&charset=utf8mb4&foreign_key_checks=0&maxAllow
edPacket=0&readTimeout=2m&sql_mode=%22ALLOW_INVALID_DATES%2CNO_ENGINE_SUBSTITUTION%2CONLY_FULL_GROUP_BY%2CIGNORE_SPACE%2CNO_AUTO_VALUE_ON_ZERO%22&tidb_enable_external_ts_read=%22OFF%22&tidb_placement_mode=%22ignore%22&tidb_txn_mode=optimistic&time_zone=%22UTC%22&timeout=2m&transaction_
isolation=%22READ-COMMITTED%22&writeTimeout=2m"]
[2024/04/10 11:04:13.749 +00:00] [INFO] [mysql.go:197] ["MySQL backends is created"] [changefeed=.] [workerCount=16] [forceReplicate=false]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=0]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=1]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=3]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=2]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=5]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=4]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=8]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=6]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=7]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=9]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=10]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=11]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=13]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=12]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=14]
[2024/04/10 11:04:13.749 +00:00] [INFO] [worker.go:105] ["Transaction dmlSink worker starts"] [changefeedID=.] [workerID=15]
[2024/04/10 11:04:13.750 +00:00] [INFO] [tz.go:35] ["Use the timezone of the TiCDC server machine"] [timezoneName=System] [timezone=UTC]
[2024/04/10 11:04:13.750 +00:00] [WARN] [config.go:381] ["Because time-zone is not specified, the timezone of the TiCDC server will be used. We recommend that you specify the time-zone explicitly. Please make sure that the timezone of the TiCDC server, sink-uri and the downstream datab
ase are consistent. If the downstream database does not load the timezone information, you can refer to https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html."] [timezone=UTC]
[2024/04/10 11:04:13.755 +00:00] [INFO] [db_helper.go:173] ["sink uri is configured"] [dsn="root:******@tcp(downstream-tidb.cdc-testbed-tps-7566087-1-319:4000)/?interpolateParams=true&multiStatements=true&allow_auto_random_explicit_insert=1&charset=utf8mb4&foreign_key_checks=0&maxAllow
edPacket=0&readTimeout=2m&sql_mode=%22NO_ENGINE_SUBSTITUTION%2CONLY_FULL_GROUP_BY%2CIGNORE_SPACE%2CNO_AUTO_VALUE_ON_ZERO%2CALLOW_INVALID_DATES%22&tidb_enable_external_ts_read=%22OFF%22&tidb_placement_mode=%22ignore%22&tidb_txn_mode=optimistic&time_zone=%22UTC%22&timeout=2m&transaction_
isolation=%22READ-COMMITTED%22&writeTimeout=2m"]
[2024/04/10 11:04:13.758 +00:00] [INFO] [mysql_ddl_sink.go:105] ["MySQL DDL sink is created"] [namespace=] [changefeed=]
[2024/04/10 11:04:14.135 +00:00] [INFO] [file.go:114] ["succeed to download and sort redo logs"] [type=ddl] [duration=397.463695ms]
^B%[2024/04/10 11:04:41.134 +00:00] [INFO] [file.go:114] ["succeed to download and sort redo logs"] [type=row] [duration=27.396073836s]

Killed

img_v3_029q_ecebff40-127f-428f-8811-a0cb7874e0bg

4/3 memory usage:
img_v3_029r_43d63712-4bb1-4930-bad1-3b97527d143g

Versions of the cluster

Release Version: v8.1.0-alpha
Git Commit Hash: 8a3b776
Git Branch: heads/refs/tags/v8.1.0-alpha
UTC Build Time: 2024-04-09 11:36:29
Go Version: go version go1.21.6 linux/amd64
Failpoint Build: false

@fubinzh fubinzh added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. labels Apr 11, 2024
@fubinzh
Copy link
Author

fubinzh commented Apr 11, 2024

/severity major

@fubinzh
Copy link
Author

fubinzh commented Apr 11, 2024

/label affects-8.1

@ti-chi-bot ti-chi-bot bot added affects-8.1 This bug affects the 8.1.x(LTS) versions. and removed may-affects-8.1 labels Apr 11, 2024
@CharlesCheung96
Copy link
Contributor

The root cause of the OOM problem is the inappropriate gc policy, which can be addressed when the GOMEMLIMIT is limited to 8GiB.
img_v3_029r_0fec910d-edf9-4cfe-99b1-6a77238ca64g

In addition, we found some operations that may allocate too much memory, and we will fix them later:

img_v3_029r_c79a92d2-fe1a-4593-9495-54035cfa4a7g

@CharlesCheung96 CharlesCheung96 added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. and removed may-affects-5.4 may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 labels Apr 12, 2024
@CharlesCheung96 CharlesCheung96 added affects-6.5 This bug affects the 6.5.x(LTS) versions. and removed affects-6.1 This bug affects the 6.1.x(LTS) versions. labels Apr 27, 2024
@flowbehappy flowbehappy added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Apr 30, 2024
hicqu added a commit to ti-chi-bot/tiflow that referenced this issue Jun 12, 2024
commit c092599
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 12 00:26:59 2024 +0800

    pkg/config, sink(ticdc): support output raw change event for mq and cloud storage sink (pingcap#11226) (pingcap#11290)

    close pingcap#11211

commit 3426e46
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 19:40:29 2024 +0800

    puller(ticdc): fix wrong update splitting behavior after table scheduling (pingcap#11269) (pingcap#11282)

    close pingcap#11219

commit 2a28078
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 16:40:37 2024 +0800

    mysql(ticdc): remove error filter when check isTiDB in backend init (pingcap#11214) (pingcap#11261)

    close pingcap#11213

commit 2425d54
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 16:40:30 2024 +0800

    log(ticdc): Add more error query information to the returned error to facilitate users to know the cause of the failure (pingcap#10945) (pingcap#11257)

    close pingcap#11254

commit 053cdaf
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 15:34:30 2024 +0800

    cdc: log slow conflict detect every 60s (pingcap#11251) (pingcap#11287)

    close pingcap#11271

commit 327ba7b
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 11:42:00 2024 +0800

    redo(ticdc): return internal error in redo writer (pingcap#11011) (pingcap#11091)

    close pingcap#10124

commit d82ae89
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Mon Jun 10 22:28:29 2024 +0800

    ddl_puller (ticdc): handle dorp pk/uk ddl correctly (pingcap#10965) (pingcap#10981)

    close pingcap#10890

commit f15bec9
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Fri Jun 7 16:16:28 2024 +0800

    redo(ticdc): enable pprof and set memory limit for redo applier (pingcap#10904) (pingcap#10996)

    close pingcap#10900

commit ba50a0e
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 19:58:26 2024 +0800

    test(ticdc): enable sequence test (pingcap#11023) (pingcap#11037)

    close pingcap#11015

commit 94b9897
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 17:08:56 2024 +0800

    mounter(ticdc): timezone fill default value should also consider tz. (pingcap#10932) (pingcap#10946)

    close pingcap#10931

commit a912d33
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 10:49:25 2024 +0800

    mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic (pingcap#10466) (pingcap#11242)

    close pingcap#11241

commit 6277d9a
Author: dongmen <20351731+asddongmen@users.noreply.github.com>
Date:   Wed May 29 20:13:22 2024 +0800

    kvClient (ticdc): revert e5999e3 to remove useless metrics (pingcap#11184)

    close pingcap#11073

commit 54e93ed
Author: dongmen <20351731+asddongmen@users.noreply.github.com>
Date:   Wed May 29 17:43:22 2024 +0800

    syncpoint (ticdc): make syncpoint support base64 encoded password (pingcap#11162)

    close pingcap#10516

commit 0ba9329
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed May 29 09:07:21 2024 +0800

    (redo)ticdc: fix the event orderliness in redo log (pingcap#11117) (pingcap#11180)

    close pingcap#11096

Signed-off-by: qupeng <qupeng@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. severity/major type/enhancement The issue or PR belongs to an enhancement.
Projects
Development

Successfully merging a pull request may close this issue.

3 participants