-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disagg: Fix the issue that lifetime of read-snapshots in WN may be longer than expected #9297
Disagg: Fix the issue that lifetime of read-snapshots in WN may be longer than expected #9297
Conversation
/retest |
3cbfaa7
to
09a2ccd
Compare
36d5d78
to
9dc0288
Compare
…lizing the snapshot data.
7769eb6
to
c1e5333
Compare
if (request.page_ids_size() == 0 && !needFetchMemTableSet()) | ||
return; | ||
|
||
// No matter all delta data is cached or not, call FetchDisaggPages to release snapshot in WN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause optimizations like in https://github.com/tidbcloud/tiflash-cse/pull/236/files get reverted? (The PR skips occupySpace to reduce lock contention)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinheLin @breezewish
What about adding another unary RPC method "CancelDisaggTask". When the MPP request on CN is finished or canceled, CN calls "CancelDisaggTask" to let the WN release the whole snapshot. Instead of sending "fetchPages" segment by segment?
This can
- decouple the optimization of 'fetchPages' and release WN snapshot
- reduce the total RPC calls between CN and RN
- handle the situation when the request is canceled in the CN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to this code: https://github.com/tidbcloud/tiflash-cse/pull/236/files#diff-415598e006988b3cadcac5e3a4eda7fdbed5ac29cc1ce8d11211a5d26a4d8449 ?
If there are no ColumnFileTiny to fetch, skip here: https://github.com/pingcap/tiflash/blob/master/dbms/src/Storages/DeltaMerge/SegmentReadTask.cpp#L332.
Code in https://github.com/pingcap/tiflash/pull/9297/files/c1e5333660e180072ff658e2cd45349b6d2db28b#diff-9fc375427641c2044d5949241ef2f173ee8b4d24e252216a6cba0018ad5c0413L545-L547 means all pages are cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinheLin Thanks, it looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
RPCs that can be reduced is when all pages are cached and there is no
ColumnFileInMemory
since v8.1, because ColumnFileInMemory is not cached.- If there are no write, segments are stable-only, snapshots are released quickly in WN.
- If there are some writes, the probability that there are no
ColumnFileInMemory
is low, so these RPCs likely cannot be reduced.
-
The coupling problem also looks fine because the logic is simple, and this code does not spread outward.
-
The situation that
CancelDisaggTask
to handle is abnormal requests.CancelDisaggTask
can only serve as a fallback. If a request takes a long time to execute, the lifecycle of the snapshot will still be long than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another optimization is use just one stream RPC to fetch pages of all segments. But this should be quite troublesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
@JinheLin: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests
If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
…nger than expected (#9297) (#9317) close #9298 1. In WN, release snapshots of SegmentReadTasks that do not have ColumnFileTiny and ColumnFileInMemory. 2. In CN, no matter all ColumnFileTinys and ColumnFileInMemorys are cached or not, call FetchDisaggPages to release snapshot in WN. Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Co-authored-by: jinhelin <linjinhe33@gmail.com>
…nger than expected (#9297) (#9316) close #9298 1. In WN, release snapshots of SegmentReadTasks that do not have ColumnFileTiny and ColumnFileInMemory. 2. In CN, no matter all ColumnFileTinys and ColumnFileInMemorys are cached or not, call FetchDisaggPages to release snapshot in WN. Co-authored-by: jinhelin <linjinhe33@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
What problem does this PR solve?
Issue Number: close #9298
Problem summary:
Snapshot of each SegmentReadTask will be released after FetchDisaggPages is called. But for some segments with no ColumnFileTiny or ColumnFileInMemory to fetch, FetchDisaggPages will never be called.
What is changed and how it works?
Notes:
Check List
Tests
Side effects
Documentation
Release note