Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cancel in AsyncTasks #8581

Merged
merged 24 commits into from
Dec 26, 2023
Merged

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Dec 25, 2023

What problem does this PR solve?

Issue Number: close #8382

Problem Summary:

The AsyncTasks is a very simple manager for running and fetching result from a non-work-stealing thread pool. It could only have one producer and one consumer per task.

In order to support canceling the running or queueing tasks, we proposed this PR.

Added cancel for AsyncTasks. So the transition will be

  • NotScheduled -> InQueue, Running, NotScheduled
  • InQueue -> Running
  • Running -> Finished, NotScheduled(canceled)
  • Finished -> NotScheduled(fetched)

Callers should guarantee a task is no longer accessible once canceled or has its result fetched.

The AsyncTasks is designed to have one consumer. If a task is canceled, then it will be unregistered from AsyncTasks immediately, the caller has to keep track of the state, and avoid calling cancel or fetching result again for the same task.

The AsyncTasks will panic if register another task with the same key. However, if the task is canceled asyncly, the worker thread may not find a canceled checkpoint immediately. If a new task is added in the meantime, then there could be two tasks manipulating some shared resources. In order to avoid this situation, block wait for cancel.

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 25, 2023
@CalvinNeo
Copy link
Member Author

/run-all-tests

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Dec 25, 2023
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/run-all-tests

CalvinNeo and others added 2 commits December 25, 2023 16:32
f
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/run-all-tests

Comment on lines +343 to +348
auto maybe_elapsed = fap_ctx->tasks_trace->queryElapsed(region_id);
if unlikely (!maybe_elapsed.has_value())
{
GET_METRIC(tiflash_fap_task_result, type_failed_cancel).Increment();
LOG_INFO(log, "FAP is canceled at beginning region_id={} new_peer_id={}", region_id, new_peer_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation will we run into this branch?

Copy link
Member Author

@CalvinNeo CalvinNeo Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a cancel happens in addTaskWithCancel between if (cancel_handle->canceled()), and the inner invokable object of (*p)() actually called(after the addTaskWithCancel quitted and release the lock).

It is very slightly chance.

@CalvinNeo
Copy link
Member Author

/run-unit-test

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
/// `throw_if_noexist` controls whether to throw.
/// NOTE: The task element will be removed after calling this function.
template <typename ResultDropper>
TaskState asyncCancelTask(Key k, ResultDropper result_dropper, bool throw_if_noexist)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also return std::future<std::optional<R>> to support await-like call pattern. However, we don't make it too complicated. If you want to wait until the worker thread quits, then just use blockedCancelRunningTask

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/run-all-tests

dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
Comment on lines 226 to 227
// Only one thread can block cancel and wait.
RUNTIME_CHECK_MSG(!cancel_handle->canceled(), "Try block cancel running task twice");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this limitation?

Copy link
Member Author

@CalvinNeo CalvinNeo Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the AsyncTasks only supports one consumer, so why would this only consumer cancel twice? I think it must be a mistake.
By checking duplicated key in asyncTasks and checking duplicated cancel in blockedCancelRunningTask will also guard the "one consumer" assumption.

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
f
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
}

bool isReady(Key key) const
TaskState unsafeQueryState(Key key) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the friends decl does not work here too

dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/run-all-tests

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small comments

{
std::unique_lock<std::mutex> l(mtx);
auto it = tasks.find(key);
RUNTIME_CHECK(it != tasks.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUNTIME_CHECK(it != tasks.end());
RUNTIME_CHECK(it != tasks.end(), key);

Copy link
Member Author

@CalvinNeo CalvinNeo Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the key is can not be format?
https://en.cppreference.com/w/cpp/utility/format/formattable

dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/Utils/AsyncTasks.h Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot bot added the lgtm label Dec 26, 2023
Copy link
Contributor

ti-chi-bot bot commented Dec 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 26, 2023
Copy link
Contributor

ti-chi-bot bot commented Dec 26, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-12-25 05:14:13.008002137 +0000 UTC m=+1456344.045229057: ☑️ agreed by Lloyd-Pottiger.
  • 2023-12-26 09:47:38.282068063 +0000 UTC m=+1559149.319294982: ☑️ agreed by JaySon-Huang.

Comment on lines 87 to 93
void doCancel()
{
// Use lock here to prevent losing signal.
std::scoped_lock<std::mutex> lock(mut);
inner.store(true);
cv.notify_all();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not acquiring lock will lead to signal lost?

Co-authored-by: JaySon <tshent@qq.com>
@CalvinNeo
Copy link
Member Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 26, 2023
CalvinNeo and others added 4 commits December 26, 2023 18:02
f
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 26, 2023
@CalvinNeo
Copy link
Member Author

/run-all-tests

@ti-chi-bot ti-chi-bot bot merged commit 03c3c49 into pingcap:master Dec 26, 2023
6 checks passed
windtalker pushed a commit to windtalker/tiflash that referenced this pull request Dec 27, 2023
JaySon-Huang pushed a commit to JaySon-Huang/tiflash that referenced this pull request Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce two stages FAP
3 participants