Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support blocking migration for the cluster migrate command #1418

Merged
merged 11 commits into from
May 15, 2023

Conversation

ellutionist
Copy link
Contributor

This PR closes 1350.

@ellutionist
Copy link
Contributor Author

src/commands/cmd_cluster.cc Outdated Show resolved Hide resolved
@ellutionist
Copy link
Contributor Author

ellutionist commented May 4, 2023

@torwig @PragmaTwice The reason why I use shared_ptr of SyncMigrateContext as a field of CommandClusterX is that I have to keep the lifetime of SyncMigrateContext as long as the command object.

When the slot_migrator wake up the SyncMigrateContext with a migration result status, it has to release the pointer so that no context will "occupy" the migrator any more (see here).
However, at this time point, the WriteCB have not been triggered by the event loop yet, thus the context cannot be deconstructed yet. Using a shared pointer will bind the life cycle of the SyncMigrateContext with the CommandClusterX object, which will solve this issue.

@PragmaTwice
Copy link
Member

PragmaTwice commented May 4, 2023

@ellutionist Thanks for your explanation.

I have not dived into these changes deeply, but if you just want to make lifetime of SyncMigrateContext as long as the command object, you can store an instance of SyncMigrateContext in CommandClusterX directly (class CommandClusterX { SyncMigrateContext ctx_; ... }), and pass some non-owning pointer (like raw pointer, no ownership) to the migrator.

src/commands/cmd_cluster.cc Outdated Show resolved Hide resolved
src/commands/cmd_cluster.cc Outdated Show resolved Hide resolved
src/cluster/slot_migrate.h Outdated Show resolved Hide resolved
@xiaobiaozhao
Copy link
Contributor

@ellutionist Thanks for your explanation.

I have not dived into these changes deeply, but if you just want to make lifetime of SyncMigrateContext as long as the command object, you can store an instance of SyncMigrateContext in CommandClusterX directly (class CommandClusterX { SyncMigrateContext ctx_; ... }), and pass some non-owning pointer (like raw pointer, no ownership) to the migrator.

I agree with that

@ellutionist
Copy link
Contributor Author

ellutionist commented May 5, 2023

@ellutionist Thanks for your explanation.

I have not dived into these changes deeply, but if you just want to make lifetime of SyncMigrateContext as long as the command object, you can store an instance of SyncMigrateContext in CommandClusterX directly (class CommandClusterX { SyncMigrateContext ctx_; ... }), and pass some non-owning pointer (like raw pointer, no ownership) to the migrator.

@PragmaTwice @xiaobiaozhao Thank you for the advice. This is another reasonable solution. However, there are two things to note here:

  1. The CommandClusterX does not always have a SyncMigrateContext (only when migrating with a "sync" flag). Making the context a non-pointer member of the CommandClusterX object may not make good sense here. If we decide to do this, I suggest to adopt std::optional<SyncMigrateContext>.
  2. Due to the concern of memory safety, I am not a fan of raw pointers. Of course, in this PR we could use it, but we may set up some potential risk for the future. Because not every maintainer could get to know that "theCommandClusterX object must live long enough to complete the migration". If some change is made to "how the command objects work" in the future, it may lead to crash.

@PragmaTwice
Copy link
Member

PragmaTwice commented May 5, 2023

The CommandClusterX does not always have a SyncMigrateContext (only when migrating with a "sync" flag). Making the context a non-pointer member of the CommandClusterX object may not make good sense here. If we decide to do this, I suggest to adopt std::optional.

Sure, you can use something like std::optional or std::unique_ptr.

Due to the concern of memory safety, I am not a fan of raw pointers. Of course, in this PR we could use it, but we may set up some potential risk for the future. Because not every maintainer could get to know that "theCommandClusterX object must live long enough to complete the migration". If some change is made to "how the command objects work" in the future, it may lead to crash.

I think in kvrock, we use std::shared_ptr if and only if the object need to be "shared" between multiple references, and the lifetime of this object cannot be determined statically (e.g. given an object o and two reference a and b, the lifetime of o is equal to the maximum of lifetime of a and b, which need to be determined dynamically).

So currently, just "in the future ..." seems not to be an accredited reason to me, since if this reason holds, we may need to change many raw pointer in the codebase to shared_ptr. If you just need a non-owning pointer (or called an observer/view/span, in modern C++), raw pointer is currently the available choice, since std::observer_ptr or std::optional<T&> (optional with reference type) is not in the standard.

@ellutionist
Copy link
Contributor Author

The CommandClusterX does not always have a SyncMigrateContext (only when migrating with a "sync" flag). Making the context a non-pointer member of the CommandClusterX object may not make good sense here. If we decide to do this, I suggest to adopt std::optional.

Sure, you can use something like std::optional or std::unique_ptr.

Due to the concern of memory safety, I am not a fan of raw pointers. Of course, in this PR we could use it, but we may set up some potential risk for the future. Because not every maintainer could get to know that "theCommandClusterX object must live long enough to complete the migration". If some change is made to "how the command objects work" in the future, it may lead to crash.

I think in kvrock, we use std::shared_ptr if and only if the object need to be "shared" between multiple references, and the lifetime of this object cannot be determined statically (e.g. given an object o and two reference a and b, the lifetime of o is equal to the maximum of lifetime of a and b, which need to be determined dynamically).

So currently, just "in the future ..." seems not to be an accredited reason to me, since if this reason holds, we may need to change many raw pointer in the codebase to shared_ptr. If you just need a non-owning pointer (or called an observer/view/span, in modern C++), raw pointer is currently the available choice, since std::observer_ptr or std::optional<T&> (optional with reference type) is not in the standard.

Thank you. I got the point of the difference between dynamic and static lifetime. From this point of view the share_ptr is not proper in this case. Perhaps the raw pointer is the best solution here.

@PragmaTwice PragmaTwice changed the title support sync migration Support blocking migration for the cluster migrate command May 8, 2023
@git-hulk
Copy link
Member

Hi @ellutionist Need to use ./x.py format to correct the code format

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with networking logic in kvrocks, so just some style comments


void SyncMigrateContext::EventCB(bufferevent *bev, int16_t events, void *ctx) {
auto self = reinterpret_cast<SyncMigrateContext *>(ctx);
auto &&slot_migrator = self->svr_->slot_migrator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using auto && here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reference to std::unique_ptr<SlotMigrator>, which cannot be copied.

Technically auto& would suffice, but I personally prefer auto&& since it accepts more kinds of value (e.g. const, not this case though).

Of course, it is also okay to use auto slot_migrator = self->svr_->slot_migrator.get() (raw pointer) here. Just a choice of style.


void SyncMigrateContext::TimerCB(int, int16_t events, void *ctx) {
auto self = reinterpret_cast<SyncMigrateContext *>(ctx);
auto &&slot_migrator = self->svr_->slot_migrator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@ellutionist
Copy link
Contributor Author

Hi @ellutionist Need to use ./x.py format to correct the code format

Hi @ellutionist Need to use ./x.py format to correct the code format

Done. It took me a while to adapt to the recent changes from [#1420].

Also some more go test cases were added.

@git-hulk
Copy link
Member

git-hulk commented May 10, 2023

The implementation generally looks good to me, left a few comments:

  • The blockingLock function looks unnecessary, it's fine to use std::unique_lock<std::mutex>lock(blocking_mutex_); directly
  • For the naming issue, SyncMigrateContext::(StartBlock|Wakeup) is a bit inconsistent with Migrator::(StartBlocking|WakeupBlocking|CancelBlocking). And for the function name, how about using Suspend|Resume|Cancel?
  • Those functions in SyncMigrateContext can be private:
    void OnWrite(bufferevent *bev); => void onWrite(bufferevent *bev);
    void OnEvent(bufferevent *bev, int16_t events); => void onEvent(bufferevent *bev, int16_t events);
    void TimerCB(int, int16_t events); => void timerCB(int, int16_t events);

@PragmaTwice
Copy link
Member

PragmaTwice commented May 10, 2023

Seems need run ./x.py check golangci-lint.

Those functions in SyncMigrateContext can be private:

Currently we need to make them public for the CRTP base classes to use them.

@git-hulk
Copy link
Member

Seems need run ./x.py check golangci-lint.

Those functions in SyncMigrateContext can be private:

Currently we need to make them public for the CRTP base classes to use them.

OK, I didn't notice this limitation.

@git-hulk
Copy link
Member

Looks like the migration test case has a segment fault. https://github.com/apache/incubator-kvrocks/actions/runs/4951001502/jobs/8855626634?pr=1418

@ellutionist
Copy link
Contributor Author

Looks like the migration test case has a segment fault. https://github.com/apache/incubator-kvrocks/actions/runs/4951001502/jobs/8855626634?pr=1418

I think I found the reason. When building with ASAN the pointer of context won't initialize with nullptr by default. Should explicitly assign it a default value.

Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@PragmaTwice
Copy link
Member

Merging... Thanks for your contribution!

@PragmaTwice PragmaTwice merged commit 75db3e4 into apache:unstable May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the support of the blocking migration for the cluster migrate command
6 participants