Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] Raft integration and TMT storage review #32

Closed
wants to merge 37 commits into from
Closed

Conversation

innerr
Copy link
Contributor

@innerr innerr commented Apr 2, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@innerr innerr changed the title RaftService / KVStore review [DNM] RaftService / KVStore review Apr 2, 2019
@@ -25,6 +25,7 @@ RaftService::RaftService(const std::string & address_, DB::Context & db_context_

RaftService::~RaftService()
{
// REVIEW: can we remove this mutex?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any upcoming commits to fix this? Or just put some questions in codebase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I think messages like this should put somewhere else like Jira instead of here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that messages will be addressed(PRs/issues should be created) after online-video discussion, before that nothing need to do

@@ -42,6 +42,8 @@ RegionMeta RegionMeta::deserialize(ReadBuffer & buf)
return RegionMeta(peer, region, apply_state, applied_term, pending_remove);
}

// REVIEW: lock? be carefull, can be easily deadlock.
// or use member `const RegionID region_id`
RegionID RegionMeta::regionId() const { return region.id(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should return region_id. I added the member region_id but forgot to change this part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@innerr innerr changed the title [DNM] RaftService / KVStore review [DNM] Raft integration and TMT storage review Apr 8, 2019
@@ -44,6 +48,8 @@ void applySnapshot(KVStorePtr kvstore, RequestReader read, Context * context)
auto cf_name = data.cf();
auto key = TiKVKey();
auto value = TiKVValue();
// REVIEW: the batch inserting logic is OK, but the calling stack is weird.
// May be we should just do lock action on each node-inserting, it's also fast when lock contention is low.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now, we don't need a specific function to deal with batch inserting. I implement this function just there was a TODO.

@@ -21,6 +23,7 @@ void RegionPersister::persist(const RegionPtr & region, enginepb::CommandRespons
// Support only on thread persist.
std::lock_guard<std::mutex> lock(mutex);

// REVIEW: can we just region->resetPersistParm(void)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I store the value first because it may be changed during doPersist.

@@ -110,7 +112,9 @@ class RegionMeta

// When we create a region peer, we should initialize its log term/index > 0,
// so that we can force the follower peer to sync the snapshot first.
// REVIEW: why initialize term = 5?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ask menglong, this is a magic number.

@@ -176,6 +181,7 @@ void RegionMeta::doSetPendingRemove() { pending_remove = true; }
void RegionMeta::waitIndex(UInt64 index)
{
std::unique_lock<std::mutex> lock(mutex);
// REVIEW: should we lock inside the closure function?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::condition_variable itself will lock.

@@ -127,9 +130,11 @@ enginepb::CommandResponse RegionMeta::toCommandResponse() const

RegionMeta::RegionMeta(RegionMeta && rhs) : region_id(rhs.regionId())
{
// REVIEW: lock rhs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe no need. I just added a lock in case

peer = std::move(rhs.peer);
region = std::move(rhs.region);
apply_state = std::move(rhs.apply_state);
// REVIEW: set rhs.* to init state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if a region is at initial state, its data is not empty. since we've decided to move it, it should not be used any more.

@@ -15,12 +15,16 @@ void applySnapshot(KVStorePtr kvstore, RequestReader read, Context * context)

enginepb::SnapshotRequest request;
auto ok = read(&request);
// REVIEW: use two 'throw' here, 'not ok' and 'no state'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@@ -40,6 +43,7 @@ void RegionPersister::doPersist(const RegionPtr & region, enginepb::CommandRespo
}

MemoryWriteBuffer buffer;
// REVIEW: add some warning log here if region is too large
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@@ -11,6 +11,8 @@ extern const int LOGICAL_ERROR;

void RegionPersister::drop(RegionID region_id)
{
// REVIEW: need mutex?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -70,6 +70,7 @@ class RegionMeta
bool isPendingRemove() const;
void setPendingRemove();

// REVIEW: assign is a better name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be considered

@@ -94,6 +95,7 @@ class RegionMeta
void doSetApplied(UInt64 index, UInt64 term);

private:
// REVIEW: we should make sure all these member are deepcopy, (eg: region.peers())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it is

@@ -225,6 +232,7 @@ void RegionMeta::execChangePeer(

switch (change_peer_request.change_type())
{
// REVIEW: throws when meet `AddNode`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any raft command is sent to all peer. we should discard those we don't need.

@@ -58,6 +58,7 @@ class KVStore final : private boost::noncopyable

std::atomic<Timepoint> last_try_persist_time = Clock::now();

// REVIEW: the task_mutex looks should be one mutex per region than one big lock in kvstore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of operation like split and merge, we can not regard region as independent one.

@@ -272,6 +282,9 @@ void KVStore::removeRegion(RegionID region_id, Context * context)
}

region_persister.drop(region_id);

// REVIEW: if process crushed here, then when the process start again, the region_table will not find this region, is it OK?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In region_table, we don't store the range of table to disk. At any time, we should ensure that the range in region_table should contains the one in region_persister to avoid losing data. After restarted, we can recompute the range of table.

}
else
{
// REVIEW: is the persisting order OK?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateRegion can only add info of range, can't change

@@ -160,6 +163,7 @@ void KVStore::onServiceCommand(const enginepb::CommandRequestBatch & cmds, RaftC
auto [it, ok] = regions.emplace(new_region->id(), new_region);
if (!ok)
{
// REVIEW: do we need to compare the old one and the new one?
Copy link
Contributor

@solotzg solotzg Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because index of the splitted one is definitely init state.

@@ -82,11 +83,13 @@ void KVStore::onSnapshot(RegionPtr new_region, Context * context)

if (new_region->isPendingRemove())
{
// REVIEW: here we remove the region in persister, then below the region is added to persister again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a kind of ABA problem. I have avoid the wrong process about selecting in applySnapshot and MergeTreeDataSelectExecutor::read.
However one region may appear in two store, because rngine may crush during send raft log which contains ChangePeer cmd. It will bring disaster to our learner read process.

@@ -66,6 +66,7 @@ void KVStore::onSnapshot(RegionPtr new_region, Context * context)
{
LOG_DEBUG(log, "KVStore::onSnapshot: previous " << old_region->toString(true) << " ; new " << new_region->toString(true));

// REVIEW: in what case this can happen? rngine crushed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure. just for idempotent.

@zyguan
Copy link
Contributor

zyguan commented Apr 11, 2019

/rebuild

fd = ::open(path.c_str(), flags, 0666);
if (-1 == fd)
{
ProfileEvents::increment(ProfileEvents::FileOpenFailed);
// REVIEW: this seems not right
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bug. Should be
if constexpr (!must_exist)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants