Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mutation support for StorageMemory #15127

Merged

Conversation

ucasfl
Copy link
Collaborator

@ucasfl ucasfl commented Sep 22, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

fix #9117

Detailed description / Documentation draft:

...

By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in docs folder.

If you are doing this for the first time, it's recommended to read the lightweight Contributing to ClickHouse Documentation guide first.

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Sep 22, 2020
@ucasfl ucasfl force-pushed the add-mutation-for-storagememory branch from 322d4dc to ec683fe Compare September 22, 2020 10:42
fix

fix
@ucasfl ucasfl force-pushed the add-mutation-for-storagememory branch from ec683fe to c0b5f84 Compare September 22, 2020 12:10
@Akazz Akazz self-assigned this Sep 22, 2020
@Akazz
Copy link
Contributor

Akazz commented Sep 29, 2020

Hey there, @ucasfl!
Currently StorageMemory uses the following mechanisms for resolving possible conflicts between concurrent operations:

  • SELECTs vs DROP/TRUNCATEs are resolved via table-level locks (lockForShare() vs lockExclusively())
  • SELECTs vs INSERTs and INSERTs vs DROP/TRUNCATEs are resolved via internal mutex

Implementing mutate() method would be very neat and we really appreciate your effort, but what you currently suggest in this PR is just not enough. And that is due to the fact that the current code cannot be easily extended to support efficient resolving possible conflicts of type MUTATEs vs <anything else>. One would either need to copy the whole list of blocks (in other words to implement some sort of versioned storage), or to put in some blocking synchronization with exclusiveLock(), and the latter is much less desirable.

@alexey-milovidov
Copy link
Member

We have discussed it and the variant with copying the list of blocks to readers looks Ok.

@ucasfl
Copy link
Collaborator Author

ucasfl commented Oct 1, 2020

If implement multiversion storage, do we still need table-level locks for storagememory? It doesn't seem to be needed anymore?@alexey-milovidov

@alexey-milovidov
Copy link
Member

If implement multiversion storage, do we still need table-level locks for storagememory? It doesn't seem to be needed anymore?

Yes, there will be no table-level locks inside StorageMemory.
We only need a mutex, under which we will copy the list of blocks.

@ucasfl
Copy link
Collaborator Author

ucasfl commented Oct 4, 2020

Hi, I have added MultiVersion storage in StorageMemory, you can take a look at it if you have time. @Akazz @alexey-milovidov

Copy link
Contributor

@Akazz Akazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to exist a race condition when concurrent INSERTs each try to add new blocks to their own copy of data.

Comment on lines 111 to 113
auto new_data = std::make_unique<BlocksList>(*(storage.data.get()));
new_data->push_back(block);
storage.data.set(std::move(new_data));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is going to be a problem due to a race condition here. Because we do not employ any CAS technique here, data can be lost when concurrent INSERTs each add a block to its own copy of data.

Copy link
Collaborator Author

@ucasfl ucasfl Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in order to solve this problem, we still need a table-level lock? @Akazz

@Akazz Akazz added the st-waiting-for-fix We are waiting for fixes in this issue or pull request label Oct 7, 2020
@ucasfl
Copy link
Collaborator Author

ucasfl commented Oct 8, 2020

If implement multiversion storage, do we still need table-level locks for storagememory? It doesn't seem to be needed anymore?

Yes, there will be no table-level locks inside StorageMemory.
We only need a mutex, under which we will copy the list of blocks.

It seems that we still need a table-level lock to resolve concurrent mutation/insert. @alexey-milovidov

fix
@ucasfl ucasfl force-pushed the add-mutation-for-storagememory branch from f7dab0b to 677787f Compare October 8, 2020 13:37
@alexey-milovidov alexey-milovidov self-assigned this Nov 23, 2020
@alexey-milovidov alexey-milovidov removed the st-waiting-for-fix We are waiting for fixes in this issue or pull request label Nov 23, 2020
data.set(std::move(new_data));
}
}


void StorageMemory::truncate(
const ASTPtr &, const StorageMetadataPtr &, const Context &, TableExclusiveLockHolder &)
{
std::lock_guard lock(mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like mutex is not needed anymore?

@@ -102,7 +106,9 @@ class MemoryBlockOutputStream : public IBlockOutputStream
metadata_snapshot->check(block, true);
{
std::lock_guard lock(storage.mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the mutex also looks unneeded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need it to resolve concurrent insert, mutation, drop...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... Concurrent INSERT and mutation is Ok: mutation will mutate previously inserted data and concurrently inserted data will be left unmutated.

For DROP there is table-level lock: IStorage::lockExclusively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a test for race-condition (you can find similar .sh tests).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have race-condition test 01454_storagememory_data_race_challenge.sh.

@@ -201,17 +208,83 @@ BlockOutputStreamPtr StorageMemory::write(const ASTPtr & /*query*/, const Storag
void StorageMemory::drop()
{
std::lock_guard lock(mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here mutex also looks unneeded.

auto new_data = std::make_unique<BlocksList>(*(data.get()));
auto data_it = new_data->begin();
auto out_it = out.begin();
while (data_it != new_data->end() && out_it != out.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that data has the same number of blocks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify in comments that deletion of whole blocks cannot happen here.

rows += buffer.rows();
bytes += buffer.bytes();
}
data.set(std::make_unique<BlocksList>(out));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move calculation of rows/bytes and assignment out of the branch to avoid code duplication.

void StorageMemory::mutate(const MutationCommands & commands, const Context & context)
{
std::lock_guard lock(mutex);
auto metadata_snapshot = getInMemoryMetadataPtr();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test checking that ALTER ADD/REMOVE column is not supported right now (that is Ok).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a few changes remain.

@alexey-milovidov
Copy link
Member

Alright.

@alexey-milovidov alexey-milovidov merged commit 1cd09fa into ClickHouse:master Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Mutations in storage Memory.
4 participants