Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++/python] Use a shared threadpool for the reindexer #2148

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

beroy
Copy link
Collaborator

@beroy beroy commented Feb 15, 2024

Issue and/or context: #2149

Changes:

Notes for Reviewer:

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Merging #2148 (d068793) into main (84f0815) will decrease coverage by 9.36%.
The diff coverage is 75.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2148      +/-   ##
==========================================
- Coverage   81.37%   72.01%   -9.36%     
==========================================
  Files          84      103      +19     
  Lines        6335     6872     +537     
  Branches      213      214       +1     
==========================================
- Hits         5155     4949     -206     
- Misses       1081     1824     +743     
  Partials       99       99              
Flag Coverage Δ
libtiledbsoma 67.43% <75.86%> (-0.70%) ⬇️
python ?
r 74.68% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api ∅ <ø> (∅)
libtiledbsoma 48.76% <50.00%> (-0.91%) ⬇️

Copy link
Contributor

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is going to have one single, global thread pool, across contexts and everything else. Is this what we actually want?

Comment on lines 136 to 145
if soma_indexer_concurrency is not None:
clib.IntIndexer.start_thread_pool(soma_indexer_concurrency)
elif tiledb_ctx is not None:
tdb_concurrency = int(
tiledb_ctx.config().get("sm.compute_concurrency_level", 10)
)
thread_count = max(1, tdb_concurrency // 2)
clib.IntIndexer.start_thread_pool(thread_count)
else:
clib.IntIndexer.start_thread_pool(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and __del__ lead to several race conditions—if I have two threads, and an indexer running in one, then starting a second one will lead to the following:

  1. Start indexer A. It starts the global thread pool.
  2. Indexer A runs.
  3. Start indexer B. It starts a new global thread pool, thus deleting the old one.
  4. Indexer B runs. (What happens to the threads of indexer A that are still running under the old pool?)
  5. Indexer A completes and is garbage collected (assuming some other problem does not arise). It deletes the thread pool entirely in __del__.
  6. Indexer B cannot continue to run. (Does it crash? Just freeze?)

Copy link
Collaborator Author

@beroy beroy Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thetorpedodog thanks for the comments. Thinking about them, some clarifications:

  1. The thread pool lifetime is tied to creation and deletion of soma context and not the indexer
  2. The thread pool currently supports parallel indexers with no race issues assuming they are both attached to the same context (it uses indexers as keys when running lookups). I can add unittests for parallel lookups and constructions to show that.
  3. I believe the only situation that can lead to the race condition you mentioned is when we have two parallel indexers each with a different context. In that case the scenario you mentioned can take place

If you think (3) is likely, I can change the design to use one concurrent hashmap, relating each soma context object to its corresponding threadpool (context_map). Here are the steps:

  1. When a context starts, it creates new threadpool and puts <context, threadpool> into the context_map
  2. When a new indexer is initialized (map location), it creates its hash table and add using the context it looksup its threadpool
  3. When looking up values, the indexer uses its threadpool to submit its lookup tasks.
  4. When the context get deleted, it kills the corresponding threadpool

Please let me know what you think.

Copy link
Member

@nguyenv nguyenv Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already discussed in DMs but x-posting here: Instead of a global map, we could add tiledb_threadpool as a property of TileDBSOMAContext. It could also be lazy loaded just like how we do with TileDBSOMAContext.tiledb_ctx. Unless I'm misunderstanding something below, there isn't already a preexisting threadpool that can be directly accessed via tiledb::Context and has to be instantiated within TileDBSOMAContext in Python or the upcoming SOMAContext in C++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the start of a thread pool in somacore (threadpool member on the ContextBase protocol), but it seems to have not made it fully to TileDB-SOMA yet? https://github.com/single-cell-data/SOMA/blob/76b2d2bbca7c8baf543b5cc9cf71fc5c9138a39a/python-spec/src/somacore/types.py#L87

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIC - that makes more sense now. Yeah SOMATIleDBContext doesn't inherit from soma.ContextBase yet, but that'll be easy to add and then we can start utilizing the context's threadpool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, for the time being I'm gonna implement the design with the thread on top of the C++ Soma context.

Comment on lines 49 to 62
std::shared_ptr<tiledbsoma::ThreadPool>
IntIndexer::tiledb_thread_pool_ = nullptr;

void IntIndexer::start_thread_pool(int thread_count) {
tiledb_thread_pool_ = std::make_unique<tiledbsoma::ThreadPool>(
thread_count);
LOG_DEBUG(fmt::format(
"[Re-indexer] Thread pool started with {} threads", thread_count));
}

void IntIndexer::stop_thread_pool() {
tiledb_thread_pool_.reset();
LOG_DEBUG("[Re-indexer] Thread stopped");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the problems noted above, the way this thread pool is accessed globally would need lock protection—otherwise, the same pointer can be racily accessed and reset across multiple threads with no guarantee of atomicity, leading to the dreaded Undefined Behavior.

Copy link
Collaborator Author

@beroy beroy Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on implementing the design I mentioned as add on this PR @thetorpedodog @bkmartinjr

@bkmartinjr
Copy link
Member

This looks like it is going to have one single, global thread pool, across contexts and everything else. Is this what we actually want?

I think it would be better to have a per-SOMATileDBContext thread pool, following the tiledb.Ctx design pattern.

Open question: is the context passed to the indexer init and bound at indexer create time or passed to get_indexer. For Pandas compat, I prefer the former

@beroy
Copy link
Collaborator Author

beroy commented Feb 16, 2024

@bkmartinjr, this PR assumes there is only one SOMATileDBContext existing at any given point. In other words, we support parallel indexers as long as there is no overlapping SOMATileDBContext. Does this assumption make sense? And wrt your question, the context is not bound to get_indexer.

@thetorpedodog
Copy link
Contributor

@bkmartinjr, this PR assumes there is only one SOMATileDBContext existing at any given point. In other words, we support parallel indexers as long as there is no overlapping SOMATileDBContext. Does this assumption make sense? And wrt your question, the context is not bound to get_indexer.

No—the idea of the SOMATileDBContext is that multiple contexts may exist which have access to different sets of resources, etc., within the same process. Having implicit global state very much contravenes that design.

Since it’s possible to access Python objects from within C++ code, it seems like the indexer should be able to run its tasks using the thread pool that already exists within the context. One way to do this might be:

  1. Expose the function we run within the thread pool to Python as a private (i.e., _-prefixed) member of the IntIndexer (something like IntIndexer._lookup_internal(...), maybe?). (This might not be needed if you can easily pass an inner function as a Callable from C++ to Python; I am not sure how advanced pybind11 is.)
  2. Where the thread task is run, instead of using the internal threadpool, get the threadpool from the SOMATileDBContext and call soma_thread_pool.submit(our_task, params, as, needed).
  3. Join on the Future returned by the threadpool and examine the result.

@bkmartinjr
Copy link
Member

Apologies for slow response - out of pocket today. I agree with Paul's summary of the design goal:

idea of the SOMATileDBContext is that multiple contexts may exist which have access to different sets of resources

@beroy
Copy link
Collaborator Author

beroy commented Feb 20, 2024

@thetorpedodog and @bkmartinjr: @nguyenv is working on a way for the context threadpool to be directly accessible by the C++. So with that in place, on the C++ side, when creating an indexer I just need to grab the the tiledb context threadpool and keep it in the indexer object to be used at the lookup time. This would require minimum changes to the indexer as well.

@nguyenv
Copy link
Member

nguyenv commented Feb 21, 2024

@thetorpedodog and @bkmartinjr: @nguyenv is working on a way for the context threadpool to be directly accessible by the C++. So with that in place, on the C++ side, when creating an indexer I just need to grab the the tiledb context threadpool and keep it in the indexer object to be used at the lookup time. This would require minimum changes to the indexer as well.

I think I need clarification. Currently, the there is a unique TheadPool created for each IntIndexer? And we want to modify this so that we create a ThreadPool for each SOMATileDBContext? And the IntIndexer uses the threadpool associated with the context its been given?

@bkmartinjr
Copy link
Member

clarification

Yes, I believe you have it correct. Goal is a thread pool associated with each context, and indexer lookups occur within that context (much like queries, etc), submitting to the context's threadpool.

@beroy
Copy link
Collaborator Author

beroy commented Feb 21, 2024

Now with @nguyenv work on sharing context between C++ and python, I think I can just achieve this goal by adding threadpools to C++ TileDBSOMAContext so each indexer keeps track of its own context + threadpool (lazily create the pool when lookup is needed for the first time)

@beroy beroy force-pushed the shared_threadpool_indexer branch 2 times, most recently from 980ece9 to af06b25 Compare February 22, 2024 21:01
@beroy
Copy link
Collaborator Author

beroy commented Feb 22, 2024

@bkmartinjr, @thetorpedodog and @nguyenv I update the PR with rebasing to @nguyenv C++ context and used that lazily create threadpool.

@bkmartinjr
Copy link
Member

@beroy - can I suggest you base this PR on the shared_threadpool_indexer branch so your changes are more evident?

@beroy beroy changed the base branch from main to viviannguyen/reduce-thread-count February 23, 2024 01:29
Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary request is for the handling of the IntIndexer.context_ state -- it should be set during construction, and only accessed read-only by map_locations and lookup. The current code which sets it in map_locations will potentially race if we have multiple threads calling map_locations simultaneously (I know this is currently unsupported API use, but it the change will protect us if we ever allow that)

@beroy beroy force-pushed the shared_threadpool_indexer branch 3 times, most recently from 9d4aaf3 to 4e1cbcb Compare February 26, 2024 00:20
Comment on lines 75 to 81
std::shared_ptr<ThreadPool>& thread_pool() {
return thread_pool_;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer to see create_thread_pool get called in this SOMAContext class rather than in the IntIndexer. As mentioned above by @bkmartinjr, we'll also want this to be thread-safe - I would probably look into using std::unique_lock.

context->create_thread_pool();
thread_pool_ = context->thread_pool();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call create_thread_pool in here. See other comment.

@beroy beroy force-pushed the shared_threadpool_indexer branch 2 times, most recently from ac51913 to 79a386e Compare February 29, 2024 18:08
@johnkerl
Copy link
Member

[sc-42198]

Copy link

This pull request has been linked to Shortcut Story #42198: tiledbsoma 1.7.3.

@beroy
Copy link
Collaborator Author

beroy commented Feb 29, 2024

@nguyenv and @bkmartinjr I updated the PR based on your last set of reviews. Will appreciate feedback.

Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary remaining question is around lazy-creation of the thread pool. It looks like @nguyenv recommended this, and it makes sense to me. As I read the code, it looks like that was not completed?

@beroy beroy force-pushed the shared_threadpool_indexer branch 3 times, most recently from c6a77c6 to 2486537 Compare March 1, 2024 01:07
Each C++ SOMAContext has it's own lazily created thread pool
@beroy beroy force-pushed the shared_threadpool_indexer branch from 2486537 to d068793 Compare March 1, 2024 04:27
@beroy beroy merged commit a8da3a8 into main Mar 1, 2024
21 checks passed
@beroy beroy deleted the shared_threadpool_indexer branch March 1, 2024 05:57
Copy link

github-actions bot commented Mar 1, 2024

The backport to release-1.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-1.7 release-1.7
# Navigate to the new working tree
cd .worktrees/backport-release-1.7
# Create a new branch
git switch --create backport-2148-to-release-1.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 a8da3a8a735307a730a8384d6a7e5ca1030a7f73
# Push it to GitHub
git push --set-upstream origin backport-2148-to-release-1.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-1.7

Then, create a pull request where the base branch is release-1.7 and the compare/head branch is backport-2148-to-release-1.7.

Copy link

github-actions bot commented Mar 1, 2024

The backport to release-1.8 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-1.8 release-1.8
# Navigate to the new working tree
cd .worktrees/backport-release-1.8
# Create a new branch
git switch --create backport-2148-to-release-1.8
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 a8da3a8a735307a730a8384d6a7e5ca1030a7f73
# Push it to GitHub
git push --set-upstream origin backport-2148-to-release-1.8
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-1.8

Then, create a pull request where the base branch is release-1.8 and the compare/head branch is backport-2148-to-release-1.8.


SOMAContext(std::map<std::string, std::string> platform_config)
: ctx_(std::make_shared<Context>(Config(platform_config))){};
: ctx_(std::make_shared<Context>(Config(platform_config)))
, thread_pool_mutex_(){};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does thread_pool_mutex_() need to be explicitly constructed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure it is necessary.

Comment on lines +74 to +81
std::shared_ptr<ThreadPool>& thread_pool() {
const std::lock_guard<std::mutex> lock(thread_pool_mutex_);
// The first thread that gets here will create the context thread pool
if (thread_pool_ == nullptr) {
create_thread_pool();
}
return thread_pool_;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to have just a declaration here and for the implementation to actually go in soma_context.cc.

return thread_pool_;
}

void create_thread_pool();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be in the header file.

Copy link
Collaborator Author

@beroy beroy Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? It is just a declaration and not global or static.

Comment on lines 76 to +78
IntIndexer(){};
/**
* Constructor with the same arguments as map_locations
*/
IntIndexer(const int64_t* keys, int size, int threads);
IntIndexer(const std::vector<int64_t>& keys, int threads)
: IntIndexer(keys.data(), keys.size(), threads) {
IntIndexer(std::shared_ptr<tiledbsoma::SOMAContext> context) {
context_ = context;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should always have a context.

    IntIndexer() : context_(make_shared<SOMAContext>()){}; // or IntIndexer() = delete;?
    IntIndexer(std::shared_ptr<tiledbsoma::SOMAContext> context) : context_(context) {};

Copy link
Collaborator Author

@beroy beroy Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't create a new context in IntIndexer agree with other suggestions. Also we must have the default constructor needed by pybind.

Comment on lines +86 to +87
if (context_ == nullptr || context_->thread_pool() == nullptr ||
context_->thread_pool()->concurrency_level() == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just check context_->thread_pool()->concurrency_level() == 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point but in cases like this, I really prefer to be ultra conservative

@johnkerl
Copy link
Member

johnkerl commented Mar 1, 2024

Backports failed, again :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants