Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Drop ParallelIterable's queue low water mark #10978

Conversation

findepi
Copy link
Member

@findepi findepi commented Aug 20, 2024

As part of the change in commit
7831a8d, queue low water mark was introduced. However, it resulted in increased number of manifests being read when planning LIMIT queries in Trino Iceberg connector. To avoid increased I/O, back out the change for now.

As part of the change in commit
7831a8d, queue low water mark was
introduced. However, it resulted in increased number of manifests being
read when planning LIMIT queries in Trino Iceberg connector. To avoid
increased I/O, back out the change for now.
@github-actions github-actions bot added the core label Aug 20, 2024
@findepi findepi changed the title Drop ParallelIterable's queue low water mark Core: Drop ParallelIterable's queue low water mark Aug 20, 2024
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this reverts to the original behavior 👍

@findepi findepi merged commit bcb3281 into apache:main Aug 21, 2024
46 checks passed
@findepi findepi deleted the findepi/drop-paralleliterable-s-queue-low-water-mark-1fd7b3 branch August 21, 2024 22:05
szehon-ho pushed a commit to szehon-ho/iceberg that referenced this pull request Sep 16, 2024
* Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close (apache#9402)

(cherry picked from commit d3cb1b6)

* Core: Limit ParallelIterable memory consumption by yielding in tasks (apache#10691)

ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for
processing input iterables. This defaults to 2 * # CPU cores.  When one
or some of the input iterables are considerable in size and the
ParallelIterable consumer is not quick enough, this could result in
unbounded allocation inside `ParallelIterator.queue`. This commit bounds
the queue. When queue is full, the tasks yield and get removed from the
executor. They are resumed when consumer catches up.

(cherry picked from commit 7831a8d)

* Drop ParallelIterable's queue low water mark (apache#10978)

As part of the change in commit
7831a8d, queue low water mark was
introduced. However, it resulted in increased number of manifests being
read when planning LIMIT queries in Trino Iceberg connector. To avoid
increased I/O, back out the change for now.

(cherry picked from commit bcb3281)

---------

Co-authored-by: Helt <heltman@qq.com>
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants