Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: clean up buffered state when operators spilled to disk #60022

Closed
yuzefovich opened this issue Feb 9, 2021 · 1 comment · Fixed by #127093
Closed

colexec: clean up buffered state when operators spilled to disk #60022

yuzefovich opened this issue Feb 9, 2021 · 1 comment · Fixed by #127093
Assignees
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Feb 9, 2021

Currently, when buffering in-memory operator exports its buffered tuples in ExportBuffered, we still keep references to all of that buffered state (thus prohibiting it from being GCed) as well as we don't clear the corresponding memory accounts. In all cases except for the unordered distinct, we actually can lose those references and we should update the memory accounting accordingly.

Jira issue: CRDB-3198
Epic: CRDB-34180

@yuzefovich yuzefovich added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 9, 2021
@yuzefovich yuzefovich self-assigned this Feb 9, 2021
@yuzefovich yuzefovich added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Feb 9, 2021
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@yuzefovich yuzefovich removed their assignment May 31, 2022
craig bot pushed a commit that referenced this issue Jul 13, 2022
84229: colexechash: improve memory accounting in the hash table r=yuzefovich a=yuzefovich

This commit improves the memory accounting in the hash table to be more
precise in the case when the `distsql_workmem` limit is exhausted.
Previously, we would allocate large slices first only to perform the
memory accounting after the fact, possibly running out of budget which
would result in a error being thrown. We'd end up in a situation where
the hash table is still referencing larger newly-allocated slices while
only the previous memory usage is accounted for. This commit makes it so
that we account for the needed capacity upfront, then perform the
allocation, and then reconcile the accounting if necessary. This way
we're much more likely to encounter the budget error before making the
large allocations.

Additionally, this commit accounts for some internal slices in the hash
table used only in the hash joiner case.

Also, now both the hash aggregator and the hash joiner eagerly release
references to these internal slices of the hash table when the spilling
to disk occurs (we cannot do the same for the unordered distinct because
there the hash table is actually used after the spilling too).

This required a minor change to the way the unordered distinct spills to
disk. Previously, the memory error could only occur in two spots (and
one of those would leave the hash table in an inconsistent state and we
were "smart" in how we repaired that). However, now the memory error
could occur in more spots (and we could have several different
inconsistent states), so this commit chooses a slight performance
regression of simply rebuilding the hash table from scratch, once, when
the unordered distinct spills to disk.

Addresses: #60022.
Addresses: #64906.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@yuzefovich yuzefovich self-assigned this Nov 26, 2023
@yuzefovich yuzefovich added the E-quick-win Likely to be a quick win for someone experienced. label Nov 26, 2023
@craig craig bot closed this as completed in 8578ca3 Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants