Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limits memory and cpu used by compaction reservation request #5185

Merged
merged 3 commits into from
Dec 15, 2024

Conversation

keith-turner
Copy link
Contributor

Added threads pools to execute compaction reservation request in order to limit memory and cpu used by executing reservations. Request queued up for the pool could still potentially use a lot of memory. Did two things to control memory of things in the queue. First only allow a compactor process to have one reservation processing at time. Second made the data related to a resevation request a soft reference which should allow it be garbage collected if memory gets low while it sitting in the queue. Once the request starts executing it obtains a strong refrence to the data so it can no longer be garbage collected.

fixes #5177

Added threads pools to execute compaction reservation request in order
to limit memory and cpu used by executing reservations.  Request queued up
for the pool could still potentially use a lot of memory.  Did two
things to control memory of things in the queue.  First only allow a
compactor process to have one reservation processing at time.  Second
made the data related to a resevation request a soft reference which
should allow it be garbage collected if memory gets low while it sitting
in the queue.  Once the request starts executing it obtains a strong
refrence to the data so it can no longer be garbage collected.

fixes apache#5177
@keith-turner keith-turner added this to the 4.0.0 milestone Dec 14, 2024
protected CompactionMetadata reserveCompaction(CompactionJobQueues.MetaJob metaJob,
String compactorAddress, ExternalCompactionId externalCompactionId) {

if (activeCompactorReservationRequest.contains(compactorAddress)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to think if there would be a race condition here if the thread pool was configured to be greater than 1 to handle multiple requests. I don't think there is because the same compactor should not be sending in multiple requests at the exact same time, so if there was multiple attempts and that leads to one of the Preconditions.checkState() checks on the add/remove failing then that is probably a good thing as it means there is an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if something really unusual is going on that Precondition check would catch it. Under normal case of compactor retrying the first check will catch it. It would probably be good add some info to the preconditions check in case it does fire.

// Use a soft reference for this in case free memory gets low while this is sitting in the queue
// waiting to process. This object can contain the tablets list of files and if there are lots
// of tablet with lots of files then that could start to cause memory problems.
private final SoftReference<CompactionJobQueues.MetaJob> metaJobRef;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat way to help with the memory issue. I haven't looked but I am curious if there's some way to get statistics from the JVM to find out how often this reference is being freed that could be useful to monitor this. It may not be necessary though as the just watching total memory usage is probably good enough as this reference should only get cleared when low in memory.

Copy link
Contributor Author

@keith-turner keith-turner Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was trying to deal with a problem I have been seeing while testing using the FlakyBulkBatchWriter in accumulo-testing that will sometimes submit a large numbers of files for a tablet. This can cause lots of memory pressure on the manager when lots of tablet have lots of files and the manager is trying to keep a lot of that in memory. For this case its really hard to reason about what will be on the thread pool queue and how much memory it is using. So decided to use a soft refernce for now, but I am uncertain about it. Opened #5188 and added comment pointing to that issue in this PR. I think if #5188 were implemented that this soft reference could be removed. Would still want to restrict compactors from having multiple request queued/running. I think if compactors can only have one thing queued for reservation and the queued data is only compaction jobs and no tablet metadata, then the memory usage would be unlikely to cause a problem.

@keith-turner keith-turner merged commit 632dbc2 into apache:main Dec 15, 2024
8 checks passed
@keith-turner keith-turner deleted the accumulo-5177 branch December 15, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unlimited threads for compaction reservation can oversubscribe manager memory
2 participants