Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds some missing FATE functionality #5263

Merged

Conversation

kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Jan 15, 2025

Adds some missing FATE functionality

  • Previously, if a user configured fewer FATE threads to work on transactions (via MANAGER_FATE_THREADPOOL_SIZE property), the pool size would not actually be decreased. These changes safely stop excess workers in the case where the property value is decreased.
  • Added test FatePoolResizeIT (tests both META and USER transactions: MetaFatePoolResizeIT and UserFatePoolResizeIT) to ensure the pool size is correctly increased and decreased with configuration changes.
  • Fate.shutdown() was not waiting on termination of the fate pool watcher when needed. Added the missing wait.

Another potential solution to this problem is disallowing decreasing the pool size, but I'm not sure if that's the way we want to go about this.

Confirmation/changes to the thread safety of the new code would be appreciated. I would also be curious if someone had a better idea of how we can go about stopping the transaction runners. Currently just try to stop a random one each second.

This is another prereq for #5130.
This also addresses the test mentioned in #5171.

- Previously, if a user configured fewer FATE threads to work on transactions (via MANAGER_FATE_THREADPOOL_SIZE property), the pool size would not actually be decreased. These changes safely stop excess workers in the case where the property value is decreased.
- Added test FatePoolResizeIT (tests both META and USER transactions: MetaFatePoolResizeIT and UserFatePoolResizeIT) to ensure the pool size is correctly increased and decreased with configuration changes.
- Fate.shutdown() was not waiting on termination of the fate pool watcher when needed. Added the missing wait.
@kevinrr888 kevinrr888 self-assigned this Jan 15, 2025
@kevinrr888 kevinrr888 added this to the 4.0.0 milestone Jan 15, 2025
core/src/main/java/org/apache/accumulo/core/fate/Fate.java Outdated Show resolved Hide resolved
final int configured = conf.getCount(Property.MANAGER_FATE_THREADPOOL_SIZE);
final int needed = configured - pool.getActiveCount();
final int needed = configured - transactionExecutor.getActiveCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAybe instead of using the active count could use the size of the list and do cleanup on it beforehand if the exitting was added.

Suggested change
final int needed = configured - transactionExecutor.getActiveCount();
// remove any threads the have exited
runningTxRunners.removeIf(tr->tr.isExited());
final int needed = configured - runningTxRunners.size();

Also wondering about creating a cached thread pool (see Executors.newCachedThreadPool() ) and not bothering resizing the pool, just add and stop stuff to the pool and it will create threads as needed. That may be too much a change for this PR though, but it would help simplify the code by removing the amount things that neede be kept in synch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good idea.
Re the cached thread pool: this may be a good solution (I'm not sure about the implications of this yet), however would require quite a bit of refactoring, and wasn't the original intention of this PR. If that is a route we want to take, we could close this PR. This would also mean #5130 changes would be different. I assume each set of fate ops may have their own cached thread pool. Or this may make 5130 completely obsolete: all fate ops would just share one cached thread pool.

Copy link
Contributor

@keith-turner keith-turner Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely want to keep this PR, using a cached thread pool would not solve the problems this PR is solving. Using the cached thread pool would be complimentary to theses changes and may make the overall code simpler, but not sure. Was thinking we could only rely on the new runningTxRunners list added in this PR and not have to worry about resizing the thread pool if using a cached thread pool. Don't worry about using it in this PR.

The cached thread pool could simplify #5130 also as you mentioned. I think the changes in this PR would still be built on though on #5130.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was misinterpreting the impact this change would have. This probably wouldn't be much refactoring. May be as simple as changing it to creating the cached pool instead of a fixed pool and deleting the ThreadPools.resizePool(pool, conf, Property.MANAGER_FATE_THREADPOOL_SIZE); call in the pool watcher task. In that case, it could be changed in this PR. I may be overlooking something though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I was uncertain about w/ using a cached thread pool was how that would work w/ our utility code for creating thread pool. If large changes are required for the utility code then would not want to make that change in the PR. If not then it would probably be good to make the change in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will see about changing the thread pools util code to support this change.

@keith-turner
Copy link
Contributor

Another potential solution to this problem is disallowing decreasing the pool size, but I'm not sure if that's the way we want to go about this.

I agree that is not a good way to go. This is a good find definitely want to be able to adjust the number of thread up and down.

kevinrr888 and others added 4 commits January 17, 2025 11:12
- Implemented better way to stop excess runners
	- This allowed code to be simpified: was able to decouple exited and stop, no longer wait for a runner to stop

Co-authored-by: Keith Turner <kturner@apache.org>
new tests were calling a method which has since been moved in main
- removed 'exited' which is no longer used in TransactionRunner
- runningTxRunners from arraylist -> hashset
@kevinrr888
Copy link
Member Author

Since this is approved, I'll merge this in. Regarding the cached thread pool, I'll make note about seeing about changing this for #5130 or maybe as another small PR

@kevinrr888 kevinrr888 merged commit ee81a3c into apache:main Jan 21, 2025
8 checks passed
@kevinrr888 kevinrr888 deleted the 4.0-feature-fate-decrease-tx-runners branch January 21, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants