-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-1919] Rework a few more elements of MR-related job exec for reuse in Temporal-based execution #3879
Conversation
6a8a22a
to
89145e8
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #3879 +/- ##
============================================
- Coverage 46.74% 46.74% -0.01%
- Complexity 11152 11154 +2
============================================
Files 2216 2216
Lines 87503 87523 +20
Branches 9617 9616 -1
============================================
+ Hits 40904 40913 +9
- Misses 42908 42920 +12
+ Partials 3691 3690 -1 ☔ View full report in Codecov by Sentry. |
public static int readConfigNumParallelRunnerThreads(Properties props) { | ||
return Integer.parseInt(props.getProperty(PARALLEL_RUNNER_THREADS_KEY, Integer.toString(DEFAULT_PARALLEL_RUNNER_THREADS))); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have this config/function when a lot of other classes pass in a different config to control the number of threads in this class? Example: Metadata writer defines their own config for how many registration threads it wants when using this class. Seems to me that this class is just a generic parallel runner class and can be used in a lot of different areas, so it makes less sense for it to have its own dedicated config for number of threads unless it's defining a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have it as getDefaultParallelRunnerConcurrency
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the primary reason I moved it here is that MRJobRunner
was referencing two configs both defined in this class.
we can certainly rename. I'd prefer not "default", just because the default value is only used as a fallback, in the case of no specific config.
as for multiple classes all leveraging this utility, while wanting their own setting, I recommend hierarchical naming, a la typesafe Config
s -
Sting thisClassConfigPrefix = getClass().getName(); // potential convention...
Config thisClassConfigs = ConfigUtils.getConfig(myConfig, thisClassConfigPrefix, ConfigFactory.empty());
int numThreads = ParallelRunner.getNumThreadsConfig(thisClassConfigs);
at the moment I'd continue using Properties
, since my objective was just to consolidate cross-class references... but if we find more widespread use, we could update to take Config
instead.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Continue reworking MR code to enable reuse; continuation of #3784
Tests
existing unit tests
Commits