Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1919] Rework a few more elements of MR-related job exec for reuse in Temporal-based execution #3879

Merged
merged 3 commits into from
Feb 21, 2024

Conversation

phet
Copy link
Contributor

@phet phet commented Feb 15, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Continue reworking MR code to enable reuse; continuation of #3784

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

existing unit tests

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@phet phet changed the title Rework a few more elements of MR-related job exec for reuse in Temporal-based execution [GOBBLIN-1919] Rework a few more elements of MR-related job exec for reuse in Temporal-based execution Feb 15, 2024
@phet phet force-pushed the cleanup-mr-gen-wus branch from 6a8a22a to 89145e8 Compare February 21, 2024 18:07
@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2024

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (492fea2) 46.74% compared to head (2dc0155) 46.74%.

Files Patch % Lines
...main/java/org/apache/gobblin/runtime/JobState.java 0.00% 10 Missing ⚠️
...java/org/apache/gobblin/util/JobLauncherUtils.java 0.00% 7 Missing ⚠️
...rg/apache/gobblin/runtime/AbstractJobLauncher.java 69.23% 3 Missing and 1 partial ⚠️
...n/java/org/apache/gobblin/util/ParallelRunner.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3879      +/-   ##
============================================
- Coverage     46.74%   46.74%   -0.01%     
- Complexity    11152    11154       +2     
============================================
  Files          2216     2216              
  Lines         87503    87523      +20     
  Branches       9617     9616       -1     
============================================
+ Hits          40904    40913       +9     
- Misses        42908    42920      +12     
+ Partials       3691     3690       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 90 to 93
public static int readConfigNumParallelRunnerThreads(Properties props) {
return Integer.parseInt(props.getProperty(PARALLEL_RUNNER_THREADS_KEY, Integer.toString(DEFAULT_PARALLEL_RUNNER_THREADS)));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have this config/function when a lot of other classes pass in a different config to control the number of threads in this class? Example: Metadata writer defines their own config for how many registration threads it wants when using this class. Seems to me that this class is just a generic parallel runner class and can be used in a lot of different areas, so it makes less sense for it to have its own dedicated config for number of threads unless it's defining a default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have it as getDefaultParallelRunnerConcurrency ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primary reason I moved it here is that MRJobRunner was referencing two configs both defined in this class.

we can certainly rename. I'd prefer not "default", just because the default value is only used as a fallback, in the case of no specific config.

as for multiple classes all leveraging this utility, while wanting their own setting, I recommend hierarchical naming, a la typesafe Configs -

Sting thisClassConfigPrefix = getClass().getName(); // potential convention...
Config thisClassConfigs = ConfigUtils.getConfig(myConfig, thisClassConfigPrefix, ConfigFactory.empty());
int numThreads = ParallelRunner.getNumThreadsConfig(thisClassConfigs);

at the moment I'd continue using Properties, since my objective was just to consolidate cross-class references... but if we find more widespread use, we could update to take Config instead.

@Will-Lo Will-Lo merged commit 436cf26 into apache:master Feb 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants