Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1826] Change isAssignableFrom() to isSuperTypeOf() per Guava 20 javadocs to… #3688

Merged

Conversation

Will-Lo
Copy link
Contributor

@Will-Lo Will-Lo commented Apr 27, 2023

… fix bug in Hive registration where classes can escape type casts

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    After the Guava 20 upgrade, there is a bug in the Hive registration code.
    When setting table properties, HiveTable calls this function in HiveRegistrationUnit to set Properties
  protected static <T> Optional<T> populateField(State state, String key, TypeToken<T> token) {

Since it returns a type generic Optional, it is possible to assign an incorrect class to a field. So an Optional can actually be holding a String value and causes a Class cast exception.

What was happening is that in Guava 20, isAssignableFrom is deprecated from the TypeToken class. Instead, the following code pattern was used:

      } else if (new TypeToken<Long>() {}.getRawType().isAssignableFrom(token.getClass())) {

However, in this case getRawType() would have returned java.Lang.Long, so it would not be assignable to the token class but rather the templated class. This would cause the code to fall through and make every property casted into an Optional

To avoid all this headache, and to also properly validate the List token type, we should be using .isSupertypeOf() in lieu of isAssignableFrom().

This time also wrote a unit test to validate the runtime behavior of each type.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Unit tests

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

… fix bug in Hive registration where classes can escape type casts
@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2023

Codecov Report

Merging #3688 (6a18c0d) into master (6338910) will increase coverage by 0.58%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #3688      +/-   ##
============================================
+ Coverage     46.97%   47.56%   +0.58%     
+ Complexity    10784     8695    -2089     
============================================
  Files          2138     1727     -411     
  Lines         84040    66319   -17721     
  Branches       9340     7181    -2159     
============================================
- Hits          39480    31546    -7934     
+ Misses        40969    32042    -8927     
+ Partials       3591     2731     -860     
Impacted Files Coverage Δ
.../org/apache/gobblin/hive/HiveRegistrationUnit.java 60.11% <100.00%> (+4.62%) ⬆️

... and 414 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice to see that test... but minor query on the appropriate class to put under test

fieldValue = (Optional<T>) Optional.of(state.getPropAsLong(key));
} else if (new TypeToken<List<String>>() {}.getRawType().isAssignableFrom(token.getClass())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what happened during the recent guava 20.0 bump is that this became

x.isAssignableFrom(y)
// where:
//   Class<List<?>> x
//   Class<TypeToken<?>> y

so it just fell through to Optional.of(state.getProp(key)) on line 137

is that your interpretation as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that's what happened, shouldn't have taken the raw type unless I was also getting the raw type from the incoming token as well.


import org.apache.gobblin.configuration.State;

public class HiveTableTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering... could this not be a test of HiveRegistrationUnit, rather than its HiveTable derived type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I originally had that too, the issue was that the HiveRegistrationUnit Builder is abstract and only implemented in the HiveTable class. I could create a mock or test class but would rather have a test that goes through more of the production code to make sure less things are missed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work tracking this down and testing to demonstrate the fix!

Copy link
Contributor

@umustafi umustafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with finding this small bug and adding a unit test

@Will-Lo Will-Lo merged commit e47aef7 into apache:master Apr 27, 2023
phet added a commit to phet/gobblin that referenced this pull request Aug 15, 2023
* upstream/master:
  [GOBBLIN-1832] Emit warning instead of failing job for retention of Hive Table Views (apache#3695)
  [GOBBLIN-1831] Use flowexecutionid in kafka monitor and jobnames (apache#3694)
  [GOBBLIN-1824]Improving the Efficiency of Work Planning in Manifest-Based DistCp Jobs (apache#3686)
  [GOBBLIN-1829] Fixes bug where the wrong workunit event was being tracked for keepin… (apache#3691)
  [GOBBLIN-1828] Implement Timeout for Creating Writer Functionality (apache#3690)
  [GOBBLIN-1827] Add check that if nested field is optional and has a non-null default… (apache#3689)
  [GOBBLIN-1826] Change isAssignableFrom() to isSuperTypeOf() per Guava 20 javadocs to… (apache#3688)
  [GOBBLIN-1822]Logging Abnormal Helix Task States (apache#3685)
  [GOBBLIN-1819] Log helix workflow information and timeout information during submission wait / polling (apache#3681)
  [GOBBLIN-1821] Let flow execution ID propagate to the Job ID if it exists (apache#3684)
  [GOBBLIN-1810] Support general iceberg catalog (support configurable behavior for metadata retention policy) (apache#3680)
  Add null default value to observability events that are additionally added (apache#3682)
  [GOBBLIN-1816] Add job properties and GaaS instance ID to observability event (apache#3676)
  [GOBBLIN-1785] add MR_JARS_BASE_DIR and logic to delete old mr jar dirs (apache#3642)
  initiliaze yarn clients in yarn app launcher so that a child class can override the yarn client creation logic (apache#3679)
  [GOBBLIN-1811]Fix Iceberg Registration Serialization (apache#3673)
  [GOBBLIN-1817] change some deprecated code and fix minor codestyle (apache#3678)
  [GOBBLIN-1812] Mockito should only be test compile (apache#3674)
  [GOBBLIN-1813] Helix workflows submission timeouts are configurable (apache#3677)
  [GOBBLIN-1810] Support general iceberg catalog in icebergMetadataWriter (apache#3672)
  Refactor yarn app launchers to support extending these classes (apache#3671)
  [GOBBLIN-1808] Bump Guava version from 15.0 to 20.0 (apache#3669)
  [GOBBLIN-1806] Submit dataset summary event post commit and integrate them into GaaSObservabilityEvent (apache#3667)
  [GOBBLIN-1814] Add `MRJobLauncher` configurability for any failing mapper to be fatal to the MR job (apache#3675)
  Add new lookback version finder for use with iceberg retention (apache#3670)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants