Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CachingHiveMetastore.getPartitionsByNames failing while cache invalidation #12439

Merged
merged 3 commits into from
May 23, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented May 17, 2022

CachingHiveMetastore uses HivePartitionName for keys in partition
cache. HivePartitionName has a peculiar, semi- value-based equality.
HivePartitionName may or may not be missing a name and it matters for
bulk load, but it doesn't matter for a single-partition load. Because
of equality semantics, the cache keys may gets mixed during bulk load.
This happens within EvictableCache.getAll where we compute tokens. We
may get an existing token and normally it's fine, since existing entry
is satisfied without a bulk load. However, if existing token has no data
in the EvictableCache.dataCache (e.g. due to explicit or implicit
invalidation), the loading gets invoked. Thus, a bulk loading may get a
HivePartitionName without a name, which is unexpected (and
unsupported) in the bulk loading.

Fixes #12513

@findepi
Copy link
Member Author

findepi commented May 17, 2022

This is just a test at this moment, not a solution.

My favorite solution would be to kill this hack

private final Optional<String> partitionName; // does not participate in hashCode/equals

private final Optional<String> partitionName; // does not participate in hashCode/equals

since it caused me too much problems already.

cc @sopel39 @losipiuk @electrum

@findepi findepi force-pushed the findepi/fix-hms-cache-equality-race branch from cf16f5d to 0176528 Compare May 18, 2022 10:38
@findepi findepi force-pushed the findepi/fix-hms-cache-equality-race branch 2 times, most recently from b1b8f23 to d6c42e0 Compare May 18, 2022 10:58
@findepi findepi marked this pull request as ready for review May 18, 2022 10:59
@findinpath
Copy link
Contributor

@findepi can you attach to the description of the PR a series of SQL statements that reproduce the scenario?

@findepi
Copy link
Member Author

findepi commented May 18, 2022

@findepi can you attach to the description of the PR a series of SQL statements that reproduce the scenario?

No. It cannot be observed in sequential process, unless there is a very specific timing.
See however TestCachingHiveMetastore, it should convey information about the problematic scenario well.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (just skimmed tests)

@findepi findepi force-pushed the findepi/fix-hms-cache-equality-race branch from c285a97 to 44209e1 Compare May 19, 2022 13:44
@findepi
Copy link
Member Author

findepi commented May 19, 2022

AC

@findepi
Copy link
Member Author

findepi commented May 20, 2022

The change broke load sharing for concurrent single-value loads. Will fix address that.

…alidation

`CachingHiveMetastore` uses `HivePartitionName` for keys in partition
cache.  `HivePartitionName` has a peculiar, semi- value-based equality.
HivePartitionName may or may not be missing a name and it matters for
bulk load, but it doesn't matter for a single-partition load. Because
of equality semantics, the cache keys may gets mixed during bulk load.
This happens within `EvictableCache.getAll` where we compute tokens. We
may get an existing token and normally it's fine, since existing entry
is satisfied without a bulk load. However, if existing token has no data
in the `EvictableCache.dataCache` (e.g. due to explicit or implicit
invalidation), the loading gets invoked. Thus, a bulk loading may get a
`HivePartitionName` without a name, which is unexpected (and
unsupported) in the bulk loading.
@findepi findepi force-pushed the findepi/fix-hms-cache-equality-race branch from df7b4cd to 4a31b9e Compare May 20, 2022 14:29
@findepi
Copy link
Member Author

findepi commented May 20, 2022

The change broke load sharing for concurrent single-value loads. Will fix address that.

Added a test and retracted change in EvictableCache.get.
The EvictableCache.getAll changes are retained, but modified a bit.
For example, the dangling tokens are cleaned up now.

@findepi findepi requested review from losipiuk and alexjo2144 May 20, 2022 14:32
@findepi
Copy link
Member Author

findepi commented May 20, 2022

cc @electrum @martint if you want to indulge yourself with some concurrency related stuff

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation of the problem in the commit message. Would you mind describing, at a high level, the approach to fixing it?

* the test exists primarily to document current state and support discussion, should the current state change.
*/
@Test(timeOut = TEST_TIMEOUT_MILLIS)
public void testConcurrentGetWithCallableShareLoad()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look into jcstress (https://github.com/openjdk/jcstress) for testing concurrency (for later, not as part of this PR).

@findepi
Copy link
Member Author

findepi commented May 20, 2022

Thanks for the detailed explanation of the problem in the commit message. Would you mind describing, at a high level, the approach to fixing it?

Well, the fix is the inverse of the problem -- pass the keys provided to getAll further to loadAll, without "exchanging" them into equal (but not same) keys based on cache state.

@findepi findepi merged commit dacd302 into trinodb:master May 23, 2022
@github-actions github-actions bot added this to the 382 milestone May 23, 2022
@findepi findepi mentioned this pull request May 23, 2022
@findepi findepi deleted the findepi/fix-hms-cache-equality-race branch May 23, 2022 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants