Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure caches are not used unsafely #10691

Merged
merged 8 commits into from
Jan 21, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Jan 19, 2022

Guava's Cache and LoadingCache have concurrency issues around
invalidation and ongoing loads.

Ensure that

  • code uses EvictableCache or EvictableLoadingCache which fix the
    probem, or
  • code uses safety wrappers, NonEvictableCache,
    NonEvictableLoadingCache, which fail when unsafe invalidation is
    called. Additionally, the interfaces have the unimplemented methods
    marked as @Deprecated, to signal the problem as early as possible.

Follows #10512

@findepi findepi requested review from losipiuk and kokosing January 19, 2022 14:12
@cla-bot cla-bot bot added the cla-signed label Jan 19, 2022
@findepi findepi force-pushed the findepi/protect-caches branch from 1ea184f to 6eb77b1 Compare January 19, 2022 17:06
@@ -64,21 +61,12 @@
private final BigQuery bigQuery;
private final ViewMaterializationCache materializationCache;
private final boolean caseInsensitiveNameMatching;
private final Cache<String, Optional<RemoteDatabaseObject>> remoteDatasets;
private final Cache<TableId, Optional<RemoteDatabaseObject>> remoteTables;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hashhar see Remove unused caches in BigQueryClient.

Copy link
Member

@hashhar hashhar Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

Indeed. It has been present since the initial commit. I compute a fresh mapping near // explicitly cache the information if the requested dataset doesn't exist but don't put it into the cache at all.

I'd prefer fixing this instead of removing it though (in separate PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer fixing this instead of removing it though (in separate PR).

understood, except that i didn't want to spend time fixing this particular one.

i can drop the change. note that this is not the only problem, probably -- what about cache invalidation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused caches in BigQueryClient dropped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantatspothero is both fixing this to actually populate the cache and converting to use the EvictableCache to prevent leaking keys that are already being loaded.

Note that we don't have a procedure to clear the entire cache so it's a better situation than the JDBC connectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the cache for now. We ran into some isssues - bulk loading not playing well with loading caches, no invalidation logic when we make changes via Trino etc.

@findepi Can you restore your commit? Sorry for the back and forth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ran into some isssues - bulk loading not playing well with loading caches, no invalidation logic when we make changes via Trino etc.

@hashhar @grantatspothero
bulk list + cache.put is inherently not playing well with invalidation

in JDBC, we 'solved' this problem by not having invalidation (CachingIdentifierMapping)
cc @kokosing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you restore your commit?

#10725

@@ -496,6 +497,8 @@ public void removeOldTasks()
try {
DateTime endTime = taskInfo.getStats().getEndTime();
if (endTime != null && endTime.isBefore(oldestAllowedTask)) {
// The removal here is concurrency safe with respect to any concurrent loads: the cache has no expiration,
// the taskId is in the cache, so there mustn't be an ongoing load.
tasks.asMap().remove(taskId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that make sense to assert that task was actually removed to follow what comment above says.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but i don't feel comfortable adding assertions in this class. do you want to address this as a followup?

@grantatspothero
Copy link
Contributor

grantatspothero commented Jan 20, 2022

@findepi a pattern that often comes up with the connectors is we want a cache that also keeps track of negative cache entries.

For example, in the bigquery connector we can only bulk list tables in the bigquery API to populate the table name cache, so if someone requests reading a table that does not exist we have to list all the tables in the schema. It would be nice to keep track of this nonexistent table so we do not keep hitting the bigquery API.
#10697

Is there a cache implementation we should be using for bulk loads that handles concurrency and invalidation of a negative cache without races?

@findepi findepi force-pushed the findepi/protect-caches branch from 17fa8e1 to 194cdef Compare January 21, 2022 08:50
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@findepi findepi force-pushed the findepi/protect-caches branch from 194cdef to 2b2c3b4 Compare January 21, 2022 11:20
@findepi
Copy link
Member Author

findepi commented Jan 21, 2022

@grantatspothero

For example, in the bigquery connector we can only bulk list tables in the bigquery API to populate the table name cache

bulk list is not bulk load

you don't ask a cache for multiple potentially existing entries at the same time

Is there a cache implementation we should be using for bulk loads that handles concurrency and invalidation of a negative cache without races?

EvictableLoadingCache is supposed to be that (is intended to support bulk loads, invalidation, as used in CachingHiveMetastore for 2 caches).

not sure about "negative cache". from cache perspective, there is no such thing, and "negativeness" is handled by using a marker, right?

let's continue BigQuery caching needs under BigQuery caching PR

@findepi findepi requested review from losipiuk and hashhar January 21, 2022 11:29
@findepi
Copy link
Member Author

findepi commented Jan 21, 2022

let's continue BigQuery caching needs under BigQuery caching PR

oh, i thought there is a PR. Let's talk under the issue #10697

They were never written to, so always empty.

This also removes `bigquery.case-insensitive-name-matching.cache-ttl`
config property, which was accepted, but didn't do anything.
The cache has no `refreshAfterWrite`, so `asyncReloading` does not
change anything.
`UniformNodeSelectorFactory` and `TopologyAwareNodeSelectorFactory`
attempt to warn only once per 30 seconds about a node being
inaccessible. Fix a race around that which allowed multiple threads to
issue a warning at once.
The `.getIfPresent()  compute .put()` can be replaced with a `.get(key,
loader)`.
The cache has no expiration, so it's equivalent to `ConcurrentHashMap`
with `Map.computeIfAbsent`.
Guava's `Cache` and `LoadingCache` have concurrency issues around
invalidation and ongoing loads.

Ensure that
- code uses `EvictableCache` or `EvictableLoadingCache` which fix the
  probem, or
- code uses safety wrappers, `NonEvictableCache`,
  `NonEvictableLoadingCache`, which fail when unsafe invalidation is
  called. Additionally, the interfaces have the unimplemented methods
  marked as `@Deprecated`, to signal the problem as early as possible.
@findepi findepi force-pushed the findepi/protect-caches branch from 2b2c3b4 to 9c5c238 Compare January 21, 2022 12:28
@findepi
Copy link
Member Author

findepi commented Jan 21, 2022

rebased on top of #10725

@github-actions github-actions bot added the docs label Jan 21, 2022
@findepi findepi merged commit 0747a37 into trinodb:master Jan 21, 2022
@findepi findepi deleted the findepi/protect-caches branch January 21, 2022 15:54
@github-actions github-actions bot added this to the 369 milestone Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants