-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify the file presense for cached directory lister and retry #20414
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
2 similar comments
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
3b5f2e9
to
abfa031
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
@i-93 could you submit a signed CLA please. |
@raunaqmorarka @electrum .. any idea who could help with review here? |
abfa031
to
6e6aae8
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Stream<TrinoFileStatus> fileStream = paths.stream() | ||
|
||
// If file statuses came from cache verify that all are present | ||
if (isCached) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't feel right to me to ask each time whether the location is cached.
You are adding handling for a corner case in the happy flow this way.
Maybe it would be better to add a procedure to clear the directory listing caching for a specified location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code under this 'if' verifies if all the listed files are present in directory listing. We know that the discrepancy could be caused by the stale cache and in this case there is a way to handle it (invalidate the cache and retry). There is no sense to do it if location is not cached, invalidation is NoOp and retry would provide the same results.
I did add invalidate(Location)
call to directory lister, so the conditional code would work in any case. It is just a performance optimization: avoiding verification and retrying if those are not going to change anything anyway.
|
|
||
// If file statuses came from cache verify that all are present | ||
if (isCached) { | ||
boolean missing = paths.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than fully reloading the whole cache it'd be nice if we could just check any missing paths directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be great. Unfortunately directory lister can't list the files individually, it does it by folders and caches the same way. We are invalidating the cache for a parent folder (not the whole cache!) causing the reloading of it's content.
6e6aae8
to
6ee23db
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
6ee23db
to
ebfa775
Compare
ebfa775
to
2668be9
Compare
@findinpath, @alexjo2144, @electrum Does anybody have any concerns about merging this? It has been tested in our preprod environment, addressed the #20344 and didn't show any side effects. |
9392592
to
954817f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay on this. Generally this seems fine to me, one small nit pick on the interface changes.
Besides that, have you tried adding a test? There are some existing ones in the product test suite that use symlink tables.
.anyMatch(path -> !fileStatuses.containsKey(path.path())); | ||
// Invalidate the cache and reload | ||
if (missing) { | ||
directoryLister.invalidate(location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general approach here is fine, I would suggest changing the interfaces in a slightly different way though.
Exposing isCached
via TableInvalidationCallback seems fine to me.
What I'd do differently is rather than exposing invalidate(Location)
can we try adding an additional parameter to the HiveFileIterator
cnstr, something like boolean invalidateCaches
. That will get passed through DirectoryLister#listFilesRecursively
to force a hard load when set to true?
My reason for that is you're assuming here that the cache key is on a Location, but that's an internal to the caching DirectoryListers that's not guaranteed to be stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alexjo2144.
I looked into your proposal to remove invalidate(Location) from the interface. That doesn't look right for me. Directory listers are chained with the delegate model, some of them are caching (by Location) ones, some are not (they have a Noop invalidate). If we remove invalidate from the interface we won't be able to push it down the chain.
We only call invalidate(location) if isCached(location) is true, so that in a way verifies that the particular directory lister supports cache by location.
What do you think?
0355a13
to
74ff57f
Compare
@alexjo2144 I have added a test case that fails on a stock code, but passes on the modified one. |
@alexjo2144, @findinpath, @electrum, @raunaqmorarka |
74ff57f
to
d87cf74
Compare
Description
These changes address the problem when the new files in manifest are not visible by
directoryLister
immediately because of the caching delay (see #20344).The
buildManifestFileIterator()
method now verifies if the referenced file does not appear in the listing. If that is the case it tries to invalidate the cache and reload the listing.Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(*) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: