Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/vsis3/ Using cached response when creds changed #2294

Closed
glostis opened this issue Mar 4, 2020 · 2 comments
Closed

/vsis3/ Using cached response when creds changed #2294

glostis opened this issue Mar 4, 2020 · 2 comments

Comments

@glostis
Copy link
Contributor

glostis commented Mar 4, 2020

Expected behavior and actual behavior.

This issue was first reported at rasterio/rasterio#1877, but after closer investigation the problem comes from GDAL and not rasterio.

When trying to access a file via /vsis3/, if a first open gives an error, due to incorrect credentials for example, all subsequent open on the same file will give keep giving errors, even if the credentials are changed.

The problem disappears if CPL_VSIL_CURL_NON_CACHED=/vsis3/ is set.
This leads me to think that this is due to GDAL's VSI caching, which serves a cached response even though the headers of the request have changed. If possible, the cache should be invalidated for that file whenever it is detected that the headers (AWS credentials or AWS region name for example) have changed.

Steps to reproduce the problem.

Write the following python snippet to a directory as snippet.py:

import sys

from osgeo import gdal

path = "/vsis3/landsat-pds/L8/210/124/LC82101242013325LGN00/LC82101242013325LGN00_B1.TIF"
for key in sys.argv[1:]:
    print("Setting AWS_ACCESS_KEY_ID to `{}`".format(key))
    gdal.SetConfigOption("AWS_ACCESS_KEY_ID", key)
    ds = gdal.Open(path)
    if not ds:
        print("Failed to open file")
    else:
        print("Succeeded to open file")
    print()

On a machine with the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables set, run from the directory containing snippet.py:

$ docker run -e AWS_SECRET_ACCESS_KEY -v $(pwd):/root/snippet -it osgeo/gdal python3 /root/snippet/snippet.py dummy $AWS_ACCESS_KEY_ID
Setting AWS_ACCESS_KEY_ID to `dummy`
Warning 1: HTTP response code on https://landsat-pds.s3.amazonaws.com/L8/210/124/LC82101242013325LGN00/LC82101242013325LGN00_B1.TIF: 403
ERROR 17: The AWS Access Key Id you provided does not exist in our records.
Failed to open file

Setting AWS_ACCESS_KEY_ID to `***<removed secret>***`
ERROR 4: `/vsis3/landsat-pds/L8/210/124/LC82101242013325LGN00/LC82101242013325LGN00_B1.TIF' does not exist in the file system, and is not recognized as a supported dataset name.
Failed to open file
$ docker run -e AWS_SECRET_ACCESS_KEY -v $(pwd):/root/snippet -it osgeo/gdal python3 /root/snippet/snippet.py $AWS_ACCESS_KEY_ID dummy
Setting AWS_ACCESS_KEY_ID to `***<removed secret>***`
Succeeded to open file

Setting AWS_ACCESS_KEY_ID to `dummy`
Succeeded to open file

As you can see, depending on the order in which you set the credentials, you either fail to open the file, or succeed to open the file both times.

Operating system

osgeo/gdal:latest docker image

GDAL version and provenance

osgeo/gdal:latest docker image

rouault added a commit that referenced this issue Mar 11, 2020
/vsis3/: invalidate cached non-existing file is AWS_ config options are changed in the meantime (fixes #2294)
@glostis
Copy link
Contributor Author

glostis commented Mar 11, 2020

Thanks for the fix @rouault !

I've tried it with the example I've included in the first comment of this issue, and your fix has solved one of the examples I've given, but not both.

  1. Running python snippet.py dummy $AWS_ACCESS_KEY_ID now works:

    • GDAL fails to open the file the first time because the given access key is wrong
    • GDAL succeeds to open it when we change the key to the right one because thanks to the fix the cache gets invalidated.
  2. Running python snippet.py $AWS_ACCESS_KEY_ID dummy still doesn't work as expected:

    • GDAL succeeds to open the file the first time because the given access key is correct
    • GDAL succeeds to open the file the second time, even though we have changed the access key to a wrong one.

Case number 2 is a bit far-fetched though, so maybe you had good reasons not to address it?

@rouault
Copy link
Member

rouault commented Mar 11, 2020

Case number 2 is a bit far-fetched though, so maybe you had good reasons not to address it?

I didn't have it in mind. The fix I did relies on not relying on cached data that indicates a failed open, when AWS_ settings have changed. "Fixing" 2 would require different logic, and I don't see 2) has a practical concern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants