Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand options for accessing secured thredds data #272

Open
tlogan2000 opened this issue Dec 15, 2022 · 4 comments
Open

Expand options for accessing secured thredds data #272

tlogan2000 opened this issue Dec 15, 2022 · 4 comments
Assignees

Comments

@tlogan2000
Copy link
Contributor

tlogan2000 commented Dec 15, 2022

Currently accessing secured PAVICS thredds data only seems possible via a pydap backend (e.g. seen here : https://pavics-sdi.readthedocs.io/en/latest/notebooks/pavics_thredds.html or more simply via xr.open_dataset(secure_url, session=session, engine='pydap')

However, there are issues with pydap working with dask.distributed that make it a less desireable option. See pydata/xarray#4348 and pydap/pydap#256

Ideally there would be methods to access the secured data with at least the default netcdf4 engine and perhaps others

@aulemahal
Copy link
Contributor

aulemahal commented Dec 15, 2022

AFAIU, complex authentication with netCDF4 relies on mechanisms implemented down in netcdf-c/libdap (C++ libraries) and on a .dodsrc file that must be in the PATH. This makes it complex once in a jupyterlab notebook. However, I believe simple HTTP authentication could work, like : ds = xr.open_dataset("http://USER:PASS@pavics.ouranos.ca/...").

But Magpie doesn't work like that, it uses an authentication call that return a cookie to be used in further calls. I have not been able to use this cookie in a way that made netcdf-c aware of it. Tried both on pavics as well as on my own machine.

FYI: netCDF4 and pydap are the only official xarray engines that support OpenDAP.

@fmigneault
Copy link
Contributor

@huard FYI

Magpie does not support simple HTTP authentication (http://USER:PASS@pavics.ouranos.ca/), and even if it did, it would not work with Twitcher that is the one enforcing the authentication. The logic needs to be added in MagpieAdapter for an auth-reference to take effect on any URL of the platform (instead of only on Magpie's /signin endpoint).

What Magpie does support, and which is parsed by the MagpieAdapter used in Twitcher, is either of those forms in the headers:

I'm not too familiar with the internals of xr.open_dataset and how/which engine/backend ends up performing the HTTP request, but if there was a way to inject those headers (could be a dirty mock hack in the worst case?), authorization would work. The notebook would only need to perform the authentication once at the start.

Alternatively, if there is a way to pass down the auth parameter to a call like requests.get, then the https://github.com/Ouranosinc/requests-magpie/ callable can be used instead to do a pre-authentication hook before the request.

@aulemahal
Copy link
Contributor

aulemahal commented Dec 15, 2022

Sadly, this is exactly what pydap allows : you can pass it a requests.Session(), in which the Auth object from requests-magpie has been inserted. I find this solution quite easy. But pydap breaks in a multi-processes setup.

The other choice is netCDF4 but here the whole web request mechanism is performed outside python, in the C++ libraries cited above, which forbids any mock or requests-based solution (AFAIU).

The easiest way out might be fixing pydap... The pickling error actually looks avoidable.

@fmigneault
Copy link
Contributor

Indeed. Fixing pydap would be the safest approach.

I'm reading a bit on https://docs.unidata.ucar.edu/netcdf/documentation/4.8.0/md__home_wfisher_Desktop_v4_88_80_netcdf-c_docs_auth.html#auth_redir.

If I understand correctly, the alternative with netCDF4 would be to do something like so:

with tempfile.TemporaryDirectory() as tmp_dir:
    auth_path = os.path.join(tmp_dir, ".daprc")
    with open(auth_path, mode="w", encoding="utf-8") as auth_file:
        auth_file.write(f"HTTP.COOKIEJAR={MAGPIE_COOKIE}")  # from previous auth request
    with mock.patch.dict("os.environ", {"DAPRCFILE": auth_path}):
        xr.open_dataset("<url>")

If everything resolves correctly, this should work (remains to be tested).

The danger with using the RC files is if a user leaves them in the workspace, it is a potential security leak.
However, this is somewhat mitigated with cookies that should have a maximum expiration lifetime (configurable in the platform settings / Magpie INI file).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants