Refactor ESO authentication and download #2681

szampier · 2023-03-10T14:20:00Z

Dear astroquery maintainers, @almicol, @Pharisaeus

this PR is a complete rewrite of the authentication and download part of astroquery.eso.
As you can see in my initial commits, I first tried to reuse ALMA's download_files and astroquery's _download_file, but I ended up re-implementing it from scratch because it doesn't support well our use case:

we don't want to send a HEAD request just to get the filename from content-disposition (unnecessary overhead)
~~we don't support range queries / partial downloads~~ (actually we do except for cases where we correct the FITS header upon download)
the progress bar used in _download_file creates a too verbose output ( ESO downloads: no 'verbose' option #1357)

Public interface

The public interface remains retrieve_data but with two arguments removed because they are no longer necessary. This method is now quite small, essentially calling find_associated_files, _download_eso_files and _unzip_files in sequence.

Authentication and Authorization

Authentication (login) is no longer necessary for public data.
We switched to token-based authentication and authorization. The token is retrieved when calling login (more specifically in the _authenticate method) and saved in self._auth_info, together with username and password so we can re-authenticate automatically when the token is about to expire. The token normally expires after 8 hours and we re-authenticate 10 minutes before the expiration date.
NOTE: since PyJWT is not a dependency of astroquery, to extract the expiration time from the token we need to "manually" decode it but this is relatively straightforward.

Support for .Z files

The current implementation unfortunately is not able to uncompress .Z files which is a big issue for ESO since most of our raw data are Z-compressed. At ESO we are considering the option of recompressing files upon download on-the-fly from .Z to .gz, which is a better supported format especially on Windows where the gunzip command is normally not available. For now I've re-implemented the uncompression using the system command if available. I know it's not ideal but at least it works, except for Windows which is not a very popular platform among our users anyways.

Test and documentation

test_nologin has been fixed since we now support anonymous download
test_retrieve_data_and_calib does not pass because of missing calselector (see TODO)
test_each_instrument_SgrAstar does not pass for unknown reasons independent from this PR
when executing the remote tests, pytest hangs at the end of the tests, I don't know why
documentation has been updated to reflect the changes

Solved issues

I believe this PR will solve at least partially the following issues:

keflavich · 2023-03-10T14:31:53Z

we don't support range queries / partial downloads

Is this an intentional choice? When downloading large files, partial downloads have historically been pretty common for me at least, and it's nice to be able to pick up where you left off.

szampier · 2023-03-10T14:46:42Z

Is this an intentional choice? When downloading large files, partial downloads have historically been pretty common for me at least, and it's nice to be able to pick up where you left off.

apologies, my statement about range queries was wrong and I've deleted it. I'll investigate how to support partial downloads. However we don't always return content-length because in some cases we do stream-processing on the files upon download so the server doesn't know the size of the downloaded file. A notable case of stream-processing is header correction, i.e. when we find issues in the FITS header we store the corrected header in a database and we apply the changes on-the-fly when downloading the file.

keflavich · 2023-03-10T15:00:02Z

OK, then I think it makes sense to check for content-length. If it is present, then a range query makes sense and partial downloads can be resumed. But if there are streamed corrections that change the content length, partial downloads are not possible.

codecov · 2023-03-10T15:31:13Z

Codecov Report

Attention: 49 lines in your changes are missing coverage. Please review.

Comparison is base (885734b) 66.53% compared to head (9ea4138) 67.28%.
Report is 8 commits behind head on main.

Files	Patch %	Lines
astroquery/eso/core.py	76.32%	49 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2681      +/-   ##
==========================================
+ Coverage   66.53%   67.28%   +0.74%     
==========================================
  Files         235      235              
  Lines       18114    18136      +22     
==========================================
+ Hits        12052    12202     +150     
+ Misses       6062     5934     -128

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Pharisaeus · 2023-03-10T16:16:26Z

At ESO we are considering the option of recompressing files upon download on-the-fly from .Z to .gz

Keep in mind this would also mean none of those files would support partial download, because, similarly to header corrections, we would not know the size of the file if we pass it through compress and gzip and stream the response.

szampier · 2023-03-13T18:36:37Z

I've added two small files to the test data but they don't seem to be installed when running the tests on the server. Should I add them to some setup/configuration file? Also I've extended astroquery.utils.mocks.MockResponse to be able to test file download without remote server, I hope it's OK.

szampier · 2023-03-15T12:43:00Z

@keflavich @bsipocz can you please help with the issue I reported in my previous comment? I've added some files in the test data folder but they are not found when running the tests in github. See failed tests in the last couple of commits.

keflavich · 2023-03-15T14:28:24Z

@szampier yes, you need to add those files under the setup_package.py file, e.g. as in Vizier:
https://github.com/astropy/astroquery/blob/main/astroquery/vizier/setup_package.py

bsipocz · 2023-10-09T19:01:51Z

@szampier @Pharisaeus - There have been a few reports of bugs with the current ESO module of astroquery, and it reminded me to double-check the planned timeline for swapping the currently used API out. Do you have some ETAs or roadmaps, however far it might be?

pep8speaks · 2023-10-20T13:31:43Z

Hello @szampier! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2023-12-08 11:45:50 UTC

szampier · 2023-10-24T15:56:05Z

@bsipocz in the past days I made a few additional changes and I now consider this PR ready for review. There are still two failing checks, one in the cadc module and one because of a missing changelog entry. I'm not sure what to do with the changelog. As written in the initial comment I've reimplemented file download mainly to avoid sending a HEAD request each time we download a file just to get the filename from content-disposition. Files are downloaded to a temp (part) file and moved to the final destination after the download has been completed. Resuming partial downloads using range queries has not been implemented and I don't consider it a critical feature for the moment, also considering that we stream-process about 20% of the archive files. I would prefer to implement this feature in a separate PR.
The bug reports you have mentioned refer to the query part which I didn't change in this PR. Once this PR is merged we will refactor also the query part and address the relevant tickets.
Do you agree to remove the Draft status from this PR?

jwoillez · 2023-11-25T08:24:29Z

I tested the PR on macOS and the following works fine:

from astroquery.eso import Eso
eso.login(username="xxxx")  # <== use your own login
eso.retrieve_data("GRAVI.2023-11-24T15:03:58.695")

Both identification and decompression work (with gunzip is available on macOS).

bsipocz

I did one quick review, without trying to run any of the changes or looking into the coverage stats. There are only some superficial comments, some of them are open ended, and a lot are just heads up, fyi-s.

Also, I see one failing test, but that one is spotted on main, too, so it can/should be fixed separately (I expect it to be something trivial on the bookkeeping level, e.g. one instrument will need to be skipped for that test.

And this will need a rebase in order to pick up recent changes that would make the unrelated CI failures gone.

bsipocz · 2023-12-07T06:47:01Z

CHANGES.rst

+  just to get the original filename. [#1580]
+- Restore support for .Z files. [#1818]


FYI: We list the number of the PR in the changelog, not the issues fixed. I'll fix this up when cleaning up the changelog for release

bsipocz · 2023-12-07T06:56:31Z

astroquery/eso/core.py

+    CALSELECTOR_URL = "https://archive.eso.org/calselector/v1/associations"
+    DOWNLOAD_URL = "https://dataportal.eso.org/dataPortal/file/"
+    AUTH_URL = "https://www.eso.org/sso/oidc/token"
+    GUNZIP = "gunzip"


I don't see tests or examples changing this, what would be the use case, windows users?

Actually I didn't foresee to change these values, that's why I've defined them as "constants". Regarding gunzip, we might change the handling of .Z files again in the future since we have now the possibility of changing compression on-the-fly upon download so we could return only .gz files to astroquery users.

bsipocz · 2023-12-07T07:00:40Z

astroquery/eso/core.py

-    def retrieve_data(self, datasets, *, continuation=False, destination=None,
-                      with_calib='none', request_all_objects=False,
-                      unzip=True, request_id=None):
+    def retrieve_data(self, datasets, *, overwrite=False, destination=None,
+                      with_calib=None, unzip=True):


there have been quite a number of keywords removed, without deprecation. Given that this is a significant rewrite of the module, as well as changes are upstream, I would think it's OK to break API without warning, though it may then need another half sentence in the changelog.

I think this is the only remaining question that I would like to address before merging. Given request_id was featured in an example in the documentation, I'm somewhat inclined towards adding a deprecation warning for all the removed parameters.
But I'm also OK if you say it has to be a clean cut for the users, I would really just like to see it's a decision whether giving warnings or exception and not a side effect of the code refactoring.
(knowing that in practice warnings are mostly ignored by users, with the deprecation warning they could run something that has no effect any more, while with the exception they face the changes straight away).

I've restored the original API and added a deprecation warning using deprecated_renamed_argument decorator from astropy, as done for example in irsa/core.py.
BTW I think the since argument should be corrected in irsa/core.py at line 63 and replaced with an array of 3 elements since there are 3 changed arguments.

BTW I think the since argument should be corrected in irsa/core.py at line 63 and replaced with an array of 3 elements since there are 3 changed arguments.

Thanks for the heads up, I suppose I was working with the assumption that it will be broadcasted, but I'll have a look.

Btw, what happened with overwrite instead of continuation?

Btw, what happened with overwrite instead of continuation?

I've changed it back to the original argument to keep the same API, but the meaning is the same, and indeed the description of this argument has not changed

Force the retrieval of data that are present in the destination directory.

astroquery/eso/core.py

bsipocz · 2023-12-07T07:06:03Z

astroquery/eso/core.py

+                return True
+        return False
+
+    def _download_eso_file(self, file_link: str, destination: str,


heads up that we're in the middle of adding/upstreaming download functionality to pyvo, in the long term this module may benefit from that, too.

good to know, I'm happy to use the upstream version if it supports our requirements.

astroquery/eso/core.py

… support .Z files

…ach file

bsipocz

Thank you, I think this should go in now.

I still think that renaming continuation to overwrite makes sense, as the functionality changed, but that can come (or not) in a follow-up PR, no need to bikeshed this one any further.

bsipocz · 2023-12-08T16:56:26Z

Thank you @szampier!

I haven't triaged the linked issues, let me know which ones to close either because they are resolved or because they won't be supported.

szampier · 2023-12-11T12:23:11Z

Thank you @szampier!

I haven't triaged the linked issues, let me know which ones to close either because they are resolved or because they won't be supported.

The following tickets can be closed:

ESO: decompressing .fits.Z files is broken #1818 : reverted to system-wide gunzip because Python's gzip doesn't support .Z files. Switched to token-based authentication and made sure the token is refreshed before it expires (another issue mentioned in this ticket).
Get error when trying to download data using astroquery.eso #1580 : the download code has been completely rewritten and this issue doesn't occur anymore.
ESO downloads: no 'verbose' option #1357 : the verbosity was due to the progress bar which is not used anymore

bsipocz added the eso label Mar 10, 2023

szampier force-pushed the refactor_eso_download branch from 87f1537 to 217af29 Compare October 13, 2023 19:57

szampier marked this pull request as ready for review October 27, 2023 10:55

Pharisaeus mentioned this pull request Nov 26, 2023

ESO retrieve_data does not download ancillary data? #2885

Open

keflavich approved these changes Nov 30, 2023

View reviewed changes

bsipocz reviewed Dec 7, 2023

View reviewed changes

szampier added 12 commits December 7, 2023 10:35

refactor eso data retrieval using alma's 'download_files'

7a1e1e4

fix authentication, remove unused method

8128eb4

uncompress files with external command because Python's gzip does not…

279a920

… support .Z files

reimplement file download so we don't need to send two requests for e…

454f02d

…ach file

use token authentication for download

8f1827a

re-authenticate when token expires

b18c286

remove argument request_all_objects

51d7ffc

rename argument continuation to overwrite and implement this feature

f904533

fix r-string, fix test

ac2a3ea

create destination directory

b12d5d7

add logs, update doc

8190085

annotate modified methods with type hints

1a1fdc2

szampier added 10 commits December 7, 2023 10:35

implement retrieval of associated calibration files

a367ca2

rebase MockResponse

42c8885

parse calselector multipart response, test calselector

65af273

fix filename for windows

b4338ac

add new test data files to setup_package.py

0e53440

batch calselector requests

2d26f0e

fix code style errors

4bb3f5a

download to .part file

82f8925

sanitize Content-Disposition's filename (rfc6266#section-4.3)

bc634dc

Update ChangeLog

1b927ee

szampier force-pushed the refactor_eso_download branch from e16465d to 1b927ee Compare December 7, 2023 09:35

szampier added 5 commits December 7, 2023 14:17

implement review comments

8a46168

fix test_each_instrument_SgrAstar and try to increase coverage

43a5940

add test

5ccd28d

skip test_unzip on Windows

006aca0

add test to increase coverage

f8ef983

bsipocz added this to the v0.4.7 milestone Dec 7, 2023

szampier added 2 commits December 8, 2023 10:42

keep original API and add deprecation warning for removed arguments

a1efe74

increase test coverage

9ea4138

bsipocz approved these changes Dec 8, 2023

View reviewed changes

bsipocz merged commit 60e6353 into astropy:main Dec 8, 2023
10 checks passed

This was referenced Dec 11, 2023

ESO: decompressing .fits.Z files is broken #1818

Closed

Get error when trying to download data using astroquery.eso #1580

Closed

ESO downloads: no 'verbose' option #1357

Closed

bsipocz mentioned this pull request Dec 19, 2023

TST BUG: ESO test files should be in tmp dirs #2902

Closed

szampier mentioned this pull request Dec 21, 2023

TST: parametrize surveys to see which one fails #2904

Closed

szampier deleted the refactor_eso_download branch April 5, 2024 23:42

vandalt mentioned this pull request Apr 23, 2024

IndexError when trying to use "special access" ESO archive query ("cas" url) #2994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ESO authentication and download #2681

Refactor ESO authentication and download #2681

szampier commented Mar 10, 2023 •

edited

Loading

keflavich commented Mar 10, 2023

szampier commented Mar 10, 2023

keflavich commented Mar 10, 2023

codecov bot commented Mar 10, 2023 •

edited

Loading

Pharisaeus commented Mar 10, 2023

szampier commented Mar 13, 2023

szampier commented Mar 15, 2023

keflavich commented Mar 15, 2023

bsipocz commented Oct 9, 2023

pep8speaks commented Oct 20, 2023 •

edited

Loading

szampier commented Oct 24, 2023

jwoillez commented Nov 25, 2023

bsipocz left a comment

bsipocz Dec 7, 2023

bsipocz Dec 7, 2023

szampier Dec 7, 2023

bsipocz Dec 7, 2023

bsipocz Dec 7, 2023

szampier Dec 8, 2023

bsipocz Dec 8, 2023

szampier Dec 8, 2023

bsipocz Dec 7, 2023

szampier Dec 7, 2023

bsipocz left a comment

bsipocz commented Dec 8, 2023

szampier commented Dec 11, 2023

		just to get the original filename. [#1580]
		- Restore support for .Z files. [#1818]

Refactor ESO authentication and download #2681

Refactor ESO authentication and download #2681

Conversation

szampier commented Mar 10, 2023 • edited Loading

Public interface

Authentication and Authorization

Support for .Z files

Test and documentation

Solved issues

keflavich commented Mar 10, 2023

szampier commented Mar 10, 2023

keflavich commented Mar 10, 2023

codecov bot commented Mar 10, 2023 • edited Loading

Codecov Report

Pharisaeus commented Mar 10, 2023

szampier commented Mar 13, 2023

szampier commented Mar 15, 2023

keflavich commented Mar 15, 2023

bsipocz commented Oct 9, 2023

pep8speaks commented Oct 20, 2023 • edited Loading

Comment last updated at 2023-12-08 11:45:50 UTC

szampier commented Oct 24, 2023

jwoillez commented Nov 25, 2023

bsipocz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsipocz left a comment

Choose a reason for hiding this comment

bsipocz commented Dec 8, 2023

szampier commented Dec 11, 2023

szampier commented Mar 10, 2023 •

edited

Loading

codecov bot commented Mar 10, 2023 •

edited

Loading

pep8speaks commented Oct 20, 2023 •

edited

Loading