-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ESO authentication and download #2681
Conversation
Is this an intentional choice? When downloading large files, partial downloads have historically been pretty common for me at least, and it's nice to be able to pick up where you left off. |
apologies, my statement about range queries was wrong and I've deleted it. I'll investigate how to support partial downloads. However we don't always return |
OK, then I think it makes sense to check for |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2681 +/- ##
==========================================
+ Coverage 66.53% 67.28% +0.74%
==========================================
Files 235 235
Lines 18114 18136 +22
==========================================
+ Hits 12052 12202 +150
+ Misses 6062 5934 -128 ☔ View full report in Codecov by Sentry. |
Keep in mind this would also mean none of those files would support partial download, because, similarly to header corrections, we would not know the size of the file if we pass it through compress and gzip and stream the response. |
I've added two small files to the test data but they don't seem to be installed when running the tests on the server. Should I add them to some setup/configuration file? Also I've extended |
@keflavich @bsipocz can you please help with the issue I reported in my previous comment? I've added some files in the test data folder but they are not found when running the tests in github. See failed tests in the last couple of commits. |
@szampier yes, you need to add those files under the |
@szampier @Pharisaeus - There have been a few reports of bugs with the current ESO module of astroquery, and it reminded me to double-check the planned timeline for swapping the currently used API out. Do you have some ETAs or roadmaps, however far it might be? |
87f1537
to
217af29
Compare
@bsipocz in the past days I made a few additional changes and I now consider this PR ready for review. There are still two failing checks, one in the cadc module and one because of a missing changelog entry. I'm not sure what to do with the changelog. As written in the initial comment I've reimplemented file download mainly to avoid sending a HEAD request each time we download a file just to get the filename from content-disposition. Files are downloaded to a temp (part) file and moved to the final destination after the download has been completed. Resuming partial downloads using range queries has not been implemented and I don't consider it a critical feature for the moment, also considering that we stream-process about 20% of the archive files. I would prefer to implement this feature in a separate PR. |
I tested the PR on macOS and the following works fine:
Both identification and decompression work (with gunzip is available on macOS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did one quick review, without trying to run any of the changes or looking into the coverage stats. There are only some superficial comments, some of them are open ended, and a lot are just heads up, fyi-s.
Also, I see one failing test, but that one is spotted on main
, too, so it can/should be fixed separately (I expect it to be something trivial on the bookkeeping level, e.g. one instrument will need to be skipped for that test.
And this will need a rebase in order to pick up recent changes that would make the unrelated CI failures gone.
just to get the original filename. [#1580] | ||
- Restore support for .Z files. [#1818] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: We list the number of the PR in the changelog, not the issues fixed. I'll fix this up when cleaning up the changelog for release
CALSELECTOR_URL = "https://archive.eso.org/calselector/v1/associations" | ||
DOWNLOAD_URL = "https://dataportal.eso.org/dataPortal/file/" | ||
AUTH_URL = "https://www.eso.org/sso/oidc/token" | ||
GUNZIP = "gunzip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see tests or examples changing this, what would be the use case, windows users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I didn't foresee to change these values, that's why I've defined them as "constants". Regarding gunzip
, we might change the handling of .Z files again in the future since we have now the possibility of changing compression on-the-fly upon download so we could return only .gz files to astroquery users.
astroquery/eso/core.py
Outdated
def retrieve_data(self, datasets, *, continuation=False, destination=None, | ||
with_calib='none', request_all_objects=False, | ||
unzip=True, request_id=None): | ||
def retrieve_data(self, datasets, *, overwrite=False, destination=None, | ||
with_calib=None, unzip=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there have been quite a number of keywords removed, without deprecation. Given that this is a significant rewrite of the module, as well as changes are upstream, I would think it's OK to break API without warning, though it may then need another half sentence in the changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only remaining question that I would like to address before merging. Given request_id
was featured in an example in the documentation, I'm somewhat inclined towards adding a deprecation warning for all the removed parameters.
But I'm also OK if you say it has to be a clean cut for the users, I would really just like to see it's a decision whether giving warnings or exception and not a side effect of the code refactoring.
(knowing that in practice warnings are mostly ignored by users, with the deprecation warning they could run something that has no effect any more, while with the exception they face the changes straight away).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've restored the original API and added a deprecation warning using deprecated_renamed_argument
decorator from astropy, as done for example in irsa/core.py
.
BTW I think the since
argument should be corrected in irsa/core.py
at line 63 and replaced with an array of 3 elements since there are 3 changed arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think the since argument should be corrected in irsa/core.py at line 63 and replaced with an array of 3 elements since there are 3 changed arguments.
Thanks for the heads up, I suppose I was working with the assumption that it will be broadcasted, but I'll have a look.
Btw, what happened with overwrite
instead of continuation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, what happened with
overwrite
instead ofcontinuation
?
I've changed it back to the original argument to keep the same API, but the meaning is the same, and indeed the description of this argument has not changed
Force the retrieval of data that are present in the destination directory.
return True | ||
return False | ||
|
||
def _download_eso_file(self, file_link: str, destination: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heads up that we're in the middle of adding/upstreaming download functionality to pyvo, in the long term this module may benefit from that, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know, I'm happy to use the upstream version if it supports our requirements.
… support .Z files
e16465d
to
1b927ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think this should go in now.
I still think that renaming continuation to overwrite makes sense, as the functionality changed, but that can come (or not) in a follow-up PR, no need to bikeshed this one any further.
Thank you @szampier! I haven't triaged the linked issues, let me know which ones to close either because they are resolved or because they won't be supported. |
The following tickets can be closed:
|
Dear astroquery maintainers, @almicol, @Pharisaeus
this PR is a complete rewrite of the authentication and download part of
astroquery.eso
.As you can see in my initial commits, I first tried to reuse ALMA's
download_files
and astroquery's_download_file
, but I ended up re-implementing it from scratch because it doesn't support well our use case:content-disposition
(unnecessary overhead)we don't support range queries / partial downloads(actually we do except for cases where we correct the FITS header upon download)_download_file
creates a too verbose output ( ESO downloads: no 'verbose' option #1357)Public interface
The public interface remains
retrieve_data
but with two arguments removed because they are no longer necessary. This method is now quite small, essentially callingfind_associated_files
,_download_eso_files
and_unzip_files
in sequence.Authentication and Authorization
Authentication (login) is no longer necessary for public data.
We switched to token-based authentication and authorization. The token is retrieved when calling
login
(more specifically in the_authenticate
method) and saved inself._auth_info
, together with username and password so we can re-authenticate automatically when the token is about to expire. The token normally expires after 8 hours and we re-authenticate 10 minutes before the expiration date.NOTE: since
PyJWT
is not a dependency of astroquery, to extract the expiration time from the token we need to "manually" decode it but this is relatively straightforward.Support for .Z files
The current implementation unfortunately is not able to uncompress .Z files which is a big issue for ESO since most of our raw data are Z-compressed. At ESO we are considering the option of recompressing files upon download on-the-fly from .Z to .gz, which is a better supported format especially on Windows where the
gunzip
command is normally not available. For now I've re-implemented the uncompression using the system command if available. I know it's not ideal but at least it works, except for Windows which is not a very popular platform among our users anyways.Test and documentation
test_nologin
has been fixed since we now support anonymous downloadtest_retrieve_data_and_calib
does not pass because of missing calselector (see TODO)test_each_instrument_SgrAstar
does not pass for unknown reasons independent from this PRSolved issues
I believe this PR will solve at least partially the following issues: