-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factorize client interface and provide "download all in range" #64
base: master
Are you sure you want to change the base?
Factorize client interface and provide "download all in range" #64
Conversation
There is no need to authenticate is order to search for available products
So far, `query_orbit` and `query_orbit_by_dt` (on Dataspace side) return only one orbit file: the one that contains the requested time range. The new `query_orbits_by_dt_range` method returns all orbit file contained in the requested time range. In order to factorize code, internal `_QueryOrbitFile` classes are defined.
EOF files on ASF server are listed once only in tests. Ideally this should have been mocked, but interrogation is still kept. Other tests will rely on a snapshot of precise orbit files that is cached. A test dedicated to testing the use of the cached baseline has been defined.
...instead of global and shared among all ASFClient instances.
And factorize out the creation of CDSE Access Token in order to to be able to inject 2FA token from the command line. In the end, no token nor secrets are recorded.
This will concern tests in test_eof.py. DB searching is tested in test_asf_client.py
Also. I've curated the cassettes from all authentication related secrets. The strange design in test_eof around |
This is quite a PR @LucHermitte ! it might take a a little awhile to get through it all, but it looks like a lot of improvements. my only initial questions are
do you think the interface changes would still accommodate that addition (which I expect to start using as my own default, since not needing a password is nice, and it'll go much faster on EC2 instances)? |
Of course. I know. :(
In the tests definitively. I guess
Quite certainly. The idea behind the changes was to keep a single and unique interface which is quite simple:
It should even be possible to add a new Also, IIRC, I've seen another REST based interface to request references to filenames on earthdata server. As I didn't want to bring more changes, I didn't try to replace the current approach. I didn't have much time to investigate if there is indeed another way to search for files according to whatever time and mission based criteria. |
what about something like this using candidates = [
item
for item in data
if item.start_time <= (t0 - margin0) and item.stop_time >= (t1 + margin1)
]
if not candidates:
raise ValidityError(
"none of the input products completely covers the requested "
"time interval: [t0={}, t1={}]".format(t0, t1)
)
# Sort candidates by all attributes except created_time
candidates.sort(key=lambda x: (x.start_time, x.stop_time, x.id))
# Group by everything except created_time and select the one with the latest created_time
result = []
for _, group in groupby(candidates, key=lambda x: (x.start_time, x.stop_time, x.id)):
result.append(max(group, key=lambda x: x.created_time))
return result (unsure if that's the exact key we want to group on) |
I've removed the dependency to sortedcontainers. Using Regarding my manual implementation of (I can join my flawed benchmarking test if you're interested -- I don't think it makes sense in sentineleof code base) |
This PR is quite big, I had to make some choices while trying to leave the current API as it is. More on the topic below.
The main changes I propose are:
ASFClient
andDataspaceClient
by:Client
that factorizes a common interface for querying and authenticatingauthenticate()
method on each clients we obtain a session object that provides the uniquedownload_all
methodA few smaller changes are included:
.netrc
file is now either explicitly given as parameter, or obtained from$NETRC
environment variable (name already used by other tools), or defaulted to~/.netrc
ASFClient
API avoid to query the list of available files again and again. Instead, a cached baseline is used. And only one test queries the lists of available files, and makes sure the new list contains that the cached baseline. EDIT: at this point I missed the usage of VCR. Yet I think we don't need multiple and redundant slow requests when recording the cassettes.ASFClient
are no longer static and global, but local to each client instance -- this way we know exactly what can be expected from each test, and we can compare the EOF list in the cache from the list of remote EOFs available on ASF server.OrbitType
to avoid typo errors and assertions to silence pylint.Finally, there are things that are half-way done as I didn't want to completely modify the current API in case you were explicitly relying on it.
List[str]
in ASF case andList[dict]
in Copernicus dataspace case. Ideally, we would return aList[SentinelOrbit]
which alldownload*()
methods would know how to convert into something that makes sense at their levelClient
level, but I'd prefer to know exactly how you want it donedownload_all
should be atSession
level, and calldownload_one
which would be specialized for each client.