Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatically different download speeds between versions #371

Closed
1 of 2 tasks
irm-codebase opened this issue Aug 13, 2024 · 15 comments · Fixed by #372
Closed
1 of 2 tasks

Dramatically different download speeds between versions #371

irm-codebase opened this issue Aug 13, 2024 · 15 comments · Fixed by #372

Comments

@irm-codebase
Copy link

Version Checks (indicate both or one)

  • I have confirmed this bug exists on the lastest release of Atlite.

  • I have confirmed this bug exists on the current master branch of Atlite.

Issue Description

It seems like older atlite versions achieved faster downloads... somehow?

I've been comparing download speeds between atlite=0.2.1 and atilite=0.2.13, and the former consistently beats out the latter when it comes to download speeds (tested around half a dozen times).

The difference is quite dramatic: minutes to hours.

Reproducible Example

# with atlite 0.2.1
import atlite

cutout = atlite.Cutout(
    path="tmp/cutout.nc",
    module="era5",
    xs=slice(-15.8, 37),
    ys=slice(30, 75),
    time=slice("2015-01", "2016-12")
)
cutout.prepare(["runoff"])

#################################
# with atlite 0.2.13
import atlite
cutout = atlite.Cutout(
    path="tmp/cutout.nc",
    module=["era5"],
    x=slice(-15.8, 37),
    y=slice(30, 75),
    time=slice("2015-01", "2016-12")
)
cutout.prepare(["runoff"])

Expected Behavior

Download speeds should generally improve between versions, or remain unchanged.

Installed Versions

old: 0.2.1
new: 0.2.13

@irm-codebase
Copy link
Author

irm-codebase commented Aug 13, 2024

Example with the old version. Completed in seconds.

image

@irm-codebase
Copy link
Author

Meanwhile, a 0.2.13 run has not finished in one and a half hours...
The 0.2.1 version was even started later!

image

@irm-codebase
Copy link
Author

More testing: I identified v0.2.10 to be the last "fast" version (although I feel like 0.2.1 was even faster but I cannot say for sure).

CDS logs tell me that the main difference is that v0.2.13 requests annual data by one month at a time, while v0.2.10 requests data all at once.

My guess is that the new approach is penalized by the CDS scheduler, putting jobs back in the queue instead of just executing them.

@irm-codebase
Copy link
Author

One last test with different features (["height", "wind", "influx", "temperature", "runoff"]) shows a similar problem: requests are several requests are made instead of one, resulting more queue time.

I am not familiar enough with CDS to know if avoiding this one is possible, though.

@FabianHofmann
Copy link
Contributor

great initiative @irm-codebase! so can we narrow it down to the way we are doing the feature requests with the CDS package?

@irm-codebase
Copy link
Author

@FabianHofmann thanks!
That is my guess, yeah. I am not familiar enough with CDS, but my gut tells me the issue is somewhere in the requests.

Also, I keep seeing a message to "move to CDS-Beta", so I'd also check if this behavior also applies there.

@FabianHofmann
Copy link
Contributor

yes, the beta version is an important upcoming step which we have to take in near future. If you are really motivated, you could also have a look at this and the (hopefully) performance improvements.

also note one more thing. CDS keeps files that were recently downloaded "warm" for fast re-downloading. So, whenever you already downloaded a feature recently, it would be much faster.

@fneum
Copy link
Member

fneum commented Aug 13, 2024

For reference, this is the PR where the monthly chunking was introduced (#236) and I think this is the one where requests were split by features (#86).

The CDS is undergoing a significant migration at the moment, which has resulted in throttled requests. The new CDS-BETA is out for a couple of weeks and the latest master of atlite can handle the new infrastructure (requires a changed API key). (#361, #364). But I have not tested the performance of the new infrastructure yet.

Will also do some investigation in the following days.

@irm-codebase
Copy link
Author

Thanks for giving this priority!

I did test with the same dataset every time. A successful download (with older versions) did not seem to help the newer one, but I did not test this thoroughly.

@fneum
Copy link
Member

fneum commented Aug 14, 2024

I benchmarked a similar setup in the new CDS infrastructure (https://cds-beta.climate.copernicus.eu/). The old one will be shut down in September. The major aspect that changed between the atlite versions mentioned is that we switched from annual to monthly requests, so I varied these using #372. I varied the x-y coordinates so that CDS would not use cached artifacts, but kept the extent (5°x5°) identical.

import atlite

# annual request

cutout = atlite.Cutout(
    path="cutout-monthly.nc",
    module="era5",
    xs=slice(-10, -5),
    ys=slice(35, 40),
    time=slice("2015-01", "2015-12")
)
cutout.prepare(["wind"], monthly_requests=False)

# completed in 35 minutes

# sequential monthly request

cutout = atlite.Cutout(
    path="cutout-annual.nc",
    module="era5",
    xs=slice(0, 5),
    ys=slice(40, 45),
    time=slice("2015-01", "2015-12")
)
cutout.prepare(["wind"], monthly_requests=True)

# completed in 49 minutes

It indeed seems that reverting to annual requests was faster in the test cases and may make sense for cutouts with a smaller geographical extent; though the exact trade-off is unclear.

One thing I noticed is that the monthly requests are made sequentially rather than in parallel. I don't think this used to be like that in the past and could add to the queuing times.

Will test next if we can do parallel requests and if this achieves a speed-up.

Logs

Here are the request logs from https://cds-beta.climate.copernicus.eu/requests?tab=all (the last item is the annual request):

image

@fneum
Copy link
Member

fneum commented Aug 14, 2024

Using dask.delayed and dask.compute, I investigated if concurrent requests of the time chunks might achieve a speedup.

In era5.get_data:

datasets = map(
retrieve_once, retrieval_times(coords, monthly_requests=monthly_requests)
)

I changed:

-    datasets = map(
-        retrieve_once, retrieval_times(coords, monthly_requests=monthly_requests)
-    )
+    time_chunks = retrieval_times(coords, monthly_requests=monthly_requests)
+    delayed_datasets = [delayed(retrieve_once)(chunk) for chunk in time_chunks]
+    datasets = compute(*delayed_datasets)

xref: 1866fc3

I tested similar to the cases above:

# concurrent monthly request

cutout = atlite.Cutout(
    path="cutout-monthly-parallel.nc",
    module="era5",
    xs=slice(-5, 0),
    ys=slice(45, 50),
    time=slice("2015-01", "2015-12")
)
cutout.prepare(["wind"], monthly_requests=True)

# completed in 23 minutes

With 23 minutes queueing time, this is even faster. However, it should be noted that the overall queue could have had different lengths and, therefore, the numbers can only be indicative.

The concurrent time chunk requests should be optional though, as too many parallel requests may get you throttled (https://confluence.ecmwf.int/display/CEMS/CDS+-+Best+Practices):

Indeed the CDS system penalises users that submit too many requests, decreasing the priority of their requests. In short: Too many parallel requests will eventually result in a slower overall download time. For this reason, we suggest limiting to a maximum of 10 parallel requests.

This might especially happen if you 5 different features, which already uses dask.delayed to post requests for different features in parallel. Downloading a year will land you at 60 (5 x 12) parallel requests, which is above the recommendation.

Logs

image

@fneum
Copy link
Member

fneum commented Aug 14, 2024

Final follow-up (and with that, I'm quite happy with the solutions proposed in #372).

I tried a bulk concurrent submission with 50 simultaneous requests for different features:

cutout = atlite.Cutout(
    path="cutout-annual-parallel-features.nc",
    module="era5",
    xs=slice(0, 5),
    ys=slice(45, 50),
    time=slice("2015-01", "2015-12")
)
cutout.prepare(
    ["height", "wind", "influx", "temperature", "runoff"],
    monthly_requests=True,
    concurrent_requests=True
)

This one finished in 26 minutes, so it took a comparable time to downloading a single feature concurrently.

@irm-codebase
Copy link
Author

irm-codebase commented Aug 14, 2024

Thank you for checking it so fast!

@fneum does this mean I won't experience a slowdown if I want to download several individual cutouts in tandem?

Some of my scripts follow this approach, and it would be great if I do not have to modify the order (e.g., first download a big dataset, then process).

@fneum
Copy link
Member

fneum commented Aug 14, 2024

I think that's right.

@fneum
Copy link
Member

fneum commented Aug 17, 2024

Further speedups can be achieved by

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants