Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧐 Subset WeatherBench2 to one year and select variables #3

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Oct 3, 2023

Obtaining a temporal subset of WeatherBench2 for the year 2000 only, at pressure level 500hPa, with data variables: geopotential, u_component_of_wind, v_component_of_wind. Disabling compression when saving to Zarr so that it can be read back using cupy-xarray's kvikIO engine developed at xarray-contrib/cupy-xarray#10 (as of Oct 2023).

References:

Obtaining a temporal subset of WeatherBench2 from 2020-2022, at pressure level 500hPa, with data variables: geopotential, u_component_of_wind, v_component_of_wind. Disabling compression when saving to Zarr so that it can be read back using cupy-xarray's kvikIO engine developed at xarray-contrib/cupy-xarray#10.
@weiji14 weiji14 self-assigned this Oct 3, 2023
@weiji14 weiji14 changed the title 🧐 Subset WeatherBench2 to two years and select variables 🧐 Subset WeatherBench2 to twenty years and select variables Oct 5, 2023
Changing from the 240x121 spatial resolution grid to 1464x721, so that a single time chunk is larger (about 4MB). Decreased time range from twenty years to one year (previous commit at  c63bc43 was actual 20, not 2 years). Also had to rechunk along the latitude and longitude dims because the original WeatherBench2 Zarr store had 2 chunks per time slice.
@weiji14 weiji14 changed the title 🧐 Subset WeatherBench2 to twenty years and select variables 🧐 Subset WeatherBench2 to one year and select variables Oct 5, 2023
@weiji14 weiji14 merged commit 2958bf3 into main Oct 5, 2023
1 check passed
@weiji14 weiji14 deleted the subset_weatherbench2 branch October 5, 2023 01:49
ds_500hpa_zuv[var].encoding["compressor"] = None

# Save to Zarr with chunks of size 1 along time dimension
# Can take about 1 hour to save 10.7GB of data at 40MB/s
Copy link
Owner Author

@weiji14 weiji14 Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to update this line 😅

Benchmark download time at commit c63bc43 for 10.7GB of data was:

real	78m14.295s
user	25m23.062s
sys	5m0.360s

At commit a904131, dataset is 18.2GB and took:

real	793m42.489s
user	279m16.092s
sys	51m52.820s

Internet speeds on my side was averaging at around 40MB/s, and rechunking on the fly probably took some time 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant