Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable download of large (spatial extent) cutouts from ERA5 via cdsapi. #236

Merged
merged 19 commits into from
Apr 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,19 @@ Upcoming Release
* Added 1-axis vertical and 2-axis tracking option for solar pv and trigon_model = "simple"
* Added small documentation for get_windturbineconfig
* The deprecated functions `grid_cells` and `grid_coordinates` were removed.
* Feature: Cutouts are now compressed during the `.prepare(...)` stepusing the native compression feature of netCDF files.
* Feature: Cutouts are now compressed differently during the `.prepare(...)` step using the native compression feature of netCDF files.
This increases time to build a cutout but should reduce cutout file sizes.
Existing cutouts are not affected. To also compress existing cutouts, load and save them using `xarray` with
compression specified, see `the xarray documentation <https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_netcdf.html>`_
for details.
* Feature: Cutouts from `ERA5` are now downloaded for each month rather than for each year.
This allows for spatially larger cutouts (worldwide) which previously exceed the maximum
download size from ERA5.
* Doc: A subsection on how to reduce `cutout` sizes has been added to the documentation.
* Bug notice: An bug in one of `atlite` package dependencies (`xarray`) can lead to `nan` values when using `atlite`.
A workaround is implemented in `atlite` which reduces the performance when building cutouts, especially for ERA5 cutouts.
The `nan` values in `cutouts` which are affected by the bug can not be recoevered and the `cutout` needs to be downloaded again.
For more details on the bug, see the `xarray issue tracker <https://github.com/pydata/xarray/issues/7691>`_.

Version 0.2.10
==============
Expand Down
16 changes: 10 additions & 6 deletions atlite/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def cutout_prepare(
features=None,
tmpdir=None,
overwrite=False,
compression={"zlib": True, "complevel": 4},
compression={"zlib": True, "complevel": 9, "shuffle": True},
):
"""
Prepare all or a selection of features in a cutout.
Expand Down Expand Up @@ -144,9 +144,10 @@ def cutout_prepare(
compression : None/dict, optional
Compression level to use for all features which are being prepared.
The compression is handled via xarray.Dataset.to_netcdf(...), for details see:
https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_netcdf.html
To disable compression, set to None. As a trade-off between speed and
compression, the default is {'zlib': True, 'complevel': 4}.
https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_netcdf.html .
To efficiently reduce cutout sizes, specify the number of 'least_significant_digits': n here.
To disable compression, set "complevel" to None.
Default is {'zlib': True, 'complevel': 9, 'shuffle': True}.

Returns
-------
Expand Down Expand Up @@ -194,9 +195,12 @@ def cutout_prepare(
fd, tmp = mkstemp(suffix=filename, dir=directory)
os.close(fd)

logger.debug("Writing cutout to file...")
# Delayed writing for large cutout
# cf. https://stackoverflow.com/questions/69810367/python-how-to-write-large-netcdf-with-xarray
write_job = ds.to_netcdf(tmp, compute=False)
with ProgressBar():
ds.to_netcdf(tmp)

write_job.compute()
if cutout.path.exists():
cutout.data.close()
cutout.path.unlink()
Expand Down
39 changes: 26 additions & 13 deletions atlite/datasets/era5.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,9 +252,10 @@ def retrieval_times(coords, static=False):
"""
Get list of retrieval cdsapi arguments for time dimension in coordinates.

If static is False, this function creates a query for each year in the
time axis in coords. This ensures not running into query limits of the
cdsapi. If static is True, the function return only one set of parameters
If static is False, this function creates a query for each month and year
in the time axis in coords. This ensures not running into size query limits
of the cdsapi even with very (spatially) large cutouts.
If static is True, the function return only one set of parameters
for the very first time point.

Parameters
Expand All @@ -274,16 +275,18 @@ def retrieval_times(coords, static=False):
"time": time[0].strftime("%H:00"),
}

# Prepare request for all months and years
times = []
for year in time.year.unique():
t = time[time.year == year]
query = {
"year": str(year),
"month": list(t.month.unique()),
"day": list(t.day.unique()),
"time": ["%02d:00" % h for h in t.hour.unique()],
}
times.append(query)
for month in t.month.unique():
query = {
"year": str(year),
"month": str(month),
"day": list(t[t.month == month].day.unique()),
"time": ["%02d:00" % h for h in t[t.month == month].hour.unique()],
}
times.append(query)
return times


Expand Down Expand Up @@ -324,17 +327,27 @@ def retrieve_data(product, chunks=None, tmpdir=None, lock=None, **updates):
fd, target = mkstemp(suffix=".nc", dir=tmpdir)
os.close(fd)

yearstr = ", ".join(atleast_1d(request["year"]))
# Inform user about data being downloaded as "* variable (year-month)"
timestr = f"{request['year']}-{request['month']}"
variables = atleast_1d(request["variable"])
varstr = "".join(["\t * " + v + f" ({yearstr})\n" for v in variables])
logger.info(f"CDS: Downloading variables\n{varstr}")
varstr = "\n\t".join([f"{v} ({timestr})" for v in variables])
logger.info(f"CDS: Downloading variables\n\t{varstr}\n")
result.download(target)

ds = xr.open_dataset(target, chunks=chunks or {})
if tmpdir is None:
logger.debug(f"Adding finalizer for {target}")
weakref.finalize(ds._file_obj._manager, noisy_unlink, target)

# Remove default encoding we get from CDSAPI, which can lead to NaN values after loading with subsequent
# saving due to how xarray handles netcdf compression (only float encoded as short int seem affected)
# Fixes issue by keeping "float32" encoded as "float32" instead of internally saving as "short int", see:
# https://stackoverflow.com/questions/75755441/why-does-saving-to-netcdf-without-encoding-change-some-values-to-nan
# and hopefully fixed soon (could then remove), see https://github.com/pydata/xarray/issues/7691
for v in ds.data_vars:
if ds[v].encoding["dtype"] == "int16":
ds[v].encoding.clear()

return ds


Expand Down
41 changes: 36 additions & 5 deletions examples/create_cutout.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1456,11 +1456,9 @@
"plotting functionality from `xarray` to plot features from\n",
"the cutout's data.\n",
"\n",
"<div class=\"alert alert-info\">\n",
"\n",
"**Warning:** This will trigger `xarray` to load all the corresponding data from disk into memory!\n",
"\n",
"</div>"
"> **Warning**\n",
"> This will trigger `xarray` to load all the corresponding data from disk into memory!\n",
"\n"
]
},
{
Expand All @@ -1469,6 +1467,39 @@
"source": [
"Now that your cutout is created and prepared, you can call conversion functions as `cutout.pv` or `cutout.wind`. Note that this requires a bit more information, like what kind of pv panels to use, where do they stand etc. Please have a look at the other examples to get a picture of application cases."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reducing Cutout file sizes\n",
"\n",
"Cutouts can become quite large, depending on the spatial and temporal scope they cover.\n",
"By default `atlite` uses a trade-off between speed and compression to reduce the file size of cutouts.\n",
"\n",
"Stronger compression can be selected when creating a new cutout by choosing a higher `complevel` (`1` to `9`, default: `4`)\n",
"```\n",
"cutout.prepare(compression={\"zlib\": True, \"complevel\": 9})\n",
"```\n",
"\n",
"To change the compression for an existing cutout:\n",
"```\n",
"cutout = atlite.Cutout(\"cutout-path.nc\")\n",
"\n",
"compression = {\"zlib\": True, \"complevel\": 9}\n",
"for var in cutout.data.data_vars:\n",
" cutout.data[var].encoding.update(compression)\n",
"\n",
"cutout.to_file()\n",
"```\n",
"For details and more arguments for `compression`, see the [xarray documentation](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_netcdf.html) for details.\n",
"\n",
"Alternatively a cutout can also be compressed by using the `netcdf` utility `nccopy` from the commandline:\n",
"\n",
"```\n",
"nccopy -d4 -s <input cutout .nc file> <output cutout .nc file>\n",
"```"
]
}
],
"metadata": {
Expand Down