Impossible to download Earth-scale data #221

davide-f opened this issue Feb 21, 2022 · 2 comments

Impossible to download Earth-scale data #221

davide-f opened this issue Feb 21, 2022 · 2 comments


davide-f commented Feb 21, 2022


While improving PyPSA-Africa towards PyPSA-Earth, I've noticed that when I try to create a cutout using atlite, atlite returns an error because copernicus is not able to convert grib files into netCDF format, as requested by the atlite package, and the workflow crashes.

Expected Behavior

Atlite shall be able to derive the desiderable cutout for the Earth.
The error message below can be reproduced by using the following code:

import atlite
cutout = atlite.Cutout(path="",
                       x=slice(-179.4, 179.7),
                       y=slice(-59.1, 87.3),

Actual Behavior

The workflow stops because Atlite is not able to successfully download the data from copernicus.
The raw grib files seem to be procuced but they cannot be successfully converted into netCDF format, hence the procedure stops.
When creating the cutout for the world exactly :"area": [87.3, -179.4, -59.1, 179.7] in copernicus.

Error Message

The request you have submitted is not valid

Reason:  grib_to_netcdf ERROR: line 4334, nc_enddef: NetCDF: One or more variable sizes violate format constraints Cannot create netCDF classic format, dataset is too large! Try splitting the input GRIB(s). grib_to_netcdf: Version 2.24.2 grib_to_netcdf: Processing input file '/cache/tmp/8599e6f4-dc3e-423e-a66e-72e4d44ac365-adaptor.mars.internal-1645467199.6672807-10228-8-tmp.grib'. grib_to_netcdf: Found 17520 GRIB fields in 1 file. grib_to_netcdf: Ignoring key(s): method, type, stream, refdate, hdate grib_to_netcdf: Creating netCDF file '/cache/data3/' grib_to_netcdf: NetCDF library version: of Dec 10 2015 16:44:18 $ grib_to_netcdf: Creating large (64 bit) file format. grib_to_netcdf: Defining variable 't2m'. grib_to_netcdf: Defining variable 'stl4'.

Your Environment

  • The atlite version used: 0.2.5
  • How you installed atlite (conda, pip or github): conda
  • Operating System: Linux
  • My environment: CentOS Linux
Can reproduce. From a first glance this might be an issue on the CDS API side. classic netCDF is limited in size to ~2 GB, maybe the individual requests are exceeding this size and the api does not internally switch to netCDF v4 (which allows for larger sizes)? We'd need to investigate a bit more, maybe someone else has a clearer view?

Full stack trace:

2022-02-22 08:15:33,413 INFO Requesting data for feature influx...
2022-02-22 08:23:52,061 INFO CDS: Downloading variables
	 * geopotential (2013)

2022-02-22 08:59:55,175 ERROR Message: the request you have submitted is not valid                                                                                                                                                                                                                                                                   
2022-02-22 08:59:55,176 ERROR Reason:  
grib_to_netcdf ERROR: line 4334, nc_enddef: NetCDF: One or more variable sizes violate format constraints

Cannot create netCDF classic format, dataset is too large!
Try splitting the input GRIB(s).
grib_to_netcdf: Version 2.24.2
grib_to_netcdf: Processing input file '/cache/tmp/74fcce81-8102-4388-b11b-f71e8c4d56a3-adaptor.mars.internal-1645514525.1224709-4276-8-tmp.grib'.
grib_to_netcdf: Found 17520 GRIB fields in 1 file.
grib_to_netcdf: Ignoring key(s): method, type, stream, refdate, hdate
grib_to_netcdf: Creating netCDF file '/cache/data5/'
grib_to_netcdf: NetCDF library version: of Dec 10 2015 16:44:18 $
grib_to_netcdf: Creating large (64 bit) file format.
grib_to_netcdf: Defining variable 't2m'.
grib_to_netcdf: Defining variable 'stl4'.

2022-02-22 08:59:55,177 ERROR   Traceback (most recent call last):
2022-02-22 08:59:55,178 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/cdshandlers/services/", line 55, in handle_request
2022-02-22 08:59:55,179 ERROR       result = cached(context.method, proc, context, context.args, context.kwargs)
2022-02-22 08:59:55,179 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/", line 108, in cached
2022-02-22 08:59:55,180 ERROR       result = proc(context, *context.args, **context.kwargs)
2022-02-22 08:59:55,180 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/", line 118, in __call__
2022-02-22 08:59:55,180 ERROR       return p(*args, **kwargs)
2022-02-22 08:59:55,181 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/", line 59, in __call__
2022-02-22 08:59:55,181 ERROR       return self.proc(context, *args, **kwargs)
2022-02-22 08:59:55,181 ERROR     File "/home/cds/cdsservices/services/mars/", line 47, in internal
2022-02-22 08:59:55,181 ERROR       return mars(context, request, **kwargs)
2022-02-22 08:59:55,182 ERROR     File "/home/cds/cdsservices/services/mars/", line 25, in mars
2022-02-22 08:59:55,182 ERROR       grib_to_netcdf(context, requests, info)
2022-02-22 08:59:55,182 ERROR     File "/home/cds/cdsservices/services/mars/", line 42, in grib_to_netcdf
2022-02-22 08:59:55,183 ERROR       context.run_command(*cmd, exception=NetcdfException)
2022-02-22 08:59:55,183 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/", line 209, in run_command
2022-02-22 08:59:55,184 ERROR       raise exception(call, proc.returncode, output)
2022-02-22 08:59:55,184 ERROR 
2022-02-22 08:59:55,185 ERROR   grib_to_netcdf ERROR: line 4334, nc_enddef: NetCDF: One or more variable sizes violate format constraints

Exception                                 Traceback (most recent call last)
Input In [6], in <module>
----> 1 cutout.prepare()

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/, in maybe_remove_tmpdir.<locals>.wrapper(*args, **kwargs)
    100 kwargs["tmpdir"] = mkdtemp()
    101 try:
--> 102     res = func(*args, **kwargs)
    103 finally:
    104     rmtree(kwargs["tmpdir"])

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/, in cutout_prepare(cutout, features, tmpdir, overwrite)
    162"Calculating and writing with module {module}:")
    163 missing_features = missing_vars.index.unique("feature")
--> 164 ds = get_features(cutout, module, missing_features, tmpdir=tmpdir)
    165 prepared |= set(missing_features)

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/, in get_features(cutout, module, features, tmpdir)
     41     feature_data = delayed(get_data)(
     42         cutout, feature, tmpdir=tmpdir, lock=lock, **parameters
     43     )
     44     datasets.append(feature_data)
---> 46 datasets = compute(*datasets)
     48 ds = xr.merge(datasets, compat="equals")
     49 for v in ds:

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    568     keys.append(x.__dask_keys__())
    569     postcomputes.append(x.__dask_postcompute__())
--> 571 results = schedule(dsk, keys, **kwargs)
    572 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in get(dsk, result, cache, num_workers, pool, **kwargs)
     76     elif isinstance(pool, multiprocessing.pool.Pool):
     77         pool = MultiprocessingPoolExecutor(pool)
---> 79 results = get_async(
     80     pool.submit,
     81     pool._max_workers,
     82     dsk,
     83     result,
     84     cache=cache,
     85     get_id=_thread_get_id,
     86     pack_exception=pack_exception,
     87     **kwargs,
     88 )
     90 # Cleanup pools associated to dead threads
     91 with pools_lock:

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    505         _execute_task(task, data)  # Re-execute locally
    506     else:
--> 507         raise_exception(exc, tb)
    508 res, worker_id = loads(res_info)
    509 state["cache"][key] = res

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in reraise(exc, tb)
    313 if exc.__traceback__ is not tb:
    314     raise exc.with_traceback(tb)
--> 315 raise exc

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    218 try:
    219     task, data = loads(task_info)
--> 220     result = _execute_task(task, data)
    221     id = get_id()
    222     result = dumps((result, id))

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in _execute_task(arg, cache, dsk)
    115     func, args = arg[0], arg[1:]
    116     # Note: Don't assign the subtask results to a variable. numpy detects
    117     # temporaries by their reference count and can execute certain
    118     # operations in-place.
--> 119     return func(*(_execute_task(a, cache) for a in args))
    120 elif not ishashable(arg):
    121     return arg

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/dask/, in apply(func, args, kwargs)
     38 def apply(func, args, kwargs=None):
     39     if kwargs:
---> 40         return func(*args, **kwargs)
     41     else:
     42         return func(*args)

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/datasets/, in get_data(cutout, feature, tmpdir, lock, **creation_parameters)
    374     return retrieve_once(retrieval_times(coords, static=True)).squeeze()
    376 datasets = map(retrieve_once, retrieval_times(coords))
--> 378 return xr.concat(datasets, dim="time").sel(time=coords["time"])

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/xarray/core/, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    217 from .dataset import Dataset
    219 try:
--> 220     first_obj, objs = utils.peek_at(objs)
    221 except StopIteration:
    222     raise ValueError("must supply at least one object to concatenate")

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/xarray/core/, in peek_at(iterable)
    192 """Returns the first value from iterable, as well as a new iterator with
    193 the same content as the original iterable
    194 """
    195 gen = iter(iterable)
--> 196 peek = next(gen)
    197 return peek, itertools.chain([peek], gen)

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/datasets/, in get_data.<locals>.retrieve_once(time)
    367 def retrieve_once(time):
--> 368     ds = func({**retrieval_params, **time})
    369     if sanitize and sanitize_func is not None:
    370         ds = sanitize_func(ds)

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/datasets/, in get_data_temperature(retrieval_params)
    190 def get_data_temperature(retrieval_params):
    191     """Get wind temperature for given retrieval parameters."""
--> 192     ds = retrieve_data(
    193         variable=["2m_temperature", "soil_temperature_level_4"], **retrieval_params
    194     )
    196     ds = _rename_and_clean_coords(ds)
    197     ds = ds.rename({"t2m": "temperature", "stl4": "soil temperature"})

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/atlite/datasets/, in retrieve_data(product, chunks, tmpdir, lock, **updates)
    294 assert {"year", "month", "variable"}.issubset(
    295     request
    296 ), "Need to specify at least 'variable', 'year' and 'month'"
    298 client = cdsapi.Client(
    299     info_callback=logger.debug, debug=logging.DEBUG >= logging.root.level
    300 )
--> 301 result = client.retrieve(product, request)
    303 if lock is None:
    304     lock = nullcontext()

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/cdsapi/, in Client.retrieve(self, name, request, target)
    347 def retrieve(self, name, request, target=None):
--> 348     result = self._api("%s/resources/%s" % (self.url, name), request, "POST")
    349     if target is not None:

File ~/miniconda3/envs/atlite/lib/python3.8/site-packages/cdsapi/, in Client._api(self, url, request, method)
    504             break
    505         self.error("  %s", n)
--> 506     raise Exception(
    507         "%s. %s."
    508         % (reply["error"].get("message"), reply["error"].get("reason"))
    509     )
    511 raise Exception("Unknown API state [%s]" % (reply["state"],))

Exception: the request you have submitted is not valid. 
grib_to_netcdf ERROR: line 4334, nc_enddef: NetCDF: One or more variable sizes violate format constraints

Cannot create netCDF classic format, dataset is too large!
Try splitting the input GRIB(s).
grib_to_netcdf: Version 2.24.2
grib_to_netcdf: Processing input file '/cache/tmp/74fcce81-8102-4388-b11b-f71e8c4d56a3-adaptor.mars.internal-1645514525.1224709-4276-8-tmp.grib'.
grib_to_netcdf: Found 17520 GRIB fields in 1 file.
grib_to_netcdf: Ignoring key(s): method, type, stream, refdate, hdate
grib_to_netcdf: Creating netCDF file '/cache/data5/'
grib_to_netcdf: NetCDF library version: of Dec 10 2015 16:44:18 $
grib_to_netcdf: Creating large (64 bit) file format.
grib_to_netcdf: Defining variable 't2m'.
grib_to_netcdf: Defining variable 'stl4'.

Contributor Author

davide-f commented Feb 22, 2022

I'm wondering whether atlite needs to download copernicus data as NC format; the nc extraction is also experimental in copernicus.
A possible solution could be to use the raw grib format when downloading data from copernicus, xarray should be able to load such data accordingly, yet I'm not sure whether the loading may need some adaptations.

Update: maybe not possible; see

