NCZarr memory leak with NetCDF 4.9.0 #2733

uweschulzweida · 2023-08-09T08:34:32Z

I have a large ZARR data set. I want to read it time step by time step. This causes me to exceed the memory limit (1TB) on my machine. It looks like all read data is kept uncompressed in memory. Is this intentional or is it a memory leak?
I am using netCDF 4.9.0. In my application only one time step is stored at a time. When I read the same data set as NetCDF4, my application only needs 100MB.

uweschulzweida · 2023-08-09T08:43:42Z

The ZARR dataset is only spatially chunked and not in time.

DennisHeimbigner · 2023-08-09T18:13:01Z

The data that is read is stored in a per-variable cache. If you have been accessing many variables you may
have a problem. It is also possible that the cache is not properly removing old entries.

uweschulzweida · 2023-08-10T12:27:10Z

Is it possible to limit the size of the cache to a user defined value?

DennisHeimbigner · 2023-08-10T18:35:21Z

Yes. Th
ere is a function in netcdf.h with the following signature

int nc_set_var_chunk_cache(int ncid, int varid, size_t size, size_t nelems, float preemption);

It sets parameters for the per-variable cache.
The "size" parameter specifies the max amount of space (in bytes) used by the cache.
The "nelems" parameter specifies the max number of pages stored in the cache.
The "preemption" parameter is unused by NCZarr.

So I would suggest calling the function for the given variable with the size set to the space you are willing to use, and with the nelems parameters set to some large number (say 1000) so that the size parameter is the only one that will have an effect. The preemption parameter can be set to 0.5 since it is unused.

re: Unidata#2733 When addressing the above issue, I noticed that there was a disconnect in NCZarr between nc_set_chunk_cache and nc_set_var_chunk cache. Specifically, setting nc_set_chunk_cache had no impact on the per-variable cache parameters when nc_set_var_chunk_cache was not used. So, modified the NCZarr code so that the per-variable cache parameters are set in this order (#1 is first choice): 1. The values set by nc_set_var_chunk_cache 2. The values set by nc_set_chunk_cache 3. The defaults set by configure.ac

uweschulzweida · 2023-08-14T11:40:06Z

I have now tested nc_set_var_chunk_cache() and nc_set_chunk_cache() with different parameters and could not see any difference in overall memory usage for my workflow.
I have a time series of a relatively large field. And I write and read this time series time step by time step. This normally only requires the memory of one time step. As expected, this works fine with NetCDF4. With NetCDF zarr memory is used for all time steps.
A complete example for writing and reading zarr can be found at: https://github.com/uweschulzweida/nczarr_example/tree/main
With NetCDF4 130 MB and with Zarr 18000 MB is needed. The problem occurs both when reading and writing. I have now also tested it with NetCDF 4.9.2.
Here is a short example how I read the data:

  const char *filename = "file://testdata.zarr#mode=zarr,file";
  constexpr size_t chunkSize = 262144; // 256k
  constexpr size_t numCells = 50 * chunkSize;
  constexpr size_t numSteps = 360;

  int ncId;
  nce(nc_open(filename, NC_NOWRITE, &ncId));

  int varId;
  nce(nc_inq_varid(ncId, "var", &varId));

  size = 4 * numCells; // one float field at one time step
  nelems = 1000;
  preemption = 0.5;
  nce(nc_set_chunk_cache(size, nelems, preemption));
  nce(nc_set_var_chunk_cache(ncId, varId, size, nelems, preemption));

  {
    std::vector<float> var(numCells);
    for (size_t i = 0; i < numSteps; ++i)
      {
        size_t start[2] = {i, 0}, count[2] = {1, numCells};
        nce(nc_get_vara_float(ncId, varId, start, count, var.data()));
      }
  }

  nce(nc_close(ncId));

re: PR Unidata#2734 re: Issue Unidata#2733 As a result of an investigation by https://github.com/uweschulzweida, I discovered a significant bug in the NCZarr cache management. This PR extends the above PR to fix that bug. ## Change Overview * Insert extra checks for cache overflow. * Added test cases contingent on the --enable-large-file-tests option. * The Columbia server is down, so it has been temporarily disabled.

WardF added this to the 4.9.3 milestone Aug 10, 2023

DennisHeimbigner mentioned this issue Aug 10, 2023

Cleanup the handling of cache parameters. #2734

Merged

DennisHeimbigner mentioned this issue Aug 17, 2023

Fix major bug in the NCZarr cache management #2737

Merged

WardF closed this as completed in #2737 Aug 17, 2023

DennisHeimbigner mentioned this issue Jan 13, 2025

Severe performance degradation in 4.9.3-rc2 #3067

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCZarr memory leak with NetCDF 4.9.0 #2733

NCZarr memory leak with NetCDF 4.9.0 #2733

uweschulzweida commented Aug 9, 2023

uweschulzweida commented Aug 9, 2023

DennisHeimbigner commented Aug 9, 2023

uweschulzweida commented Aug 10, 2023

DennisHeimbigner commented Aug 10, 2023 •

edited

Loading

uweschulzweida commented Aug 14, 2023

NCZarr memory leak with NetCDF 4.9.0 #2733

NCZarr memory leak with NetCDF 4.9.0 #2733

Comments

uweschulzweida commented Aug 9, 2023

uweschulzweida commented Aug 9, 2023

DennisHeimbigner commented Aug 9, 2023

uweschulzweida commented Aug 10, 2023

DennisHeimbigner commented Aug 10, 2023 • edited Loading

uweschulzweida commented Aug 14, 2023

DennisHeimbigner commented Aug 10, 2023 •

edited

Loading