Reading single grid cells from a multi-file netcdf dataset? #2979

naught101 · 2019-05-22T05:01:50Z

I have a multifile dataset made up of month-long 8-hourly netcdf datasets over nearly 30 years. The files are available from ftp://ftp.ifremer.fr/ifremer/ww3/HINDCAST/GLOBAL/, and I'm spcifically looking at e.g. 1990_CFSR/hs/ww3.199001_hs.nc for each year and month. Each file is about 45Mb, for about 15Gb total.

I want to calculate some lognormal distribution parameters of the Hs variable at each grid point (actually, only a smallish subset of points, using a mask). However, if I load the data with open_mfdataset and try to read a single lat/lon grid cell, my computer tanks, and python gets killed due to running out of memory (I have 16Gb, but even if I only try to open 1 year of data - ~500Mb, python ends up using 27% of my memory).

Is there a way in xarray/dask to force dask to only read single sub-arrays at a time? I have tried using lat/lon chunking, e.g.

mfdata_glob = '/home/nedcr/cr/data/wave/*1990*.nc'
global_ds = xr.open_mfdataset(
    mfdata_glob,
    chunks={'latitude': 1, 'longitude': 1})

but that doesn't seem to improve things.

Is there any way around this problem? I guess I could try using preprocess= to sub-select grid cells, and loop over that, but that seems like it would require opening and reading each file 317*720 times, which sounds like a recipe for a long wait.

The text was updated successfully, but these errors were encountered:

TomNicholas · 2019-05-23T16:15:54Z

Have you seen #1823? It sounds like you might be having the same issue: xarray loads coordinate information into memory to check alignment is correct, but for many datasets with large coordinate arrays this could be prohibitive.

You know your variables are aligned so you could try the workaround suggested in that thread: give the coordinates to drop_variables, then update them from a single master dataset (because presumably your latitude and longitude don't depend on time!)

naught101 changed the title ~~Reading single grid cells from a? multi-file netcdf dataset~~ Reading single grid cells from a multi-file netcdf dataset? May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading single grid cells from a multi-file netcdf dataset? #2979

Reading single grid cells from a multi-file netcdf dataset? #2979

naught101 commented May 22, 2019 •

edited

Loading

TomNicholas commented May 23, 2019

Reading single grid cells from a multi-file netcdf dataset? #2979

Reading single grid cells from a multi-file netcdf dataset? #2979

Comments

naught101 commented May 22, 2019 • edited Loading

TomNicholas commented May 23, 2019

naught101 commented May 22, 2019 •

edited

Loading