-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_netcdf() fails to append to an existing file #1215
Comments
An even simpler example: import os
import xarray as xr
path = 'test.nc'
if os.path.exists(path):
os.remove(path)
ds = xr.Dataset()
ds['dim'] = ('dim', [0, 1, 2])
ds['var1'] = ('dim', [10, 11, 12])
ds['var2'] = ('dim', [13, 14, 15])
ds[['var1']].to_netcdf(path)
ds[['var2']].to_netcdf(path, 'a') |
Note that the problems occurs because the backend wants to write the |
Good catch! Marking this as a bug. |
I did a few tests: the regression happened in #1017 Something in the way coordinates variables have changes implies that the writing is happening differently now. The question is whether this should now be handled downstream (in the netcdf backend) or upstream (at the dataset level)? |
OK, I understand what's going on now. Previously, we had a hack that disabled writing variables along dimensions of the form So although your example worked in v0.8.2, this small variation did not, because we call ds = xr.Dataset()
ds['dim'] = ('dim', [1, 2, 3])
ds['var1'] = ('dim', [10, 11, 12])
ds.to_netcdf(path)
ds = xr.Dataset()
ds['dim'] = ('dim', [1, 2, 3])
ds['var2'] = ('dim', [10, 11, 12])
ds.to_netcdf(path, 'a') I find it reassuring that this only worked in limited cases before, so it unlikely that many users are depending on this functionality. It would be nice if My main concern with squeezing this in is that the proper behavior is not entirely clear and will need to go through some review:
|
I see.
Agreed, but it would be good to get this working some day. For now I can see an easy workaround for my purposes. Another possibility would be to give the user control on whether existing variables should be ignored, overwritten or raise an error when appending to a file. |
@fmaussion and @shoyer - I have a use case that could use this. I'm wondering if either of you have looked at this any further since January? If not, I'll propose a path forward that fits my use case and we can iterate on the details until we're satisfied:
I don't think loading variables already written to disk is practical. My preference would be to only append missing variables/coordinates.
differing dims: raise an error I'd like to implement this but to keep it as simple as possible. A trivial use case like this should work: fname = 'out.nc'
dates = pd.date_range('2016-01-01', freq='1D', periods=45)
ds = xr.Dataset()
for var in ['A', 'B', 'C']:
ds[var] = xr.DataArray(np.random.random((len(dates), 4, 5)),
dims=('time', 'x', 'y'), coords={'time': dates})
for var in ds.data_vars:
ds[[var]].to_netcdf(fname, mode='a') |
@jhamman no I haven't looked into this any further (and I also forgot what my workaround at that time actually was). I also think your example should work, and that we should never check for values on disk: if the dims and coordinates names match, write the variable and assume the coordinates are ok. If the variable already exists on file, match the behavior of netCDF4 (I actually don't know what netCDF4 does in that case) |
+1, we probably don't want to read coordinates back from disk
…On Wed, Oct 4, 2017 at 12:09 PM Fabien Maussion ***@***.***> wrote:
@jhamman <https://github.com/jhamman> no I haven't looked into this any
further (and I also forgot what my workaround at that time actually was).
I also think your example should work, and that we should never check for
values on disk: if the dims and coordinates names match, write the variable
and assume the coordinates are ok.
If the variable already exists on file, match the behavior of netCDF4 (I
actually don't know what netCDF4 does in that case)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1215 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1u6et-S2R4rynbf2vr8bM1Y4WKdjks5so9gEgaJpZM4LmQme>
.
|
Is it now possible to append to a netCDF file using xarray? I have some tabular data that is read into a dataframe in chunks from a large file. The goal is write in chunks to netCDF. If so, could someone please provide a simple code example. I am receiving an RuntimeError 'NetCDF: String match to name in use' as well. Thank you. |
No, it is not. This issue is about appending new variables to an existing netCDF file. I think what you are looking for is to append along existing dimensions to a netCDF file. This is possible in the netCDF data model, but not yet supported by xarray. See #1398 for some discussion. For these types of use cases, I would generally recommend writing a new netCDF file, and then loading everything afterwards using |
Thank you! I will give xarray.open_mfdataset a shot. Just one question - is this approach memory conservative? My reasoning for chunking in the first place is large file size. |
Yes, |
The following code used to work well in v0.8.2:
On master, it fails with:
The text was updated successfully, but these errors were encountered: