Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

' ' at end of variable name causes to_netcdf() to crash #8029

Closed
4 tasks done
28raining opened this issue Jul 28, 2023 · 6 comments
Closed
4 tasks done

' ' at end of variable name causes to_netcdf() to crash #8029

28raining opened this issue Jul 28, 2023 · 6 comments

Comments

@28raining
Copy link

What happened?

If variable name ends in a ' ' (space) then to_netcdf crashes.

In my opinion

  • At least the error message could tell which variable name has the issue
  • If space at the end is not allowed, then could you rstrip everything
  • It works with scipy xarr.to_netcdf(newName, engine="scipy")

This is related to #7943

Our tool converts csv files to XARRAY.
So these kind of errors need to self-heal, otherwise we have to feedback to the user and get them to change the csv file.
Which would be frustrating for everyone

What did you expect to happen?

No response

Minimal Complete Verifiable Example

with open('endSpace.csv','w') as f:
  f.write('''PASS ,temperature
PASS, 10
FAIL, 20''')

import pandas as pd
df = pd.read_csv('endSpace.csv')
index = ['temperature']
df = df.set_index(index)
xarr = df.to_xarray()
xarr.to_netcdf("new.nc")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Traceback (most recent call last):
  File "/Users/will/Documents/GitHub/czdashboard/old/coordNameBug.py", line 11, in <module>
    xarr.to_netcdf("new.nc")
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/core/dataset.py", line 1911, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/backends/api.py", line 1217, in to_netcdf
    dump_to_store(
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/backends/api.py", line 1264, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/backends/common.py", line 271, in store
    self.set_variables(
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/backends/common.py", line 309, in set_variables
    target, source = self.prepare_variable(
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/backends/netCDF4_.py", line 488, in prepare_variable
    nc4_var = self.ds.createVariable(
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2945, in netCDF4._netCDF4.Dataset.createVariable
  File "src/netCDF4/_netCDF4.pyx", line 4164, in netCDF4._netCDF4.Variable.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2014, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Name contains illegal characters

Anything else we need to know?

No response

Environment

commit : 2e218d10984e9919f0296931d92ea851c6a6faf5
python : 3.11.2.final.0
python-bits : 64
OS : Darwin
OS-release : 22.5.0
Version : Darwin Kernel Version 22.5.0: Thu Jun 8 22:21:34 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T8112
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.3
numpy : 1.24.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.1.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : 2023.2.0
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@28raining 28raining added bug needs triage Issue that has not been reviewed by xarray team member labels Jul 28, 2023
@headtr1ck
Copy link
Collaborator

Thanks for the report.

According to the netcdf standard:

"Names that have trailing space characters are also not permitted."

See:
https://docs.unidata.ucar.edu/nug/current/netcdf_data_set_components.html

Maybe you could add a quick cleanup script before writing?

@28raining
Copy link
Author

I have done that. I
have over 100 variables so to figure it was a trailing space was time consuming.

And as in other ticket, how many of these random gotchas are there.

At least the error message needs to be better.
Ideally the tool should just handle it; either to_netcdf cleans up, or to_xarray cleans up

@headtr1ck
Copy link
Collaborator

This error is raised by netCDF4, xarray doesn't care about the variable names, they don't even need to be strings.

You could open an issue with netCDF4 to improve the error message. I agree that it should at least contain the invalid name and possibly even the violated rule.

@headtr1ck headtr1ck added topic-error reporting upstream issue and removed bug needs triage Issue that has not been reviewed by xarray team member labels Jul 30, 2023
@TomNicholas
Copy link
Member

TomNicholas commented Jul 30, 2023

Ideally the tool should just handle it; either to_netcdf cleans up, or to_xarray cleans up

to_xarray shouldn't clean it up because spaces (and forward slashes) are (currently) valid characters in xarray variable names.

to_netcdf probably shouldn't automatically clean it up either because then you will silently write a file with different metadata than specified, which would break round-tripping.

What we could do is raise a better error message (as @headtr1ck says), which ideally would be implemented upstream in netCDF4 but could also be implemented as a check within the corresponding xarray backend, similar to #7953.

@28raining if you want to open a pull request to add an error message like #7953 does then I think that would be welcome (others please point out if you see an issue with that).

@headtr1ck
Copy link
Collaborator

I totally agree.

In general I would advise against implementing the same checks on xarray side, this just leads to divergent behavior in the future.
But in this case, I would say that the netcdf rules are put in stone and additionally we can advice users to try a different backend that allows invalid netcdfs.

@ZedThree
Copy link
Contributor

ZedThree commented Aug 2, 2023

I've opened a PR upstream to at least make the error messages more helpful: Unidata/netcdf4-python#1268.

Suggestions welcome on further improvements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants