Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in extension for complex numbers? #1280

Closed
ZedThree opened this issue Oct 6, 2023 · 15 comments · Fixed by #1295
Closed

Interest in extension for complex numbers? #1280

ZedThree opened this issue Oct 6, 2023 · 15 comments · Fixed by #1295

Comments

@ZedThree
Copy link
Contributor

ZedThree commented Oct 6, 2023

I have a (currently proof-of-concept) library extending netCDF to support complex numbers, including all of the most common conventions that people are currently using. This is implemented as a C library so that I can consistently extend all the language bindings. The Python API is built on top of netcdf4-python and is an entirely drop-in replacement/extension. Here's an example of writing a complex array and reading it back, demonstrating the correct dtype both ways:

import nc_complex as netCDF4
import numpy as np

complex_array = np.array([0 + 0j, 1 + 0j, 0 + 1j, 1 + 1j, 0.25 + 0.75j], dtype="c16")

with netCDF4.Dataset(filename, "w") as f:
    f.createDimension("x", size=len(complex_array))
    complex_var = f.createVariable("complex_data", "c16", ("x",))
    complex_var[:] = complex_array

with netCDF4.Dataset(filename, "r") as f:
    print(f["complex_data"])
    print(f["complex_data"][:])

# <class 'nc_complex.Variable'>
# compound data_dim(x)
# compound data type: complex128
# unlimited dimensions:
# current shape = (5,)
# [0.  +0.j   1.  +0.j   0.  +1.j   1.  +1.j   0.25+0.75j]

Is there potentially any interest in merging the modifications for the Python API in here? It would require using my C library and switching out some calls to my wrappers. It could also be possible to make this behaviour optional if that was a concern.

As a side note, I found scikit-build-core really, really useful for building the Cython module. Might simplify the build system here too?

@jswhit
Copy link
Collaborator

jswhit commented Oct 10, 2023

what does an ncdump of the file in your example look like?

@ZedThree
Copy link
Contributor Author

The files created by nc_complex are all completely standard netCDF4:

netcdf nc_complex_test {
types:
  compound _PFNC_DOUBLE_COMPLEX_TYPE {
    double r ;
    double i ;
  }; // _PFNC_DOUBLE_COMPLEX_TYPE
dimensions:
        x = 5 ;
variables:
        _PFNC_DOUBLE_COMPLEX_TYPE complex_data(x) ;
data:

 complex_data = {0, 0}, {1, 0}, {0, 1}, {1, 1}, {0.25, 0.75} ;
}

and here's what it looks like when read by netcdf4-python:

import netCDF4

with netCDF4.Dataset("nc_complex_test.nc", "r") as f:
    print(f["complex_data"])
    print(f["complex_data"][:])

# <class 'netCDF4._netCDF4.Variable'>
# compound complex_data(x)
# compound data type: {'names': ['r', 'i'], 'formats': ['<f8', '<f8'], 'offsets': [0, 8], 'itemsize': 16, 'aligned': True}
# unlimited dimensions: 
# current shape = (5,)
# [(0.  , 0.  ) (1.  , 0.  ) (0.  , 1.  ) (1.  , 1.  ) (0.25, 0.75)]

The other important feature is that nc_complex also reads variables created with a complex dimension as native complex types:

filename = "complex_dim.nc"

with netCDF4.Dataset(filename, "w") as f:
    f.createDimension("x", size=len(complex_array))
    f.createDimension("ri", size=2)
    c_ri = f.createVariable("data_dim", np.float64, ("x", "ri"))
    as_dim_array = np.vstack((complex_array.real, complex_array.imag)).T
    c_ri[:] = as_dim_array
    print(c_ri)
    print(c_ri[:])

# <class 'netCDF4._netCDF4.Variable'>
# float64 data_dim(x, ri)
# unlimited dimensions: 
# current shape = (5, 2)
# filling on, default _FillValue of 9.969209968386869e+36 used
# [[0.   0.  ]
#  [1.   0.  ]
#  [0.   1.  ]
#  [1.   1.  ]
#  [0.25 0.75]]

with nc_complex.Dataset(filename, "r") as f:
    print(f["data_dim"])
    print(f["data_dim"][:])

# <class 'nc_complex.Variable'>
# compound data_dim(x)
# compound data type: complex128
# unlimited dimensions: 
# current shape = (5,)
# [0.  +0.j   1.  +0.j   0.  +1.j   1.  +1.j   0.25+0.75j]

And here's ncdump for that file to show that the underlying file is unchanged:

$ ncdump complex_dim.nc 
netcdf complex_dim {
dimensions:
        x = 5 ;
        ri = 2 ;
variables:
        double data_dim(x, ri) ;
data:

 data_dim =
  0, 0,
  1, 0,
  0, 1,
  1, 1,
  0.25, 0.75 ;
}

@davidhassell
Copy link

Hi @ZedThree,

Thanks for these examples. I don't see how you know to read a complex type from the double array with shape (5, 2) in complex_dim.nc - what am I missing?

Thanks,
David

@ZedThree
Copy link
Contributor Author

@davidhassell I check for a dimension of length 2 that has a recognised name, currently either ri or complex -- strictly, it should also be the last dimension.. These are based on real codes that use these names (other examples I found are two and depth (!), but I don't check for these).

This could be configurable, and I certainly plan to expand this to cover other cases as I come across them!

@davidhassell
Copy link

Thanks, @ZedThree - I'm not keen on deciding whether or not something is complex just based on its netCDF dimension names, though!

@ZedThree
Copy link
Contributor Author

Well, it must also be length 2 and I could also check for certain attributes, but yes it's possible there would be false positives, so that behaviour will be configurable. The point is really that it can handle that convention as well.

@jswhit
Copy link
Collaborator

jswhit commented Oct 11, 2023

Is this compatible with the h5py complex type, so that hdf5 files with complex data written by h5py would be readable by netcdf4-python and vice versa?

@davidhassell
Copy link

@ZedThree wrote:

... so that behaviour will be configurable ...

That's good. Would the default be to not apply any such convention?

@ZedThree
Copy link
Contributor Author

@jswhit Yes, though only in the next release of netcdf-C (4.9.3, fixed in PR #2655). I'm building wheels of nc-complex here which already support this. Netcdf4-python built using that version of netcdf-C will also be able to read them, though only as a compound type.

If I find time, I may also see about fixing h5py to be compatible with older versions of netcdf-C.

@davidhassell Depends how strong people's feelings are one way or the other :) I would definitely be interested in knowing if there are any false positives, that would make up my mind quite quickly!

@davidhassell
Copy link

Hie @ZedThree, I think the default should not go against what the data model and conventions that defined netCDF, so I would say the default should be to not interpret such (n, 2) variables as complex. However, I would have no problem with the netCDF4-python API offering options to change this.

Any extra attributes to indicate such variables are complex should be part of CF.

@kmuehlbauer
Copy link

Coming from pydata/xarray#8288 I think that handling complex numbers should be a first class citizen of netcdf4-python.

@ZedThree has made great efforts to investigate current implementations in this extensive write-up (https://github.com/PlasmaFAIR/nc-complex/blob/main/README.md) and make them accessible by his netcdf4-python extension.

+1 for considering this for implementation into netcdf4-python.

@jswhit
Copy link
Collaborator

jswhit commented Oct 26, 2023

@ZedThree just to clarify, this feature will require netcdf-c 4.9.3 (or current netcdf-c develop), correct?

@ZedThree
Copy link
Contributor Author

@jswhit Nope, not required, but using current netcdf-c main does enable compatibility with h5netcdf

@jswhit
Copy link
Collaborator

jswhit commented Oct 26, 2023

@ZedThree thanks. I think this would be a great addition (as long as the auto-detection is off by default). Please do submit a PR when you have time.

@ZedThree
Copy link
Contributor Author

Great, thanks! Yep, I'll add a runtime option (probably to Dataset?) to enable auto-detection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants