-
Notifications
You must be signed in to change notification settings - Fork 16
Netcdf files
Caveats of reading and writing netcdf files in different programming languages
-
Problem illustration
Consider writing a 2D array to a netcdf file in
Python
,
from scipy.io import netcdf
import numpy as np
with netcdf.netcdf_file('test.nc','w') as f:
f.createDimension('time',2)
f.createDimension('space',3)
var = f.createVariable('A','d',('time','space')) # create a A variable
# with double precision
var[:] = np.arange(6).reshape((2,3)) # var = array([[0,1,2],[3,4,5]])
However, when you read variable A
in test.nc
from Matlab
,
>> ncdisp('test.nc') % display variables in netcdf file
Source:
test.nc
Format:
classic
Dimensions:
time = 2
space = 3
Variables:
A
Size: 3x2
Dimensions: space,time
Datatype: double
>> ncread('test.nc','A')
ans =
0 3
1 4
2 5
Note the dimensions of A
are switched compared to those in Python
. In fact, A
in Matlab
is the transpose of A
in Python
. Here we just use Python
and Matlab
as an example to demonstrate what could happen when reading and writing netcdf file in different programming languages. More generally, this transposition effect can take place when writing the netcdf file in C or Python
but reading it in Matlab or Fortran
or vice versa.
The fact that the matrix gets transposed could result in some nasty bug, especially when the matrix is a square matrix, in which the transposition cannot be easily identified.
-
Explanation
The inconsistent behavior of dealing with netcdf files across different programming languages is in essence due to different conventions of treating 2D array.
C or Python
uses so-calledrow-majoring
convention whereasMatlab or Fortran
usescolumn-majoring
.Now let's go back to the example above.
When
A
is written totest.nc
inPython
, internally the netcdf file saves a 1D array[0,1,2,3,4,5]
due torow-majoring
convention, which means the 2D array is flatten row by row. Forrow-majoring
, the fastest varying dimension is in the row direction (i.e. dimensionspace
of size 3). The slowest varying dimension istime
, which has a size of 2.When
A
is read fromtest.nc
inMatlab
, sinceMatlab
iscolumn-majoring
, the fastest varying dimension is in the column direction so the 2D array is filled by 1D array column by column with the size of the column (i.e. number of rows) being equal to 3. As a result, the 2D array gets transposed. -
Takeaway
-
When we try to deduce the dimensions of a 2D array in a netcdf file by looking at the source code that writes the array, we need to not only look at the dimensions but also check the consistency of programming languages in terms of majoring convention (i.e.
column-majoring
orrow-majoring
). -
The same caveats apply to arrays with dimensions higher than 2 (i.e. the order of dimensions is exactly the opposite).
-
Reading and writing 1D array across different programming languages is always safe.
-
-
References
- More on column-majoring and row-majoring: http://www.fftw.org/doc/Multi_002ddimensional-Array-Format.html#Multi_002ddimensional-Array-Format
- Stack overflow discussion: http://stackoverflow.com/questions/28275442/netcdf4-python-dimensions-reordered