Netcdf files

Caveats of reading and writing netcdf files in different programming languages

Problem illustration

Consider writing a 2D array to a netcdf file in Python,

from scipy.io import netcdf
import numpy as np
with netcdf.netcdf_file('test.nc','w') as f:
   f.createDimension('time',2)
   f.createDimension('space',3)
   var = f.createVariable('A','d',('time','space')) # create a A variable 
                                                    # with double precision
   var[:] = np.arange(6).reshape((2,3)) # var = array([[0,1,2],[3,4,5]])

However, when you read variable A in test.nc from Matlab,

>> ncdisp('test.nc') % display variables in netcdf file
Source:
           test.nc
Format:
           classic
Dimensions:
           time  = 2
           space = 3
Variables:
    A
           Size:       3x2
           Dimensions: space,time
           Datatype:   double
>> ncread('test.nc','A')
ans =

     0     3
     1     4
     2     5

Note the dimensions of A are switched compared to those in Python. In fact, A in Matlab is the transpose of A in Python. Here we just use Python and Matlab as an example to demonstrate what could happen when reading and writing netcdf file in different programming languages. More generally, this transposition effect can take place when writing the netcdf file in C or Python but reading it in Matlab or Fortran or vice versa.

The fact that the matrix gets transposed could result in some nasty bug, especially when the matrix is a square matrix, in which the transposition cannot be easily identified.

Explanation

The inconsistent behavior of dealing with netcdf files across different programming languages is in essence due to different conventions of treating 2D array. C or Python uses so-called row-majoring convention whereas Matlab or Fortran uses column-majoring.

Now let's go back to the example above.

When A is written to test.nc in Python, internally the netcdf file saves a 1D array [0,1,2,3,4,5] due to row-majoring convention, which means the 2D array is flatten row by row. For row-majoring, the fastest varying dimension is in the row direction (i.e. dimension space of size 3). The slowest varying dimension is time, which has a size of 2.

When A is read from test.nc in Matlab, since Matlab is column-majoring, the fastest varying dimension is in the column direction so the 2D array is filled by 1D array column by column with the size of the column (i.e. number of rows) being equal to 3. As a result, the 2D array gets transposed.
Takeaway
- When we try to deduce the dimensions of a 2D array in a netcdf file by looking at the source code that writes the array, we need to not only look at the dimensions but also check the consistency of programming languages in terms of majoring convention (i.e. column-majoring or row-majoring).
- The same caveats apply to arrays with dimensions higher than 2 (i.e. the order of dimensions is exactly the opposite).
- Reading and writing 1D array across different programming languages is always safe.
References
- More on column-majoring and row-majoring: http://www.fftw.org/doc/Multi_002ddimensional-Array-Format.html#Multi_002ddimensional-Array-Format
- Stack overflow discussion: http://stackoverflow.com/questions/28275442/netcdf4-python-dimensions-reordered

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Netcdf files

Clone this wiki locally