-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions of netcdf-4 read performance on GFS surface and atmosphere data files #1543
Comments
OK, I have been looking into this a bit. I have been looking at file gfs.t00z.sfcf024.nc. This file is 1087655544 bytes, or 1.09 GB. It is a netCDF-4 classic model file with 153 vars, there are 5 coord-vars, and 148 NC_FLOAT data vars. The data vars have a size of 1 x 1536 x 3072. (This is 4718592 values, or 18874368 bytes ~ 19 MB per var uncompressed.) Here's a typical example of a data var:
Note that the shuffle filter is in use, and deflate level is set to 1. |
@junwang-noaa in the other discussion you say:
I have looked at the sfcf file, and the data are compressed (deflated). Do I have something incorrect here? |
Some quick results from the surface data file:
So compression is a significant factor here. However, it brings considerable benefit. Uncompressed, the surface file is 2.87 GB. Compressed it is only 1.09 GB. |
Thanks lot for the analysis! As you pointed out it's critical to have
compressed data as we are increasing vertical resolution. For the atm file
with 3D fields, we use lossy compression, which further reduced the data
size, but it also takes even longer.
…On Wednesday, November 20, 2019, Edward Hartnett ***@***.***> wrote:
Some quick results from the surface data file:
- Reading the original netCDF4 compressed file takes 8 s.
- Reading a classic version of the same file (uncompressed) takes 4 s.
- Reading a netCDF-4 uncompressed version of the file takes 1.5 s.
So compression is a significant factor here. However, it brings
considerable benefit. Uncompressed, the surface file is 2.87 GB. Compressed
it is only 1.09 GB.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TKUBOC43OXILE7222TQUVUMTA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESX5JY#issuecomment-556105383>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TMTZSP4N3MFYMXYCATQUVUMTANCNFSM4JPSS4JQ>
.
|
I find no indication of lossy compression in the file. Here's what I am seeing:
|
We are using nbits=14 for lossy compression, deflate=1, the data itself is
real4.
cld_amt:nbits = 14
…On Wednesday, November 20, 2019, Edward Hartnett ***@***.***> wrote:
I find no indication of lossy compression in the file. Here's what I am
seeing:
float cld_amt(time, pfull, grid_yt, grid_xt) ;
cld_amt:long_name = "cloud amount" ;
cld_amt:units = "1" ;
cld_amt:missing_value = -1.e+10f ;
cld_amt:_FillValue = -1.e+10f ;
cld_amt:cell_methods = "time: point" ;
cld_amt:output_file = "atm" ;
cld_amt:max_abs_compression_error = 3.057718e-05f ;
cld_amt:nbits = 14 ;
cld_amt:_Storage = "chunked" ;
cld_amt:_ChunkSizes = 1, 22, 308, 615 ;
cld_amt:_DeflateLevel = 1 ;
cld_amt:_Endianness = "little" ;
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TP62LAGCYLLTVSIY33QUWA4ZA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEETXS5Y#issuecomment-556235127>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TOCGSTPM4MPN4CN7ZLQUWA4ZANCNFSM4JPSS4JQ>
.
|
OK, what do you mean by nbits=14 for lossy compression? I see that the deflate level is 1, but the netcdf-4 deflation of vars has nothing to do with nbits... Did you apply some transformation on the data before you wrote it? Each value is currently stored as a deflated 32-bit floating point. Is that what you intend? |
We actually did transform the code before writing them. The code was done
by Jeff Whitaker using method from doi:10.5194/gmd-10-413-2017. We are
using nbits=14 for the data sets I provided.
elemental real function quantized(dataIn, nbits, dataMin, dataMax)
integer, intent(in) :: nbits
real(4), intent(in) :: dataIn, dataMin, dataMax
real(4) offset, scale_fact
! convert data to 32 bit integers in range 0 to 2**nbits-1, then cast
! cast back to 32 bit floats (data is then quantized in steps
! proportional to 2**nbits so last 32-nbits in floating
! point representation should be zero for efficient zlib compression).
scale_fact = (dataMax - dataMin) / (2**nbits-1); offset = dataMin
quantized = scale_fact*(nint((dataIn - offset) / scale_fact)) + offset
end function quantized
…On Wed, Nov 20, 2019 at 3:55 PM Edward Hartnett ***@***.***> wrote:
OK, what do you mean by nbits=14 for lossy compression? I see that the
deflate level is 1, but the netcdf-4 deflation of vars has nothing to do
with nbits... Did you apply some transformation on the data before you
wrote it?
Each value is currently stored as a deflated 16-bit floating point. Is
that what you intend?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TM64M4N2BYWVPE4OEDQUWP5LA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEU33BY#issuecomment-556383623>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TLOMICMFRBHOMGNJO3QUWP5LANCNFSM4JPSS4JQ>
.
|
OK, thanks for that info. Here's a chart showing the read rate (MB/s) for various combinations of chunksizes, deflation, and shuffle. The thing to note here is the difference between all combinations with no deflation (the ones that are very high). As we see, uncompressing the data is a major factor in read rate: |
Sorry, I just realized that you are working on surface file. The two files
are avtually model forecast output(history files), they are not restart
files. The surface file is lossless compression due to the land sea mask
issue. So nbits is not applied. The atmf file is lossy compressed using
the method I just described.
Does the figure below means the surface file is compressed more with
shuffle filter and can be read it faster?
…On Wednesday, November 20, 2019, Edward Hartnett ***@***.***> wrote:
Does the shuffle filter help when compressing these float data? Yes it
does, across a wide range of chunksize choices.
[image: Effect of Shuffle Filter on GFS Surface Restart File]
<https://user-images.githubusercontent.com/38856240/69279962-a3dd0580-0ba2-11ea-8c09-f82380697cd1.png>
I thought that the shuffle filter would make reads slower, but it has the
opposite effect:
[image: Read Rate vs Shuffle Filter for GFS Surface Restart]
<https://user-images.githubusercontent.com/38856240/69280819-5f526980-0ba4-11ea-9e68-7d7179dacb78.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TOFWKUK7BEZCUUL6NTQUWVXFA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEVJO4A#issuecomment-556439408>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TLU3ZXNSSZ6ZPM7ZBLQUWVXFANCNFSM4JPSS4JQ>
.
|
Yes, the surface file is faster and more compressed because you have used the shuffle filter. I am going to take a look at the atmosphere file today. So far it looks like netcdf-4 is a good deal faster than netcdf classic, until you turn on compression. THere's no free lunch! When you compress the data, it takes extra time to uncompress. However, you should not be seeing the slowdown that you are. It does not take me anything like 5 minutes to read these files. Does the GFS do all output in netcdf-4 compressed now? |
I wondered last night whether some buffering might be going on which was impacting the re-read time for these files. So this morning I changed the program to make a copy of the written file, and re-read that copy; This will defeat any buffering that is going on. But the results are the same. When comparing netCDF-4 and netCDF classic, netCDF-4 is 2 or 3 times faster reading the file When compression is turned on, then netCDF-4 is much slower reading the file. But the file is much smaller, which makes it easier to store and also easier to transfer around. I am continuing to experiment. I will keep this issue updated with my results... |
We are working on GFSv16, which will be implemented next year, it is using
netcdf-4.
May I ask how long it takes you to read the two files? Would you please
send us the code of reading the files? If you like, I can send you our code
of reading the files, which is taking 5 mins. Thanks a lot for the analysis.
…On Thursday, November 21, 2019, Edward Hartnett ***@***.***> wrote:
Yes, the surface file is faster and more compressed because you have used
the shuffle filter.
I am going to take a look at the atmosphere file today.
So far it looks like netcdf-4 is a good deal faster than netcdf classic,
until you turn on compression. THere's no free lunch! When you compress the
data, it takes extra time to uncompress.
However, you should not be seeing the slowdown that you are. It does not
take me anything like 5 minutes to read these files.
Does the GFS do all output in netcdf-4 compressed now?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TKU736WVB5KYOJFXVTQUZZ4XA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEZ7PKA#issuecomment-557053864>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TIOAJSVJL4BVDMKZP3QUZZ4XANCNFSM4JPSS4JQ>
.
|
Yes, please send me your code. Here's code that reads the surface file:
|
Thanks for sharing the code. Our code is in Fortran, I may change it to
Fortran to see if it improves the reading speed.
One of our code to read these files is in POST processing. You can find it
on github at:
https://github.com/NOAA-EMC/EMC_post/blob/develop/sorc/ncep_post.fd/INITPOST_GFS_NETCDF.f
Please let us know if you see anything that slows down the reading process.
Thanks.
…On Thu, Nov 21, 2019 at 6:27 PM Edward Hartnett ***@***.***> wrote:
Yes, please send me your code.
Here's code that reads the surface file:
/*
Copyright 2019, UCAR/Unidata
See COPYRIGHT file for copying and redistribution conditions.
This program benchmarks the reading of a GFS restart file in
netCDF-4.
Ed Hartnett 11/19/19
*/
#include <nc_tests.h>
#include <err_macros.h>
#include <time.h>
#include <sys/time.h> /* Extra high precision time info. */
#include <math.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#define FILE_NAME "gfs.t00z.sfcf024.nc"
#define MILLION 1000000
#define NDIM3 3
#define DIMLEN_1 3072
#define DIMLEN_2 1536
/* Prototype from tst_utils.c. */
int nc4_timeval_subtract(struct timeval *result, struct timeval *x,
struct timeval *y);
int
main(int argc, char **argv)
{
printf("Benchmarking GFS restart file.\n");
printf("Reading a GFS restart file...\n");
{
int ncid;
int ndims, nvars, ngatts, unlimdimid;
char name[NC_MAX_NAME + 1];
size_t dimlen[NDIM3];
float *data;
struct timeval start_time, end_time, diff_time;
float read_us;
int d, v;
/* Start timer. */
if (gettimeofday(&start_time, NULL)) ERR;
/* if (nc_set_chunk_cache(DIMLEN_1 * DIMLEN_2, 10, .75)) ERR; */
/* Open the file. */
if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR;
if (nc_inq(ncid, &ndims, &nvars, &ngatts, &unlimdimid)) ERR;
printf("ndims %d nvars %d ngatts %d unlimdimid %d\n", ndims, nvars,
ngatts, unlimdimid);
/* Check dims. */
for (d = 0; d < ndims; d++)
{
if (nc_inq_dim(ncid, d, name, &dimlen[d])) ERR;
printf("read dimid %d name %s len %ld\n", d, name, dimlen[d]);
}
/* Allocate storage for one timestep of a 3D var. */
if (!(data = malloc(DIMLEN_1 * DIMLEN_2 * sizeof(float)))) ERR;
/* Read var data. */
for (v = 0; v < nvars; v++)
{
nc_type xtype;
int natts;
int dimids[NDIM3];
int nvdims;
if (nc_inq_var(ncid, v, name, &xtype, &nvdims, dimids, &natts)) ERR;
/* Skip reading the coord vars. */
if (nvdims != 3 || xtype != NC_FLOAT)
continue;
printf("reading var %s xtype %d nvdims %d dimids %d %d %d\n", name,
xtype, nvdims, dimids[0], dimids[1], dimids[2]);
/* if (nc_set_var_chunk_cache(ncid, v, DIMLEN_1 * DIMLEN_2, 10, 0)) ERR; */
if (nc_get_var_float(ncid, v, data)) ERR;
}
/* Free data storage. */
free(data);
/* Close the file. */
if (nc_close(ncid)) ERR;
/* Stop timer. */
if (gettimeofday(&end_time, NULL)) ERR;
if (nc4_timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
read_us = (int)diff_time.tv_sec + (float)diff_time.tv_usec / MILLION ;
printf("reading took %g seconds.\n", read_us);
}
SUMMARIZE_ERR;
FINAL_RESULTS;
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TPJ7TPYEBC4FUXSYPTQU4KP5A5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE4AEJY#issuecomment-557318695>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TMZN6BPCDTLS4GTOGLQU4KP5ANCNFSM4JPSS4JQ>
.
|
Seems like everything is happening in subroutine read_netcdf_2d_scatter(), but that is not included with the code. Where is it? |
Changing to Fortran should not and will not change the speed vs. the C library. Fortran is just a thin wrapper around the C functions. It just changes order of dimensions, and adds 1 to all the stuff (like count, start, stride) which is 0-based in C. |
The subroutine read_netcdf_2d_scatter is in INITPOST_NETCDF.f.
https://github.com/NOAA-EMC/EMC_post/blob/develop/sorc/ncep_post.fd/INITPOST_NETCDF.f
<https://github.com/NOAA-EMC/EMC_post/blob/develop/sorc/ncep_post.fd/INITPOST_GFS_NETCDF.f>
It's good to know the Fortran wrapper does not change the speed. Thank you!
Jun
…On Fri, Nov 22, 2019 at 5:59 AM Edward Hartnett ***@***.***> wrote:
Changing to Fortran should not and will not change the speed vs. the C
library. Fortran is just a thin wrapper around the C functions. It just
changes order of dimensions, and adds 1 to all the stuff (like count,
start, stride) which is 0-based in C.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TOTJAN2BX4KEM6TGU3QU63RLA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE5JU2I#issuecomment-557488745>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TIF5SYTZVLOYMU5RRLQU63RLANCNFSM4JPSS4JQ>
.
|
One good test would be to turn off compression. Can you easily do that? If so, then we can see how much of your delays are caused by uncompressing the data. |
Seems like you are reading the values, then copying them one by one into another array. Why not just read them to the destination array in one operation?
! jj=jm-j+1 |
Ed,
The destination array is on decomposed domain. So what we did is to read
the field into a temporary array, check undefined value, then scatter it
onto decomposed domain. I agree the dummy in the code below is not needed.I
guess it's there because previous GFS output has bottom-up vertical
structure, which is different from UPP (top down) that is used by several
models.
I am traveling back to DC today. I will run some tests and get back to you
next Monday. Thanks.
Jun
…On Friday, November 22, 2019, Edward Hartnett ***@***.***> wrote:
Seems like you are reading the values, then copying them one by one into
another array. Why not just read them to the destination array in one
operation?
do l=1,lm
do j=1,jm
! jj=jm-j+1
jj=j
do i=1,im
dummy(i,j,l)=dummy2(i,jj,l)
if(dummy(i,j,l)==spval_netcdf)dummy(i,j,l)=spval
end do
end do
end do
end if
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TKCIY2EQDXZR7WD4WTQU7ZCJA5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE55FPQ#issuecomment-557568702>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TNT7OMYW2HDL2MLFPLQU7ZCJANCNFSM4JPSS4JQ>
.
|
Jun, I have not abandoned this, but I need to get my poster together for AGU! ;-) If you are at AGU, I hope you come by and see me. I'll be in the poster section Wed morning. |
Ed,
Thank you very much for thinking of this issue. I am not going to AGU, I
understand you have other commitments, really appreciate that you took time
working on the data sets. We plan to switch to netcdf format for GFSv16
implementation, it will benefit all the downstream jobs if we can speed up
the reading. I have another question, does Netcdf4 now use HDF1.10.2 and
later version that supports parallel write for compressed data sets? If
not, may I ask what is the timeline for that? Thank you!
Jun
…On Mon, Dec 2, 2019 at 12:15 PM Edward Hartnett ***@***.***> wrote:
Jun,
I have not abandoned this, but I need to get my poster together for AGU!
;-)
If you are at AGU, I hope you come by and see me. I'll be in the poster
section Wed morning.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1543?email_source=notifications&email_token=AI7D6TNTQRBCQIZQUNUDGHDQWU7D7A5CNFSM4JPSS4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFUGV7I#issuecomment-560491261>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TISDW6HZ3EQM7O5WMDQWU7D7ANCNFSM4JPSS4JQ>
.
|
Yes, netcdf-c does work fine with HDF5-1.10.5, and you would be well advised to upgrade to that if you have not already. (1.10.4 has a parallel I/O bug.) We cannot yet write compressed data in parallel, but I hope to get that working early in the next year (after AGU). WHen you say you plan to switch to netCDF format for GFSv16, what was being used before that? (Or did you mean you are switching from netCDF to netCDF-4?) Did you try writing without compression to see what kind of performance you see? |
@junwang-noaa what is your NOAA email? Can you email me at Edward.Hartnett@noaa.gov? |
Much progress has been made here, and we are working this issue on other GitHub projects. I will close this issue. |
We have two sample restart files from the Global Forecasting System (GFS) from NOAA. They are slow to read.
What's up with that?
See comments and sample files from @junwang-noaa in #1520.
The text was updated successfully, but these errors were encountered: