count argument in H5Sselect_hyperslab #2296

wkliao · 2022-04-20T05:30:04Z

Argument 'count' in NetCDF is not exactly the same as the 'count' in
H5Sselect_hyperslabs(space_id, op, start, stride, count, block).
When the argument 'stride' is NULL, NetCDF's 'count' should be used in
argument 'block', for example,

   H5Sselect_hyperslabs(space_id, op, start, NULL, ones, count);

where 'one' is an array of all 1s. Although using NULL 'block' below

   H5Sselect_hyperslabs(space_id, op, start, NULL, count, NULL);

has the same effect, HDF5 internally stores the space of a subarray as a
list of single elements, instead of a "block", which can affect the
performance.

WardF · 2022-04-21T20:43:43Z

nc_perf/tst_files3.c

@@ -138,7 +139,7 @@ int dump_hdf_file(const float *data, int docompression)
      for (start[1] = 0; start[1] < Y_LEN; start[1]++)
      {
 	 if (H5Sselect_hyperslab(file_spaceid, H5S_SELECT_SET, start, NULL,
-				 count, NULL) < 0) ERR_RET;
+				 ones, count) < 0) ERR_RET;


So we are replacing count, {1,1,128} with ones, {1,1,1}, and passing {1,1,128} as the 6th argument to H5Select_hyperslabs() instead of NULL. As per the assertion in the conversation, NULL would be treated as {1,1,1} when passed to H5Select_hyperslabs(). This difference may not matter (I need to go refer to the HDF5 API documentation); given that the tests still pass, it's probably a safe bet that it doesn't matter. However, I just wanted to raise this inconsistency when it was noticed. @wkliao, is this intentional/inconsequential?

So we are replacing count, {1,1,128} with ones, {1,1,1}, and passing {1,1,128} as the 6th argument to H5Select_hyperslabs() instead of NULL. As per the assertion in the conversation, NULL would be treated as {1,1,1} when passed to H5Select_hyperslabs(). This difference may not matter (I need to go refer to the HDF5 API documentation); given that the tests still pass, it's probably a safe bet that it doesn't matter. However, I just wanted to raise this inconsistency when it was noticed. @wkliao, is this intentional/inconsequential?

The use of count, NULL in the NetCDF current implementation does not cause any error. My intension of this PR is to point out the correct usage of H5Sselect_hyperslabs(). Please refer to the HDF5 reference manual. There is also an example that helps understand arguments count and block, quoted below

For example, consider a 2-dimensional dataspace with hyperslab selection settings as follows: the start offset is specified as [1,1], stride is [4,4], count is [3,7], and block is [2,2]. In C, these settings will specify a hyperslab consisting of 21 2x2 blocks of array elements starting with location (1,1) with the selected blocks at locations (1,1), (5,1), (9,1), (1,5), (5,5), etc.;

My concern is for performance only. Even though the outcomes are the
same, the internal HDF5 implementation will treat each block as a single
element block if NULL is used. For example, a subarray of size 5x3 will be
stored as 15 blocks of size 1x1. If this PR is used, then the subarray will be
stored as 1 block of size 5x3.

DennisHeimbigner · 2022-04-21T20:47:21Z

Can you add a test case showing a failure in the existing code that is fixed
with your change?

DennisHeimbigner · 2022-04-21T20:54:13Z

Ward- here is some additional information:
https://stackoverflow.com/questions/46539084/what-is-the-block-size-in-hdf5

DennisHeimbigner · 2022-04-21T21:01:20Z

I guess I am dense, but I cannot see the purpose behind the block concept.
Can't we just set block to be a vector of ones?

edwardhartnett · 2022-04-21T21:01:46Z

I actually got pretty confused looking at this PR. I don't understand it at all!

wkliao · 2022-04-21T21:46:39Z

I assume you all can see my reply to @WardF 's post.

Just to clarify one of my statement there.

For example, a subarray of size 5x3 will be stored as 15 blocks of size 1x1.
If this PR is used, then the subarray will be stored as 1 block of size 5x3.

The "stored" in my example is referring to the HDF5 internal data structures
stored in the memory. It does not got to the file.

If you are curious, try calling H5Sget_select_hyper_nblocks
to verify the number of blocks. In addition, H5Sget_select_hyper_blocklist()
can give you the size of each block.

DennisHeimbigner · 2022-04-21T21:49:28Z

Do you happen to know the relationship between blocks and chunks?

wkliao · 2022-04-21T22:08:35Z

'block' in H5Select_hyperslabs() is not required to be the same as 'chunk'.

DennisHeimbigner · 2022-04-21T22:13:14Z

I must confess I do not see why they added the block concept.
As you say it provides some kind of performance improvement,
I assume that is why they did it. But the documentation on it
is abominable, completely opaque.

wkliao · 2022-04-21T22:23:39Z

The concept of block in HDF5 is to allow a single H5Dwrite/H5Dread to
write/read multiple subarrays. From the example figure in the stackoverflow
URL you referred to earlier, there are 8 subarrays (blocks). All 8 can be
written/read in a single API call. But in NetCDF, this must be done by 8
calls to nc_put_vara/nc_get_vara. For parallel I/O, HDF5 has an advantage
over NetCDF, when the aggregate access region of all processes is a
contiguous block, such as the entire variable.

DennisHeimbigner · 2022-04-22T19:07:27Z

Well,, I trust https://github.com/wkliao to get this right.
But I wish I had a better understanding of the semantics of blocks.

DennisHeimbigner · 2022-04-22T21:01:21Z

re: http://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/UG_frame12Dataspaces.html

From this I gather that blocks are used in the actual disk storage. Why on earth
would they expose this when they don't expose chunks. I would have thought that
the library would handle the mapping transparently.

edwardhartnett · 2022-04-23T12:57:56Z

OK, I've studied this a bit more and it make sense.

@wkliao have you measured any performance difference with these changes?

wkliao · 2022-04-25T00:11:44Z

I don't have a performance evaluation for this PR.

edwardhartnett · 2022-04-28T13:16:45Z

@WardF I am doing a bunch of extra testing before your release. It would be great if this change was merged, so it could be included in all my extra testing.

edwardhartnett · 2022-05-29T12:46:56Z

@WardF if this is going to be merged before 4.9.0, could you merge it please, so I can do some extra testing?

Argument 'count' in NetCDF is not exactly the same as the 'count' in H5Sselect_hyperslabs(space_id, op, start, stride, count, block). When the argument 'stride' is NULL, NetCDF's 'count' should be used in argument 'block', for example, H5Sselect_hyperslabs(space_id, op, start, NULL, ones, count); where 'one' is an array of all 1s. Although using NULL 'block' below H5Sselect_hyperslabs(space_id, op, start, NULL, count, NULL); has the same effect, HDF5 internally stores the space of a subarray as a list of single elements, instead of a "block", which can affect the performance.

WardF · 2023-12-12T23:41:22Z

Reverted a merge that ended up being a bit more trouble than expected, will attempt to merge main back in more carefully.

Argument 'count' in NetCDF is not exactly the same as the 'count' in H5Sselect_hyperslabs(space_id, op, start, stride, count, block). When the argument 'stride' is NULL, NetCDF's 'count' should be used in argument 'block', for example, H5Sselect_hyperslabs(space_id, op, start, NULL, ones, count); where 'one' is an array of all 1s. Although using NULL 'block' below H5Sselect_hyperslabs(space_id, op, start, NULL, count, NULL); has the same effect, HDF5 internally stores the space of a subarray as a list of single elements, instead of a "block", which can affect the performance.

… H5Sselect_hyperslab

wkliao requested a review from WardF as a code owner April 20, 2022 05:30

WardF self-assigned this Apr 21, 2022

WardF added the area/parallel label Apr 21, 2022

WardF added this to the 4.9.0 milestone Apr 21, 2022

WardF reviewed Apr 21, 2022

View reviewed changes

WardF modified the milestones: 4.9.0, 4.9.1 Jun 15, 2022

wkliao force-pushed the H5Sselect_hyperslab branch from 4518ae8 to 64d3f2b Compare June 18, 2022 18:04

WardF modified the milestones: 4.9.1, 4.9.2 Feb 13, 2023

WardF modified the milestones: 4.9.2, 4.9.3 May 16, 2023

wkliao requested a review from DennisHeimbigner as a code owner December 12, 2023 23:02

WardF previously approved these changes Dec 12, 2023

View reviewed changes

WardF dismissed their stale review via 64d3f2b December 12, 2023 23:40

WardF force-pushed the H5Sselect_hyperslab branch from 1947e4a to 64d3f2b Compare December 12, 2023 23:40

wkliao and others added 3 commits December 12, 2023 16:45

Rebased PR by hand against main.

6a628b9

Merge branch 'H5Sselect_hyperslab' of github.com:wkliao/netcdf-c into…

2ffd1c0

… H5Sselect_hyperslab

WardF approved these changes Dec 12, 2023

View reviewed changes

WardF merged commit bace524 into Unidata:main Dec 13, 2023
97 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

count argument in H5Sselect_hyperslab #2296

count argument in H5Sselect_hyperslab #2296

wkliao commented Apr 20, 2022

WardF Apr 21, 2022 •

edited

Loading

wkliao Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

edwardhartnett commented Apr 21, 2022

wkliao commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

wkliao commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

wkliao commented Apr 21, 2022

DennisHeimbigner commented Apr 22, 2022

DennisHeimbigner commented Apr 22, 2022

edwardhartnett commented Apr 23, 2022

wkliao commented Apr 25, 2022

edwardhartnett commented Apr 28, 2022

edwardhartnett commented May 29, 2022

WardF commented Dec 12, 2023

count argument in H5Sselect_hyperslab #2296

count argument in H5Sselect_hyperslab #2296

Conversation

wkliao commented Apr 20, 2022

WardF Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

wkliao Apr 21, 2022

Choose a reason for hiding this comment

DennisHeimbigner commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

edwardhartnett commented Apr 21, 2022

wkliao commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

wkliao commented Apr 21, 2022

DennisHeimbigner commented Apr 21, 2022

wkliao commented Apr 21, 2022

DennisHeimbigner commented Apr 22, 2022

DennisHeimbigner commented Apr 22, 2022

edwardhartnett commented Apr 23, 2022

wkliao commented Apr 25, 2022

edwardhartnett commented Apr 28, 2022

edwardhartnett commented May 29, 2022

WardF commented Dec 12, 2023

WardF Apr 21, 2022 •

edited

Loading