-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count argument in H5Sselect_hyperslab #2296
Conversation
@@ -138,7 +139,7 @@ int dump_hdf_file(const float *data, int docompression) | |||
for (start[1] = 0; start[1] < Y_LEN; start[1]++) | |||
{ | |||
if (H5Sselect_hyperslab(file_spaceid, H5S_SELECT_SET, start, NULL, | |||
count, NULL) < 0) ERR_RET; | |||
ones, count) < 0) ERR_RET; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are replacing count, {1,1,128}
with ones, {1,1,1}
, and passing {1,1,128}
as the 6th argument to H5Select_hyperslabs()
instead of NULL
. As per the assertion in the conversation, NULL
would be treated as {1,1,1}
when passed to H5Select_hyperslabs()
. This difference may not matter (I need to go refer to the HDF5 API documentation); given that the tests still pass, it's probably a safe bet that it doesn't matter. However, I just wanted to raise this inconsistency when it was noticed. @wkliao, is this intentional/inconsequential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are replacing count,
{1,1,128}
with ones,{1,1,1}
, and passing{1,1,128}
as the 6th argument toH5Select_hyperslabs()
instead ofNULL
. As per the assertion in the conversation,NULL
would be treated as{1,1,1}
when passed toH5Select_hyperslabs()
. This difference may not matter (I need to go refer to the HDF5 API documentation); given that the tests still pass, it's probably a safe bet that it doesn't matter. However, I just wanted to raise this inconsistency when it was noticed. @wkliao, is this intentional/inconsequential?
The use of count, NULL
in the NetCDF current implementation does not cause any error. My intension of this PR is to point out the correct usage of H5Sselect_hyperslabs()
. Please refer to the HDF5 reference manual. There is also an example that helps understand arguments count and block, quoted below
For example, consider a 2-dimensional dataspace with hyperslab
selection settings as follows: the start offset is specified as [1,1],
stride is [4,4], count is [3,7], and block is [2,2]. In C, these settings
will specify a hyperslab consisting of 21 2x2 blocks of array elements
starting with location (1,1) with the selected blocks at locations
(1,1), (5,1), (9,1), (1,5), (5,5), etc.;
My concern is for performance only. Even though the outcomes are the
same, the internal HDF5 implementation will treat each block as a single
element block if NULL is used. For example, a subarray of size 5x3 will be
stored as 15 blocks of size 1x1. If this PR is used, then the subarray will be
stored as 1 block of size 5x3.
Can you add a test case showing a failure in the existing code that is fixed |
Ward- here is some additional information: |
I guess I am dense, but I cannot see the purpose behind the block concept. |
I actually got pretty confused looking at this PR. I don't understand it at all! |
I assume you all can see my reply to @WardF 's post. Just to clarify one of my statement there.
The "stored" in my example is referring to the HDF5 internal data structures If you are curious, try calling |
Do you happen to know the relationship between blocks and chunks? |
'block' in H5Select_hyperslabs() is not required to be the same as 'chunk'. |
I must confess I do not see why they added the block concept. |
The concept of block in HDF5 is to allow a single H5Dwrite/H5Dread to |
Well,, I trust https://github.com/wkliao to get this right. |
re: http://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/UG_frame12Dataspaces.html From this I gather that blocks are used in the actual disk storage. Why on earth |
OK, I've studied this a bit more and it make sense. @wkliao have you measured any performance difference with these changes? |
I don't have a performance evaluation for this PR. |
@WardF I am doing a bunch of extra testing before your release. It would be great if this change was merged, so it could be included in all my extra testing. |
@WardF if this is going to be merged before 4.9.0, could you merge it please, so I can do some extra testing? |
Argument 'count' in NetCDF is not exactly the same as the 'count' in H5Sselect_hyperslabs(space_id, op, start, stride, count, block). When the argument 'stride' is NULL, NetCDF's 'count' should be used in argument 'block', for example, H5Sselect_hyperslabs(space_id, op, start, NULL, ones, count); where 'one' is an array of all 1s. Although using NULL 'block' below H5Sselect_hyperslabs(space_id, op, start, NULL, count, NULL); has the same effect, HDF5 internally stores the space of a subarray as a list of single elements, instead of a "block", which can affect the performance.
4518ae8
to
64d3f2b
Compare
1947e4a
to
64d3f2b
Compare
Reverted a merge that ended up being a bit more trouble than expected, will attempt to merge |
Argument 'count' in NetCDF is not exactly the same as the 'count' in H5Sselect_hyperslabs(space_id, op, start, stride, count, block). When the argument 'stride' is NULL, NetCDF's 'count' should be used in argument 'block', for example, H5Sselect_hyperslabs(space_id, op, start, NULL, ones, count); where 'one' is an array of all 1s. Although using NULL 'block' below H5Sselect_hyperslabs(space_id, op, start, NULL, count, NULL); has the same effect, HDF5 internally stores the space of a subarray as a list of single elements, instead of a "block", which can affect the performance.
… H5Sselect_hyperslab
Argument 'count' in NetCDF is not exactly the same as the 'count' in
H5Sselect_hyperslabs(space_id, op, start, stride, count, block).
When the argument 'stride' is NULL, NetCDF's 'count' should be used in
argument 'block', for example,
where 'one' is an array of all 1s. Although using NULL 'block' below
has the same effect, HDF5 internally stores the space of a subarray as a
list of single elements, instead of a "block", which can affect the
performance.