Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement selection vector I/O with collective chunk filling #3826

Merged
merged 14 commits into from
Nov 16, 2023

Conversation

vchoi-hdfgroup
Copy link
Contributor

@vchoi-hdfgroup vchoi-hdfgroup commented Nov 2, 2023

Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization when computing max address.

Addresses ECP-344

…hunk filling.

Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization
when computing max address.
@vchoi-hdfgroup vchoi-hdfgroup added WIP Work in progress (please also convert PRs to draft PRs) Component - C Library Core C library issues (usually in the src directory) Type - Improvement Improvements that don't add a new feature or functionality labels Nov 2, 2023
src/H5Dchunk.c Outdated
H5MM_xfree(io_addrs);
H5MM_xfree(io_wbufs);
if (!all_same_block_len)
H5MM_xfree(io_sizes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: I'd prefer to just always call H5MM_xfree here to make the code more robust against future changes. It's a no-op if io_sizes in NULL so this won't cause problems

Copy link
Member

@derobins derobins Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C standard explicitly says that calling free() on NULL pointers is acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/H5FDmpio.c Outdated
if (count > 0)
max_addr = s_addrs[count - 1] + (haddr_t)(s_sizes[count - 1]);
/* Bug: need to account for fixed size optimization */
if (count > 0) {
Copy link
Member

@fortnern fortnern Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things here:

  • If count == 1 this reads off the end of the s_sizes array (maybe just change the condition to count > 1)
  • The sizes can actually end anywhere in the array, not just at index 1. I'd recommend checking the last element to see if it's 0, then scanning from index 1 and up until you see a 0. So maybe something like:
    if (count > 1) {
    uint32_t idx = count - 1;
    if (s_sizes[idx] == 0)
    for (idx = 0; idx < count - 1; idx++)
    if (s_sizes[idx + 1] == 0)
    break;
    max_addr = s_addrs[count - 1] + (haddr_t)(s_sizes[idx])
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the comment shouldn't proclaim it as a bug, just explain what it's doing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my code is wrong too, I'll try to fix it later tonight. We might need to change the H5FD__mpio_vector_build_types function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think <s_sizes> returned from H5FD__mpio_vector_build_types() can be (1) or (2) as follows:

  1. entries of sizes
  2. Two entries for same size optimization: entry 0 is the size and entry 1 is 0
    Maybe the better way is to change H5FD__mpio_vector_build_types().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be any number of entries between 1 (with a 0 terminator if count > 1) and count

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have time it would be great to add a test for this case. We would have to use the public H5FD interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the patch from Neil.
I will work on adding a test for this case.

@derobins derobins changed the title Changes for ECP-344: Implement selection vector I/O with collective c… Implement selection vector I/O with collective c… Nov 3, 2023
@derobins derobins changed the title Implement selection vector I/O with collective c… Implement selection vector I/O with collective chunk filling Nov 3, 2023
vchoi-hdfgroup and others added 3 commits November 3, 2023 11:28
For H5Dchunk.c: fix H5MM_xfree()
For H5FDmpio.c:
1) Revert the fix to H5FD__mpio_write_vector()
2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()
@jhendersonHDF
Copy link
Collaborator

jhendersonHDF commented Nov 3, 2023

The code looks fine and is much simpler, but it does remove the part where the old code divided up the chunks among all the MPI ranks. The current approach will have every MPI rank writing all of the chunks out. Some MPI implementations may optimize for this, but I think we ran into the case where this doesn't necessarily happen and so implemented the rank 0 bcast feature. I'd suggest we should either have only rank 0 write out the chunks and all the other ranks should set their I/O sizes to 0 (and maybe set the bufs to NULL and addrs to HADDR_UNDEF) or try to put back the logic of dividing up the work. Ideally the latter since rank 0 only writes may not scale well.

Copy link
Collaborator

@jhendersonHDF jhendersonHDF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for discussion

@vchoi-hdfgroup
Copy link
Contributor Author

I will work on putting back the logic of dividing up the work like the original H5D__chunk_collective_fill() routine

src/H5Dchunk.c Outdated
@@ -5603,156 +5607,73 @@ H5D__chunk_collective_fill(const H5D_t *dset, H5D_chunk_coll_fill_info_t *chunk_

/* Check if we have any chunks to write on this rank */
if (num_blocks > 0 || (leftover && leftover > mpi_rank)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick with pre-existing code - the first leftover is redundant since mpi_rank will never be less than 0. This could be if (num_blocks > 0 || leftover > mpi_rank) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with this and subsequent comments.


for (i = 0; i < (size_t)blocks; i++) {
size_t idx = i + (size_t)(mpi_rank * blocks);
if (need_sort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some more comments to the blocks that don't have them, explaining what the code is doing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for this block in particular, add something like this note (I just discussed this with Scot):

Note that we sort all of the chunks here, and not just a subset corresponding to this rank. We do this since we have found MPI I/O to work better when each rank writes blocks that are contiguous in the file, and by sorting the full list we maximize the chance of that happening.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could potentially be making extra work for ourselves now by sorting here since the MPI I/O VFD is already scanning the chunk list to check for this anyway and we don't necessarily need to impose the monotonically non-decreasing file displacement requirement on other parallel VFDs. That said, since the sort has to happen at some point anyway if the addresses are out of order, I'm not sure if the above note is really needed here.

@@ -556,10 +556,20 @@ verify_chunk_opt_status(size_t num_dsets, test_mode_t test_mode, bool any_io, bo
* elements (because no elements were raw data), which is what happens when performing I/O on a
* filtered dataset with no selection. Vector I/O does report an I/O call was made if passed a raw
* data element of size 0, so this is consistent. */
/* Changes: for allocation, the reported I/O is vector I/O */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call out "changes" in the code comment here. Just explain why we expect the different modes under each condition as the code stands with this PR

VRFY((H5D_SCALAR_IO | H5D_VECTOR_IO) == actual_sel_io_mode_reduced,
"verified actual selection I/O mode was scalar and vector I/O");
else if (unalloc_read)
/* Changes: for allocation, the reported I/O is vector I/O */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Explain that both collective filling/allocation and filtered I/O use vector I/O so only that is reported whether or not allocation happened.

@jhendersonHDF
Copy link
Collaborator

@vchoi-hdfgroup Thanks for putting pack the logic for dividing up the chunks among the ranks. This looks good to me and I don't really have anything to add to @fortnern's comments.

@fortnern
Copy link
Member

Looks good to me.

@lrknox lrknox removed the WIP Work in progress (please also convert PRs to draft PRs) label Nov 16, 2023
@lrknox lrknox merged commit ed31aac into HDFGroup:develop Nov 16, 2023
jhendersonHDF pushed a commit to jhendersonHDF/hdf5 that referenced this pull request Dec 7, 2023
…p#3826)

* Changes for ECP-344: Implement selection vector I/O with collective chunk filling.
Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization
when computing max address.

* Fixes based on PR review comments:
For H5Dchunk.c: fix H5MM_xfree()
For H5FDmpio.c:
1) Revert the fix to H5FD__mpio_write_vector()
2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()

* Put back the logic of dividing up the work among all the mpi ranks similar to the
original H5D__chunk_collective_fill() routine.

* Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector().
lrknox pushed a commit that referenced this pull request Dec 7, 2023
* Preserve MPI-I/O file hints when fapl is closed (#3755)

* Fix for issue #3025: Save the MPI info in the file struct so H5Fget_access_plist() can retrieve it from there.

* Add compression tests for subfiling (#3769)

* Fix typo in comment (#3775)

* Fixed a file handle leak in the core VFD (#3779)

When opening a file with the core VFD and a file image, if the file
already exists, the file check would leak the POSIX file handle.

Fixes GitHub issue #635

* Fix a format string warning in the C++ examples (#3776)

* Cancel running GitHub workflows on push to same PR (#3772)

* Cancel running GitHub workflows on push to same PR

* Remove github.sha from workflow concurrency groups

* Print some messages in parallel tests on MPI rank 0 only (#3785)

Avoids overly verbose output from all processes emitting progress, etc. info.

* Avoid attempted use of NULL pointer in parallel compression code (#3786)

The parallel compression test code tests for the case where all MPI ranks have no selection in a dataset when writing to it. Add an early exit to the code to avoid attempting to use a NULL pointer due to there being no work to do.

* Don't install h5tools_test_utils test program on system (#3793)

* Add Doxygen to H5FDsplitter.h (#3794)

* H5FD_CURR_SPLITTER_VFD_CONFIG_VERSION
* H5FD_SPLITTER_PATH_MAX
* H5FD_SPLITTER_MAGIC
* H5FD_splitter_vfd_config_t
* H5Pset_fapl_splitter()
* H5Pget_fapl_splitter()

* Update Doxygen initializers & identifiers in VFDs (#3795)

* Add Doxygen for all H5FD_<VFD> initializers
* Add Doxygen for all H5FD_<VFD>_VALUE values
* Mark H5FD_<vfd>_init() calls private in Doxygen

* Fix memory corruption in 'MPI I/O FAPL preserve' test (#3806)

* Fix usage of h5_clean_files in t_pflush2.c (#3807)

* Fix parallel driver check in h5_fixname_real (#3808)

* Fix a couple usages of MPI_Info_get (#3809)

* Remove H5system.c warning on Windows oneAPI. (#3812)

* Add processing of NVHPC flags in linux-gnulibc1 file (#3804)

* Disable testing as tests are failing the same as in CMake

* Use the current toolchain for examples as default (#3810)

* Fix misc. warnings from GCC when compiling with -fsanitize=undefined (#3787)

* Set NVHPC maximum optimization level to -O1 for now (#3800)

* Set NVHPC maximum optimization level to -O1 for now

Compiling HDF5 with NVHPC 23.5 - 23.9 results in test failures in
4 different test files that need to be resolved. Since those tests
pass with an optimization level of -O1 (and -O0) and it is currently
unclear whether the test failures are due to issues in HDF5 or issues
in the 'nvc' compiler, set the maximum optimization level for NVHPC
to -O1 until the test failures are resolved.

* Disable nvhpc Java testing in CMake and amend known issues

* Re-enable testing of Autotools nvhpc

* Update some doxygen links to local refs (#3814)

* Rework MPI Info FAPL preserve PR to use VFD 'ctl' operations (#3782)

* Removed the use of C wrappers from H5P APIs. (#3824)

* fix seg fault on frontier/cray

* fix seg fault on frontier/cray

* fix seg fault on frontier/cray

* removed the use of h5pclose_c

* removed the use of h5pclose_c

* Fortran Wrappers H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f (#3801)

* Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f

* Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f tests

* Create test for H5Pget_dxpl_mpio (#3825)

* Create test and add to testphdf5

* Renamed h5fuse.sh to h5fuse (#3834)

* provide an alternative to mapfile for older bash

* Disable FP exceptions in H5T init code (#3837)

The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization.

Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore.

Fixes #3831

* Add intel oneapi windows build to CI CMake (#3836)

* Remove printf format warning on Windows oneAPI. (#3838)

* Correct ENV variables (#3841)

* Remove Autotools sed hack (#3848)

configure.ac contains a sed line that cleans up incorrect library
flags which was added to paper over some bugs in earlier versions
of the Autotools. These issues are not a problem with the current
versions of the Autootols.

The sed line causes problems on MacOS, so it has been removed.

Fixes #3843

* Make filter unregister callbacks safe for VOL connectors (#3629)

* Make filter callbacks use top-level API functions

When using VOL connectors, H5I_iterate may not provide
valid object pointers to its callback. This change keeps
existing functionality in H5Zunregister() without using
potentially unsafe pointers.

* Filter callbacks use internal API

* Skip MPI work on non-native VOL

* Add extra space in comments for consistency (#3852)

* Add extra space in comments for consistency

* uncomment tfloatsattrs test

* Update Actions badges to link to relevant workflow (#3850)

* Add CMake long double cross-compile defaults (#3683)

HDF5 performs a couple of checks at build time to see if long double
values can be converted correctly (IBM's Power architecture uses a
special format for long doubles). These checks were performed using
TRY_RUN, which is a problem when cross-compiling.

These checks now use default values appropriate for most non-Power
systems when cross-compiling. The cache values can be pre-set if
necessary, which will preempt both the TRY_RUN and the default.

Affected values:
    H5_LDOUBLE_TO_LONG_SPECIAL      (default no)
    H5_LONG_TO_LDOUBLE_SPECIAL      (default no)
    H5_LDOUBLE_TO_LLONG_ACCURATE    (default yes)
    H5_LLONG_TO_LDOUBLE_CORRECT     (default yes)
    H5_DISABLE_SOME_LDOUBLE_CONV    (default no)

Fixes GitHub #3585

* Updates for building and testing VOL connectors

* Fix issue with HDF5_VOL_ALLOW_EXTERNAL CMake variable

* Initialize parallel testing with MPI_THREAD_MULTIPLE when testing API

* Add CMake variable to allow specifying a VOL connector's package name

* Remove call to MPI_Init in serial API tests

While previously necessary, it now interferes with VOL connectors that
may need to be initialized with MPI_THREAD_MULTIPLE

* Fixes for CI and presets (#3853)

* Change dest for doxygen (#3856)

* Implement selection vector I/O with collective chunk filling (#3826)

* Changes for ECP-344: Implement selection vector I/O with collective chunk filling.
Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization
when computing max address.

* Fixes based on PR review comments:
For H5Dchunk.c: fix H5MM_xfree()
For H5FDmpio.c:
1) Revert the fix to H5FD__mpio_write_vector()
2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()

* Put back the logic of dividing up the work among all the mpi ranks similar to the
original H5D__chunk_collective_fill() routine.

* Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector().

* Do not publish compression headers or docs (#3865)

* Fix typo: look -> loop (#3866)

* Moved the README to markdown and expanded its overview of the files, file generation, and other Fortran wrapper development practices as mentioned in the HDF5 architectural document. I added a new figure and included the SVG file and the original xfig file it was generated from. (#3862)

* Add HDF5_DISABLE_TESTS_REGEX option to skip tests (#3859)

* Fix typo in error message for `MPI_Type_dup`. (#3867)

* Complete the `if command line option` sentence. (#3868)

* Fix h5dump segmentation fault when --vfd-value option is used (#3873)

* Updated URL in funding.yml (#3882)

Using new shortened URL, might look better.

* Remove unused variable from unmerged changes

* Update src/H5Tinit_float.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Type - Improvement Improvements that don't add a new feature or functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants