Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Dataset #2120

Merged
merged 175 commits into from
Oct 19, 2022
Merged

Multi Dataset #2120

merged 175 commits into from
Oct 19, 2022

Conversation

fortnern
Copy link
Member

@fortnern fortnern commented Sep 24, 2022

The big PR for multi dataset changes. A couple notes:

  • There is an apparently unrelated bug in parallel compression that is restricting the number of cases for the parallel mdset test. Jordan will investigate this soon.

Resolved notes:

  • I was able to disable H5D_pre_read/write without breaking it. However, it made the mdset test run substantially slower (shouldn't affect single dataset I/O). Jordan and I are investigating this. If we can't fix it, we will need to decide whether to still remove it and accept the lower performance when using multi dataset in serial. The advantage to still removing these calls is it allows the library to pass the full I/O to selection/vectore enabled VFDs even in serial. Resolved - type conversion buffers were exceeding the free list limits. We decided this issue is really specific to the test and isn't a reason to prevent full use of the multi dataset path, and removed H5D__pre_read/write. I will try out a performance fix for this.
  • There is an unrelated bug in dataspace code that I am investigating that is causing a restriction on the serial mdset test. I am working on this. Resolved - bug fixed and test re-enabled.

fortnern and others added 30 commits September 30, 2021 16:11
Update to new multi dataset API
* Update to new multi dataset Fortran API and tests.
* Sync Fortran with develop.
* skipping h5pget_mpio_actual_io_mode_f for now
@@ -0,0 +1,767 @@
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright by The HDF Group. *
* Copyright by the Board of Trustees of the University of Illinois. *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is new code, we probably want to avoid the University of Illinois portion here. Otherwise, it's unclear whether/how that portion applies.

src/H5Defl.c Outdated Show resolved Hide resolved
src/H5Dio.c Outdated Show resolved Hide resolved
src/H5Dio.c Outdated Show resolved Hide resolved
src/H5Dio.c Outdated Show resolved Hide resolved
src/H5Dio.c Outdated Show resolved Hide resolved
@@ -2060,21 +2110,44 @@ H5VL__dataset_read(void *obj, const H5VL_class_t *cls, hid_t mem_type_id, hid_t
*-------------------------------------------------------------------------
*/
herr_t
H5VL_dataset_read(const H5VL_object_t *vol_obj, hid_t mem_type_id, hid_t mem_space_id, hid_t file_space_id,
hid_t dxpl_id, void *buf, void **req)
H5VL_dataset_read(size_t count, const H5VL_object_t *vol_obj[], hid_t mem_type_id[], hid_t mem_space_id[],
Copy link
Collaborator

@jhendersonHDF jhendersonHDF Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's only one place remaining in the feature branch where H5VL_dataset_read is used and it could probably be converted to the form that the new H5VL_dataset_read_direct is expecting. It also looks like H5VL_dataset_write is no longer used, unless I missed a spot while searching. Based on that, does it seem worth it to just modify H5VL_dataset_read and H5VL_dataset_write rather than having two different internal H5VL I/O calls for read and write?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left those in for consistency with how other the other VOL callbacks work. We could maybe eliminate one of each function but I don't think the remaining functions should use the same naming convention as the other single underscore H5VL callback functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I don't think this needs to hold up a merge but it's something to think about to lower some of the maintenance overhead.

src/H5Dchunk.c Outdated
HGOTO_ERROR(H5E_DATASPACE, H5E_CANTINSERT, FAIL, "can't insert chunk into skip list")
} /* end if */

/* get chunk file address */
if (H5D__chunk_lookup(di->dset, new_piece_info->scaled, &udata) < 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing these chunk lookups have been moved from somewhere else? Just checking that we aren't adding extra metadata overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spent a few hours trying to eliminate this. I think we're going to have to accept that a small amount of memory will be temporarily wasted for unallocated chunks (1 pointer per chunk) and skip this lookup here. Caching the result doesn't work because the chunk can be evicted between io_init and read/write. This was in here for the previous algorithm which used a global skip list sorted by address for all the pieces in the I/O. Addresses are no longer needed here since this is now an unsorted array. We'll have to add this in for the parallel code path to eventually get the file address.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call (and similar ones throughout the io_init functions) has been eliminted by commit 37cdba4

Copy link
Collaborator

@jhendersonHDF jhendersonHDF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've reviewed this PR as thoroughly as I can. I still have a couple of lower order concerns (the conversations left unresolved), but I don't think they're anything that should hold up a merge.

src/H5Dio.c Outdated Show resolved Hide resolved
src/H5Dio.c Outdated Show resolved Hide resolved
@derobins derobins merged commit 93754ca into develop Oct 19, 2022
brtnfld added a commit to brtnfld/hdf5 that referenced this pull request Oct 25, 2022
* Fix bug with cross platform compatibility of references within vlens.
No testing yet.

* Merge from multi_rd_wd_coll_io to a more recent branch from develop.
Untested, probably does not work yet.

* Committing clang-format changes

* Committing clang-format changes

* Fix many bugs in multi dataset branch.  Mostly works, some issues in
SWMR tests.

* Committing clang-format changes

* Disable test in swmr.c that was failing due to bug in HDF5 unrelated to
multi dataset.

* Committing clang-format changes

* Fixed fortran multi-dataset tests

* Fixed xlf errors

* Added benchmark code for multi-datasets

* loops over datasets

* added missing error arg.

* Added gnuplot formatting

* Jonathan Kim original MD benchmarking code

* updated MD benchmarking code

* code clean-up

* Only make files in feature test mode

* misc clean-up

* removed TEST_MDSET_NO_LAST_DSET_2ND_PROC option

* Committing clang-format changes

* Change multi dataset API to use arrays of individual parameters instead
of the parameter struct.

* Committing clang-format changes

* Update to new multi dataset Fortran API and tests. (HDFGroup#1724)

* Update to new multi dataset Fortran API and tests.
* Sync Fortran with develop.
* skipping h5pget_mpio_actual_io_mode_f for now

* Fixed issue with dxpl_id, changed to variable size dim. (HDFGroup#1770)

* Remove "is_coll_broken" field from H5D_io_info_t struct

* Committing clang-format changes

* Minor cleanup in multi dataset code.

* Committing clang-format changes

* Clean up in multi dataset code.

* Committing clang-format changes

* Committing clang-format changes

* Fix speeling

* Fix bug in parallel compression. Switch base_maddr in io_info to be a
union.

* Committing clang-format changes

* Implement selection I/O support with multi dataset.  Will be broken in
parallel until PR 1803 is merged to develop then the MDS branch.

* Committing clang-format changes

* Spelling

* Fix bug in multi dataset that could cause errors when only some of the
datasets in the multi dataset I/O used type conversion.

* Committing clang-format changes

* Integrate multi dataset APIs with VOL layer.  Add async versions of
multi dataset APIs.

* Committing clang-format changes

* Spelling fixes

* Fix bug in non-parallel HDF5 compilation.

* Committing clang-format changes

* Fix potential memory/free list error. Minor performance fix. Other minor
changes.

* Committing clang-format changes

* Fix memory leak with memory dataspace for I/O.

* Committing clang-format changes

* Fix stack variables too large.  Rename H5D_dset_info_t to
H5D_dset_io_info_t.

* Committing clang-format changes

* Remove mem_space_alloc field from H5D_dset_io_info_t.  Each function is
now responsible for freeing any spaces it adds to dset_info.

* Committing clang-format changes

* fixed _multi Fortran declaration

* Refactor various things in (mostly) the serial I/O code path to make
things more maintainable.

* Committing clang-format changes

* updated to array based, doxygen, and examples

* Reinstate H5D_chunk_map_t, stored (via pointer) inside
H5D_dset_io_info_t.

* Change from calloc to malloc for H5D_dset_io_info_t and H5D_chunk_map_t.
Switch temporary dset_infos to be local stack variables.

* Committing clang-format changes

* format cleanup

* format cleanup

* added coll and ind

* Modify all parallel I/O paths to take dset_info instead of assuming
dset_info[0].

* Committing clang-format changes

* fixed output

* Rework parallel I/O code to work properly with multi dataset in more
cases.  Fix bug in parallel compression.

* Committing clang-format changes

* Prevent H5D__multi_chunk_collective_io() from messing up collective opt
property for other datasets in I/O.  Other minor cleanup.  Add new test
case to t_pmulti_dset.c for H5FD_MPIO_INDIVIDUAL_IO, disabled for now
due to failures apparently unrelated to multi dataset code.

* Fix spelling

* Committing clang-format changes

* Replace N log N algorithm for finding chunk in
H5D__multi_chunk_collective_io() with O(N) algorithm, and remove use of
io_info->sel_pieces in that function.

* Committing clang-format changes

* Replace sel_pieces skiplist in io_info with flat array of pointers, use
qsort in I/O routine only when necessary.

* Committing clang-format changes

* Add new test case to mdset.c

* Committing clang-format changes

* Fix spelling

* Very minor fix in H5VL__native_dataset_read()

* Fix bug that could affect filtered parallel multi-dataset I/O.

* Add RM entries for H5Dread_multi(), H5Dread_multi_async(),
H5Dwrite_multi(), and H5Dwrite_multi_async()

* Unskip test in swmr.c

* Committing clang-format changes

* Eliminate H5D__pre_read and H5D__pre_write

* Remove examples/ph5mdsettest.c. Will fix and re-add as a test.

* Enable hyperslab combinations in mdset test

* Committing clang-format changes

* Clarify H5Dread/write_multi documentation.

* Fix bugs in multi-dataset I/O.  Expand serial multi dataset test.
Update macro in parallel multi dataset test.

* Committing clang-format changes

* Spelling

* Remove obsolete entry in bin/trace

* Rework type conversion buffer allocation. Only one buffer is shared
between datasets in mdset mode, and it is malloced instead of calloced.

* Committing clang-format changes

* Fix bug in error handling in H5D__read/write

* added multi-dataset fortran check with optional dataset creation id (HDFGroup#2150)

* removed dup. dll entry

* Address comments from code review.

* Remove spurious changes in H5Fmpi.c

* Fix issue with reading unallocated datasets in multi-dataset mode.
Address other comments from code review.

* Committing clang-format changes

* Delay chunk index lookup from io_init to mdio_init so it doesn't add
overhead to single dataset I/O.

* Committing clang-format changes

* Fix inappropriate use of piece_count

* updated copyright on new file, removed benchmark from testing dir.

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: M. Scot Breitenfeld <brtnfld@hdfgroup.org>
Co-authored-by: Dana Robinson <43805+derobins@users.noreply.github.com>
@derobins derobins deleted the feature/multi_dataset branch April 18, 2023 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants