Skip to content

Commit

Permalink
Fix an issue where compound datatype member IDs can be leaked during …
Browse files Browse the repository at this point in the history
…conversion (#4459) (#4484)

* Fix an issue where compound datatype member IDs can be leaked during conversion (#4459)

Also fixes issues with handling of partially initialized datatypes during library shutdown

* Fix dead documentation links in 4 .md files.
  • Loading branch information
jhendersonHDF authored May 14, 2024
1 parent f0ecc8b commit 56da09d
Show file tree
Hide file tree
Showing 8 changed files with 274 additions and 40 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ Please make sure that you check the items applicable to your pull request:
* [ ] If changes were done to Autotools build, were they added to CMake and vice versa?
* [ ] Is the pull request applicable to any other branches? If yes, which ones? Please document it in the GitHub issue.
* [ ] Is the new code sufficiently documented for future maintenance?
* [ ] Does the new feature require a change to an existing API? See "API Compatibility Macros" document (https://docs.hdfgroup.org/hdf5/v1_14_4/api-compat-macros.html)
* [ ] Does the new feature require a change to an existing API? See "API Compatibility Macros" document (https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/api-compat-macros.html)
* Documentation
* [ ] Was the change described in the release_docs/RELEASE.txt file?
* [ ] Was the new function documented in the corresponding public header file using [Doxygen](https://hdfgroup.github.io/hdf5/v1_14_4/_r_m_t.html)?
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ DOCUMENTATION
-------------
This release is fully functional for the API described in the documentation.

https://docs.hdfgroup.org/hdf5/v1_14_4/_l_b_a_p_i.html
https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/_l_b_a_p_i.html

Full Documentation and Programming Resources for this release can be found at

https://docs.hdfgroup.org/hdf5/v1_14_4/index.html
https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/index.html

The latest doxygen documentation generated on changes to HDF5 1.14.x is available at:

https://hdfgroup.github.io/hdf5/
https://hdfgroup.github.io/hdf5/v1_14

See the [RELEASE.txt](release_docs/RELEASE.txt) file in the [release_docs/](release_docs/) directory for information specific
to the features and updates included in this release of the library.
Expand Down
28 changes: 14 additions & 14 deletions doc/parallel-compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ participate in the collective write call.
## Multi-dataset I/O support

The parallel compression feature is supported when using the
multi-dataset I/O API routines ([H5Dwrite_multi](https://hdfgroup.github.io/hdf5/group___h5_d.html#gaf6213bf3a876c1741810037ff2bb85d8)/[H5Dread_multi](https://hdfgroup.github.io/hdf5/group___h5_d.html#ga8eb1c838aff79a17de385d0707709915)), but the
multi-dataset I/O API routines ([H5Dwrite_multi](https://hdfgroup.github.io/hdf5/v1_14_4/group___h5_d.html#gaf6213bf3a876c1741810037ff2bb85d8)/[H5Dread_multi](https://hdfgroup.github.io/hdf5/v1_14_4/group___h5_d.html#ga8eb1c838aff79a17de385d0707709915)), but the
following should be kept in mind:

- Parallel writes to filtered datasets **must** still be collective,
Expand All @@ -99,7 +99,7 @@ following should be kept in mind:

## Incremental file space allocation support

HDF5's [file space allocation time](https://hdfgroup.github.io/hdf5/group___d_c_p_l.html#ga85faefca58387bba409b65c470d7d851)
HDF5's [file space allocation time](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_c_p_l.html#ga85faefca58387bba409b65c470d7d851)
is a dataset creation property that can have significant effects
on application performance, especially if the application uses
parallel HDF5. In a serial HDF5 application, the default file space
Expand All @@ -118,15 +118,15 @@ While this strategy has worked in the past, it has some noticeable
drawbacks. For one, the larger the chunked dataset being created,
the more noticeable overhead there will be during dataset creation
as all of the data chunks are being allocated in the HDF5 file.
Further, these data chunks will, by default, be [filled](https://hdfgroup.github.io/hdf5/group___d_c_p_l.html#ga4335bb45b35386daa837b4ff1b9cd4a4)
Further, these data chunks will, by default, be [filled](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_c_p_l.html#ga4335bb45b35386daa837b4ff1b9cd4a4)
with HDF5's default fill data value, leading to extraordinary
dataset creation overhead and resulting in pre-filling large
portions of a dataset that the application might have been planning
to overwrite anyway. Even worse, there will be more initial overhead
from compressing that fill data before writing it out, only to have
it read back in, unfiltered and modified the first time a chunk is
written to. In the past, it was typically suggested that parallel
HDF5 applications should use [H5Pset_fill_time](https://hdfgroup.github.io/hdf5/group___d_c_p_l.html#ga6bd822266b31f86551a9a1d79601b6a2)
HDF5 applications should use [H5Pset_fill_time](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_c_p_l.html#ga6bd822266b31f86551a9a1d79601b6a2)
with a value of `H5D_FILL_TIME_NEVER` in order to disable writing of
the fill value to dataset chunks, but this isn't ideal if the
application actually wishes to make use of fill values.
Expand Down Expand Up @@ -220,14 +220,14 @@ chunks to end up at addresses in the file that do not align
well with the underlying file system, possibly leading to
poor performance. As an example, Lustre performance is generally
good when writes are aligned with the chosen stripe size.
The HDF5 application can use [H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
The HDF5 application can use [H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
to have a bit more control over where objects in the HDF5
file end up. However, do note that setting the alignment
of objects generally wastes space in the file and has the
potential to dramatically increase its resulting size, so
caution should be used when choosing the alignment parameters.

[H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
has two parameters that control the alignment of objects in
the HDF5 file, the "threshold" value and the alignment
value. The threshold value specifies that any object greater
Expand Down Expand Up @@ -264,19 +264,19 @@ in a file, this can create significant amounts of free space
in the file over its lifetime and eventually cause performance
issues.

An HDF5 application can use [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
An HDF5 application can use [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
with a value of `H5F_FSPACE_STRATEGY_PAGE` to enable the paged
aggregation feature, which can accumulate metadata and raw
data for dataset data chunks into well-aligned, configurably
sized "pages" for better performance. However, note that using
the paged aggregation feature will cause any setting from
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
to be ignored. While an application should be able to get
comparable performance effects by [setting the size of these pages](https://hdfgroup.github.io/hdf5/group___f_c_p_l.html#gad012d7f3c2f1e1999eb1770aae3a4963) to be equal to the value that
would have been set for [H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a),
comparable performance effects by [setting the size of these pages](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_c_p_l.html#gad012d7f3c2f1e1999eb1770aae3a4963) to be equal to the value that
would have been set for [H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a),
this may not necessarily be the case and should be studied.

Note that [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
Note that [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
has a `persist` parameter. This determines whether or not the
file free space manager should include extra metadata in the
HDF5 file about free space sections in the file. If this
Expand All @@ -300,12 +300,12 @@ hid_t file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, fcpl_id, fapl_id);

While the parallel compression feature requires that the HDF5
application set and maintain collective I/O at the application
interface level (via [H5Pset_dxpl_mpio](https://hdfgroup.github.io/hdf5/group___d_x_p_l.html#ga001a22b64f60b815abf5de8b4776f09e)),
interface level (via [H5Pset_dxpl_mpio](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_x_p_l.html#ga001a22b64f60b815abf5de8b4776f09e)),
it does not require that the actual MPI I/O that occurs at
the lowest layers of HDF5 be collective; independent I/O may
perform better depending on the application I/O patterns and
parallel file system performance, among other factors. The
application may use [H5Pset_dxpl_mpio_collective_opt](https://hdfgroup.github.io/hdf5/group___d_x_p_l.html#gacb30d14d1791ec7ff9ee73aa148a51a3)
application may use [H5Pset_dxpl_mpio_collective_opt](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_x_p_l.html#gacb30d14d1791ec7ff9ee73aa148a51a3)
to control this setting and see which I/O method provides the
best performance.

Expand All @@ -318,7 +318,7 @@ H5Dwrite(..., dxpl_id, ...);

### Runtime HDF5 Library version

An HDF5 application can use the [H5Pset_libver_bounds](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gacbe1724e7f70cd17ed687417a1d2a910)
An HDF5 application can use the [H5Pset_libver_bounds](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gacbe1724e7f70cd17ed687417a1d2a910)
routine to set the upper and lower bounds on library versions
to use when creating HDF5 objects. For parallel compression
specifically, setting the library version to the latest available
Expand Down
17 changes: 17 additions & 0 deletions release_docs/RELEASE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,23 @@ Bug Fixes since HDF5-1.14.3 release

Library
-------
- Fixed a leak of datatype IDs created internally during datatype conversion

Fixed an issue where the library could leak IDs that it creates internally
for compound datatype members during datatype conversion. When the library's
table of datatype conversion functions is modified (such as when a new
conversion function is registered with the library from within an application),
the compound datatype conversion function has to recalculate data that it
has cached. When recalculating that data, the library was registering new
IDs for each of the members of the source and destination compound datatypes
involved in the conversion process and was overwriting the old cached IDs
without first closing them. This would result in use-after-free issues due
to multiple IDs pointing to the same internal H5T_t structure, as well as
crashes due to the library not gracefully handling partially initialized or
partially freed datatypes on library termination.

Fixes h5py GitHub #2419

- Fixed many (future) CVE issues

A partner organization corrected many potential security issues, which
Expand Down
13 changes: 5 additions & 8 deletions src/H5T.c
Original file line number Diff line number Diff line change
Expand Up @@ -1643,12 +1643,11 @@ H5T__unlock_cb(void *_dt, hid_t H5_ATTR_UNUSED id, void *_udata)
FUNC_ENTER_PACKAGE_NOERR

assert(dt);
assert(dt->shared);

if (H5T_STATE_IMMUTABLE == dt->shared->state) {
if (dt->shared && (H5T_STATE_IMMUTABLE == dt->shared->state)) {
dt->shared->state = H5T_STATE_RDONLY;
(*n)++;
} /* end if */
}

FUNC_LEAVE_NOAPI(SUCCEED)
} /* end H5T__unlock_cb() */
Expand Down Expand Up @@ -1862,7 +1861,6 @@ H5T__close_cb(H5T_t *dt, void **request)

/* Sanity check */
assert(dt);
assert(dt->shared);

/* If this datatype is VOL-managed (i.e.: has a VOL object),
* close it through the VOL connector.
Expand Down Expand Up @@ -4162,10 +4160,10 @@ H5T_close_real(H5T_t *dt)
FUNC_ENTER_NOAPI(FAIL)

/* Sanity check */
assert(dt && dt->shared);
assert(dt);

/* Clean up resources, depending on shared state */
if (dt->shared->state != H5T_STATE_OPEN) {
if (dt->shared && (dt->shared->state != H5T_STATE_OPEN)) {
if (H5T__free(dt) < 0)
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTFREE, FAIL, "unable to free datatype");

Expand Down Expand Up @@ -4202,10 +4200,9 @@ H5T_close(H5T_t *dt)

/* Sanity check */
assert(dt);
assert(dt->shared);

/* Named datatype cleanups */
if (dt->shared->state == H5T_STATE_OPEN) {
if (dt->shared && (dt->shared->state == H5T_STATE_OPEN)) {
/* Decrement refcount count on open named datatype */
dt->shared->fo_count--;

Expand Down
35 changes: 23 additions & 12 deletions src/H5Tconv.c
Original file line number Diff line number Diff line change
Expand Up @@ -2241,21 +2241,32 @@ H5T__conv_struct_init(const H5T_t *src, const H5T_t *dst, H5T_cdata_t *cdata, co
if (need_ids) {
hid_t tid;

if ((tid = H5I_register(H5I_DATATYPE, priv->src_memb[i], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for source compound member datatype");
/* Only register new IDs for the source and destination member datatypes
* if IDs weren't already registered for them. If the cached conversion
* information has to be recalculated (in the case where the library's
* table of conversion functions is modified), the same IDs can be reused
* since the only information that needs to be recalculated is the conversion
* paths used.
*/
if (priv->src_memb_id[i] == H5I_INVALID_HID) {
if ((tid = H5I_register(H5I_DATATYPE, priv->src_memb[i], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for source compound member datatype");
}
priv->src_memb_id[i] = tid;
}
priv->src_memb_id[i] = tid;

if ((tid = H5I_register(H5I_DATATYPE, priv->dst_memb[src2dst[i]], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for destination compound member datatype");
if (priv->dst_memb_id[src2dst[i]] == H5I_INVALID_HID) {
if ((tid = H5I_register(H5I_DATATYPE, priv->dst_memb[src2dst[i]], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for destination compound member datatype");
}
priv->dst_memb_id[src2dst[i]] = tid;
}
priv->dst_memb_id[src2dst[i]] = tid;
}
} /* end if */
} /* end for */
Expand Down
4 changes: 2 additions & 2 deletions test/API/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# HDF5 API Tests

This directory contains several test applications that exercise HDF5's
public API and serve as regression tests for HDF5 [VOL Connectors](https://docs.hdfgroup.org/hdf5/v1_14_4/_h5_v_l__u_g.html).
public API and serve as regression tests for HDF5 [VOL Connectors](https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/_h5_v_l__u_g.html).

## Build Process and options

Expand Down Expand Up @@ -42,7 +42,7 @@ Currently unsupported

These API tests currently only support usage with the native HDF5 VOL connector and HDF5 VOL
connectors that can be loaded dynamically as a plugin. For information on how to build a VOL
connector in this manner, refer to section 2.3 of the [HDF5 VOL Connector Author Guide](https://docs.hdfgroup.org/hdf5/v1_14_4/_v_o_l__connector.html).
connector in this manner, refer to section 2.3 of the [HDF5 VOL Connector Author Guide](https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/_v_o_l__connector.html).

TODO: section on building VOL connectors alongside HDF5 for use with tests

Expand Down
Loading

0 comments on commit 56da09d

Please sign in to comment.