Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded nulls not correctly read via DataSet::p_read_fixed_len #3034

Closed
mkitti opened this issue Jun 2, 2023 · 6 comments · Fixed by #3083
Closed

Embedded nulls not correctly read via DataSet::p_read_fixed_len #3034

mkitti opened this issue Jun 2, 2023 · 6 comments · Fixed by #3083
Assignees
Labels
Branch - 1.14 PRs to hdf5_1_14 Component - C++ C++ wrappers Priority - 3. Low 🔽 Code cleanup, small feature change requests, etc. Type - Bug Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub

Comments

@mkitti
Copy link
Contributor

mkitti commented Jun 2, 2023

Describe the bug
Fixed length strings are copied to H5std_string (std::string) as if they were null terminated C strings. This incorrect if the fixed length string contains embedded null bytes.

This is because string& operator= (const char* s) is used on line 732 here:

hdf5/c++/src/H5DataSet.cpp

Lines 711 to 735 in 81bc34a

DataSet::p_read_fixed_len(const hid_t mem_type_id, const hid_t mem_space_id, const hid_t file_space_id,
const hid_t xfer_plist_id, H5std_string &strg) const
{
// Only allocate for fixed-len string.
// Get the size of the dataset's data
size_t data_size = getInMemDataSize();
// If there is data, allocate buffer and read it.
if (data_size > 0) {
// Create buffer for C string
char *strg_C = new char[data_size + 1]();
herr_t ret_value = H5Dread(id, mem_type_id, mem_space_id, file_space_id, xfer_plist_id, strg_C);
if (ret_value < 0) {
delete[] strg_C; // de-allocate for fixed-len string
throw DataSetIException("DataSet::read", "H5Dread failed for fixed length string");
}
// Get string from the C char* and release resource allocated locally
strg = strg_C;
delete[] strg_C;
}
}

Expected behavior
Fixed length strings with embedded null bytes should be correctly read into the std::string.

Line 732 should probably be

strg = H5std_string(strg_C, data_size);

Platform (please complete the following information)

  • HDF5 version 1.14.0
  • macOS 13.3.1
  • Apple clang version 14.0.3 (clang-1403.0.22.14.1)
  • cmake

Additional context
Add any other context about the problem here.

@mkitti
Copy link
Contributor Author

mkitti commented Jun 2, 2023

To illustrate the issue, here's an example C++ program.

#include <string>
#include <iostream>

int main ()
{
    std::string str1, str2;
    const char* str_data = "Test\0string";
    const int str_data_len = 11;

    /* Uses string& operator= (const char* s), truncates at first null byte */
    str1 = str_data;

    /* uses string& operator= (const string& str), embedded null byte is included */
    str2 = std::string(str_data, str_data_len);

    std::cout << "str1: " << str1 << '\n';
    std::cout << "str2: " << str2 << '\n';
    return 0;
}

The output of this is as follows.

str1: Test
str2: Teststring

@hyoklee hyoklee added the Component - C++ C++ wrappers label Jun 2, 2023
@hyoklee hyoklee added the Type - Bug Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub label Jun 2, 2023
@hyoklee
Copy link
Member

hyoklee commented Jun 2, 2023

@mkitti , thank you so much for your report.

@derobins modified the code 2 years ago so I assigned this to him.

@mkitti
Copy link
Contributor Author

mkitti commented Jun 6, 2023

@steven-varga What are your thoughts about this?

@mkitti
Copy link
Contributor Author

mkitti commented Jun 6, 2023

Ultrafeel seems to have recognized the same issue:
c1ad367#r88347289

@steven-varga
Copy link

@mkitti Hi Mark, from the DDL

<string> ::= H5T_STRING {
                 STRSIZE <strsize>;
                 STRPAD <strpad>;
                 CSET <cset>;
                 CTYPE <type>;
             }
<strsize> ::= <int_value>
<strpad> ::= H5T_STR_NULLTERM | H5T_STR_NULLPAD | H5T_STR_SPACEPAD
<cset> ::= H5T_CSET_ASCII | H5T_CSET_UTF8

The DDL resembles to an EBNF variant, and doesn't give a precise definition of H5T_CSET_ASCII and H5T_CSET_UTF8 nor to H5T_STR_NULLTERM The official set for both contain the \0 byte. From here this really goes downhill for me: <ctype> ::= H5T_C_S1 | H5T_FORTRAN_S1 allows both C or Fortran style strings, and Fortran allows \0 so my current interpretation of the above DDL has changed: yes HDF5 allows \0 in some strings, and the inverse mapping should consider if H5T_FORTRAN_S1 set or not.

@mkitti
Copy link
Contributor Author

mkitti commented Jun 8, 2023

Proposed fix:
003935e

#3083

@derobins derobins changed the title [BUG] Embedded nulls not correctly read via DataSet::p_read_fixed_len Embedded nulls not correctly read via DataSet::p_read_fixed_len Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Branch - 1.14 PRs to hdf5_1_14 Component - C++ C++ wrappers Priority - 3. Low 🔽 Code cleanup, small feature change requests, etc. Type - Bug Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants