Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Fix String Handling #522

Closed
wants to merge 1 commit into from
Closed

[C++] Fix String Handling #522

wants to merge 1 commit into from

Conversation

markaylett
Copy link

This patch fixes several string handling issues in the current C++ code-generator.
The following Gist describes these issues in detail:

https://gist.github.com/marayl/729a6305335e547acc927a0428d77e25

See related issue #517.

indent + " std::uint64_t minlen = std::min<std::uint64_t>(length, %3$d);\n" +
indent + " const char* src = m_buffer + m_offset + %2$d;\n" +
indent + " const char* end = sbe_util::stpncpy(dst, src, minlen);\n" +
indent + " if (length > 6)\n" +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the hard coded value of 6 here intentional? If the intention is to null terminate dst when possible, I would expect that the compare would be against minlen.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. Well spotted! Pushing update now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to help. Getting this accepted would help overcome the issues described your Gist.

This patch fixes several issues with string handling in the current C++ code-generator.
The following Gist describes these issues in detail:

https://gist.github.com/marayl/729a6305335e547acc927a0428d77e25
@markaylett
Copy link
Author

This PR has been open for some time now. Is there anything I can do to help move this forward?

@tmontgomery
Copy link
Contributor

Sorry for the lack of feedback and delay. Just busy.

The std::string "get" changes take some liberties that are not desirable. Specifically, assuming the string length is not the actual length and assuming ASCII encoding breaks some use cases.

Also, the utility functions are not needed because those semantics are not desired.

What was discussed was adding a single method to check length copied. Would rather see that instead.

@markaylett
Copy link
Author

Thanks for getting back to me. I understand that you're busy. My main concern was that the PR would be closed due to lack of activity.

Both std::string and std::str[n]len deal with byte strings only; they make no assumptions regarding the character encoding. Some std::string functions assume a null-terminator, such as the const char* constructor, but that's about it. Take a look a std::char_traits<char> if you are unconvinced.

If std::string byte strings break some use-cases, then I'd be happy to review those use-cases along with the desired semantics if you can provide them.

It's probably worth taking a step back at this point to consider two of the main issues that I am trying to address:

  1. the "put" function may result in a read overrun;
  2. the "put" function does not zero pad the buffer.

Do you accept that these issues need addressing?

If my particular solution does not work for you, then I can always file separate bug reports instead, although the test-cases in my PR should still be useful for verification purposes.

@tmontgomery
Copy link
Contributor

Several changes made for #506 and #530 have probably invalidated this PR.

Would still rather see a single method added instead of any changes to existing methods.

The way your functions work is not how some usages of std::string work to hold arbitrary data. It breaks some users current usage. I can not provide details as I don't know them. Only that I have been told they do. I suspect that things such as std::string(const CharT*, size_t), which can hold arbitrary data according to constructor (4) at http://en.cppreference.com/w/cpp/string/basic_string/basic_string is probably in use.

Personally, the schema specifies a fixed length array and the app is responsible for providing it. So, read overrun and no zero padding is the apps responsibility. But I am willing to add methods to aid the app as long as it doesn't break behavior that is depended upon.

@markaylett
Copy link
Author

I don't think we're going to reach a consensus on this one, so I'll continue with my local work-arounds for now. I may take a fresh look in the new year when I have more time.

@markaylett markaylett closed this Dec 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants