-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Better handling of null terminator in strings #2678
Comments
It seems to me, that representing String via pointer and size could be much more reliable and performant |
We should clearly document the rationale for our choice, regardless of what the final decision may be. |
For references to prior work, we can look at Rust and C++. Rust treats c-style strings as a separate type: CString C++ also recommends this in its core guidelines: F.25: Use a SL.str.3: Use See zstring in the Guidelines Support Library for more details. |
Whether or not I think it's fine for people who are implementing That said, I have no opinions on whether or not |
[External] [stdlib] Add the `normalize_index` function This PR is some kind of mix between #2386 and #2384 which have issues (aborting) or have too many conflicts because the PR is too big. This PR solves part of #2251 and #2337 We try here to give the ground work for indexing correctly. This function added can then be used wherever we work with sequences. Two things I noticed during development: 1) The `debug_assert` does not run in unit tests. Is there any way to enable it? We currently have out-of-bounds bugs in our test suite. 2) The null terminator is causing pain, again, again, and again. Do we have any plans to make it optional when working with String? I opened #2678 to discuss this. To avoid to fix those issues in this PR, I used the `normalize_index` on the `__refitem__` of `InlineArray` which doesn't have widespread use yet and isn't impacted by out-of-bounds bugs. My recommendation would be to merge this PR then to rebase #2386 and #2384 on it. We should also afterwards fix the out of bounds bugs that can be triggered in the test suite by enabling debug_assert. The diff might seem big, but no worries, it's mostly the licenses and docstrings :) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2677 MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
[External] [stdlib] Add the `normalize_index` function This PR is some kind of mix between #2386 and #2384 which have issues (aborting) or have too many conflicts because the PR is too big. This PR solves part of #2251 and #2337 We try here to give the ground work for indexing correctly. This function added can then be used wherever we work with sequences. Two things I noticed during development: 1) The `debug_assert` does not run in unit tests. Is there any way to enable it? We currently have out-of-bounds bugs in our test suite. 2) The null terminator is causing pain, again, again, and again. Do we have any plans to make it optional when working with String? I opened #2678 to discuss this. To avoid to fix those issues in this PR, I used the `normalize_index` on the `__refitem__` of `InlineArray` which doesn't have widespread use yet and isn't impacted by out-of-bounds bugs. My recommendation would be to merge this PR then to rebase #2386 and #2384 on it. We should also afterwards fix the out of bounds bugs that can be triggered in the test suite by enabling debug_assert. The diff might seem big, but no worries, it's mostly the licenses and docstrings :) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2677 MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
Review Mojo's priorities
What is your request?
I would like a discussion to answer the following questions:
String
aligned with the goals of Mojo?String
less painful concerning the null terminator? And makeString
more safe?What is your motivation for this change?
Once again, I have hit the issue of the null terminator when working with String (I was creating the String from an empty list instead of a list with one zero element).
I cannot count the number of bugs we have had so far because of the null terminator.
The ones I can gather so far:
str(String())
; printing corrupts future prints ofString()
#2392@always_inline
changes the behavior of the code #2353test_file.mojo
#2283 (comment)and I'm sure there are more.
Some might say "skill issue", then it seems we're all "unskilled" 😆
I don't have enough knowledge to say if it's absolutely necessary or not. But I'm sure that IF it is necessary to have the null terminator in string, we can make it a bit more fool-proof. I believe it also closely relates with the goals of Mojo about memory safety as the null terminator caused many memory bugs so far. Can we add guardrails here?
The null terminator has also an impact on permformance. As we may need to resort to clever tricks when implementing SSO to squeeze one more byte out of the inline buffer
This adds some instructions to every access to a String size, and String creation, and it also takes one more byte for every String created (when we have SSO, that's one more byte to set in the inline buffer per String).
The text was updated successfully, but these errors were encountered: