-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Add more safeguards in the String
constructors
#2691
[stdlib] Add more safeguards in the String
constructors
#2691
Conversation
@@ -609,6 +613,7 @@ struct String( | |||
fn __init__(inout self): | |||
"""Construct an uninitialized string.""" | |||
self._buffer = Self._buffer_type() | |||
self._buffer.append(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this change means that constructing an empty string will now allocate, where it did not before. That may be what we want, but its a notable change to the default representation of a String
.
E.g. a List[String]
default initialized to empty String
and then later filled in would now do O(N) allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. a List[String] default initialized to empty String and then later filled in would now do O(N) allocations.
Can you give a small code example? I'm not sure I see the regression you are talking about.
Pratically speaking, when we have SSO, empty strings will have null termintors allocated on the stack, so it will be very cheap. I'd advise that until then, we err on the side of caution about safety in String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a compelling point regarding the small-string optimization.
I'm convinced now that NUL terminating by default is not necessarily a bad behavior to aim for long term🙂
Can you give a small code example? I'm not sure I see the regression you are talking about.
Not a regression per se, but consider:
var list = List[String](String(), String(), String(), String(), String())
for count in len(list):
list[count] = str(count)
Before this change, this program only allocates a String
for each element once.
After this change, first five empty strings will be allocated, and then overwritten with the final content of the list as initialized by the for
loop.
I don't necessarily think this is a convincing reason on its own not to move forward with this change. I'm just offering this as an example of the change in behavior this PR brings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the insight. Indeed it's a good example where we may have degraded performance. With SSO, that shouldn't be an issue. I'd be interested to know if a null terminator needs to be a requirement for String, but I guess this can be discussed after SSO is done
|
||
assert_equal(String(), String(List[Int8]())) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the main benefits I can see from this change is that an empty String
is now validly NUL terminated, so it can be passed to functions that are expecting a non-null string pointer.
So if thats the desired behavior, we should add some tests for that🙂
(But, I'd argue most functions that could accept an empty string should arguably be written to accept a null pointer, because as mentioned then they'll optimally allow the caller to avoid allocating space just to store an empty string.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if thats the desired behavior, we should add some tests for that🙂
Do you mean adding a test where if do assert_equal(String()._buffer[0], 0)
?
EDIT: Nevermind, I understand now. I'll add the test tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
65fe057
to
2c47dd7
Compare
…str__` method (#40421) [External] [stdlib] Fix: Add null terminator to the buffer inside `__str__` method String really has a hard time working without this null terminator and I feel this PR is a step towards fixing #2687 without requirering controversial changes like the ones in #2691. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2787 MODULAR_ORIG_COMMIT_REV_ID: 95f438ebbb6e076fe6ffb79debf7cd4fd70248de
…str__` method (#40421) [External] [stdlib] Fix: Add null terminator to the buffer inside `__str__` method String really has a hard time working without this null terminator and I feel this PR is a step towards fixing modularml#2687 without requirering controversial changes like the ones in modularml#2691. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2787 MODULAR_ORIG_COMMIT_REV_ID: 95f438ebbb6e076fe6ffb79debf7cd4fd70248de
I'll close this PR in favor of the SSO work. We can always see after SSO if this is still relevant. |
…str__` method (#40421) [External] [stdlib] Fix: Add null terminator to the buffer inside `__str__` method String really has a hard time working without this null terminator and I feel this PR is a step towards fixing #2687 without requirering controversial changes like the ones in #2691. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2787 MODULAR_ORIG_COMMIT_REV_ID: 95f438ebbb6e076fe6ffb79debf7cd4fd70248de
Related to #2687
Fix #2392
Also related to #2678
Description of the problem:
EDIT: Please note that now, an empty string will allocate one byte on the heap for the null terminator. This will degrade the performance of some programs, the tradeoff is more safety. This performance penalty will go away with SSO, as the null terminator will be on the stack, which I hope can be implemented soon-ish when #2637 is fixed