-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Add the normalize_index
function
#2677
[stdlib] Add the normalize_index
function
#2677
Conversation
Re your (1) comment about I think we can do something like |
Here you go: #2687 :) |
|
@laszlokindrat Feel free to review when you have the time. The rebase is done :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! Let's have a discussion on some of these design and implementation questions, before we move forward.
stdlib/src/utils/static_tuple.mojo
Outdated
var normalized_idx = int(index) | ||
if normalized_idx < 0: | ||
normalized_idx += size | ||
var normalized_index = normalize_index[container_name="InlineArray"]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is subtly different behavior: debug_assert
can be disabled (in which case there is no bound checking, no overhead, but bad things can happen); the implementation you introduced, however, always performs the bounds check, which guarantees that this becomes more expensive.
@JoeLoser Any thoughts on whether this should be a "safe" method or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. This is intentional. I followed the instructions from @JoeLoser here: #2386 (comment)
Since having a performant access to the elements in any array is desirable, we could get some inspiration from rust which does have the same challenges. Rust solves it by having safe access with [i]
and unsafe access with get_unchecked
.
What I would recommend here is to do something similar, as unsafe things must be explicitly unsafe using an appropriate name (from the stdlib vision document). When this PR gets merged, I can open an issue (which can be a good first issue!) to add a second method to the InlineArray unsafe_get()
where normalized_index
isn't called, and we don't wrap the index when there are negative values. A debug_assert will be added in unsafe_get()
for better debuggability.
I'll answer here concerning the heap memory allocation. I think there are a few points to add:
- The heap allocation will be used only when the "safe" get is used. Thus if the platform doesn't support it, the developper can use
unsafe_get()
- We should strive for good error messages in the general/safe case. It's fine to have bad/no error messages in the other situations.
- When we have SSO and can control the size of the internal String buffer, we can make it big enough in this precise code path to make sure that the String bytes are always allocated on the stack (buffer) and never on the heap. When this is done, the platforms where heap allocations aren't possible will be able to use the safe method to get an item by index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for clarifying, I'm glad you already had this discussion. I'm okay with the goal here, but it might be risky to merge this since it will introduce overhead for existing (internal) users. I can push this forward, but it would be great if you could fast follow with the implementation of an unsafe accessors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it in this PR now instead. Let's not take unnecessary risks with internal breakages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laszlokindrat I added unsafe_get to have fast access :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you! Please rebase on the latest nightly, and I'll pull this in.
stdlib/src/utils/static_tuple.mojo
Outdated
var normalized_idx = int(index) | ||
if normalized_idx < 0: | ||
normalized_idx += size | ||
var normalized_index = normalize_index[container_name="InlineArray"]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for clarifying, I'm glad you already had this discussion. I'm okay with the goal here, but it might be risky to merge this since it will introduce overhead for existing (internal) users. I can push this forward, but it would be great if you could fast follow with the implementation of an unsafe accessors.
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
9439280
to
5d859b1
Compare
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
!sync |
[External] [stdlib] Apply Indexer trait to stdlib containers The second half of #2384 I have intentionally omitted `static_tuple.mojo` to avoid conflicting with #2677 Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com> Closes #2722 MODULAR_ORIG_COMMIT_REV_ID: 365298938b36c3f4f9ad0067dfcb3856de369875
[External] [stdlib] Apply Indexer trait to stdlib containers The second half of modularml#2384 I have intentionally omitted `static_tuple.mojo` to avoid conflicting with modularml#2677 Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com> Closes modularml#2722 MODULAR_ORIG_COMMIT_REV_ID: 365298938b36c3f4f9ad0067dfcb3856de369875
[External] [stdlib] Add method `unsafe_get` to `List` See #2677 (comment) for the background about this method. We are currently missing methods to access elements in collections without any bounds checks or wraparound, for maximum performance. I suggest that we introduce `unsafe_get` and `unsafe_set` to our List-like collections. This is equivalent to * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked_mut in Rust. This should prove useful to makers of high performance libraries like Max :p We can then make `__getitem__` and `__setitem__` as safe as we want without impacting the power users. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2800 MODULAR_ORIG_COMMIT_REV_ID: 5321c191d83262f240c5d8d5ac77afccaa00a7ba
@gabrieldemarmiesse Sorry this has been taking so long, but we are running into several internal CI failures related to this and a lot has changed recently. Could you please rebase this on the next nightly? Note that there will be a |
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@laszlokindrat done! |
[External] [stdlib] Add method `unsafe_get` to `List` See modularml#2677 (comment) for the background about this method. We are currently missing methods to access elements in collections without any bounds checks or wraparound, for maximum performance. I suggest that we introduce `unsafe_get` and `unsafe_set` to our List-like collections. This is equivalent to * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked_mut in Rust. This should prove useful to makers of high performance libraries like Max :p We can then make `__getitem__` and `__setitem__` as safe as we want without impacting the power users. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2800 MODULAR_ORIG_COMMIT_REV_ID: 5321c191d83262f240c5d8d5ac77afccaa00a7ba
[External] [stdlib] Add method `unsafe_get` to `List` See modularml#2677 (comment) for the background about this method. We are currently missing methods to access elements in collections without any bounds checks or wraparound, for maximum performance. I suggest that we introduce `unsafe_get` and `unsafe_set` to our List-like collections. This is equivalent to * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked_mut in Rust. This should prove useful to makers of high performance libraries like Max :p We can then make `__getitem__` and `__setitem__` as safe as we want without impacting the power users. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2800 MODULAR_ORIG_COMMIT_REV_ID: 5321c191d83262f240c5d8d5ac77afccaa00a7ba Signed-off-by: Avinag <udayagiriavinag@gmail.com>
[External] [stdlib] Add method `unsafe_get` to `List` See modularml#2677 (comment) for the background about this method. We are currently missing methods to access elements in collections without any bounds checks or wraparound, for maximum performance. I suggest that we introduce `unsafe_get` and `unsafe_set` to our List-like collections. This is equivalent to * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked_mut in Rust. This should prove useful to makers of high performance libraries like Max :p We can then make `__getitem__` and `__setitem__` as safe as we want without impacting the power users. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2800 MODULAR_ORIG_COMMIT_REV_ID: 5321c191d83262f240c5d8d5ac77afccaa00a7ba
[External] [stdlib] Add method `unsafe_get` to `List` See modularml#2677 (comment) for the background about this method. We are currently missing methods to access elements in collections without any bounds checks or wraparound, for maximum performance. I suggest that we introduce `unsafe_get` and `unsafe_set` to our List-like collections. This is equivalent to * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked_mut in Rust. This should prove useful to makers of high performance libraries like Max :p We can then make `__getitem__` and `__setitem__` as safe as we want without impacting the power users. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2800 MODULAR_ORIG_COMMIT_REV_ID: 5321c191d83262f240c5d8d5ac77afccaa00a7ba Signed-off-by: Avinag <udayagiriavinag@gmail.com>
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
[External] [stdlib] Add the `normalize_index` function This PR is some kind of mix between #2386 and #2384 which have issues (aborting) or have too many conflicts because the PR is too big. This PR solves part of #2251 and #2337 We try here to give the ground work for indexing correctly. This function added can then be used wherever we work with sequences. Two things I noticed during development: 1) The `debug_assert` does not run in unit tests. Is there any way to enable it? We currently have out-of-bounds bugs in our test suite. 2) The null terminator is causing pain, again, again, and again. Do we have any plans to make it optional when working with String? I opened #2678 to discuss this. To avoid to fix those issues in this PR, I used the `normalize_index` on the `__refitem__` of `InlineArray` which doesn't have widespread use yet and isn't impacted by out-of-bounds bugs. My recommendation would be to merge this PR then to rebase #2386 and #2384 on it. We should also afterwards fix the out of bounds bugs that can be triggered in the test suite by enabling debug_assert. The diff might seem big, but no worries, it's mostly the licenses and docstrings :) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2677 MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
Landed in cb2e75b! Thank you for your contribution 🎉 |
Now that this has been merged, we should start using it more widely. @gabrieldemarmiesse @laszlokindrat is there any reason I might have missed that means I should hold off on filing an issue for using this more widely? |
I think we can open an issue. At the time of merge there were a few PRs in flight, we didn't know if there was work left to do on the subject. Now that the dust has settled, it's true that it would be a good issue to beginners. Let's not forget in the issue to mention that we would like on function change per PR to avoid a big PR changing all the indexing for the stdlib |
Thanks Gabriel. Filed #2948 to track this. From grepping through the stdlib code it looks like there aren't that many uses of this pattern overall, so the size of resulting PRs may end up not to be too huge. I included a note though asking contributors not to do more than 4-5 call-sites in one PR. |
[External] [stdlib] Apply Indexer trait to stdlib containers The second half of #2384 I have intentionally omitted `static_tuple.mojo` to avoid conflicting with #2677 Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com> Closes #2722 MODULAR_ORIG_COMMIT_REV_ID: 365298938b36c3f4f9ad0067dfcb3856de369875
[External] [stdlib] Add method `unsafe_get` to `List` See #2677 (comment) for the background about this method. We are currently missing methods to access elements in collections without any bounds checks or wraparound, for maximum performance. I suggest that we introduce `unsafe_get` and `unsafe_set` to our List-like collections. This is equivalent to * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked * https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked_mut in Rust. This should prove useful to makers of high performance libraries like Max :p We can then make `__getitem__` and `__setitem__` as safe as we want without impacting the power users. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2800 MODULAR_ORIG_COMMIT_REV_ID: 5321c191d83262f240c5d8d5ac77afccaa00a7ba
[External] [stdlib] Add the `normalize_index` function This PR is some kind of mix between #2386 and #2384 which have issues (aborting) or have too many conflicts because the PR is too big. This PR solves part of #2251 and #2337 We try here to give the ground work for indexing correctly. This function added can then be used wherever we work with sequences. Two things I noticed during development: 1) The `debug_assert` does not run in unit tests. Is there any way to enable it? We currently have out-of-bounds bugs in our test suite. 2) The null terminator is causing pain, again, again, and again. Do we have any plans to make it optional when working with String? I opened #2678 to discuss this. To avoid to fix those issues in this PR, I used the `normalize_index` on the `__refitem__` of `InlineArray` which doesn't have widespread use yet and isn't impacted by out-of-bounds bugs. My recommendation would be to merge this PR then to rebase #2386 and #2384 on it. We should also afterwards fix the out of bounds bugs that can be triggered in the test suite by enabling debug_assert. The diff might seem big, but no worries, it's mostly the licenses and docstrings :) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2677 MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
This PR is some kind of mix between #2386 and #2384 which have issues (aborting) or have too many conflicts because the PR is too big.
This PR solves part of #2251 and #2337
We try here to give the ground work for indexing correctly. This function added can then be used wherever we work with sequences.
Two things I noticed during development:
debug_assert
does not run in unit tests. Is there any way to enable it? We currently have out-of-bounds bugs in our test suite.To avoid to fix those issues in this PR, I used the
normalize_index
on the__refitem__
ofInlineArray
which doesn't have widespread use yet and isn't impacted by out-of-bounds bugs.My recommendation would be to merge this PR then to rebase #2386 and #2384 on it. We should also afterwards fix the out of bounds bugs that can be triggered in the test suite by enabling debug_assert.
The diff might seem big, but no worries, it's mostly the licenses and docstrings :)