[stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` #2896

gabrieldemarmiesse · 2024-05-30T21:08:26Z

Finally found the culprit in the flakyness that plagued us since a few week in the test_reversed.mojo.

The actual bug:

When iterating over a list in reverse order, we should start at len(my_list) - 1 not at len(my_list).
That triggered an out of bounds access and thus was undefined behavior.

The effect on our CI

As you know, we have been seeing flakyness lately. It was documented a number of times and always related to reverse:

[BUG] Flaky test, test dict popteim ...get: wrong variant type #2866 (comment)
[BUG] builtin/test_reversed.mojo for stdlib works with mojo build, non-deterministically crashes when run in JIT mode via mojo <file> #2369

Why was it passing sometimes?

This is because there were Optional[...] in the List. Thus if the flag of the Optional says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: ["a", "b", "c", "d"] None
but the last None is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get wrong variant type since we don't know what happens to the memory between None check and access.

Why wasn't it found earlier?

First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs.

Well we did have debug_assert before getting the element of the List, but this debug_assert looked like this in the dict iterator:

            @parameter
            if forward:
                debug_assert(
                    self.index < self.src[]._reserved, "dict iter bounds"
                )
            else:
                debug_assert(self.index >= 0, "dict iter bounds")

So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better debug_assert would have been

debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds")

When I worked on my PR #2718 the condition self.index < self.src[]._reserved didn't trigger anything since it was in the wrong branch, it was never executed.

Also before, __get_ref didn't have any bounds checks, even when assertions were enabled.

A recent commit 8d0870e adds unsafe_get() in List and make __get_ref use it. It also adds debug_assert to unsafe_get(), which means that now __get_ref has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable.

Since we have this, the debug_assert in dict.mojo isn't necessary anymore.

Consequences on ongoing work:

This fix have been also added to [stdlib] Enable assertions in unit tests #2718
The PR [stdlib] Support Dict popitem #2701 that we did with @jayzhan211 was actually correct. It was just using reverse(Dict.items()) which was buggy at the time. After the fix is merged, we can re-revert this PR.
[stdlib] Implement Dict.popitem() method #2794 is not necessary anymore since the implementation by @jayzhan211 was correct.
The real cause of [BUG] Flaky test, test dict popteim ...get: wrong variant type #2866 was found, the issue has already been closed though.
[BUG] builtin/test_reversed.mojo for stdlib works with mojo build, non-deterministically crashes when run in JIT mode via mojo <file> #2369 can be closed for good.
[stdlib] Add some logging to test_reverse.mojo to flush out a flaky bug #2832 can be closed for good.

Closing thoughts

We really need to run the unit tests with assertions enabled and add assertions whenever necessary
The dict implementation is a bit too complicated. For example, self._reserved is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use len(self._entries) instead. I guess this is a relic of the time when List wasn't completely flushed out. If had done so, it would have been ovious that we can't do my_list.__get_ref(len(my_list))
Iterating manually over a list like this is bug-prone. The implementation we have especially is, since

                @parameter
                if forward:
                    self.index += 1
                else:
                    self.index -= 1

is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs.

We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the dict.mojo.

…())` Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>

gabrieldemarmiesse · 2024-05-30T21:09:27Z

stdlib/src/collections/dict.mojo

@@ -85,15 +85,6 @@ struct _DictEntryIter[
    @always_inline
    fn __next__(inout self) -> Self.ref_type:
        while True:
-
-            @parameter
-            if forward:


This is not necessary anymore since __get_ref has bounds checks, see the detailed explanation of the bug.

JoeLoser

This is amazing! Thanks so much for digging into this and fixing it, @gabrieldemarmiesse! ❤️

JoeLoser · 2024-05-30T21:15:06Z

!sync

Mogball · 2024-05-30T21:23:50Z

Nice! Great work!

rd4com · 2024-05-30T23:00:04Z

🎉 Great work! 🏆

jayzhan211

Nice catch! 💯

modularbot · 2024-05-31T04:44:15Z

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

modularbot · 2024-05-31T13:03:46Z

Landed in 4d089aa! Thank you for your contribution 🎉

@parameter

…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87

JoeLoser · 2024-05-31T21:34:41Z

stdlib/src/collections/list.mojo

-                "The index provided must be within the range [0, len(List) -1]"
-                " when using List.unsafe_get()"
+                "The index provided must be within the range [0, "
+                + str(len(self[]) - 1)


FYI I had to drop these internally since you can't use str(...) here as it will fail to interpret for cases where List is used at compile time since str(...) doesn't work at compile time which is a known issue.

@parameter

…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87

[stdlib] Fix UB in reversed(Dict.values()) and `reversed(Dict.items…

deb0eb1

…())` Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>

gabrieldemarmiesse requested a review from a team as a code owner May 30, 2024 21:08

gabrieldemarmiesse commented May 30, 2024

View reviewed changes

JoeLoser self-assigned this May 30, 2024

JoeLoser approved these changes May 30, 2024

View reviewed changes

modularbot added the imported-internally Signals that a given pull request has been imported internally. label May 30, 2024

msaelices mentioned this pull request May 30, 2024

[stdlib] Implement Dict.popitem() method #2794

Closed

2 tasks

jayzhan211 approved these changes May 30, 2024

View reviewed changes

modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels May 31, 2024

modularbot closed this May 31, 2024

JoeLoser reviewed May 31, 2024

View reviewed changes

gabrieldemarmiesse deleted the fix_flaky_reversed_test branch May 31, 2024 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` #2896

[stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` #2896

gabrieldemarmiesse commented May 30, 2024

gabrieldemarmiesse May 30, 2024 •

edited

Loading

JoeLoser left a comment

JoeLoser commented May 30, 2024

Mogball commented May 30, 2024

rd4com commented May 30, 2024

jayzhan211 left a comment

modularbot commented May 31, 2024

modularbot commented May 31, 2024

JoeLoser May 31, 2024

[stdlib] Fix UB in reversed(Dict.values()) and reversed(Dict.items()) #2896

[stdlib] Fix UB in reversed(Dict.values()) and reversed(Dict.items()) #2896

Conversation

gabrieldemarmiesse commented May 30, 2024

The actual bug:

The effect on our CI

Why was it passing sometimes?

Why wasn't it found earlier?

Consequences on ongoing work:

Closing thoughts

gabrieldemarmiesse May 30, 2024 • edited Loading

Choose a reason for hiding this comment

JoeLoser left a comment

Choose a reason for hiding this comment

JoeLoser commented May 30, 2024

Mogball commented May 30, 2024

rd4com commented May 30, 2024

jayzhan211 left a comment

Choose a reason for hiding this comment

modularbot commented May 31, 2024

modularbot commented May 31, 2024

JoeLoser May 31, 2024

Choose a reason for hiding this comment

[stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` #2896

[stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` #2896

gabrieldemarmiesse May 30, 2024 •

edited

Loading