Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to add debug logs for ENQUEUING an invalid object #49741

Merged
merged 3 commits into from
May 19, 2023

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented May 10, 2023

We were seeing GC Corruption when popping an invalid object, but we don't know how we got this invalid pointer. (And we can't repro on linux so can't use RR.)

This is an attempt to check for the object's validity before enqueuing so that we can hopefully give a more useful error message (which object's pointer was corrupted).

But for some reason it's currently complaining about the very first push....

src/gc.c Outdated Show resolved Hide resolved
@d-netto
Copy link
Member

d-netto commented May 19, 2023

Fixed some compiler warnings and inserted the gc_assert_parent_validity in the places where the parent validity should be checked when enqueueing an object.

On the example from #21590 (with GC_ASSERT_PARENT_VALIDITY defined):

a = Ref(1)
ptr = pointer_from_objref(a) - 8
unsafe_store!(Ptr{Int}(ptr), ptr)
GC.gc()
GC error (probable corruption)
Allocations: 15177 (Pool: 15157; Big: 20); GC: 0
Parent 0x11809dba0
of type:
Core.Binding
While marking child at 0x1180f4350
of type:
Core.ReturnNode(val=SSAValue(1))

The performance of keeping GC_ASSERT_PARENT_VALIDITY enabled needs to be further assessed in order to justify the decision of keeping under a debug flag or enabling it by default.

@d-netto d-netto marked this pull request as ready for review May 19, 2023 19:58
NHDaly and others added 3 commits May 19, 2023 16:59
We were seeing GC Corruption when _popping_ an invalid object, but we
don't know how we got this invalid pointer. (And we can't repro on linux
so can't use RR.)

This is an attempt to check for the object's validity _before enqueuing_
so that we can hopefully give a more useful error message (which
object's pointer was corrupted).

But for some reason it's currently complaining about the very first push....
- Remove JL_NORETURN
- Account for changes in #49556
@d-netto d-netto force-pushed the nhd-debugging-gc-corruption-macos branch from d5bd127 to 9fdc2d1 Compare May 19, 2023 19:59
Copy link
Member Author

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks again for picking this up and carrying it forward, @d-netto!! :)

The hardest part to me will be ensuring that we don't ever miss a spot and push an object without including this assertion there. (Whereas before, when it was on the pop side, it was easier to ensure it always happens.) But I think it should be pretty doable.

Thanks!

src/gc.c Outdated Show resolved Hide resolved
@d-netto d-netto merged commit 6d70d2a into master May 19, 2023
@d-netto d-netto deleted the nhd-debugging-gc-corruption-macos branch May 19, 2023 21:49
d-netto added a commit to RelationalAI/julia that referenced this pull request Sep 18, 2023
…ng#49741)

* Attempting to add debug logs for ENQUEUING an invalid object

Check for the object's validity _before enqueuing_
so that we can hopefully give a more useful error message (which
object's pointer was corrupted).

---------

Co-authored-by: Diogo Netto <diogonetto.dcn@gmail.com>
d-netto pushed a commit to RelationalAI/julia that referenced this pull request Sep 18, 2023
…ng#49741)

* Attempting to add debug logs for ENQUEUING an invalid object

Check for the object's validity _before enqueuing_
so that we can hopefully give a more useful error message (which
object's pointer was corrupted).

---------

Co-authored-by: Diogo Netto <diogonetto.dcn@gmail.com>

show mark-queue on GC critical error (JuliaLang#49902)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants