Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport parent validity check for the GC mark loop #53

Closed
wants to merge 1 commit into from

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Sep 18, 2023

Backports JuliaLang#49741 and JuliaLang#49902.

I found more convenient to bundle these two upstream PRs into a single backport PR since one is a follow-up for the other, but I can split into separate backports if it makes reviewing easier.

Example of how it works in practice:

a = Ref(1)
ptr = pointer_from_objref(a) - 8
unsafe_store!(Ptr{Int}(ptr), ptr)
GC.gc()

gives us the following error message:

GC error (probable corruption)
Allocations: 15522 (Pool: 15497; Big: 25); GC: 0
Parent 0x11dd4fc50
of type:
Module
While marking child at 0x10eb9c310
of type:
Core.ReturnNode(val=SSAValue(1))

[79389] signal (6): Abort trap: 6
in expression starting at /Users/dnetto/RAI/julia-RAI/test.jl:4
signal (6) thread (0) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 15522 (Pool: 15497; Big: 25); GC: 0
zsh: abort      ./julia test.jl

…ng#49741)

* Attempting to add debug logs for ENQUEUING an invalid object

Check for the object's validity _before enqueuing_
so that we can hopefully give a more useful error message (which
object's pointer was corrupted).

---------

Co-authored-by: Diogo Netto <diogonetto.dcn@gmail.com>

show mark-queue on GC critical error (JuliaLang#49902)
@@ -1856,14 +1856,37 @@ STATIC_INLINE uintptr_t gc_read_stack(void *_addr, uintptr_t offset,
return *(uintptr_t*)real_addr;
}

JL_NORETURN NOINLINE void gc_assert_datatype_fail(jl_ptls_t ptls, jl_datatype_t *vt,
jl_gc_markqueue_t *mq) JL_NOTSAFEPOINT
#define GC_ASSERT_PARENT_VALIDITY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not defined upstream I think? So this is not an exact backport? If this is added only for us, then it should be called out explicitly and maybe be a separate commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is added only for us, then it should be called out explicitly and maybe be a separate commit?

Sounds good. I left it defined for now since I want to run a few benchmarks to study what is performance penalty of leaving this defined.

Though I should have been more explicit about it. Will add a note on that. Thanks for catching it.

@d-netto
Copy link
Member Author

d-netto commented Sep 18, 2023

NOTE: GC_ASSERT_PARENT_VALIDITY is defined for now, but it's not defined upstream. We should analyze what is the performance penalty of leaving this defined before merging.

@@ -1856,14 +1856,37 @@ STATIC_INLINE uintptr_t gc_read_stack(void *_addr, uintptr_t offset,
return *(uintptr_t*)real_addr;
}

JL_NORETURN NOINLINE void gc_assert_datatype_fail(jl_ptls_t ptls, jl_datatype_t *vt,
jl_gc_markqueue_t *mq) JL_NOTSAFEPOINT
#define GC_ASSERT_PARENT_VALIDITY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-netto this is you setting this option to enable the flag, right?

Please separate that out, so that this commit is only the backport of the option, and then we should have a separate commit that actually enables the option. That way, we could drop this patch when upgrading to 1.10, but keep the patch that enables the feature.

Can you put this in a separate commit, and also in a separate file? I think this should be in our Make.user or something like that, right? Or in a nix file? Is there a place where we aggregate these build flags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This option was set by me: I just want to run some benchmarks to see what is the performance penalty of doing that, though I plan to disable it before merging. See note.

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Comment or remove stale label, or this PR will be closed in 5 days.

@github-actions github-actions bot added the stale This pull request is inactive label Oct 19, 2023
kpamnany pushed a commit that referenced this pull request Oct 19, 2023
This should be squashed with #53
when that is merged.
@github-actions github-actions bot closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This pull request is inactive
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants