Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make backtrace buffer handling more systematic #33277

Merged
merged 1 commit into from
Oct 15, 2019
Merged

Conversation

c42f
Copy link
Member

@c42f c42f commented Sep 16, 2019

Increase expressibility of what can be stored in backtrace buffers, while ensuring that the GC can find roots without knowing about the detail.

To do this, introduce a new "extended backtrace entry" format which carries along the number of roots and other data in a bitpacked format. This allows the backtrace buffer to be traversed and the roots collected in a general way, without the GC knowing about interpreter frames. Use
this to add the module to InterperterIP so that the module of interpreted top level thunks can be known (this is infrastructure to help with #33065.)

In the future the extended entry format should allow us to be a lot more flexible with what can be stored in a backtrace. For example, we could

  • Compress the backtrace cycles of runaway recursive functions so that stack overflows are much more likely to fit in the fixed-size bt_data array.
  • Integrate external or other types of interpreter frames into the backtrace machinery.

Implementation details

[Edit] To quote the code comments regarding the buffer format:

A backtrace buffer conceptually contains a stack of instruction pointers ordered from the inner-most frame to the outermost. We store them in a special raw format for two reasons:

  • Efficiency: Every throw() must populate the trace so it must be as efficient as possible.
  • Signal safety: For signal-based exceptions such as StackOverflowError the trace buffer needs to be filled from a signal handler where most operations are not allowed (including malloc) so we choose a flat preallocated buffer.

The raw buffer layout contains "frame entries" composed of one or several jl_bt_element_t values. From the point of view of the GC, an entry is either:

  1. A single instruction pointer to native code, not GC-managed.
  2. An "extended entry": a mixture of raw data and pointers to julia objects which must be treated as GC roots.

An extended entry e is made up of several jl_bt_element_t values:

  • e[0] JL_BT_NON_PTR_ENTRY - Special marker to distinguish extended entries
  • e[1] tags - A bit packed uintptr_t containing a tag and the number of GC- managed and non-managed values
  • e[2+j] - GC managed data
  • e[2+ngc+i] - Non-GC-managed data

The format of tags is, from LSB to MSB:

  • 0:2 ngc - Number of GC-managed pointers for this frame entry
  • 3:5 nptr - Number of non-GC-managed buffer elements
  • 6:9 tag - Entry type
  • 10:... header - Entry-specific header data

@c42f c42f added error handling Handling of exceptions by Julia or the user bugfix This change fixes an existing bug labels Sep 16, 2019
@c42f c42f requested a review from vtjnash September 16, 2019 14:42
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

base/stacktraces.jl Outdated Show resolved Hide resolved
src/julia_internal.h Outdated Show resolved Hide resolved
@c42f
Copy link
Member Author

c42f commented Sep 17, 2019

linux32 test failure looked like it could be real, but the test logs leave no clue which actual line failed or why (https://build.julialang.org/#/builders/13/builds/5336/steps/2/logs/stdio). I've updated the tests to be more concrete and hopefully provide a bit more of a clue what is wrong.

@c42f
Copy link
Member Author

c42f commented Sep 17, 2019

Welp, linux32 test passed this time around and the original failure did seem like it would be surprising if it were related. The win64 packaging error is a problem with code signing (tests passed).

Should I go ahead and merge this? I'd like to use the module tracking as a base for trying out better formatting heuristics for backtraces, eg #33065 (comment). (Also as a base for trying out some changes in #33065 (comment).)

@c42f
Copy link
Member Author

c42f commented Sep 17, 2019

Ah, unfortunate clash with the nice work in #33190. That's a lot of conflicts.

@vtjnash
Copy link
Member

vtjnash commented Sep 17, 2019

Yeah, mostly trivial, but we both seem to have added an argument to one of the functions

src/stackwalk.c Outdated Show resolved Hide resolved
@c42f
Copy link
Member Author

c42f commented Sep 25, 2019

Part of this has been extracted into #33380. I'll rebase this once that one is merged.

@c42f c42f force-pushed the cjf/extended-bt-entries branch 2 times, most recently from 273ee60 to c3d498a Compare October 3, 2019 05:14
@c42f
Copy link
Member Author

c42f commented Oct 3, 2019

Rebased. I think this is basically good to go. Should be almost entirely internal cleanup / refactoring, with the slight exception of adding a field to Base.InterpreterIP.

@c42f
Copy link
Member Author

c42f commented Oct 8, 2019

Bump. Does anyone want a closer look at this before merging? @maleadt perhaps you might be interested as this seems relevant to your memory profiler work.

Musing over the design here a little, another way to look at this is that it evolves backtrace buffers toward a special purpose serialization format which is:

  • async signal safe for writing
  • fairly fast and compact for the specific purpose of recording backtraces of mostly-native frames
  • understood by the GC

I still think it's worth merging more or less as it is, but I do wonder where this leads in the longer term (how many serialization formats do we want, really?)

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For allocation tracking, I guess it would be best to add another extended entry frame containing the allocation info (as opposed to making the top frame an allocation one and putting the instruction pointer in the header field like InterpreterIP does, since allocations can come from the interpreter too and an extended entry can't have multiple tags)?

src/julia_internal.h Outdated Show resolved Hide resolved
src/julia_internal.h Outdated Show resolved Hide resolved
src/julia_internal.h Outdated Show resolved Hide resolved
@c42f
Copy link
Member Author

c42f commented Oct 15, 2019

Thanks for having a look! I'll rebase to fix the conflicts again (looks like a collision with #33524) and attempt a little rewording for clarification.

For allocation tracking, I guess it would be best to add another extended entry frame containing the allocation info

Yes I think that would probably be good. If I understand correctly the allocation profiling just dumps the records in the buffer and doesn't need to do anything with them until the end of the profiling run. So serializing the allocation info interleaved/associated with the backtrace data like this should be fairly natural and convenient I guess.

Increase expressibility of what can be stored in backtrace buffers,
while ensuring that the GC can find roots without knowing about the
detail.

To do this, introduce a new "extended backtrace entry" format which
carries along the number of roots and other data in a bitpacked format.
This allows the backtrace buffer to be traversed and the roots collected
in a general way, without the GC knowing about interpreter frames. Use
this to add the module to InterperterIP so that the module of
interpreted top level thunks can be known.

In the future the extended entry format should allow us to be a lot more
flexible with what can be stored in a backtrace. For example, we could
* Compress the backtrace cycles of runaway recursive functions so that
  stack overflows are much more likely to fit in the fixed-size bt_data
  array.
* Integrate external or other types of interpreter frames into the
  backtrace machinery.
@c42f
Copy link
Member Author

c42f commented Oct 15, 2019

Ok I've tried a little renaming and rewording as suggested; hopefully that will be clearer now.

I'll go ahead and merge once CI has run.

@c42f c42f merged commit 643ec18 into master Oct 15, 2019
@c42f c42f deleted the cjf/extended-bt-entries branch October 15, 2019 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug error handling Handling of exceptions by Julia or the user
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants