Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Revise breadcrumb_writer_t ownership #6766

Merged
merged 1 commit into from
Jun 18, 2019
Merged

Revise breadcrumb_writer_t ownership #6766

merged 1 commit into from
Jun 18, 2019

Conversation

sdmaclea
Copy link

@sdmaclea sdmaclea commented Jun 11, 2019

Make breadcrumb_writer_t own m_files
Manage breadcrumb_writer_t with a shared pointer
Prevent premature destruction by adding a shared pointer for the background thread

Fixes #4646 on master

@sdmaclea sdmaclea added this to the 3.0 milestone Jun 11, 2019
@sdmaclea sdmaclea self-assigned this Jun 11, 2019
Copy link

@lpereira lpereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to reproduce this race condition? I'm curious to understand why this happened in the first place.

The patch LGTM with comments, though.

{
if (m_enabled)
auto instance = std::make_shared<breadcrumb_writer_t>(files);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're cleaning this up, would it be possible to get rid of begin_write() and end_write(), and use only the constructor/destructor of breadcrumb_writer_t? If that's possible, then make_shared() can be called in run_app_for_context(), and the branch to determine if end_write() has to be called can be junked. This would simplify this class a little bit.

Another thing that might be possible here is to get rid of breadcrumb_writer_t entirely and capture the breadcrumbs with a lambda passed directly to std::thread. I like the call to swap() to own the breadcrumbs map, though, and I don't know if there's a good way to do something similar without it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this could potentially simplify things: https://gist.github.com/lpereira/48403e8540524b7854743d68b6dd879d

@sdmaclea
Copy link
Author

sdmaclea commented Jun 13, 2019

@lpereira

Were you able to reproduce this race condition? I'm curious to understand why this happened in the first place.

I was not able to reproduce the failure.

My working hypothesis is that the startup failure in coreclr is leaving the stack in a corrupted state as the process is exiting. Coreclr does lots of magic to handle exceptions so there could be a case which we missed.

I believe this is actually a symptom of a fatal failure in coreclr. The AV is masking the real failure.

My strategy is to move all the thread state off the main stack.

To allow for the join I needed a shared pointer, or a thread object.

I chose a shared pointer because I know exactly what state is stored within it. I can reason about what will happen if

  • its destructor is prematurely called
  • its destructor is not called
  • it is corrupted through a random stack overwrite (but destructor is not called)
  • it is corrupted through a random stack overwrite (then destructor is called)

In general I like your gist, but I have a much harder time reasoning about the layout of the main stack. I don't really know what assumptions the compiler is allowed to make. Or what state the thread object contains.

Also for backporting to the 2.2 branch, I wanted to keep the major functional changes in the breadcumb* files.

Copy link
Member

@elinor-fung elinor-fung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the existing code / original fix, it looks like the intention is to always finish writing out the breadcrumbs if exception handling is done is such a way that the destructor is called. Would that no longer be the case now?

Re: how the failure happend in the first place. Maybe silly question, but are we sure that the aspnet side is compiled in such a way that the destructor of breadcrumb_writer_t would have been called? I think that means compiling with \EHa, not just using __try/__except (not sure - the docs are a bit complicated and I have not really dealt with the exception handling model).

~breadcrumb_writer_t();

void begin_write();
breadcrumb_writer_t(std::unordered_set<pal::string_t> &files);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the constructor private now? It should always be created through begin_write now, right?

Copy link
Author

@sdmaclea sdmaclea Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first approach. Unfortunately, std::make_shared requires a public constructor. We could use new and assign a bare pointer to the shared_ptr.

@sdmaclea
Copy link
Author

are we sure that the aspnet side is compiled in such a way that the destructor of breadcrumb_writer_t would have been called?

I don't really know. I doubt it. I know when I was working on adding tracing of unhandled exceptions for this same host, it was very difficult to get the exception handling right (when prototyping in corerun). I was only able to get it right when I did it inside coreclr with the pal TRY/EXCEPT macros.

I was trying to make sure breadcrumb writer didn't fail even if the host exception handling was wrong.

@sdmaclea
Copy link
Author

sdmaclea commented Jun 13, 2019

the destructor is called. Would that no longer be the case now?

It would be the case if the smart_ptr destructor is called.
For the apparent case where the destructor is not called, it still would not be the case.

Edit - it would no longer be the case. If we cared, It could possibly be added back.

I am not particularly worried about the breadcrumbs finishing. These fatal exceptions should really only occur in development scenarios. I think of breadcrumbs as being important in a deployed scenario.

@sdmaclea
Copy link
Author

@pakrym Do you know how the asp.net iis host is configured to catch exceptions? Or how can we find out?

@pakrym
Copy link

pakrym commented Jun 17, 2019

cc @jkotalik

@sdmaclea
Copy link
Author

sdmaclea commented Jun 17, 2019

I think that means compiling with \EHa, not just using __try/__except (not sure - the docs are a bit complicated and I have not really dealt with the exception handling model).

My read of the docs says you are correct.

@steveharter's design for #4315 explicitly stated \EHa was required in the host

Looking at the IIS source the exception method seems to be set to \EHs. The destructors are not guaranteed to be called in this case and #4315 will have no affect.

This PR should fix the breadcrumb AV when there is a SEH (asynchronous exception) on the main thread.

Make breadcrumb_writer_t own m_files
Manage breadcrumb_writer_t with a shared pointer
Prevent premature destruction by adding a shared pointer for the background thread
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AV and AppVerifier failure when exception is being thrown in main
5 participants