Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush instruction cache after thunk pool allocation #75393

Merged
merged 2 commits into from
Sep 10, 2022

Conversation

jkotas
Copy link
Member

@jkotas jkotas commented Sep 10, 2022

Fixes #74710

@filipnavara
Copy link
Member

I will rebase #75264 if this lands first.

@filipnavara
Copy link
Member

I noticed the same issue when fixing the code for macOS ARM64 so I am actually glad to have a more complete fix. Not being too familiar with the ARM64 architecture I have a follow-up question. Since the data pages are written separately (from managed code) is there any need to flush data cache for these pages too? I originally came to the conclusion that it should not be necessary but since you are way more familiar with the code I assume you'll know better.

@jkotas
Copy link
Member Author

jkotas commented Sep 10, 2022

The data pages will get written to when the marshalled delegate is created. Locks taken during that process should take care of the data catches flushing.

@VSadov
Copy link
Member

VSadov commented Sep 10, 2022

We probably see Windows issues due to this as well, perhaps not as often for some reason.

@jkotas
Copy link
Member Author

jkotas commented Sep 10, 2022

We probably see Windows issues due to this as well, perhaps not as often for some reason.

Yes.

This path is only hit by marshalled delegates. We tried to rid of marshalled delegates from everywhere in product code and replace them with function pointers. We have hit the bug in libraries test only because this one marshalled delegate was missed during the conversion. I am going to submit a separate PR to convert this marshalled delegate to a function pointer.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It was probably hard to figure the root cause.

@VSadov
Copy link
Member

VSadov commented Sep 10, 2022

With this and the dealings with JIT memory on OSX in the other PR, I wonder - is there a configuration where NativeAOT does not use executable memory? (even if with some tradeoffs)

@jkotas
Copy link
Member Author

jkotas commented Sep 10, 2022

is there a configuration where NativeAOT does not use executable memory? (even if with some tradeoffs)

We have a configuration under ifdef (that is currently disabled) that compiles these entrypoints into the binary as static code, and then creates copies of this static code if necessary using file mapping. It is not very portable and makes the binary bigger. We can make it work if somebody asks for it.

@MichalPetryka
Copy link
Contributor

is there a configuration where NativeAOT does not use executable memory? (even if with some tradeoffs)

We have a configuration under ifdef (that is currently disabled) that compiles these entrypoints into the binary as static code, and then creates copies of this static code if necessary using file mapping. It is not very portable and makes the binary bigger. We can make it work if somebody asks for it.

I'd assume that mode is used by the teams that made NativeAOT work on consoles that are AOT only.

@jkotas
Copy link
Member Author

jkotas commented Sep 10, 2022

I'd assume that mode is used by the teams that made NativeAOT work on consoles that are AOT only.

Yep. Or they just avoid marshalled delegates to avoid the problem altogether.

@jkotas jkotas merged commit 1f94e11 into dotnet:main Sep 10, 2022
@jkotas
Copy link
Member Author

jkotas commented Sep 10, 2022

/backport to release/7.0

@jkotas jkotas deleted the flushinstructioncache branch September 10, 2022 23:14
@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3029769600

@bartonjs
Copy link
Member

This path is only hit by marshalled delegates. We tried to rid of marshalled delegates from everywhere in product code and replace them with function pointers.

Does that mean there's some test coverage missing somewhere for the scenario? If it took us two weeks to figure out what was wrong I'd hate to think what it would mean for the average customer.

@jkotas
Copy link
Member Author

jkotas commented Sep 12, 2022

Does that mean there's some test coverage missing somewhere for the scenario?

We do have a test coverage. The crash is caused by a race condition with processor instruction cache. It is impossible to write tests that would reliably catch these types of race conditions.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.Net.* and System.Security.* crash on NativeAOT arm64
6 participants