-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssemblyLoadContext crash on linux during exception processing #81108
Comments
@elinor-fung are you aware of this bug? It's getting to a point where we're having to disable most assembly loading tests on Linux because it's blocking our PRs. That's a significant gap I'd like to avoid if possible but no one seems to be looing at this. |
This also has the added benefit that the entire .NET Core set of tests run in less than one second. Previously it close to half a second per test. This will hopefully alleviate the pain for the runtime issue we're hittnig running `AssemblyLoadContext` tests in CI. This reduces the amount of compilations on this code path which seem to be the trigger for the failure. - dotnet#66621 - dotnet/runtime#81108
Thanks @jaredpar - I had not seen this. Taking a look. |
For the core dump that has the stack in the description: precode.cpp@676 is when we trying to assign to runtime/src/coreclr/vm/precode.cpp Lines 670 to 676 in 82f7100
where the data should be one page after this = 0x00007f9b3ef04ff0 . That maps to
If I'm looking at this correctly, it seems like part of the memory for that runtime/src/coreclr/vm/precode.h Lines 199 to 204 in 4f01044
I'm not actually sure if
@janvorli any ideas / does this look familiar at all? |
This also has the added benefit that the entire .NET Core set of tests run in less than one second. Previously it close to half a second per test. This will hopefully alleviate the pain for the runtime issue we're hittnig running `AssemblyLoadContext` tests in CI. This reduces the amount of compilations on this code path which seem to be the trigger for the failure. - dotnet#66621 - dotnet/runtime#81108
I will try the repro @jaredpar shared and patch the runtime with the fix ported to see if it is the same thing. |
@jaredpar which branch is your commit in? Just cloning your repo and running |
Looking at the dump in detail, it actually looks like a slightly different issue than the one fixed by #81192, although at the same place. It looks like the memory region allocated from the new stub heap crossed the page boundary, which should never happen as the heap uses interleaved code / data pages. |
This also has the added benefit that the entire .NET Core set of tests run in less than one second. Previously it close to half a second per test. This will hopefully alleviate the pain for the runtime issue we're hittnig running `AssemblyLoadContext` tests in CI. This reduces the amount of compilations on this code path which seem to be the trigger for the failure. - dotnet#66621 - dotnet/runtime#81108
@janvorli did you clone roslyn or my fork? My fork is where the commit is https://github.com/jaredpar/roslyn/commits/e107da471048f014c8ddf3517c9c79a24a1a5adb |
I've cloned your fork (using the url you've shared before)
By default it cloned branch |
When I open the commit on github, it says: |
I was able to repro it locally after @jaredpar helped me to get the right state of the repo. The primary cause of the issue is OOM, but a special kind of that. Linux has a maximum number of memory mappings that is by default set to about 64k mapings. This test ends up creating more of them, so further There is a workaround though. The max number of mappings per process can be enlarged. For example, raising the number to 2048000 can be done as follows: sudo bash -c "sysctl -w 'vm.max_map_count=2048000'" I have verified that with the setting raised to 128000, the failing test consistently succeeds. 96000 was not sufficient. I didn't try to bisect it further. Setting the limit higher has no impact on memory consumption or performance. |
This change should also alleviate the underlying OOM situation as it dramatically reduces the amount of allocations that happen, reduces the assemblies that get loaded into the Still I'm a bit surprised that we were hitting an OOM in the original code. Yes we create a lot of compilations but in the grand scheme of our unit tests it's a drop in the bucket. The item that is significantly different to the rest of our tests is the number of non-collectible |
For every 170 jitted functions, we need 2 mappings for precode stubs. For each assembly, we need one mapping per section. @jaredpar do you have an estimate on how many assemblies get loaded when running the test and how many methods got jitted? From what you've said, it seems that the version that I was testing was using non-collectible AssemblyLoadContexts. So each assembly that got loaded into one and all of its dependencies that were loaded in there too would contribute to that. |
I think there were roughly 20-30 assemblies that got loaded into every If the suspicion is that JIT'd methods then that lines up with what I'm seeing. What makes this group of tests different than our others is the amount of |
This also has the added benefit that the entire .NET Core set of tests run in less than one second. Previously it close to half a second per test. This will hopefully alleviate the pain for the runtime issue we're hittnig running `AssemblyLoadContext` tests in CI. This reduces the amount of compilations on this code path which seem to be the trigger for the failure. - dotnet#66621 - dotnet/runtime#81108
* Continued analyzer assembly loading work This change responds to several concerns raised in #66492: - Collapse the `AnalyzerAssembyLoader` hierarchy by one level and move more logic into the base type. - Fix `AssemblyLoadContext` in tests to not replicate types. That means we can safely pass `ITestOutputHelper` into the custom `AssemblyLoadContext` - Further strengthen the test separation I have deliberately not changed the file names to reflect the new type names because I wanted the review process to be smooth. Once I have two sign offs for the change I will send another commit which does the following: - Rename - DefaultAnalyzerAssemblyLoaderTests.cs -> AnalyzerAssemblyLoaderTests.cs - DefaultAnalyzerAssemblyLoader.Core.cs -> AnalyzerAssemblyLoader.Core.cs - DefaultAnalyzerAssemblyLoader.Desktop.cs -> AnalyzerAssemblyLoader.Desktop.cs - DefaultAnalyzerAssemblyLoader.cs -> AnalyzerAssemblyLoader.cs - Move - `DefaultAnalyzerAssemblyLoader` into DefaultAnalyzerAssemblyLoader.cs - `InvokeUtil` to InvokeUtil.cs. As constructed every edit to this type causes Test Explorer to lose context on the theory branches. Doesn't come back until rebuild. Will be smoother editting in its own file. That should get is into the style we expect here. After that I will merge once tests are green. * Unload AssemblyLoadContext after test completion * Reuse AssemblyLoadTestFixture across test runs This also has the added benefit that the entire .NET Core set of tests run in less than one second. Previously it close to half a second per test. This will hopefully alleviate the pain for the runtime issue we're hittnig running `AssemblyLoadContext` tests in CI. This reduces the amount of compilations on this code path which seem to be the trigger for the failure. - #66621 - dotnet/runtime#81108 * Apply suggestions from code review Co-authored-by: Rikki Gibson <rikkigibson@gmail.com> * PR feedback * rename the files * Fixup --------- Co-authored-by: Rikki Gibson <rikkigibson@gmail.com>
is this issue still actionable or can be closed? |
This is no longer a blocker for the compiler after we moved to collectable |
We can close it then. |
Description
The compiler team is seeing a runtime crash when running our analyzer loading tests on Linux. The same tests pass on Windows and Mac. The basic setup is our tests do the following:
AssemblyLoadContext
for the purpose of isolating the individual test. This loads the compiler binaries and many test assetsAssemblyLoadContext
in order to load our analyzer dependencies.On Linux this is resulting in a runtime crash.
Reproduction Steps
Run the test
Microsoft.CodeAnalysis.UnitTests.DefaultAnalyzerAssemblyLoaderTests.AssemblyLoading_DependencyInDifferentDirectory
on Linux. Couple of notes:https://github.com/jaredpar/roslyn
Theory
test and we're unsure if the crash happens when thebool
istrue
orfalse
Expected behavior
The test should pass. Based on the stack it may be throwing an exception that indicates an error on our part. In the chance of exception though we would expect it to exit with an exception that has a stack trace.
Note: if there is an exception being thrown then it's quite possible that it's an exception type who's assembly is rooted in one of the child
AssemblyLoadContext
. The same assembly is going to be loaded in the defaultAssemblyLoadContext
. Wanted to call that out in case it helps with the investigation.Actual behavior
The runtime crashes. The stack of the crash is as follows:
Regression?
Unknown
Known Workarounds
None
Configuration
Here is the SDK / runtime info
Other information
@AaronRobinsonMSFT helped a lot in getting this info
The text was updated successfully, but these errors were encountered: