Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in GlobalizationNative_GetSortHandle related to System.Globalization.GlobalizationMode #49073

Closed
radekdoulik opened this issue Mar 3, 2021 · 38 comments · Fixed by #53453
Assignees
Labels
area-System.Globalization linkable-framework Issues associated with delivering a linker friendly framework os-android
Milestone

Comments

@radekdoulik
Copy link
Member

Context: dotnet/android#5669

In above PR we are experiencing crash in GlobalizationNative_GetSortHandle function. I found out that it happens only in trimmed apps. After further investigation, it can be worked around by preserving System.Globalization.GlobalizationMode type.

It might be related to these recent changes: #47999, dotnet/android@9ac280c

The workaround: dotnet/android@438afbb

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Mar 3, 2021
@ghost
Copy link

ghost commented Mar 3, 2021

Tagging subscribers to this area: @tarekgh, @safern
See info in area-owners.md if you want to be subscribed.

Issue Details

Context: dotnet/android#5669

In above PR we are experiencing crash in GlobalizationNative_GetSortHandle function. I found out that it happens only in trimmed apps. After further investigation, it can be worked around by preserving System.Globalization.GlobalizationMode type.

It might be related to these recent changes: #47999, dotnet/android@9ac280c

The workaround: dotnet/android@438afbb

Author: radekdoulik
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@safern safern added os-android and removed untriaged New issue has not been triaged by the area owner labels Mar 3, 2021
@safern safern added this to the 6.0.0 milestone Mar 3, 2021
@safern
Copy link
Member

safern commented Mar 3, 2021

cc: @eerhardt @marek-safar

@safern safern added the linkable-framework Issues associated with delivering a linker friendly framework label Mar 3, 2021
@ghost
Copy link

ghost commented Mar 3, 2021

Tagging subscribers to 'linkable-framework': @eerhardt, @vitek-karas, @LakshanF, @tannergooding, @sbomer
See info in area-owners.md if you want to be subscribed.

Issue Details

Context: dotnet/android#5669

In above PR we are experiencing crash in GlobalizationNative_GetSortHandle function. I found out that it happens only in trimmed apps. After further investigation, it can be worked around by preserving System.Globalization.GlobalizationMode type.

It might be related to these recent changes: #47999, dotnet/android@9ac280c

The workaround: dotnet/android@438afbb

Author: radekdoulik
Assignees: -
Labels:

area-System.Globalization, linkable-framework, os-android

Milestone: 6.0.0

@eerhardt
Copy link
Member

eerhardt commented Mar 3, 2021

Just guessing by looking at the code, but is anything explicitly loading ICU on the xamarin android app?

What could be happening is the following line getting trimmed away:

And nothing else is loading ICU. So the ucol_open is NULL:

(*ppSortHandle)->collatorsPerOption[0] = ucol_open(lpLocaleName, &err);

@tarekgh
Copy link
Member

tarekgh commented Mar 3, 2021

@eerhardt is it possible to trim the line you pointed at even Invariant mode is off?

@eerhardt
Copy link
Member

eerhardt commented Mar 3, 2021

My assumption is that LoadICU() line is getting trimmed because I changed Xamarin apps to explicitly set <InvariantGlobalization>false</InvariantGlobalization>. That feature switch rewrites the method that calls into this line to just return false; all the time. Thus, this line isn't getting called anymore.

@eerhardt
Copy link
Member

eerhardt commented Mar 3, 2021

I can repro this outside of Xamarin, just on a plain linux machine using the latest SDK.

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net6.0</TargetFramework>
    <InvariantGlobalization>false</InvariantGlobalization>
  </PropertyGroup>

</Project>
using System;
using System.Configuration;

namespace ConsoleApp7
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(DateTime.Now);
        }
    }
}
$ dotnet publish -c Release -r linux-x64 -p:PublishTrimmed=true -p:TrimMode=link
$ ./bin/Release/net6.0/linux-x64/publish/ConsoleApp7
Segmentation fault (core dumped)

radekdoulik added a commit to dotnet/android that referenced this issue Mar 4, 2021
Bump net6 preview2 version, be in sync with iOS https://github.com/xamarin/xamarin-macios/blob/871e7b1cd0ca0e8434e94c8eedb168f33d5da2e8/Make.config#L500

Bump runtime pack version.

Added `System.Private.CoreLib.xml` as workaround for the crash in `GlobalizationNative_GetSortHandle`.
Context: dotnet/runtime#49073

Stop using `mono_register_config_for_assembly` and `mono_config_parse_memory` functions,
they are gone from net6 runtime.

`BuildReleaseArm64` test, net6 apk size difference before/after

Simple XA:
```
Summary:
  +           0 Other entries 0.00% (of 58,410)
  +           0 Dalvik executables 0.00% (of 316,728)
  -       7,699 Assemblies -1.10% (of 696,896)
  -      41,704 Shared libraries -0.80% (of 5,196,608)
  -      14,848 Uncompressed assemblies -1.09% (of 1,361,408)
  -      36,864 Package size difference -1.25% (of 2,946,926)
```
XF/XA:
```
Summary:
  +           0 Other entries 0.00% (of 917,033)
  +           0 Dalvik executables 0.00% (of 3,455,720)
  -      15,978 Assemblies -0.28% (of 5,682,747)
  -      41,704 Shared libraries -0.79% (of 5,271,800)
  -      31,232 Uncompressed assemblies -0.25% (of 12,712,960)
  -      45,056 Package size difference -0.45% (of 9,952,454)
```

Co-authored-by: Jonathan Peppers <jonathan.peppers@microsoft.com>
Co-authored-by: Marek Habersack <grendel@twistedcode.net>
@tannergooding
Copy link
Member

@eerhardt assumption looks to be correct. Modifying the program to be:

using System;
using System.Runtime.InteropServices;

namespace ConsoleApp7
{
    class Program
    {
        static void Main(string[] args)
        {
            LoadICU();
            Console.WriteLine(DateTime.Now);
        }

        [DllImport("libSystem.Globalization.Native", EntryPoint = "GlobalizationNative_LoadICU")]
        internal static extern int LoadICU();
    }
}

Allows everything to work as expected:

tagoo@GOODING-RYZEN:/mnt/d/Users/tagoo/Source/local/tmp$ /mnt/c/Users/tagoo/Downloads/dotnet-sdk-6.0.100-preview.3.21158.3-linux-x64/dotnet publish -c Release -r linux-x64 -p:PublishTrimmed=true -p:TrimMode=link
Microsoft (R) Build Engine version 16.10.0-preview-21126-01+6819f7ab0 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  All projects are up-to-date for restore.
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  tmp -> /mnt/d/Users/tagoo/Source/local/tmp/bin/Release/net6.0/linux-x64/tmp.dll
  Optimizing assemblies for size, which may change the behavior of the app. Be sure to test after publishing. See: https://aka.ms/dotnet-illink
  tmp -> /mnt/d/Users/tagoo/Source/local/tmp/bin/Release/net6.0/linux-x64/publish/
tagoo@GOODING-RYZEN:/mnt/d/Users/tagoo/Source/local/tmp$ ./bin/Release/net6.0/linux-x64/publish/tmp
03/08/2021 11:15:51

SortHandleCache.GetCachedSortHandle and/or the native code likely needs to be updated to account for invariant mode.

@tannergooding
Copy link
Member

Actually, I believe this is a linker bug in that it is trimming the static initialization logic for Invariant where it should not be given that it has side effects: dotnet/linker#1876

@MichalStrehovsky
Copy link
Member

The illink XML doesn't look right:

    <type fullname="System.Globalization.GlobalizationMode">
      < ... >
      <method signature="System.Boolean get_Invariant()" body="stub" value="false" feature="System.Globalization.Invariant" featurevalue="false" />
    </type>

I don't think we can make a promise that Invariant will always be false.

On Browser, we will fall back to Invariant if ICU cannot be loaded. Taking the ICU codepaths in that case won't lead to happiness:

int loaded = Interop.Globalization.LoadICU();
if (loaded == 0 && !OperatingSystem.IsBrowser())
{
// This can't go into resources, because a resource lookup requires globalization, which requires ICU
string message = "Couldn't find a valid ICU package installed on the system. " +
"Please install libicu using your package manager and try again. " +
"Alternatively you can set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support. " +
"Please see https://aka.ms/dotnet-missing-libicu for more information.";
Environment.FailFast(message);
}
// fallback to Invariant mode if LoadICU failed (Browser).
return loaded == 0;

This is not a publish-time constant, at least not in browser.

@marek-safar
Copy link
Contributor

On Browser, we will fall back to Invariant if ICU cannot be loaded.

This doesn't look like a good user experience to me and we should fix that. The message needs to be updated anyway as it does not make sense of other platforms too (e.g tvOS).

/cc @lewing

@eerhardt
Copy link
Member

eerhardt commented Mar 9, 2021

The illink XML doesn't look right:
I don't think we can make a promise that Invariant will always be false.

I don't follow. This is a feature switch that the developer (or SDK) set and explicitly said Invariant=false. When they set Invariant=false, we substitute GlobalizationMode.get_Invariant() to just return false.

@MichalStrehovsky
Copy link
Member

I don't follow. This is a feature switch that the developer (or SDK) set and explicitly said Invariant=false

Right, but the code as it is there can fallback to this returning true if ICU cannot load on Browser targets (see the quoted code above). If we want to keep that behavior, Invariant==false cannot be a publish-time constant because the generated code needs to be able to handle the "we wanted to use ICU but ICU didn't load so we use Invariant" case.

ICU doesn't load at all right now, so that needs to be fixed first of course.

@eerhardt
Copy link
Member

eerhardt commented Mar 9, 2021

cc @EgorBo who added the fallback case for Browser in #40765

@tannergooding
Copy link
Member

This would be trivial to handle and would integrate with existing code if we had a way to substitute app config switches rather than method bodies.

Most of the BCL, and other apps in several cases, rely on logic similar or in addition to the following initializing a static readonly field (including GlobalizationMode.get_Invariant):

if (!AppContext.TryGetSwitch(switchName, out bool ret))
{
    string? switchValue = Environment.GetEnvironmentVariable(envVariable);
    if (switchValue != null)
    {
        ret = bool.IsTrueStringIgnoreCase(switchValue) || switchValue.Equals("1");
    }
}

return ret;

This works and works nicely because the JIT optimizes static readonly bool to be a constant and cause downstream dead code elimination.

Many of the method bodies we are substituting have corresponding app config switches and potentially logic (like the above) that executes in addition to the replaced method body.

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Mar 9, 2021

This would be trivial to handle and would integrate with existing code if we had a way to substitute app config switches rather than method bodies.

Yes, it would be the ideal user experience and avoid a lot of issues. It was discussed e.g. here: dotnet/linker#1112 (comment). It's problematic because illink needs to basically interpret the entire class constructors involved (the key thing is that it needs to understand the entire cctor - not just the small part that assigns the value - because the assignment could be behind complex control flow).

@tannergooding
Copy link
Member

because the assignment could be behind complex control flow

Could this be handled in two parts?

That is, allow the user to specify that an app config switch is x. For simple cases (like how most of the BCL uses the value) AppContext.TryGetSwitch(switchName, out bool ret) is should be able to flow the value.
For more complex cases, the substituter could be told that GlobalizationMode.get_Invariant is returning switchName. The backing field and associated initializer would still be preserved, but it tells the linker the association where it may not be able to determine it itself.
The logic should still ultimately be correct because the relevant AppContext.TryGetSwitch() call returned the correct value and therefore the field should have been assigned accordingly.

@marek-safar
Copy link
Contributor

What is the pattern the JIT can optimize? Are you saying this https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/LocalAppContextSwitches.cs is not something JIT can see through?

@tannergooding
Copy link
Member

What is the pattern the JIT can optimize?

The JIT optimizes static readonly fields for primitive types and has limited support for certain other types.

Are you saying this https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/LocalAppContextSwitches.cs is not something JIT can see through?

Right, that will not be optimized AFAIK. The JIT has to assume non readonly static fields could be mutated and so will not optimize them.

If it was instead the following (like GlobalizationMode.Invariant does), then it would be a static readonly field and could be:

public static bool SerializationGuard { get; } = GetSwitchValue("Switch.System.Runtime.Serialization.SerializationGuard");

@tannergooding
Copy link
Member

What are the other options?

I don't think there are many if any at all. I believe you either end up with something like the linker is doing today which is a "new concept" and allows the root block to be optimized as "appropriate" but which may require the .NET code to be refactored to account for it or you teach the linker to work with the existing .NET optimization concepts.

For this particular case, it could be transitioned to an explicit static constructor that ensures LoadIcu happens, but that may have undesirable impact to the JIT scenario.
The relevant area owners would need to chime in on what they think is appropriate.

@lewing
Copy link
Member

lewing commented Mar 9, 2021

cc @EgorBo who added the fallback case for Browser in #40765

If memory serves, the browser fallback here was primarily added to keep things limping along while we were integrating the icu data loading changes through the stack. If the fallback is causing issues we can probably turn it back into an error now (and update the error message).

@tannergooding
Copy link
Member

@safern, @tarekgh; do you see any problems with refactoring this to have an explicit static constructor that calls LoadIcu?

As best as I can tell, GlobalizationMode is internal and only has two non-private things exposed (the UseNls and Invariant properties). Both of these cause the static constructor to run on Windows but only Invariant does on Unix.
So I believe switching to a static constructor here is "safe".

I think it would be refactored so that GetGlobalizationInvariantMode() only reads the app context switch/environment variable
The static ctor itself would then call LoadIcu and would fail fast if it couldn't be loaded (just like it does today). How it behaves for Browser when it can't be loaded seems like its being discussed by @eerhardt and @lewing above.

@eerhardt
Copy link
Member

eerhardt commented Mar 9, 2021

If memory serves, the browser fallback here was primarily added to keep things limping along while we were integrating the icu data loading changes through the stack. If the fallback is causing issues we can probably turn it back into an error now (and update the error message).

I've opened #49391 to track this.

@tarekgh
Copy link
Member

tarekgh commented Mar 9, 2021

@tannergooding I don't see a problem if we guarantee the order of execution of other static globalization properties. We need to ensure LoadICU is called as early as possible.
CC @jkotas to advise if he sees any issue moving LoadICU to a static constructor.

One side point, there is more properties will be added to the GlobalizationMode #47301.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2021

If you move LoadIcu to explicit static constructor on Unix, internal static bool UseNls => false; will need to trigger that static constructor (it does not need to trigger it today). It would have bad side-effects as well. It won't be possible to optimize UseNls into nothing as it gets optimized today.

@eerhardt
Copy link
Member

eerhardt commented Mar 9, 2021

Just throwing this out a possible solution:

We could change my initial change (#47999) to be browser-wasm specific. It saves roughly 4 KB (.br compressed) in the default blazorwasm template, which is worthwhile for that environment. But for other targets (even Xamarin), it doesn't seem like a significant gain.

@tannergooding
Copy link
Member

We could change my initial change (#47999) to be browser-wasm specific.

Is it possible for LoadIcu to fail on wasm? The current code seems to indicate that it might be and so returning false is potentially problematic.

Likewise, I think it would be good to determine how to handle these issues long term. This is one case where the method body replacement is problematic and more may come up as the targeted trimmings get more complex.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2021

I think you can make the explicit static constructor work well if you split GlobalizationMode into two types: One type that has just the invariant mode flag and that will be responsible for initializing the ICU as appropriate, and the second type has the rest of the globalization properties.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2021

This is one case where the method body replacement is problematic and more may come up as the targeted trimmings get more complex.

Right, you need to be creative to fit the pattern that method body replacement is able to handle. We have to do refactoring pretty much every time we try to make something trimmable.

@sbomer
Copy link
Member

sbomer commented Mar 9, 2021

Another alternative, if we want to keep the fallback behavior where Invariant=false isn't really constant, might be to substitute GetInvariantSwitchValue to return false instead. The downside is that it goes against the established pattern for feature switches - I'd prefer removing the fallback for Invariant=false if it's reasonable.

@eerhardt
Copy link
Member

eerhardt commented Mar 9, 2021

might be to substitute GetInvariantSwitchValue to return false instead

The issue there is all the code in CoreLib is switching off of GlobalizationMode.Invariant. So unless we substitute that property getter, we won't trim any of the unused code paths that do:

if (GlobalizationMode.Invariant)
{
return InvariantGetSortKey(source, destination, options);
}
else
{
return GetSortKeyCore(source, destination, options);
}

@sbomer
Copy link
Member

sbomer commented Mar 10, 2021

Right - I'm only mentioning it as another option in case we want to keep the browser fallback that tries to set Invariant=true if ICU fails to load. In this case I assume we'd need to keep both branches when InvariantGlobalization=false at publish time, one in case the load succeeds, one in case it fails. (The substitution for InvariantGlobalization=true would stay as-is and still lets the false branch be trimmed.)

@eerhardt
Copy link
Member

In this case I assume we'd need to keep both branches when InvariantGlobalization=false at publish time, one in case the load succeeds, one in case it fails.

That would completely undo any size gains made by #47999

@safern
Copy link
Member

safern commented Mar 10, 2021

Another scenario to consider here is App Local ICU. App Local ICU does more than just calling LoadICU and that is also handled when initializing the Invariant property value. So we should support the scenario of a trimmed app using App Local ICU as well.

That currently happens here:

if (TryGetAppLocalIcuSwitchValue(out string? icuSuffixAndVersion))
{
LoadAppLocalIcu(icuSuffixAndVersion);
}

@jonpryor
Copy link
Member

The Xamarin.Android team re-encountered this issue, or a variation on it, in: dotnet/android#5954

Our "workaround" was to effectively revert @eerhardt 's change in dotnet/android@9ac280c, and instead leave $(InvariantGlobalization) as "unset" (empty string) unless it has the value true:

<InvariantGlobalization Condition="'$(InvariantGlobalization)' != 'true'"></InvariantGlobalization>

The crash we saw @eerhardt 's previous comment:

And nothing else is loading ICU. So the ucol_open is NULL

Is there anything we need to do to properly fix and resolve this issue?

@steveisok
Copy link
Member

For what it's worth, in a P5 runtime branch, a trimmed GlobalizationMode.Invariant with InvariantGlobalization=false looks like:

internal static bool Invariant {
        [CompilerGenerated]
        get {
            return false;
        }
    } = GetGlobalizationInvariantMode ();

Whereas the trimmed version in XA looks like:

internal static bool Invariant => false;

Are we (or am I) pointing at different versions of the linker?

@eerhardt eerhardt assigned eerhardt and unassigned tannergooding May 28, 2021
eerhardt added a commit to eerhardt/runtime that referenced this issue May 28, 2021
Move the LoadICU logic into a static ctor, so it runs early in the process, but not as part of querying the GlobalizationMode.Invariant property. This allows for LoadICU to still be invoked, even when the app is trimmed and GlobalizationMode.Invariant is hard-coded by the linker.

While I was changing this code, I also removed the workaround in Browser builds for swallowing errors being returned from LoadICU.

Fix dotnet#49073
Fix dotnet#49391
@eerhardt
Copy link
Member

FYI - I am taking a look at this. I have the beginnings of a fix here: https://github.com/eerhardt/runtime/tree/FixInvariantModeTrimming. Writing tests now.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 28, 2021
eerhardt added a commit that referenced this issue Jun 8, 2021
* Fix InvariantGlobalization=false in a trimmed app.

Move the LoadICU logic into a static ctor, so it runs early in the process, but not as part of querying the GlobalizationMode.Invariant property. This allows for LoadICU to still be invoked, even when the app is trimmed and GlobalizationMode.Invariant is hard-coded by the linker.

While I was changing this code, I also removed the workaround in Browser builds for swallowing errors being returned from LoadICU.

Fix #49073
Fix #49391

* Add trimming tests for InvariantGlobalization

* Update the LoadICU message for mobile workloads.

* Respond to PR feedback.

- Split the substitutions of GlobalizationMode.Invariant between true and false
- Add a trimming test that ensures Invariant=true trims everything it is supposed to

* Add checks for all mobile platforms
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 8, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 8, 2021
jonathanpeppers added a commit to jonathanpeppers/xamarin-android that referenced this issue Aug 27, 2021
The linked runtime issue is fixed:

dotnet/runtime#49073

We should be able to remove this now.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization linkable-framework Issues associated with delivering a linker friendly framework os-android
Projects
None yet
Development

Successfully merging a pull request may close this issue.