Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPA map hash calculation in case of "the Turkish-I Problem" if locale was changed. #37910

Closed
viewizard opened this issue Jun 15, 2020 · 5 comments · Fixed by #38016
Closed

TPA map hash calculation in case of "the Turkish-I Problem" if locale was changed. #37910

viewizard opened this issue Jun 15, 2020 · 5 comments · Fixed by #38016
Labels
area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner

Comments

@viewizard
Copy link
Member

We are faced with issue, when CoreCLR calculate TPA hash for map entry with not Turkish and not Azeri locale and call TPA map lookup after locale was changed to Turkish or Azeri:

const SimpleNameToFileNameMapEntry *pTpaEntry = tpaMap->LookupPtr(simpleName.GetUnicode());

The point of issue is "the Turkish-I Problem" (https://docs.microsoft.com/en-us/previous-versions/dotnet/articles/ms973919(v=msdn.10)?redirectedfrom=MSDN#the-motivation-the-turkish-i-problem). After locale changed, towupper() provide another result for i and different hash are calculated in case if file name have iletter.

All works fine (for Linux) if hash function HashiString() replaced by HashString() (case-sensitive, without towupper() call).
Is the any restrictions for HashString() usage here instead of HashiString() in case of Linux with case sensitive filesystems? Or, could we fix this in another way?

CC @alpencolt @jkotas

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-VM-coreclr untriaged New issue has not been triaged by the area owner labels Jun 15, 2020
@ghost
Copy link

ghost commented Jun 15, 2020

Tagging subscribers to this area: @vitek-karas
Notify danmosemsft if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Jun 15, 2020

Comparison of assembly names is case insensitive throughout the stack. Switching this one to case sensitive comparison would introduce a different set of inconsistencies.

Can this be fixed by using SString::HashCaseInsensitive instead of HashiString?

Ideally, the assembly binder would be written in C# where we have the well tested globalization functions available, but that's quite a bit of work.

@HJLeee
Copy link
Contributor

HJLeee commented Jun 15, 2020

I think both SString::HashCaseInsensitive and HashiString are using tolower series.
(In case of SELF_NO_HOST for SString::HashCaseInsensitive)

Basically, if string comparisons(or calculation for the strings) are made between a cached case insensitive one based on the initial locale and a newly converted string based on the current locale, there is a chance of malfuntion regardless of changing this to case sensitive.
If there is a background reason for comparing strings should be done case insensitive way, tolower_l and towlower_l with a cached locale could be an alternative.

@jkotas
Copy link
Member

jkotas commented Jun 16, 2020

The SELF_NO_HOST is not used for the runtime itself. It is used only for tools.

@viewizard
Copy link
Member Author

I tested SString::HashCaseInsensitive in SimpleNameToFileNameMapTraits::Hash with some simple changes like this:

        static count_t Hash(const key_t &str) { 
            SString tmp(str);
            return tmp.HashCaseInsensitive();
        }

looks like all works fine with this changes. I could see, that generated hashes not connected to locale any more (tested with us, tr and az locales), we could avoid "the Turkish-I Problem" in this way.

jkotas pushed a commit that referenced this issue Jun 25, 2020
* Fix TPA map hash calculation.

The point of issue is "the Turkish-I Problem". After locale changed, towupper() provide another result for "i" and different hash are calculated in case if file name have "i" letter.

* Regression test for #37910
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants