Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524

Closed
pkanavos opened this issue Sep 29, 2015 · 17 comments
Assignees
Labels
Area-IDE Bug Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented Tenet-Localization Some piece of UI isn’t localized, often due to hard-coding of strings or other visible elements.
Milestone

Comments

@pkanavos
Copy link

As shown in this SO question, applying the Encapsulate field on a field starting with i, eg inside, results in a property named İnside instead of Inside if the user's locale is set to Turkish.

This is caused because AbstractEncapsulateFieldService.GeneratePropertyName makes a call to char.ToUpper() without specifying the invariant culture. As a result, the case conversion is culture-specific, using the current user's locale. The case of the Turkish i is even described in the MSDN Article Writing Culture-Safe Managed Code.

The fix should be to change line 362 to pass CultureInfo.InvariantCulture to ToUpper(), eg:

return char.ToUpper(baseName[0],CultureInfo.InvariantCulture).ToString() + baseName.Substring(1);

instead of :

return char.ToUpper(baseName[0]).ToString() + baseName.Substring(1);
@Pilchie Pilchie added the Resolution-By Design The behavior reported in the issue matches the current design label Sep 29, 2015
@Pilchie
Copy link
Member

Pilchie commented Sep 29, 2015

This is actually By Design. As identifiers are user code, they should obey they user's locale. The bug here is that VS2013 did NOT follow the culture.

@Pilchie Pilchie closed this as completed Sep 29, 2015
@svick
Copy link
Contributor

svick commented Sep 29, 2015

@Pilchie Many (vast majority?) developers use English identifiers, even if their native language is not English and they're not working in an English-speaking country (so it is likely that their OS locale is not English).

Are you saying that any developer who uses English identifiers should change their OS locale to en-US?

@pkanavos
Copy link
Author

@Pilchie I think you misunderstood the issue, it's about language not culture. The question in SO was caused because the refactoring incorrectly converted an english field name while his account was using Turkish locale settings. This was a complete and unexpected surprise to the developer. The developer typed inside and obviously expected a property called Inside, not İnside. At the very least, this is unsafe behavior according to MSDN's own guidelines.

I seriously doubt whether any developer wants a refactoring to arbitrarily modify characters from one codepage (Latin) to another, completely unsuitable codepage. Or from correctly spelled English to a mixed-language work.

In any case this has nothing to do with culture - just because the user's culture is set to one locale, doesn't mean that any text typed is in that language. Word's spell checking for example, doesn't assume that all text is in a specific language simply because the user's locale is set to it. It tries to detect the correct language before marking spelling errors.

If you want to provide language-specific refactorings but can't detect the correct language to use, add a setting that will allow the user to select the correct language. Or add a setting to disable the locale-specific behavior. Otherwise, the refactoring should do the expected thing and not change the text's codepage.

I should also note that in the past decade at least, Microsoft tried to localize various development settings, tools, languages, tutorials etc and found that non-US developers simply weren't interested.

@thepeekay
Copy link

@Pilchie I kinda like to chip in here as well. This does look like a bug and not by design.

This either has to be a setting or we should auto-detect the language and don't convert English strings during refactorings.

Lastly, I believe the fix should be Ordinal not Invariant.

@davkean
Copy link
Member

davkean commented Oct 2, 2015

@pkanavos @pkefal The behavior you are see is currently acting "by design" when it uses the current culture to convert user typed strings and when searching (via Ctrl+, or inside the IntelliSense drop down). This is how applications work across Windows; they use the currently set Region settings (represented in .NET by "CurrentCulture") and use it to format, parse, and convert anything that displayed to, or entered by, the user.

In saying that, however, I can see how this behavior might be unexpected for those that have their locale settings set to a particular region for the rest of Windows, but code in English or use English identifiers. There's not a really a good/easy way for us to detect the language of given identifier, but I'm in agreement that we should probably reconsider this, but in a way that does not regress existing behavior. For example, if we blindly changed to it use InvariantCulture (which is effectively en-US) we would incorrectly convert the Turkish word; işte to Işte instead of the real word İşte.

As a follow-up to a couple of comments above:

At the very least, this is unsafe behavior according to MSDN's own guidelines.

I think you might be misinterpreting what we call out on MSDN, it is only unsafe to use the current culture for comparisons when using it to make security decisions. Our way of using CurrentCulture is the intended usage of it.

Lastly, I believe the fix should be Ordinal not Invariant.

There's no concept of an ordinal ToUpper/ToLower conversion, ordinal means "byte-to-byte" and is only relevant when comparing the bytes of particular sequence. OrdinalIgnoreCase really means "do a byte-to-byte comparison after converting it to upper using Invariant".

@davkean davkean reopened this Oct 2, 2015
@davkean davkean removed the Resolution-By Design The behavior reported in the issue matches the current design label Oct 2, 2015
@thepeekay
Copy link

@davkean After your explanation, I understand why you mention the OrdinalIgnoreCase is not the way. This not really a comparison in the end. Glad you did recognize that this behavior might be unexpected and reconsider this.

@davkean
Copy link
Member

davkean commented Oct 2, 2015

Tagging @rchande, this is very similar to the Microsoft Connect bug you are currently looking at.

@rchande
Copy link
Contributor

rchande commented Oct 2, 2015

I wonder if we should try @jasonmalinowski's suggestion of doing invariant culture conversion if the identifier contains only Latin characters. That seems like it might be safe here, but I'm not sure if we could do the same for completion.

@davkean
Copy link
Member

davkean commented Oct 2, 2015

@rchande That's making the assumption that all words in Turkish (or other language) are made up of non-Latin characters, which is not true. For example, this test would fail for the Turkish word for good ("iyi").

@jinujoseph jinujoseph added the Tenet-Localization Some piece of UI isn’t localized, often due to hard-coding of strings or other visible elements. label Oct 2, 2015
@Pilchie Pilchie added this to the 1.1 milestone Oct 5, 2015
@yunyilmaz
Copy link

In addition to this issue there is also another problem. When I make a property Inside whose backing field is inside Intellisense doesn't work as expected
Lets say i key is pressed from Turkish(Q) keyboard(There is also an ı key in the keyboard) Intellisense doesn't choose Inside property. It selects it only when ı key is pressed. This is rather disturbing as well as this issue because as in the previous issue it is the usual behavior pre 2015 releases

@davkean
Copy link
Member

davkean commented Oct 5, 2015

@yunyilmaz Thanks. That's the bug/behavior that's been referred to in the Connect bug that I hinted to @rchande about.

As it appears you use Turkish settings, can I ask you a question, what if the property name was "İstanbul" (spelt with a dotted i), would you expect the dotless "ı" to select it?

@yunyilmaz
Copy link

@davkean No I would still expect dotted i to select it. Even we use Turkish locale we still expect the environment to behave as our locale is English

@davkean
Copy link
Member

davkean commented Oct 5, 2015

To try and understand expectations, I've drawn a table with my understandings from this thread.

Under Turkish locale settings, typing the letter on the vertical column, should select the word on the horizontal column in IntelliSense (currently assuming no other word in IntelliSense starting with dotless/dotted I):

↓Letter/Word→ index (en) Index (en) ırak (tr) Irak (tr) işte (tr) İşte (tr)
i
İ
ı
I

Above is not a design (I'm not even sure we're able to do above), just trying to understand expectations from which we can then come up with a design.

Turkish in the thread, does this match your expectation? Or is your expectation something different?

@SedarG
Copy link

SedarG commented Oct 5, 2015

@davkean, the table reflects what I'd expect.

@davkean
Copy link
Member

davkean commented Oct 5, 2015

@SedarG What about 'İ' matching to 'index' - I wrote that in the above table, but I'm now second guessing myself. Should that match?

@yunyilmaz
Copy link

@davkean Yes I think it should match. Because Upper version of i is İ . But to provide this kind of functionality is not a trival task. Lets say you press little i key you checked Index(en) but not Irak(tr) in the table. I don't think it is possible for IDE to know Irak is a turkish word.

@Pilchie
Copy link
Member

Pilchie commented Oct 30, 2015

Fixed in stabilization by #6404

@Pilchie Pilchie closed this as completed Oct 30, 2015
@Pilchie Pilchie added the Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented label Oct 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE Bug Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented Tenet-Localization Some piece of UI isn’t localized, often due to hard-coding of strings or other visible elements.
Projects
None yet
Development

No branches or pull requests