Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524

pkanavos · 2015-09-29T09:24:30Z

As shown in this SO question, applying the Encapsulate field on a field starting with i, eg inside, results in a property named İnside instead of Inside if the user's locale is set to Turkish.

This is caused because AbstractEncapsulateFieldService.GeneratePropertyName makes a call to char.ToUpper() without specifying the invariant culture. As a result, the case conversion is culture-specific, using the current user's locale. The case of the Turkish i is even described in the MSDN Article Writing Culture-Safe Managed Code.

The fix should be to change line 362 to pass CultureInfo.InvariantCulture to ToUpper(), eg:

return char.ToUpper(baseName[0],CultureInfo.InvariantCulture).ToString() + baseName.Substring(1);

instead of :

return char.ToUpper(baseName[0]).ToString() + baseName.Substring(1);

The text was updated successfully, but these errors were encountered:

Pilchie · 2015-09-29T16:55:20Z

This is actually By Design. As identifiers are user code, they should obey they user's locale. The bug here is that VS2013 did NOT follow the culture.

svick · 2015-09-29T20:02:36Z

@Pilchie Many (vast majority?) developers use English identifiers, even if their native language is not English and they're not working in an English-speaking country (so it is likely that their OS locale is not English).

Are you saying that any developer who uses English identifiers should change their OS locale to en-US?

pkanavos · 2015-09-29T20:19:05Z

@Pilchie I think you misunderstood the issue, it's about language not culture. The question in SO was caused because the refactoring incorrectly converted an english field name while his account was using Turkish locale settings. This was a complete and unexpected surprise to the developer. The developer typed inside and obviously expected a property called Inside, not İnside. At the very least, this is unsafe behavior according to MSDN's own guidelines.

I seriously doubt whether any developer wants a refactoring to arbitrarily modify characters from one codepage (Latin) to another, completely unsuitable codepage. Or from correctly spelled English to a mixed-language work.

In any case this has nothing to do with culture - just because the user's culture is set to one locale, doesn't mean that any text typed is in that language. Word's spell checking for example, doesn't assume that all text is in a specific language simply because the user's locale is set to it. It tries to detect the correct language before marking spelling errors.

If you want to provide language-specific refactorings but can't detect the correct language to use, add a setting that will allow the user to select the correct language. Or add a setting to disable the locale-specific behavior. Otherwise, the refactoring should do the expected thing and not change the text's codepage.

I should also note that in the past decade at least, Microsoft tried to localize various development settings, tools, languages, tutorials etc and found that non-US developers simply weren't interested.

thepeekay · 2015-10-01T14:54:56Z

@Pilchie I kinda like to chip in here as well. This does look like a bug and not by design.

This either has to be a setting or we should auto-detect the language and don't convert English strings during refactorings.

Lastly, I believe the fix should be Ordinal not Invariant.

davkean · 2015-10-02T15:49:35Z

@pkanavos @pkefal The behavior you are see is currently acting "by design" when it uses the current culture to convert user typed strings and when searching (via Ctrl+, or inside the IntelliSense drop down). This is how applications work across Windows; they use the currently set Region settings (represented in .NET by "CurrentCulture") and use it to format, parse, and convert anything that displayed to, or entered by, the user.

In saying that, however, I can see how this behavior might be unexpected for those that have their locale settings set to a particular region for the rest of Windows, but code in English or use English identifiers. There's not a really a good/easy way for us to detect the language of given identifier, but I'm in agreement that we should probably reconsider this, but in a way that does not regress existing behavior. For example, if we blindly changed to it use InvariantCulture (which is effectively en-US) we would incorrectly convert the Turkish word; işte to Işte instead of the real word İşte.

As a follow-up to a couple of comments above:

At the very least, this is unsafe behavior according to MSDN's own guidelines.

I think you might be misinterpreting what we call out on MSDN, it is only unsafe to use the current culture for comparisons when using it to make security decisions. Our way of using CurrentCulture is the intended usage of it.

Lastly, I believe the fix should be Ordinal not Invariant.

There's no concept of an ordinal ToUpper/ToLower conversion, ordinal means "byte-to-byte" and is only relevant when comparing the bytes of particular sequence. OrdinalIgnoreCase really means "do a byte-to-byte comparison after converting it to upper using Invariant".

thepeekay · 2015-10-02T15:56:39Z

@davkean After your explanation, I understand why you mention the OrdinalIgnoreCase is not the way. This not really a comparison in the end. Glad you did recognize that this behavior might be unexpected and reconsider this.

davkean · 2015-10-02T16:24:05Z

Tagging @rchande, this is very similar to the Microsoft Connect bug you are currently looking at.

rchande · 2015-10-02T17:18:22Z

I wonder if we should try @jasonmalinowski's suggestion of doing invariant culture conversion if the identifier contains only Latin characters. That seems like it might be safe here, but I'm not sure if we could do the same for completion.

davkean · 2015-10-02T17:50:53Z

@rchande That's making the assumption that all words in Turkish (or other language) are made up of non-Latin characters, which is not true. For example, this test would fail for the Turkish word for good ("iyi").

yunyilmaz · 2015-10-05T19:57:54Z

In addition to this issue there is also another problem. When I make a property Inside whose backing field is inside Intellisense doesn't work as expected
Lets say i key is pressed from Turkish(Q) keyboard(There is also an ı key in the keyboard) Intellisense doesn't choose Inside property. It selects it only when ı key is pressed. This is rather disturbing as well as this issue because as in the previous issue it is the usual behavior pre 2015 releases

davkean · 2015-10-05T20:17:27Z

@yunyilmaz Thanks. That's the bug/behavior that's been referred to in the Connect bug that I hinted to @rchande about.

As it appears you use Turkish settings, can I ask you a question, what if the property name was "İstanbul" (spelt with a dotted i), would you expect the dotless "ı" to select it?

yunyilmaz · 2015-10-05T20:21:06Z

@davkean No I would still expect dotted i to select it. Even we use Turkish locale we still expect the environment to behave as our locale is English

davkean · 2015-10-05T21:08:21Z

To try and understand expectations, I've drawn a table with my understandings from this thread.

Under Turkish locale settings, typing the letter on the vertical column, should select the word on the horizontal column in IntelliSense (currently assuming no other word in IntelliSense starting with dotless/dotted I):

↓Letter/Word→	index (en)	Index (en)	ırak (tr)	Irak (tr)	işte (tr)	İşte (tr)
i	✓	✓			✓	✓
İ	✓				✓	✓
ı			✓	✓
I	✓	✓	✓	✓

Above is not a design (I'm not even sure we're able to do above), just trying to understand expectations from which we can then come up with a design.

Turkish in the thread, does this match your expectation? Or is your expectation something different?

SedarG · 2015-10-05T21:47:08Z

@davkean, the table reflects what I'd expect.

davkean · 2015-10-05T22:40:11Z

@SedarG What about 'İ' matching to 'index' - I wrote that in the above table, but I'm now second guessing myself. Should that match?

yunyilmaz · 2015-10-06T08:46:21Z

@davkean Yes I think it should match. Because Upper version of i is İ . But to provide this kind of functionality is not a trival task. Lets say you press little i key you checked Index(en) but not Irak(tr) in the table. I don't think it is possible for IDE to know Irak is a turkish word.

Pilchie · 2015-10-30T15:57:33Z

Fixed in stabilization by #6404

gafter added the Area-IDE label Sep 29, 2015

Pilchie added the Resolution-By Design The behavior reported in the issue matches the current design label Sep 29, 2015

Pilchie closed this as completed Sep 29, 2015

davkean reopened this Oct 2, 2015

davkean removed the Resolution-By Design The behavior reported in the issue matches the current design label Oct 2, 2015

jinujoseph added the Tenet-Localization Some piece of UI isn’t localized, often due to hard-coding of strings or other visible elements. label Oct 2, 2015

Pilchie added this to the 1.1 milestone Oct 5, 2015

Pilchie assigned rchande Oct 5, 2015

Pilchie added the Bug label Oct 23, 2015

Pilchie assigned brettfo and unassigned rchande Oct 23, 2015

brettfo mentioned this issue Oct 28, 2015

always use the 'en-US' culture for .ToUpper() when creating the new property name #6404

Merged

Pilchie closed this as completed Oct 30, 2015

Pilchie added the Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented label Oct 30, 2015

svick mentioned this issue Feb 5, 2016

Proposal: Positional deconstruction based on existing constructors and properties #8415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524

Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524

pkanavos commented Sep 29, 2015

Pilchie commented Sep 29, 2015

svick commented Sep 29, 2015

pkanavos commented Sep 29, 2015

thepeekay commented Oct 1, 2015

davkean commented Oct 2, 2015

thepeekay commented Oct 2, 2015

davkean commented Oct 2, 2015

rchande commented Oct 2, 2015

davkean commented Oct 2, 2015

yunyilmaz commented Oct 5, 2015

davkean commented Oct 5, 2015

yunyilmaz commented Oct 5, 2015

davkean commented Oct 5, 2015

SedarG commented Oct 5, 2015

davkean commented Oct 5, 2015

yunyilmaz commented Oct 6, 2015

Pilchie commented Oct 30, 2015

Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524

Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524

Comments

pkanavos commented Sep 29, 2015

Pilchie commented Sep 29, 2015

svick commented Sep 29, 2015

pkanavos commented Sep 29, 2015

thepeekay commented Oct 1, 2015

davkean commented Oct 2, 2015

thepeekay commented Oct 2, 2015

davkean commented Oct 2, 2015

rchande commented Oct 2, 2015

davkean commented Oct 2, 2015

yunyilmaz commented Oct 5, 2015

davkean commented Oct 5, 2015

yunyilmaz commented Oct 5, 2015

davkean commented Oct 5, 2015

SedarG commented Oct 5, 2015

davkean commented Oct 5, 2015

yunyilmaz commented Oct 6, 2015

Pilchie commented Oct 30, 2015