-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524
Comments
This is actually By Design. As identifiers are user code, they should obey they user's locale. The bug here is that VS2013 did NOT follow the culture. |
@Pilchie Many (vast majority?) developers use English identifiers, even if their native language is not English and they're not working in an English-speaking country (so it is likely that their OS locale is not English). Are you saying that any developer who uses English identifiers should change their OS locale to en-US? |
@Pilchie I think you misunderstood the issue, it's about language not culture. The question in SO was caused because the refactoring incorrectly converted an english field name while his account was using Turkish locale settings. This was a complete and unexpected surprise to the developer. The developer typed inside and obviously expected a property called Inside, not İnside. At the very least, this is unsafe behavior according to MSDN's own guidelines. I seriously doubt whether any developer wants a refactoring to arbitrarily modify characters from one codepage (Latin) to another, completely unsuitable codepage. Or from correctly spelled English to a mixed-language work. In any case this has nothing to do with culture - just because the user's culture is set to one locale, doesn't mean that any text typed is in that language. Word's spell checking for example, doesn't assume that all text is in a specific language simply because the user's locale is set to it. It tries to detect the correct language before marking spelling errors. If you want to provide language-specific refactorings but can't detect the correct language to use, add a setting that will allow the user to select the correct language. Or add a setting to disable the locale-specific behavior. Otherwise, the refactoring should do the expected thing and not change the text's codepage. I should also note that in the past decade at least, Microsoft tried to localize various development settings, tools, languages, tutorials etc and found that non-US developers simply weren't interested. |
@Pilchie I kinda like to chip in here as well. This does look like a bug and not by design. This either has to be a setting or we should auto-detect the language and don't convert English strings during refactorings. Lastly, I believe the fix should be Ordinal not Invariant. |
@pkanavos @pkefal The behavior you are see is currently acting "by design" when it uses the current culture to convert user typed strings and when searching (via Ctrl+, or inside the IntelliSense drop down). This is how applications work across Windows; they use the currently set Region settings (represented in .NET by "CurrentCulture") and use it to format, parse, and convert anything that displayed to, or entered by, the user. In saying that, however, I can see how this behavior might be unexpected for those that have their locale settings set to a particular region for the rest of Windows, but code in English or use English identifiers. There's not a really a good/easy way for us to detect the language of given identifier, but I'm in agreement that we should probably reconsider this, but in a way that does not regress existing behavior. For example, if we blindly changed to it use InvariantCulture (which is effectively en-US) we would incorrectly convert the Turkish word; işte to Işte instead of the real word İşte. As a follow-up to a couple of comments above:
I think you might be misinterpreting what we call out on MSDN, it is only unsafe to use the current culture for comparisons when using it to make security decisions. Our way of using CurrentCulture is the intended usage of it.
There's no concept of an ordinal ToUpper/ToLower conversion, ordinal means "byte-to-byte" and is only relevant when comparing the bytes of particular sequence. OrdinalIgnoreCase really means "do a byte-to-byte comparison after converting it to upper using Invariant". |
@davkean After your explanation, I understand why you mention the OrdinalIgnoreCase is not the way. This not really a comparison in the end. Glad you did recognize that this behavior might be unexpected and reconsider this. |
Tagging @rchande, this is very similar to the Microsoft Connect bug you are currently looking at. |
I wonder if we should try @jasonmalinowski's suggestion of doing invariant culture conversion if the identifier contains only Latin characters. That seems like it might be safe here, but I'm not sure if we could do the same for completion. |
@rchande That's making the assumption that all words in Turkish (or other language) are made up of non-Latin characters, which is not true. For example, this test would fail for the Turkish word for good ("iyi"). |
In addition to this issue there is also another problem. When I make a property Inside whose backing field is inside Intellisense doesn't work as expected |
@yunyilmaz Thanks. That's the bug/behavior that's been referred to in the Connect bug that I hinted to @rchande about. As it appears you use Turkish settings, can I ask you a question, what if the property name was "İstanbul" (spelt with a dotted i), would you expect the dotless "ı" to select it? |
@davkean No I would still expect dotted i to select it. Even we use Turkish locale we still expect the environment to behave as our locale is English |
To try and understand expectations, I've drawn a table with my understandings from this thread. Under Turkish locale settings, typing the letter on the vertical column, should select the word on the horizontal column in IntelliSense (currently assuming no other word in IntelliSense starting with dotless/dotted I):
Above is not a design (I'm not even sure we're able to do above), just trying to understand expectations from which we can then come up with a design. Turkish in the thread, does this match your expectation? Or is your expectation something different? |
@davkean, the table reflects what I'd expect. |
@SedarG What about 'İ' matching to 'index' - I wrote that in the above table, but I'm now second guessing myself. Should that match? |
@davkean Yes I think it should match. Because Upper version of i is İ . But to provide this kind of functionality is not a trival task. Lets say you press little i key you checked Index(en) but not Irak(tr) in the table. I don't think it is possible for IDE to know Irak is a turkish word. |
Fixed in stabilization by #6404 |
As shown in this SO question, applying the Encapsulate field on a field starting with
i
, eginside
, results in a property namedİnside
instead ofInside
if the user's locale is set to Turkish.This is caused because AbstractEncapsulateFieldService.GeneratePropertyName makes a call to
char.ToUpper()
without specifying the invariant culture. As a result, the case conversion is culture-specific, using the current user's locale. The case of the Turkishi
is even described in the MSDN Article Writing Culture-Safe Managed Code.The fix should be to change line 362 to pass
CultureInfo.InvariantCulture
toToUpper()
, eg:instead of :
The text was updated successfully, but these errors were encountered: