-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dart String toUpperCase and toLowerCase methods are incorrect for Turkish. #28
Comments
This comment was originally written by @ahmetaa I think without proper Locale (is there any?) settings 'I' cannot be converted to 'ı'. |
This comment was originally written by jat@google.com String.toLowerCase/toUpperCase will not be locale aware. The plan (which I do not personally agree with) is to provide a separate i18n library that has basically the important functionality of ICU (TBD). Added WontFix label. |
This comment was originally written by @mdakin So in order to make Dart applications that use Strings properly in Turkish (and Azeri etc.), programmers will have to use an external library? This is a huge disappointment. |
This comment was originally written by @ahmetaa If it is well documented, I guess it is o.k. Otherwise it would cause subtle bugs. |
This comment was originally written by @mdakin So in order to make Dart applications that use Strings properly in Turkish (and Azeri etc.), programmers will have to use an external library? This is a huge disappointment. |
This comment was originally written by jat@google.com The i18n library will be provided, but you will have to #import it, and use its methods rather than the ones on String. |
This comment was originally written by jat@google.com I agree personally - I think it would be better to leave off toLowerCase/toUpperCase from String than to provide one that produces unexpected behavior for non-ASCII characters and incorrect behavior in some locales even for ASCII characters, and we should be learning from i18n mistakes in Java and elsewhere rather than repeating them. |
This comment was originally written by mda...@google.com If plan for Dart is to use an external library to support basic string case operations for Turkic languages with latin script, will it look like this: (lets assume an application that gets username and lowercases it before processing it further, a common use case) name = get Name I don't expect anybody to write code like this to convert cases, and as a result most applications written will be automatically broken for several Turkic languages. I actually reported same error for go, http://code.google.com/p/go/issues/detail?id=703 they ended up adding toLowerSpecial and toUpperSpecial methods to Strings package (http://code.google.com/p/go/source/detail?r=477b3015c0 ) However I am not sure this solution is any better. Would you consider reopening the bug, maybe in a different form? What can we do to fix this problem? |
This comment was originally written by jat@google.com Personally, I agree with you, but I think it is going to take more than a few people to change this plan, as the VM guys do not want to have to bundle something like ICU into the VM, as that would preclude implementation on very small platforms. Personally, I think any program that doesn't consider localization is broken at this day and age, and it shouldn't even be possible to write one that doesn't. Regarding your example, the expectation is that anyone who does care about writing internationalized apps will always use the i18n library and not use String.toLowercase/toUpperCase, so it would look something like this: normalized = I18n.toLowerCase(name); There is work to be done on how locales are specified (on the browser it will almost certainly be just one locale at a time by default, the server gets a little messier but by default one locale per request), but the idea is if you care about localization you always use the localizable API. |
Added Area-Library label. |
This comment was originally written by @mdakin jat@, then, as you said before, maybe the safest path is to remove toLowerCase and toUpperCase methods from String class. Would Dart team consider removing "WontFix" status for this bug? |
This comment was originally written by jat@google.com Set owner to @floitschG. |
This comment was originally written by @mdakin After thinking about it, I have one last comment on this, If toLowerCase and toUpperCase are by default locale sensitive, this would introduce a new class of terrible bugs, as Turkish case conversion i <-> İ and ı <-> I converts normal Ascii strings into non-Ascii and this breaks other things like URLs, database column names etc. Java's default locale sensitive case conversion approach caused tons of issues , most applications stopped working on Turkish locale. The best scenario, as you already mentioned, would be providing both locale sensitive and insensitive case converson methods in a separate core library. |
The core Dart language avoids Unicode as much as possible. It has toLower/UpperCase but not much more. As John said: for i18n support one will need to import the i18n library and not use the toLower/UpperCase functions for locale dependent strings. |
This comment was originally written by @ahmetaa Just to add some info, |
Here's a minimal repro that this CL fixes: `ui.dart` ```dart library dart.ui; import 'dart:ffi'; part 'foo.dart'; ``` `foo.dart` ```dart part of dart.ui; @Native<Void Function()>(symbol: 'foo_func', isLeaf: true) external void foo_func(); ``` When compiling with `compile_platform.dart` with `--target=dart2wasm`, the following error appears: ``` Unhandled exception: Verification error: Target=wasm, VerificationStage.afterModularTransformations: Invalid location with target 'wasm' on FunctionNode() (FunctionNode): RangeError (offset): Invalid value: Not in inclusive range 0..56: 91 Context: 'foo_func_$import'. Node: 'FunctionNode()'. #0 VerificationErrorListener.reportError (package:kernel/verifier.dart:81:5) #1 VerifyingVisitor.problem (package:kernel/verifier.dart:222:14) #2 VerifyingVisitor._getLocation (package:kernel/verifier.dart:1361:7) #3 VerifyingVisitor._hasLocation (package:kernel/verifier.dart:1370:26) #4 VerifyingVisitor.getSameLibraryLastSeenTreeNode (package:kernel/verifier.dart:1342:28) #5 VerifyingVisitor.localContext (package:kernel/verifier.dart:1382:24) #6 VerifyingVisitor.defaultDartType (package:kernel/verifier.dart:1491:41) #7 Visitor.visitVoidType (package:kernel/visitor.dart:1309:37) #8 VoidType.accept (package:kernel/ast.dart:11190:42) #9 FunctionNode.visitChildren (package:kernel/ast.dart:3919:16) #10 VerifyingVisitor.visitChildren (package:kernel/verifier.dart:259:10) #11 VerifyingVisitor.visitWithLocalScope (package:kernel/verifier.dart:266:5) #12 VerifyingVisitor.visitFunctionNode (package:kernel/verifier.dart:721:5) #13 FunctionNode.accept (package:kernel/ast.dart:3908:38) #14 VerifyingVisitor.visitProcedure (package:kernel/verifier.dart:620:19) #15 Procedure.accept (package:kernel/ast.dart:3311:40) #16 visitList (package:kernel/ast.dart:14488:14) #17 Library.visitChildren (package:kernel/ast.dart:591:5) #18 VerifyingVisitor.visitChildren (package:kernel/verifier.dart:259:10) #19 VerifyingVisitor.defaultTreeNode (package:kernel/verifier.dart:196:5) #20 TreeVisitor.visitLibrary (package:kernel/visitor.dart:503:35) #21 VerifyingVisitor.visitLibrary (package:kernel/verifier.dart:367:11) #22 Library.accept (package:kernel/ast.dart:577:38) #23 visitList (package:kernel/ast.dart:14488:14) #24 Component.visitChildren (package:kernel/ast.dart:14320:5) #25 VerifyingVisitor.visitChildren (package:kernel/verifier.dart:259:10) #26 VerifyingVisitor.visitComponent (package:kernel/verifier.dart:342:7) #27 Component.accept (package:kernel/ast.dart:14313:38) #28 VerifyingVisitor.check (package:kernel/verifier.dart:171:15) #29 verifyComponent (package:kernel/verifier.dart:69:20) ... ``` The issue seems to be that after doing this native transformation, the node's `fileUri` references the enclosing library (`ui.dart` above), but the `node.location` references the actual source file (`foo.dart` above) indirectly through `node.fileOffset`. This ends up being an issue when compiling the platform dill in Google3, but I didn't look into why `flutter build web --wasm` isn't broken. Internal bug: b/292172146 Change-Id: I2b8d7d215b2c36354860257ce651d50168e9523d Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/315360 Reviewed-by: Ömer Ağacan <omersa@google.com> Commit-Queue: Jia Hao Goh <jiahaog@google.com>
This issue was originally filed by @mdakin
String toLowerCase and toUpperCase does not work correctly for Turkish dotless i and capital dotted i.
Run this application (Unfortunately http://try-dart-lang.appspot.com/ loses Turkish characters after I tried to link it):
main() {
// Expected conversions
String trUpper = "A,B,C,Ç,D,E,F,G,Ğ,H,I,İ,J,K,L,M,N,O,Ö,P,R,S,Ş,T,U,Ü,V,Y,Z";
String trLower = "a,b,c,ç,d,e,f,g,ğ,h,ı,i,j,k,l,m,n,o,ö,p,r,s,ş,t,u,ü,v,y,z";
// Actual conversions
String dartTrUpper = trLower.toUpperCase();
String dartTrLower = trUpper.toLowerCase();
if (dartTrUpper != trUpper) {
print ("Incorrect Turkish toUpper conversion. \nExpected: ${trUpper} \nFound: ${dartTrUpper}");
}
if (dartTrLower != trLower) {
print ("Incorrect Turkish toLower conversion. \nExpected: ${trLower} \nFound: ${dartTrLower}");
}
}
Expected: Program does not print anything.
Actual: Prints 2 messages with outputs.
The text was updated successfully, but these errors were encountered: