-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 support #151
UTF-8 support #151
Conversation
It's bizarre: these tests are passing on travis but failing on my machine. Here's what I get:
|
what does this say? ruby -e 'p Encoding.default_internal ; p Encoding.default_external'
echo $LANG |
Does |
I'm getting an error as well when running like this LANG=C cucumber features/utf-8.feature but it goes green when invoked this way LANG=en_US.UTF-8 cucumber features/utf-8.feature |
@mattwynne +1 for @chrismdp's comment, I believe the default file encoding is different on linux than OS X. |
On 15 May 2013, at 22:14, Chris Parsons notifications@github.com wrote:
|
|
On 15 May 2013, at 22:04, Arne Brasseur notifications@github.com wrote:
I'm using OS X 10.8.3 and RVM, in case that's relevant. |
On 15 May 2013, at 22:14, Arne Brasseur notifications@github.com wrote:
|
I have added another UTF-16 test. I suppose the goal is to make them work oblivious of the |
To be honest. I think it is just as much an issue in cucumber as it is in aruba (at least for the UTF-16 case). Is there any requirement (from cucumber usage) that feature files must be UTF-8 encoded? |
On 16 May 2013, at 08:00, Jarl Friis notifications@github.com wrote:
|
On 16 May 2013, at 08:05, Jarl Friis notifications@github.com wrote:
This issue has arisen because Cucumber's pre-aruba tests were forcing the encoding[1] of STDOUT to UTF-8. When we migrate the same test to use Aruba, it fails (on some machines). Why do you say it's an issue in Cucumber? [1] https://github.com/cucumber/cucumber/blob/master/legacy_features/support/env.rb#L104 |
@mattwynne : Based on the stack trace if you run |
I have just pushed a patch (beb5fe9) on this branch. I am not sure this is the right solution, but it gives a clue of the utf-8 problem at least. However it doesn't seem to solve the UTF-16 problem. The patch will make |
On 16 May 2013, at 08:30, Jarl Friis notifications@github.com wrote:
Ace. Can we ship it? |
Hang on regarding shipping... |
That Encoding:ASCII-8BIT is the culprit, because of that Ruby will read The real solution would be for Cucumber to support a magic comment like |
@arnebrasseur the behaviour of my machine seems inconsistent with the Ruby docs: http://ruby-doc.org/core-2.0/Encoding.html#label-External+encoding
WTF. Maybe my Ruby installation is just broken. It seems like Ruby is ignoring the value of the |
I get the same behaviour from 1.9.3 too. |
@arnebrasseur Cucumber does have that magic comment, but I don't think Aruba pays attention to it. I think it's simply there to ensure the Gherkin parser uses the correct encoding when parsing the feature file. So are you suggesting that, instead of @jarl-dk's fix, we should force the encoding of STDOUT/STDERR to use the encoding specified in the feature? |
@arnebrasseur : your comments are good. I guess that what you are commenting on is that the feature file is always read as UTF-8, and that is why the UTF-16 case is a cucumber issue, am I right? However my fix is forcing the output from a command into the same encoding as the expected (which, due to cucumber, is always UTF-8). The output from a command will have the encoding corresponding to the I don't mind shipping my patch, but I will consider it a "workaround" that only solves the problem for feature files encoded in UTF-8, and therefore I suggest that the test for UTF-16 is taken out (for now) and maybe should be taken to the cucumber project... @arnebrasseur : I think your proposal to cucumber to honor magic comments are not too bad, already cucumber honors magic comments like |
@mattwynne : what if you set |
On 16 May 2013, at 10:26, Jarl Friis notifications@github.com wrote:
Nevertheless, it could happen if someone's LANG env variable is set wrong, am I correct? Could we reproduce it for the UTF-8 case on Travis by setting that variable in the cat command? I'll try it. |
I'm not actually very familiar with Cucumber or Aruba, I just got intrigued by your tweet since I've spent quite some time digging into encoding issues. So for a specific solution you guys can use your best judgement. In general though it's a good idea to use UTF-8 unless there's explicit reason to do otherwise. Especially when the default encoding is ASCII-8BIT, since it's a subset of UTF-8 but still ruby refuses to convert one to the other when needed, so utf8_string + ascii_string blows up, same for gsub, string interpolation, etc. This is a deliberate choice of the Ruby devs, even when there's a lossless conversion to a compatible encoding, Ruby doesn't do that for you. This seems to be mostly due to the Japanese being sceptical of Unicode, because of something the Unicode consortium did called Han unification. But for the rest of the world it's quite annoying. |
@mattwynne : Are you sure cucumber honors magic comment like If I add that to the UTF-16 test, the stack trace indicates that cucumber pukes. Seems like aruba step definitions are never reached. |
On 16 May 2013, at 10:32, Jarl Friis notifications@github.com wrote:
|
I agree, lets focus on UTF-8 for this issue and create another for UTF-16 (which may depend on cucumber issue). |
Shall we squash the addition and removal of utf-16 test when merging to master so they cancel out? I vote for that. |
Yep, sounds good to me. On 16 May 2013, at 10:46, Jarl Friis notifications@github.com wrote:
|
Note, my fix works only for ruby-1.9 :-( |
Do we care about 1.8.7 anymore? On 16 May 2013, at 10:50, Jarl Friis notifications@github.com wrote:
|
So removing the UTF-16 commits #152 is good to go, right? |
Thanks @jarl-dk! |
This reverts commit 5d315b6. @chrismdp there's something funny going on with Aruba and UTF-8. I presume these features are passing on your machine, and I can see they're passing on Travis, but they fail on my machine. See cucumber/aruba#151 for details. I'm reverting this commit while we sort this out.
Migrating over some of Cucumber's old features into Aruba, I've noticed a problem where we compare strings with UTF-8 characters in them.
This feature reproduces the problem (on my machine at least).