Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Deprecate Int-Char comparisons, e.g. 'x' == 120 #16024

Merged
merged 2 commits into from
May 9, 2016

Conversation

StefanKarpinski
Copy link
Member

@StefanKarpinski
Copy link
Member Author

What's going on the CI here?

@yuyichao
Copy link
Contributor

yuyichao commented Apr 23, 2016

I don't think the PR CI will run with merge conflict (since it runs on merge commit)

@yuyichao yuyichao added the breaking This change will break code label Apr 23, 2016
@@ -51,7 +51,9 @@ function parseinline(stream::IO, md::MD, config::Config)
content = []
buffer = IOBuffer()
while !eof(stream)
char = peek(stream)
# FIXME: this is broken if we're looking for non-ASCII
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woudn't peekchar fix this problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need peek to match the current read interface:

read(io, UInt8)
read(io, Char)

peek(io, UInt8)
peek(io, Char)

with the defaults matching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, peekchar only works for the IOStream type, so this fails.

@stevengj
Copy link
Member

I have to say that I'm not convinced that this change is a good idea. Comparison of Char with integers is a pretty well-defined operation, thanks to Unicode, is familiar from many other languages, and is convenient (witness all of the annoying conversions required by this patch, when the original code was perfectly clear).

@Tetralux
Copy link
Contributor

@stevengj Is not c == 'x' clearer than c == 120 ?

@stevengj
Copy link
Member

@H-225, that's not the problem. You can already write c == 'x', and any sane programmer would do so. But if you look at Stefan's patch, you'll see that there are lots of cases where you read in a single UInt8 byte b, and you want to check b == '\n' and similar. And b == '\n' is much clearer than b == 10 and more convenient than b == UInt8('\n'), which are what this patch requires.

@Tetralux
Copy link
Contributor

@stevengj My bad - totally should have noticed that.
Devils Advocate: Maybe some kind of Char prefix is in order? b'x' -> UInt8('x') @StefanKarpinski @JeffBezanson
Or would that be considered unnecessary?

@nalimilan
Copy link
Member

Adding a new prefix system just to avoid typing a few chars? That doesn't sound worth it.

@stevengj
Copy link
Member

Put it another way: has this change (a) caught any errors or (b) made the Base code cleaner? Looks like "no" on both counts?

@Tetralux
Copy link
Contributor

I think these are the important points here:

  • c = Char('x'); c == 120 is bad; I don't want to have to know that UInt8('x') == 120.
  • n = UInt8('x'); n == 'x' is good; the intent is clear. Plus, it covers the quote below.

there are lots of cases where you read in a single UInt8 byte b, and you want to check b == '\n' and similar.

I therefore agree with @stevengj; we should not remove UInt8/Char comparisons.
However, I would be OK with removing Int/Char comparisons.

@nalimilan
Copy link
Member

@H-225 These two examples don't make sense. They should be c = UInt8('x'); c == 120 and n = Char(120); n == 'x'. Both can still be written with or without this PR. What the PR does is force you to use one of these two forms, while before that you could have written 'x' == 120 directly.

@stevengj
Copy link
Member

stevengj commented Apr 25, 2016

@nalimilan, the point is that there are lots of cases (e.g. see this PR!) where you have already read in a byte as a b::UInt8, and you want to check b == '\n' or similar. This PR forces you to do an explicit cast somewhere first. What do we gain by it?

@H-225, it would seem rather odd and arbitrary if you could compare Char to UInt8 but not to other Integer types. (Also a bit short-sighted -- the same reasoning applies to UInt16 for processing UTF-16 data, and also to UInt32 for processing UTF-32 data, especially if Stefan goes ahead with #14383.)

@StefanKarpinski
Copy link
Member Author

@stevengj: there are actually several places where there were actual or potential bugs here due to comparing UInt8 values with Char and assuming that if the byte value equals the char value (Unicode code point) then you have a match. This isn't true, of course, unless the stream is encoded as Latin-1. Converting from UInt8 to Char doesn't actually fix that issue, of course, it just makes it a little more obvious. The correct solution is to make sure that I/O APIs like peek return chars correctly, not bytes.

That was also not the motivation for this change. Currently we have isequal('x', 120) which means that Set(Any[120, 'x']) == Set(['x']) which is clearly not good. The only alternative to fixing this while keeping 'x' == 120 is to make 'x' == 120 but !isequal('x', 120) – which would be the only case aside from NaNs and ±0.0 where == and isequal currently differ. We can do that, but it's a biggish step semantically and begins to widen the crack between == and isequal.

@StefanKarpinski StefanKarpinski changed the title Deprecate Int-Char comparisons, e.g. 'x' == 120 RFC: Deprecate Int-Char comparisons, e.g. 'x' == 120 Apr 25, 2016
@stevengj
Copy link
Member

stevengj commented Apr 25, 2016

@StefanKarpinski, I don't see how having peek etcetera return Char will help with the problem of accidentally reading a stream in some non-Unicode-compatible encoding.

Making == and isequal differ here makes a lot of sense to me, actually, but I can see why you would be reluctant.

@StefanKarpinski
Copy link
Member Author

@StefanKarpinski, I don't see how having peek etcetera return Char will help with the problem of accidentally reading a stream in some non-Unicode-compatible encoding.

Because it decodes a Char value in whatever way is appropriate for the stream. Currently we don't support streams with different encodings, but that's something we should support in the future. For now it at least fixes the common problem for UTF-8 streams.

@nalimilan
Copy link
Member

Currently we don't support streams with different encodings, but that's something we should support in the future.

See https://github.com/nalimilan/StringEncodings.jl#advanced-usage-stringencoder-and-stringdecoder

@StefanKarpinski
Copy link
Member Author

@stevengj: I'm ok with making == and isequal more different, but that requires a bit of thought about meaning. Why is this the one case, aside from IEEE 754 weirdness, where it's ok for == and isequal to differ? In general, I've been contemplating whether == should raise an error when you try to compare objects of "incomparable types"; whereas isequal must support that (and === does too).

@stevengj
Copy link
Member

Because comparison of chars to integers is well-defined (Unicode), convenient, and a long-standing convention in many languages? They aren't really "incomparable types".

@StefanKarpinski
Copy link
Member Author

So it's ok to compare 'x' == 120 but !isequal('x', 120). What about 'x' == 120.0? Is that ok too? So Char is not a numeric type, but it has some canonical numerical embedding? Just deciding to do this feels very ad-hoc unless we have some broader concept of why it's ok to do this.

@JeffBezanson
Copy link
Member

The recent issue about comparing Exprs containing NaN is a good example. :(print(65)) and :(print('A')) clearly should not be equal in any sense. For this particular case, separating chars and ints with isequal would be sufficient, but like Stefan I worry about having many ad-hoc differences between == and isequal.

@stevengj
Copy link
Member

stevengj commented Apr 27, 2016

@StefanKarpinski, having 'x' == 120 return false seems like an invitation for future bugs. Deprecating it is one thing, but taking a construct that has a widespread clear meaning in many other programming languages and turning it on its head seems far worse than the problem you are trying to fix here. I think the only sane alternatives here are either to make == different from isequal or to plan on a permanent deprecation warning.

@ararslan, the mapping of Char to Int is unambiguous, literally standardized, and widespread in programming languages. Automatic mapping of strings to integers is none of those things.

@stevengj
Copy link
Member

@ararslan, defining zero(Char) seems like a hack to me: the zero function is supposed to return the additive identity, but + is not defined for Char and so it has no additive identity.

@StefanKarpinski StefanKarpinski added this to the 0.5.0 milestone May 5, 2016
@StefanKarpinski StefanKarpinski self-assigned this May 5, 2016
@StefanKarpinski StefanKarpinski added the needs decision A decision on this change is needed label May 5, 2016
@vtjnash
Copy link
Member

vtjnash commented May 5, 2016

have to say that I think we should do this. We clearly need to make !isequal('x', 120) and there are two approaches: this or making == and isequal different.

while reviewing/fixup tests, I found one that asserts '/' != "/home/me" (

@unix_only @test expanduser("~")[1] != ENV["HOME"]
). it might have been helpful if that was a NotComparableError.

@StefanKarpinski
Copy link
Member Author

it might have been helpful if that was a NotComparableError.

If we're going to go down the road of having NotComparableError, then we should really go all in. I'm all for it, but I don't think it belongs in this fairly minimal PR to make Set(Any['x',120]) work right.

@@ -478,6 +478,6 @@ precompile(Base.launch, (Base.LocalManager, Dict, Array{Base.WorkerConfig, 1}, B
precompile(Base.set_valid_processes, (Array{Int, 1}, ))

# Speed up repl help
sprint(Markdown.term, @doc mean)
# sprint(Markdown.term, @doc mean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was wrong here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. I commented it out because bootstrap was failing and then forgot to try putting it back in.

@StefanKarpinski
Copy link
Member Author

Any idea why we're not getting AppVeyor CI on this?

@StefanKarpinski
Copy link
Member Author

Nevermind, it kicked in after a while.

@StefanKarpinski StefanKarpinski merged commit bf73102 into master May 9, 2016
@StefanKarpinski StefanKarpinski deleted the sk/char_neq_int branch May 9, 2016 08:46
@tkelman
Copy link
Contributor

tkelman commented May 9, 2016

Needs a NEWS.md item.

@ararslan
Copy link
Member

ararslan commented May 9, 2016

My understanding is that once things are deprecated, they'll be removed in a future release. So if char-int comparison is deprecated, what are we envisioning the eventual behavior to be once it's removed entirely? A MethodError from ==?

@JeffBezanson
Copy link
Member

Saying that deprecation implies removal is a bit too narrow. Sometimes we need to change a behavior, e.g. changing a true result to false as here, and the deprecation mechanism is also useful for this.

@ararslan
Copy link
Member

ararslan commented May 9, 2016

@JeffBezanson That's good to know, thanks! So the eventual behavior here is that 0 == '\0' (for example) will be false?

@StefanKarpinski
Copy link
Member Author

It's also possible that it will be an error. Either way, this is the intermediate step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.