-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Remove RevString #23612
WIP: Remove RevString #23612
Conversation
@nanosoldier |
Your benchmark job has completed, but no benchmarks were actually executed. Perhaps your tag predicate contains misspelled tags? cc @ararslan |
f9cad23
to
18d1ef9
Compare
Whoops. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
base/strings/types.jl
Outdated
@@ -99,32 +99,18 @@ end | |||
|
|||
## reversed strings without data movement ## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be removed as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It's hard to check that all computations of indices are correct, but this looks good and tests pass...
base/repl/REPLCompletions.jl
Outdated
@@ -239,18 +238,19 @@ function find_start_brace(s::AbstractString; c_start='(', c_end=')') | |||
in_back_ticks = true | |||
end | |||
else | |||
if !in_back_ticks && !in_double_quotes && c == '\'' && !done(r, i) && next(r, i)[1]!='\\' | |||
if !in_back_ticks && !in_double_quotes && c == '\'' && i > 0 && s[prevind(s, i)] != '\\' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to save the result of prevind(s, i)
since it's also used below.
base/strings/search.jl
Outdated
e-j+1 | ||
function rsearch(s::AbstractString, c::Chars, i::Integer=start(s)) | ||
if isempty(c) | ||
return 1 <= i <= nextind(s, endof(s)) ? i : throw(BoundsError(s, i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endof(s)
is the last valid index, so 1 <= i <= endof(s)
should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I was just mimicking the structure of the search
code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. So maybe better keep the same (weird) approach as search
and fix that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like I might as well go with the better approach now and update search
to match.
base/strings/search.jl
Outdated
end | ||
|
||
function _rsearchindex(s, t, i) | ||
if isempty(t) | ||
return 1 <= i <= nextind(s,endof(s)) ? i : | ||
throw(BoundsError(s, i)) | ||
end | ||
t = RevString(t) | ||
rs = RevString(s) | ||
t = reverse(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not implement the reversed algorithm just like above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure how best to do it. Suggestions welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know... Doesn't changing every next
into prevind
work like you did above? I admit that method looks a bit scary. Or just keep RevString
as an internal helper type just for that method. ;-)
@@ -1,6 +1,6 @@ | |||
# This file is a part of Julia. License is MIT: https://julialang.org/license | |||
|
|||
# SubString and RevString types | |||
# SubString type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is kind of inconsistent now. Maybe better rename it to substring.jl and move unrelated functions elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Once I'm done with implementations and cleanup here I'll make that a separate commit.
base/strings/types.jl
Outdated
@@ -99,32 +99,18 @@ end | |||
|
|||
## reversed strings without data movement ## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"without data movement"?
@@ -158,7 +158,6 @@ | |||
<item> RegexMatch </item> | |||
<item> RegexMatchIterator </item> | |||
<item> RepString </item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could remove RepString
too while you're at it (probably in other files too; but are they maintained?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that must have been an oversight in #19867. I don't even know what this file is for.
test/strings/types.jl
Outdated
@@ -174,11 +165,6 @@ for T in (String, GenericString) | |||
s = convert(T, string(prefix, c, suffix)) | |||
r = reverse(s) | |||
ri = search(r, c) | |||
@test r == RevString(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you don't check that r
is correct now. Should probably hardcode the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have more extensive tests for the correctness of reverse
in test/strings/unicode.jl; I assume those would be sufficient.
18d1ef9
to
a33730d
Compare
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan |
To remove julia> s = "X\U0001d4c1β\U0001d6a4\U0001d4c1β\U0001d6a4"
"X𝓁β𝚤𝓁β𝚤"
julia> b = SubString(s, 1, 8)
"X𝓁β𝚤"
julia> r = reverse(b)
"𝚤β𝓁X"
julia> search(r, 'X')
8
julia> search(String(r), 'X')
11 These results are both valid indices for their inputs and indeed they correspond to the same character (i.e. |
Bump. Any ideas about the indexing? It would also be good to hear from @stevengj, who introduced |
Just to restate my position: let's get rid of |
ff9744e
to
679e7ee
Compare
I'd be okay keeping |
BTW, I've noticed the following bug on current master: julia> rsearch(SubString("", 1, 0), "")
ERROR: BoundsError: attempt to access ""
at index [0]
Stacktrace:
[1] _rsearchindex(::SubString{String}, ::String, ::Int64) at ./strings/search.jl:219
[2] _rsearch(::SubString{String}, ::String, ::Int64) at ./strings/search.jl:357
[3] rsearch(::SubString{String}, ::String) at ./strings/search.jl:365
[4] macro expansion at ./REPL.jl:97 [inlined]
[5] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73 |
The basic motivation for |
That's fine. What do you propose I do in this PR then, given the difference in indexing? |
Remove |
Yeah, I guess that would be okay. It'd be a pretty breaking change without providing a deprecation though. (Though I suppose not all that much more breaking than silently changing the behavior to convert to |
Deprecate to |
Hm, I just noticed that in deprecating |
ec7f783
to
ebc668d
Compare
base/strings/search.jl
Outdated
matched = false | ||
break | ||
end | ||
c, k = next(rs,k) | ||
d, j = next(t,j) | ||
# Using `reverseind` with `prevind` mimics `next` but for iteration over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still wonder whether it wouldn't be more efficient to work with indices in the original string (using prevind
), and only call reverseind
once the match is found. Or maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, though this approach didn't introduce any regressions as far as Nanosoldier could tell. My problem with changing it is that it took me quite a while to figure out how to even get to this; translating this algorithm to be in reverse broke my brain a bit. I'd be happy to change it with some guidance though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe just leave a TODO
so that we can find it when looking for possible performance enhancements.
If you want to try changing it, I'd say you should just have to call prevind
on both rs
and t
, extract the chars and compare them as currently? Actually I don't understand why reversind
is needed here, nor how it could be correct. In the first iteration, k
contains the index in s
, not in reverse(s)
, so how can that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I knew.
Yeah, why not. Converting to |
ebc668d
to
9debcf0
Compare
9debcf0
to
3e06ccd
Compare
I think things should be all sorted out now, if folks want to have another look. |
This would look good... except that CI doesn't pass. :-p |
Okay, well that's what I get for only running a subset of the tests locally. 😑 |
NEWS.md
Outdated
@@ -243,6 +246,9 @@ This section lists changes that do not have deprecation warnings. | |||
* All command line arguments passed via `-e`, `-E`, and `-L` will be executed in the order | |||
given on the command line ([#23665]). | |||
|
|||
* `reverse(::AbstractString)` now unconditionally returns a `String`. Previously it | |||
returned a `RepString`, which has been removed from Base ([#23612]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RepString
-> RevString
The `RevString` type for lazily reversed strings has been moved to the LegacyStrings package. Fixes #22611. Calling `reverse` on an `AbstractString` with no more specific method now unconditionally returns a `String`.
Move the non-substring related functionality to strings/basic.jl.
3e06ccd
to
1693b04
Compare
Hey @ararslan, just so you know, I'm about to push a branch that deletes |
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close JuliaLang#22611 Close JuliaLang#24613 See also: JuliaLang#10593 JuliaLang#23612 JuliaLang#24103
This PR removes
RevString
from Base. It's the companion to JuliaStrings/LegacyStrings.jl#22, which addsRevString
to LegacyStrings.Fixes #22611.