Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on comparison between String and SubString #32401

Closed
chengchingwen opened this issue Jun 24, 2019 · 5 comments
Closed

Performance issue on comparison between String and SubString #32401

chengchingwen opened this issue Jun 24, 2019 · 5 comments
Labels
performance Must go faster strings "Strings!"

Comments

@chengchingwen
Copy link
Member

There is a performance issue when compare between SubString and String. Here is an example to bring out this issue.

using BenchmarkTools

function findsubstr(str, s, e, vocab, ::Val{use_substr}) where use_substr
  if use_substr
    target = SubString(str, s, e)
  else
    target = "$(str[s:e])"
  end

  findfirst(isequal(target), vocab)
end

vocab = ["abc", "def", "ghi"]

julia> @benchmark findsubstr("abcdefghi", 4, 6, vocab, Val(true))
BenchmarkTools.Trial:
  memory estimate:  224 bytes
  allocs estimate:  5
  --------------
  minimum time:     68.546 ns (0.00% GC)
  median time:      70.883 ns (0.00% GC)
  mean time:        87.784 ns (17.74% GC)
  maximum time:     43.507 μs (99.75% GC)
  --------------
  samples:          10000
  evals/sample:     976

julia> @benchmark findsubstr("abcdefghi", 4, 6, vocab, Val(false))
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  2
  --------------
  minimum time:     41.083 ns (0.00% GC)
  median time:      41.733 ns (0.00% GC)
  mean time:        50.037 ns (14.60% GC)
  maximum time:     42.766 μs (99.87% GC)
  --------------
  samples:          10000
  evals/sample:     991
@rfourquet
Copy link
Member

rfourquet commented Jun 24, 2019

The problem is reduced to:

str = "abcdefghi"
s1 = SubString(str, 4, 6) # "def"
s2 = "$(str[4:6])"

julia> @btime $s1 == $("def");
  35.958 ns (2 allocations: 96 bytes)

julia> @btime $s2 == $("def");
  3.737 ns (0 allocations: 0 bytes)

The behavior is similar when the result is false, e.g. comparing to $("cdef").

@rfourquet rfourquet added performance Must go faster strings "Strings!" labels Jun 24, 2019
@StefanKarpinski
Copy link
Sponsor Member

Nice reduction, @rfourquet. It's because of calling generic string == methods that don't use memcmp:

julia> @which s1 == s1
==(a::AbstractString, b::AbstractString) in Base at strings/basic.jl:294

julia> @which s1 == s2
==(a::AbstractString, b::AbstractString) in Base at strings/basic.jl:294

julia> @which s2 == s2
==(a::String, b::String) in Base at strings/string.jl:105

I thought that @bkamins had fixed this at some point, but maybe I'm misremembering.

@bkamins
Copy link
Member

bkamins commented Jun 24, 2019

The proposal was the following #26921 (+ see the discussion on pros and cons). When I am done with pressing DataFrames.jl fixes which need to be shipped before JuliaCon I think I can go back to that issue (of course feel free to overtake it if you prefer - thank you).

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jun 27, 2019

I just posted this over there:

Some time having gone by to think about this, I think we may want to make comparison of invalid strings officially undefined behavior—not in the "we might crash or eat your hard disk" sense, but in the sense that you shouldn't rely on the results remaining consistent over time or across different string types. That would allow us to use the memcmp implementation for any pair of String and SubString{String} thereby fixing #32401 . It would also fix the original problem here since the behavior for String and SubString would become consistent.

@chengchingwen
Copy link
Member Author

The performance difference seems to be fixed already. Tested on v1.7.2:

str = "abcdefghi"
s1 = SubString(str, 4, 6) # "def"
s2 = "$(str[4:6])"

julia> @btime $s1 == $("def");
  3.273 ns (0 allocations: 0 bytes)

julia> @btime $s2 == $("def");
  3.776 ns (0 allocations: 0 bytes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster strings "Strings!"
Projects
None yet
Development

No branches or pull requests

4 participants