Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Specialize setdiff! for sets #27486

Closed
wants to merge 2 commits into from
Closed

[RFC] Specialize setdiff! for sets #27486

wants to merge 2 commits into from

Conversation

guilhermeleobas
Copy link
Contributor

Hi all,

This PR tries to solve the issue #25317. As pointed out there, the PR #23528 made setdiff as slow as setdiff!. This is still a work in progress and I would like some feedback on what has been done so far.

With the specialization, I got the following numbers

julia> s1 = Set([1.0])
Set([1.0])

julia> s = Set(rand(1_000_000))
Set([0.955284, 0.397492, , 0.00387804, 0.116504])

julia> @which setdiff!(s1, s)
setdiff!(s::Set, itr::Set) in Base at set.jl:298

julia> @time setdiff!(s1, s)
  0.059103 seconds (32.46 k allocations: 1.888 MiB)
Set([1.0])

julia> @time setdiff!(s1, s)
  0.000004 seconds (9 allocations: 656 bytes)
Set([1.0])

With the specialization, shouldn't the first call to setdiff! be as fast as the second one?

@guilhermeleobas guilhermeleobas changed the title [WIP] Specialize setdiff! for Sets [WIP] Specialize setdiff! for sets Jun 8, 2018
@fredrikekre
Copy link
Member

Instead of benchmarking with @time, use @btime or @benchmark from https://github.com/JuliaCI/BenchmarkTools.jl

@guilhermeleobas
Copy link
Contributor Author

guilhermeleobas commented Jul 17, 2018

Hi @fredrikekre, thanks for the feedback and sorry for the delay.

With my changes and using the same setdiff! call as before, I have the following results:

BenchmarkTools.Trial:
  memory estimate:  496 bytes
  allocs estimate:  5
  --------------
  minimum time:     173.827 ns (0.00% GC)
  median time:      184.953 ns (0.00% GC)
  mean time:        233.334 ns (17.84% GC)
  maximum time:     65.088 μs (99.66% GC)
  --------------
  samples:          10000
  evals/sample:     732

And below is the benchmark for Julia v. 0.6.4, which is the current release version:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     37.215 ms (0.00% GC)
  median time:      37.590 ms (0.00% GC)
  mean time:        37.701 ms (0.00% GC)
  maximum time:     40.461 ms (0.00% GC)
  --------------
  samples:          133
  evals/sample:     1

As can be seen, the specialized version of setdiff! is way faster. What do you think?

Edit: I should have added in my first comment what this PR tries to achieve. So, the goal is to make setdiff! faster by 1) specializing it for sets and 2) always iterating over the smallest set between the two passed to the function.

@guilhermeleobas guilhermeleobas changed the title [WIP] Specialize setdiff! for sets [RFC] Specialize setdiff! for sets Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants