Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer Hash#[] over Set#.include? for speed #312

Merged
merged 1 commit into from
Jun 17, 2016

Conversation

schneems
Copy link
Member

Playing with stackprof against codetriage and for an initial run with no cache Set#include? was the top called method, something around 8% of execution time spent there.

Did a microbenchmark to see if it would be faster to use a hash:

require 'benchmark/ips'

require 'set'

set  = Set.new [:foo, :bar]
hash = {foo: true, bar: true}

Benchmark.ips do |x|
  x.report("set ") {|num|
    i = 0
    while i < num
      set.include?(:foo)
      i += 1
    end

  }
  x.report("hash") {|num|
    i = 0
    while i < num
      hash[:foo]
      i += 1
    end
  }
  x.compare!
end

# Warming up --------------------------------------
#                 set    215.314k i/100ms
#                 hash   219.939k i/100ms
# Calculating -------------------------------------
#                 set      11.715M (±15.5%) i/s -     56.843M in   5.010837s
#                 hash     20.119M (±18.2%) i/s -     96.333M in   5.010977s

# Comparison:
#                 hash: 20118880.7 i/s
#                 set : 11714839.0 i/s - 1.72x slower

Yes, it is faster.

Anecdotally when running RAILS_ENV=production time rake assets:precompile against codetriage:

Before patch:

eal    0m18.325s
user    0m14.564s
sys    0m2.729s

After patch:

real    0m17.981s
user    0m14.461s
sys    0m2.716s

Playing with stackprof against codetriage and for an initial run with no cache `Set#include?` was the top called method, something around 8% of execution time spent there.

Did a microbenchmark to see if it would be faster to use a hash:

```
require 'benchmark/ips'

require 'set'

set  = Set.new [:foo, :bar]
hash = {foo: true, bar: true}

Benchmark.ips do |x|
  x.report("set ") {|num|
    i = 0
    while i < num
      set.include?(:foo)
      i += 1
    end

  }
  x.report("hash") {|num|
    i = 0
    while i < num
      hash[:foo]
      i += 1
    end
  }
  x.compare!
end

# Warming up --------------------------------------
#                 set    215.314k i/100ms
#                 hash   219.939k i/100ms
# Calculating -------------------------------------
#                 set      11.715M (±15.5%) i/s -     56.843M in   5.010837s
#                 hash     20.119M (±18.2%) i/s -     96.333M in   5.010977s

# Comparison:
#                 hash: 20118880.7 i/s
#                 set : 11714839.0 i/s - 1.72x slower
```

Yes, it is faster.

Anecdotally when running `RAILS_ENV=production time rake assets:precompile` against codetriage:

Before patch:

```
eal    0m18.325s
user    0m14.564s
sys    0m2.729s
```

After patch:

```
real    0m17.981s
user    0m14.461s
sys    0m2.716s
```
@schneems schneems force-pushed the schneems/hash-metadata-lookup branch from 30d439f to f032821 Compare June 15, 2016 22:14
@schneems
Copy link
Member Author

Actually looks like we can go even faster with case statements

Comparison:
                case:  6852426.1 i/s
                hash:  3622063.5 i/s - 1.89x slower
                set :  3400254.4 i/s - 2.02x slower

@jeremy
Copy link
Member

jeremy commented Jun 15, 2016

Was just writing to suggest case, since Ruby gives us a helping hand 😁

Wonder whether validate_processor_result! is over-defensive, anyway, like running a linter in production code. Are a processor's results better validated by its test suite?

@schneems
Copy link
Member Author

We need some kind of way to enable a "debug" or "strict" mode when it comes to these checks. They're not even really that useful for debugging, they only guarantee a certain subset of types are used, they don't guarantee the correct types are used in the correct places. I guess the checks are done to blow up before we try to serialize a non-serializable type to the cache by accident.The deep checking of types is really bad for source maps which is an array of hashes with values that hold arrays.

If this is really so expensive, we should definitely look for ways to work around having to do these checks in production.

I tried implementing the logic using a case statement:

    def valid_processor_metadata_value?(value)
      case value
      when String, Symbol, Fixnum, Bignum, TrueClass, FalseClass, NilClass
        true
      when Set, Array
        value.all? { |v| valid_processor_metadata_value?(v) }
      when Hash
        value.all? do |(key, value)|
          valid_processor_metadata_value?(key) &&
            valid_processor_metadata_value?(value)
        end
      else
        false
      end
    end

Then I set up my system to run assets:precompile 50 times and record the results for the codetriage website:

task "assets:bench" do
  measure = []

  50.times do
    measure << Benchmark.measure do
      `rm -rf tmp/cache/assets/sprockets/v4.0/ ; rm -rf public/assets; time RAILS_ENV=production bundle exec rake assets:precompile`
    end.real
  end
  puts "================ DONE ================"
  puts measure.join("\n")
end

I put the results in a spreadsheet

Name       Avg (seconds)   STDev(seconds)
----------------------------------------
Current 13.96280399   0.3321853933
Hash    12.54463898   0.250917801
Case    13.13691678   1.628405121

The case statement is faster than the "current" but not within the stdev. Hash is the fastest even with stdev. On the average case my assets compile about 10% faster when we switched over to a hash.

I think the case statement might care where the types are in the statement, or maybe it might care how many values in when are available for it to compare. Or maybe === is slower than thing.class == String. I don't know. I'm curious to look into it more tomorrow.

If you're interested in messing around with optimizing the case statement I wrote a script that compares it between hash and set https://gist.github.com/schneems/925ff1988066d4d582915fdf948ca142 for the non "complex" (i.e. Set, Array, Hash) data types.

@schneems
Copy link
Member Author

Looked into it more, still not sure why case is slower. Need to move on. We can revisit some other time.

@schneems schneems merged commit 88fca20 into master Jun 17, 2016
@tenderlove
Copy link
Member

Or maybe === is slower than thing.class == String. I don't know. I'm curious to look into it more tomorrow.

Yes. Class#===(obj) does an oby.is_a?(Class) where thing.class == String is only comparing the classes. IOW switching from a case statement to a hash lookup would mean you don't support subclasses of those classes. But it looks like that's not a real use case, so I'd just go with the hash.

@rafaelfranca
Copy link
Member

👍

@vincentwoo
Copy link

Whoa, this is a little weird to me. Any thoughts on why Set.include? is slower than Hash#[]? I thought Set was supposed to be a special-purpose class explicitly to make set operations like include? fast.

@schneems
Copy link
Member Author

Calls to Hash#[] are optimized by the interpreter. Here's some disassembled code:

code = <<-END
  set = Set.new
  set.include?(foo.class)
END
puts RubyVM::InstructionSequence.compile(code).disasm

Produces

== disasm: #<ISeq:<compiled>@<compiled>>================================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] set
0000 trace            1                                               (   1)
0002 getinlinecache   9, <is:0>
0005 getconstant      :Set
0007 setinlinecache   <is:0>
0009 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>
0012 setlocal_OP__WC__0 2
0014 trace            1                                               (   2)
0016 getlocal_OP__WC__0 2
0018 putself
0019 opt_send_without_block <callinfo!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0022 opt_send_without_block <callinfo!mid:class, argc:0, ARGS_SIMPLE>, <callcache>
0025 opt_send_without_block <callinfo!mid:include?, argc:1, ARGS_SIMPLE>, <callcache>
0028 leave

While

code = <<-END
  hash = {}
  hash[foo.class]
END
puts RubyVM::InstructionSequence.compile(code).disasm

produces

== disasm: #<ISeq:<compiled>@<compiled>>================================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] hash
0000 trace            1                                               (   1)
0002 newhash          0
0004 setlocal_OP__WC__0 2
0006 trace            1                                               (   2)
0008 getlocal_OP__WC__0 2
0010 putself
0011 opt_send_without_block <callinfo!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0014 opt_send_without_block <callinfo!mid:class, argc:0, ARGS_SIMPLE>, <callcache>
0017 opt_aref         <callinfo!mid:[], argc:1, ARGS_SIMPLE>, <callcache>

I think it's that opt_aref is faster than opt_send_without_block, but IANA(Ruby)C developer.

@vincentwoo
Copy link

If this is true, is there ever an appropriate time to use Set and not a Hash in Ruby?

@schneems
Copy link
Member Author

I think it's supposed to be faster for set operations such as unions and calculating intersections.

I could be wrong but I think set is powered by a hash under the hood.

@vincentwoo
Copy link

Ah, cool. Thanks for the info.

@jeremy jeremy deleted the schneems/hash-metadata-lookup branch June 22, 2016 06:26
@schneems
Copy link
Member Author

Looks like Set is written in ruby https://github.com/ruby/ruby/blob/trunk/lib/set.rb and it is powered by a hash. Based on that a Hash will always be faster, however it might not be as convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants