validate_processor_result! takes up a lot of time #383

bouk · 2016-09-15T18:42:22Z

We're working on porting Shopify to Sprockets 4, and I was looking into a big increase in compilation time. Currently compiling all the assets takes about 3x as long as on sprockets 3

I discovered that validate_processor_result! takes up 20% of the total compilation time.

Here is a stackprof for a cache-cleared complete compilation of all the assets in Shopify

System configuration

Sprockets master 99444c0
Ruby version 2.2.3

cc @rafaelfranca @josh

The text was updated successfully, but these errors were encountered:

schneems · 2016-09-16T20:00:18Z

I had some PRs that made this a little better. Here's the first PR #312

I'm not 100% sure, but I think we're accidentally "validating" the processor results twice, since all results will eventually be passed through "digest" and digest converts known classes to a digest, and raises an error on everything else

sprockets/lib/sprockets/digest_utils.rb

Lines 38 to 80 in 99444c0

    
           ADD_VALUE_TO_DIGEST = { 
        
             String     => ->(val, digest) { digest << val }, 
        
             FalseClass => ->(val, digest) { digest << 'FalseClass'.freeze }, 
        
             TrueClass  => ->(val, digest) { digest << 'TrueClass'.freeze  }, 
        
             NilClass   => ->(val, digest) { digest << 'NilClass'.freeze   }, 
        
             Symbol => ->(val, digest) { 
        
               digest << 'Symbol'.freeze 
        
               digest << val.to_s 
        
             }, 
        
             Fixnum => ->(val, digest) { 
        
               digest << 'Fixnum'.freeze 
        
               digest << val.to_s 
        
             }, 
        
             Bignum => ->(val, digest) { 
        
               digest << 'Bignum'.freeze 
        
               digest << val.to_s 
        
             }, 
        
             Array => ->(val, digest) { 
        
               digest << 'Array'.freeze 
        
               val.each do |element| 
        
                 ADD_VALUE_TO_DIGEST[element.class].call(element, digest) 
        
               end 
        
             }, 
        
             Hash => ->(val, digest) { 
        
               digest << 'Hash'.freeze 
        
               val.sort.each do |array| 
        
                 ADD_VALUE_TO_DIGEST[Array].call(array, digest) 
        
               end 
        
             }, 
        
             Set => ->(val, digest) { 
        
               digest << 'Set'.freeze 
        
               ADD_VALUE_TO_DIGEST[Array].call(val, digest) 
        
             }, 
        
             Encoding => ->(val, digest) { 
        
               digest << 'Encoding'.freeze 
        
               digest << val.name 
        
             }, 
        
           } 
        
           ADD_VALUE_TO_DIGEST.default_proc = ->(_, val) { 
        
             raise TypeError, "couldn't digest #{ val }" 
        
           } 
        
           private_constant :ADD_VALUE_TO_DIGEST

If that's the case then we can get rid of this validate processor method safely i think. That's my theory but I haven't spent much time looking into the viability of that solution.

Another option could be an API to disable/enable the validation, so maybe you would only turn it on at test time or when there is a problem.

schneems · 2016-09-16T20:00:31Z

btw josh doesn't work on this project anymore.

DerekStride · 2016-09-19T14:18:39Z

@schneems If it is the case that the digest alone is sufficient, I was working with @bouk last week to optimize that as well, seeing as how that took a lot of time too. I forked sprockets and have a PR up where I rewrote DIgestUtils#build_digest in C to see if I could get better performance.

You can check out the PR if you're interested.

schneems · 2016-09-19T14:39:17Z

If it is the case that the digest alone is sufficient

The digest is doing the exact same work, so it is sufficient. What I haven't done is gone through and proven that all results from the processor are guaranteed to go through the digest function. I'm guessing they will, but I need to prove it before we kill any code.

Interesting work with the c-extension. I don't think I would ever make sprockets a gem with a native extension, however i'm not against making it easier to hook into somehow to do that work in C, so we could maybe have a sprockets-c gem or something. I'm surprised the results aren't even better. It looks like the code is almost exclusively using Ruby's C api, so I'm guessing the win we are seeing is with manual memory management, or maybe by skipping the write-barrier?

bouk · 2016-09-19T14:42:46Z

@schneems I think a big thing is the way the type switch works, because there's quite a bit of overhead with the Hash-proc thing that Sprockets does right now for every thing in the value that is to be digested. I agree that having a separate C extension is a good idea

rafaelfranca · 2016-09-19T15:11:48Z

@tenderlove you might be interested on this one. I remember you were talking about this digest code.

tenderlove · 2016-09-19T15:15:34Z

I'll poke at this again, but IIRC we need to stop handling so many types for the digests and switch to polymorphism. Last time I poked at it I had an idea but it wasn't 100% backwards compatible. I'll try to post a patch that isn't backwards compat, but we can at least look at the results. @bouk thanks for reporting this!

schneems · 2016-09-19T15:17:04Z

Here is the PR where we switched to this recursive hash #320 which gave me 16% asset generation in code triage.

bouk · 2016-09-19T15:38:48Z

Last week I played around with some different approaches for implementing this, including refinements (I've validated that they all return the same digest): https://gist.github.com/bouk/2eef1fffaa88262c364da830dbabc65c. As always, keep in mind that benchmarks are deceiving (I didn't look much into whether the object I'm using is representative of things that are normally digested), but it seems that refinements are viable for this.

$ ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
$ ruby ./bench.rb
Rehearsal ----------------------------------------------
Hash         2.770000   0.000000   2.770000 (  2.768750)
Switch       2.230000   0.000000   2.230000 (  2.233324)
Refinement   1.930000   0.010000   1.940000 (  1.938744)
------------------------------------- total: 6.940000sec

                 user     system      total        real
Hash         2.800000   0.000000   2.800000 (  2.802943)
Switch       2.260000   0.000000   2.260000 (  2.272631)
Refinement   1.970000   0.010000   1.980000 (  1.970709)

Note that my switch implementation is different from the one that was in Sprockets before.

tenderlove · 2016-09-19T15:55:51Z

I'd really like to investigate why we're actually passing so many types to that one function and fix that. ISTM if we went up the callstack and found the source of these complex types and handled them at their source, then we could eliminate this "God function" and speed up the whole system. Does that make sense?

schneems · 2016-09-19T17:08:59Z

I think we are building digests out of the hashes that represent an asset, these contain arrays, strings, other hashes etc. I can be available for questions, ping me on campfire. I don't have all the answers but hopefully can help.

schneems · 2016-11-03T21:15:13Z

@bouk coming back to your comment on refinements, guess I didn't really read too closely. I re-worked it to be a more-indicative object (one from jquery-rails) and to use benchmark-ips. Thanks for your comment, wish I had investigated more when you posted. Here is the gist:

https://gist.github.com/schneems/8c69779766f1ab46a5a262a24de95dba

# Warming up --------------------------------------
#           Switch         1.000  i/100ms
#           Refinement     1.000  i/100ms
#           Hash           1.000  i/100ms
# Calculating -------------------------------------
#           Switch          9.038  (± 0.0%) i/s -     46.000  in   5.100806s
#           Refinement     11.931  (± 8.4%) i/s -     59.000  in   5.038682s
#           Hash            7.255  (±13.8%) i/s -     36.000  in   5.012351s

# Comparison:
#           Refinement:       11.9 i/s
#           Switch    :        9.0 i/s - 1.32x  slower
#           Hash      :        7.3 i/s - 1.64x  slower

Turns out refinements are WAYYY faster than the currently used hashes. I'm guessing that is because there is no overhead for context lookup versus with the procs in hashes. Which is what you mentioned. The other surprising thing is that a case statement is faster.

That wasn't what I was seeing with another bench in #312. Looking back it seems that while a hash is faster than a case statement in terms of lookup, the actual work done inside of the blocks is dominating the amount of time. Would love other thoughts here with regard to the switch statement.

I'm fine using refinements for a perf gain. I'm going to do some more real-world benches today and see what happens. The downside to the refinements is that a subclass of hash or string, etc would pass while currently it is a strict class check. I'm wondering if those things will blow up the cache in another location so maybe this behavior change is fine?

class StringFoo < String
end

puts RefinementDigestUtils.digest(StringFoo.new("foo"))
# => 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae

schneems · 2016-11-04T00:59:32Z

Here is a PR with some promising results #416

I get these numbers:

            Avg         STDev
Refinements 10.06318182 0.2621523045
Hash        12.08513782 1.944694921

Based on this refinements is faster than the current hash (8-16% in a real world assets:precompile app), but there is a high variance.

headius · 2016-11-08T07:25:14Z

JRuby results on your benchmark.

Warming up --------------------------------------
          Switch         1.000  i/100ms
          Refinement     1.000  i/100ms
          Hash           1.000  i/100ms
Calculating -------------------------------------
          Switch         12.420  (± 8.1%) i/s -     62.000  in   5.014637s
          Refinement      5.001  (± 0.0%) i/s -     25.000  in   5.009298s
          Hash           11.779  (± 8.5%) i/s -     59.000  in   5.020402s

Comparison:
          Switch    :       12.4 i/s
          Hash      :       11.8 i/s - same-ish: difference falls within error
          Refinement:        5.0 i/s - 2.48x  slower

And here's the MRI results:

Warming up --------------------------------------
          Switch         1.000  i/100ms
          Refinement     1.000  i/100ms
          Hash           1.000  i/100ms
Calculating -------------------------------------
          Switch          6.776  (± 0.0%) i/s -     34.000  in   5.028583s
          Refinement      8.672  (±11.5%) i/s -     44.000  in   5.090601s
          Hash            5.072  (± 0.0%) i/s -     26.000  in   5.132423s

Comparison:
          Refinement:        8.7 i/s
          Switch    :        6.8 i/s - 1.28x  slower
          Hash      :        5.1 i/s - 1.71x  slower

schneems · 2016-11-08T16:33:09Z

Thanks! Refinements seems to be faster all around.

eregon · 2016-11-08T22:34:27Z

@schneems It seems you read the numbers wrong, refinements are actually the slowest of the 3 on JRuby.

schneems · 2016-11-09T07:03:00Z

Ah, i did. Ugh. Thanks.

dgynn · 2016-11-12T04:42:51Z

Since the ADD_VALUE_TO_DIGEST Hash uses Classes as keys it can be sped up by using .compare_by_identity.rehash. That also works for the other Hashes that use integers or classes as well (DIGEST_SIZES and HASH_ALGORITHMS).

Here is a fork of the refinement_bench.rb gist that adds a benchmark for that which shows Hashes (with compare_by_identity) nearly as fast as Refinements, 50% faster than normal Hashes, and slightly faster than Switch.
https://gist.github.com/dgynn/c28cbefb514d26e057f37e60e03689f5

Note: rehash is needed for JRuby support. MRI automatically rehashes when compare_by_identity is called but JRuby does not.

Here are the MRI results

Warming up --------------------------------------
    Switch               1.000  i/100ms
    Refinement           1.000  i/100ms
    Hash                 1.000  i/100ms
    Hash by identity     1.000  i/100ms
Calculating -------------------------------------
    Switch               13.354  (± 7.5%) i/s -     67.000  in   5.031874s
    Refinement           17.478  (± 5.7%) i/s -     88.000  in   5.047458s
    Hash                 10.254  (± 0.0%) i/s -     52.000  in   5.075437s
    Hash by identity     15.274  (± 0.0%) i/s -     77.000  in   5.045927s

Comparison:
    Refinement      :       17.5 i/s
    Hash by identity:       15.3 i/s - 1.14x  slower
    Switch          :       13.4 i/s - 1.31x  slower
    Hash            :       10.3 i/s - 1.70x  slower

And JRuby results

Warming up --------------------------------------
    Switch               1.000  i/100ms
    Refinement           1.000  i/100ms
    Hash                 2.000  i/100ms
    Hash by identity     2.000  i/100ms
Calculating -------------------------------------
    Switch               20.850  (± 4.8%) i/s -    105.000  in   5.042976s
    Refinement            9.110  (± 0.0%) i/s -     46.000  in   5.057898s
    Hash                 20.180  (± 0.0%) i/s -    102.000  in   5.057621s
    Hash by identity     20.233  (± 4.9%) i/s -    102.000  in   5.051163s

Comparison:
    Switch          :       20.9 i/s
    Hash by identity:       20.2 i/s - same-ish: difference falls within error
    Hash            :       20.2 i/s - same-ish: difference falls within error
    Refinement      :        9.1 i/s - 2.29x  slower

@dgynn

Since we are always looking up these hashes with an identical duplicate (not just the same value, the same object) we can take advantage of Hash.compare_by_identity. Suggested by @dgynn in #383 (comment)

schneems · 2016-12-12T19:35:42Z

Implemented the compare_by_identity patch in #439. I tried adding on more hashes like VALID_METADATA_VALUE_TYPES_HASH but didn't get any perf difference. I still think I want to just remove valid_processor_metadata_value? checks or make them opt-in somehow.

schneems · 2017-11-17T16:18:44Z

@bouk are you interested in sending me a patch that just removes this method? We're already essentially doing the same class checks by passing processor results into digest utils. This is duplicated logic.

…_result [close #383] Remove duplicate check

schneems mentioned this issue Nov 4, 2016

Speed up digests with refinements #417

Closed

schneems mentioned this issue Dec 12, 2016

Speed up Hash lookups with constants #439

Merged

schneems mentioned this issue Dec 16, 2016

Sprockets 4 performance regressions #413

Closed

This was referenced Nov 14, 2017

master perf proper #510

Closed

3.x perf #506

Closed

schneems added enhancement performance labels Nov 17, 2017

schneems mentioned this issue Nov 20, 2017

3.x super #507

Closed

schneems closed this as completed in 5308efa May 20, 2019

schneems added a commit that referenced this issue May 20, 2019

Merge pull request #605 from rails/schneems/remove-validate_processor…

1844ee9

…_result [close #383] Remove duplicate check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validate_processor_result! takes up a lot of time #383

validate_processor_result! takes up a lot of time #383

bouk commented Sep 15, 2016

schneems commented Sep 16, 2016

schneems commented Sep 16, 2016

DerekStride commented Sep 19, 2016

schneems commented Sep 19, 2016

bouk commented Sep 19, 2016

rafaelfranca commented Sep 19, 2016

tenderlove commented Sep 19, 2016

schneems commented Sep 19, 2016

bouk commented Sep 19, 2016

tenderlove commented Sep 19, 2016

schneems commented Sep 19, 2016

schneems commented Nov 3, 2016

schneems commented Nov 4, 2016

headius commented Nov 8, 2016

schneems commented Nov 8, 2016

eregon commented Nov 8, 2016

schneems commented Nov 9, 2016

dgynn commented Nov 12, 2016

schneems commented Dec 12, 2016

schneems commented Nov 17, 2017

validate_processor_result! takes up a lot of time #383

validate_processor_result! takes up a lot of time #383

Comments

bouk commented Sep 15, 2016

System configuration

schneems commented Sep 16, 2016

schneems commented Sep 16, 2016

DerekStride commented Sep 19, 2016

schneems commented Sep 19, 2016

bouk commented Sep 19, 2016

rafaelfranca commented Sep 19, 2016

tenderlove commented Sep 19, 2016

schneems commented Sep 19, 2016

bouk commented Sep 19, 2016

tenderlove commented Sep 19, 2016

schneems commented Sep 19, 2016

schneems commented Nov 3, 2016

schneems commented Nov 4, 2016

headius commented Nov 8, 2016

schneems commented Nov 8, 2016

eregon commented Nov 8, 2016

schneems commented Nov 9, 2016

dgynn commented Nov 12, 2016

schneems commented Dec 12, 2016

schneems commented Nov 17, 2017