Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate_processor_result! takes up a lot of time #383

Closed
bouk opened this issue Sep 15, 2016 · 20 comments
Closed

validate_processor_result! takes up a lot of time #383

bouk opened this issue Sep 15, 2016 · 20 comments

Comments

@bouk
Copy link
Member

bouk commented Sep 15, 2016

We're working on porting Shopify to Sprockets 4, and I was looking into a big increase in compilation time. Currently compiling all the assets takes about 3x as long as on sprockets 3

I discovered that validate_processor_result! takes up 20% of the total compilation time.

Here is a stackprof for a cache-cleared complete compilation of all the assets in Shopify

image

System configuration

  • Sprockets master 99444c0
  • Ruby version 2.2.3

cc @rafaelfranca @josh

@schneems
Copy link
Member

I had some PRs that made this a little better. Here's the first PR #312

I'm not 100% sure, but I think we're accidentally "validating" the processor results twice, since all results will eventually be passed through "digest" and digest converts known classes to a digest, and raises an error on everything else

ADD_VALUE_TO_DIGEST = {
String => ->(val, digest) { digest << val },
FalseClass => ->(val, digest) { digest << 'FalseClass'.freeze },
TrueClass => ->(val, digest) { digest << 'TrueClass'.freeze },
NilClass => ->(val, digest) { digest << 'NilClass'.freeze },
Symbol => ->(val, digest) {
digest << 'Symbol'.freeze
digest << val.to_s
},
Fixnum => ->(val, digest) {
digest << 'Fixnum'.freeze
digest << val.to_s
},
Bignum => ->(val, digest) {
digest << 'Bignum'.freeze
digest << val.to_s
},
Array => ->(val, digest) {
digest << 'Array'.freeze
val.each do |element|
ADD_VALUE_TO_DIGEST[element.class].call(element, digest)
end
},
Hash => ->(val, digest) {
digest << 'Hash'.freeze
val.sort.each do |array|
ADD_VALUE_TO_DIGEST[Array].call(array, digest)
end
},
Set => ->(val, digest) {
digest << 'Set'.freeze
ADD_VALUE_TO_DIGEST[Array].call(val, digest)
},
Encoding => ->(val, digest) {
digest << 'Encoding'.freeze
digest << val.name
},
}
ADD_VALUE_TO_DIGEST.default_proc = ->(_, val) {
raise TypeError, "couldn't digest #{ val }"
}
private_constant :ADD_VALUE_TO_DIGEST

If that's the case then we can get rid of this validate processor method safely i think. That's my theory but I haven't spent much time looking into the viability of that solution.

Another option could be an API to disable/enable the validation, so maybe you would only turn it on at test time or when there is a problem.

@schneems
Copy link
Member

btw josh doesn't work on this project anymore.

@DerekStride
Copy link

@schneems If it is the case that the digest alone is sufficient, I was working with @bouk last week to optimize that as well, seeing as how that took a lot of time too. I forked sprockets and have a PR up where I rewrote DIgestUtils#build_digest in C to see if I could get better performance.

You can check out the PR if you're interested.

@schneems
Copy link
Member

If it is the case that the digest alone is sufficient

The digest is doing the exact same work, so it is sufficient. What I haven't done is gone through and proven that all results from the processor are guaranteed to go through the digest function. I'm guessing they will, but I need to prove it before we kill any code.

Interesting work with the c-extension. I don't think I would ever make sprockets a gem with a native extension, however i'm not against making it easier to hook into somehow to do that work in C, so we could maybe have a sprockets-c gem or something. I'm surprised the results aren't even better. It looks like the code is almost exclusively using Ruby's C api, so I'm guessing the win we are seeing is with manual memory management, or maybe by skipping the write-barrier?

@bouk
Copy link
Member Author

bouk commented Sep 19, 2016

@schneems I think a big thing is the way the type switch works, because there's quite a bit of overhead with the Hash-proc thing that Sprockets does right now for every thing in the value that is to be digested. I agree that having a separate C extension is a good idea

@rafaelfranca
Copy link
Member

@tenderlove you might be interested on this one. I remember you were talking about this digest code.

@tenderlove
Copy link
Member

I'll poke at this again, but IIRC we need to stop handling so many types for the digests and switch to polymorphism. Last time I poked at it I had an idea but it wasn't 100% backwards compatible. I'll try to post a patch that isn't backwards compat, but we can at least look at the results. @bouk thanks for reporting this!

@schneems
Copy link
Member

Here is the PR where we switched to this recursive hash #320 which gave me 16% asset generation in code triage.

@bouk
Copy link
Member Author

bouk commented Sep 19, 2016

Last week I played around with some different approaches for implementing this, including refinements (I've validated that they all return the same digest): https://gist.github.com/bouk/2eef1fffaa88262c364da830dbabc65c. As always, keep in mind that benchmarks are deceiving (I didn't look much into whether the object I'm using is representative of things that are normally digested), but it seems that refinements are viable for this.

$ ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
$ ruby ./bench.rb
Rehearsal ----------------------------------------------
Hash         2.770000   0.000000   2.770000 (  2.768750)
Switch       2.230000   0.000000   2.230000 (  2.233324)
Refinement   1.930000   0.010000   1.940000 (  1.938744)
------------------------------------- total: 6.940000sec

                 user     system      total        real
Hash         2.800000   0.000000   2.800000 (  2.802943)
Switch       2.260000   0.000000   2.260000 (  2.272631)
Refinement   1.970000   0.010000   1.980000 (  1.970709)

Note that my switch implementation is different from the one that was in Sprockets before.

@tenderlove
Copy link
Member

I'd really like to investigate why we're actually passing so many types to that one function and fix that. ISTM if we went up the callstack and found the source of these complex types and handled them at their source, then we could eliminate this "God function" and speed up the whole system. Does that make sense?

@schneems
Copy link
Member

I think we are building digests out of the hashes that represent an asset, these contain arrays, strings, other hashes etc. I can be available for questions, ping me on campfire. I don't have all the answers but hopefully can help.

@schneems
Copy link
Member

schneems commented Nov 3, 2016

@bouk coming back to your comment on refinements, guess I didn't really read too closely. I re-worked it to be a more-indicative object (one from jquery-rails) and to use benchmark-ips. Thanks for your comment, wish I had investigated more when you posted. Here is the gist:

https://gist.github.com/schneems/8c69779766f1ab46a5a262a24de95dba

# Warming up --------------------------------------
#           Switch         1.000  i/100ms
#           Refinement     1.000  i/100ms
#           Hash           1.000  i/100ms
# Calculating -------------------------------------
#           Switch          9.038  (± 0.0%) i/s -     46.000  in   5.100806s
#           Refinement     11.931  (± 8.4%) i/s -     59.000  in   5.038682s
#           Hash            7.255  (±13.8%) i/s -     36.000  in   5.012351s

# Comparison:
#           Refinement:       11.9 i/s
#           Switch    :        9.0 i/s - 1.32x  slower
#           Hash      :        7.3 i/s - 1.64x  slower

Turns out refinements are WAYYY faster than the currently used hashes. I'm guessing that is because there is no overhead for context lookup versus with the procs in hashes. Which is what you mentioned. The other surprising thing is that a case statement is faster.

That wasn't what I was seeing with another bench in #312. Looking back it seems that while a hash is faster than a case statement in terms of lookup, the actual work done inside of the blocks is dominating the amount of time. Would love other thoughts here with regard to the switch statement.

I'm fine using refinements for a perf gain. I'm going to do some more real-world benches today and see what happens. The downside to the refinements is that a subclass of hash or string, etc would pass while currently it is a strict class check. I'm wondering if those things will blow up the cache in another location so maybe this behavior change is fine?

class StringFoo < String
end

puts RefinementDigestUtils.digest(StringFoo.new("foo"))
# => 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae

@schneems
Copy link
Member

schneems commented Nov 4, 2016

Here is a PR with some promising results #416

I get these numbers:

            Avg         STDev
Refinements 10.06318182 0.2621523045
Hash        12.08513782 1.944694921

Based on this refinements is faster than the current hash (8-16% in a real world assets:precompile app), but there is a high variance.

@headius
Copy link

headius commented Nov 8, 2016

JRuby results on your benchmark.

Warming up --------------------------------------
          Switch         1.000  i/100ms
          Refinement     1.000  i/100ms
          Hash           1.000  i/100ms
Calculating -------------------------------------
          Switch         12.420  (± 8.1%) i/s -     62.000  in   5.014637s
          Refinement      5.001  (± 0.0%) i/s -     25.000  in   5.009298s
          Hash           11.779  (± 8.5%) i/s -     59.000  in   5.020402s

Comparison:
          Switch    :       12.4 i/s
          Hash      :       11.8 i/s - same-ish: difference falls within error
          Refinement:        5.0 i/s - 2.48x  slower

And here's the MRI results:

Warming up --------------------------------------
          Switch         1.000  i/100ms
          Refinement     1.000  i/100ms
          Hash           1.000  i/100ms
Calculating -------------------------------------
          Switch          6.776  (± 0.0%) i/s -     34.000  in   5.028583s
          Refinement      8.672  (±11.5%) i/s -     44.000  in   5.090601s
          Hash            5.072  (± 0.0%) i/s -     26.000  in   5.132423s

Comparison:
          Refinement:        8.7 i/s
          Switch    :        6.8 i/s - 1.28x  slower
          Hash      :        5.1 i/s - 1.71x  slower

@schneems
Copy link
Member

schneems commented Nov 8, 2016

Thanks! Refinements seems to be faster all around.

@eregon
Copy link
Contributor

eregon commented Nov 8, 2016

@schneems It seems you read the numbers wrong, refinements are actually the slowest of the 3 on JRuby.

@schneems
Copy link
Member

schneems commented Nov 9, 2016

Ah, i did. Ugh. Thanks.

@dgynn
Copy link

dgynn commented Nov 12, 2016

Since the ADD_VALUE_TO_DIGEST Hash uses Classes as keys it can be sped up by using .compare_by_identity.rehash. That also works for the other Hashes that use integers or classes as well (DIGEST_SIZES and HASH_ALGORITHMS).

Here is a fork of the refinement_bench.rb gist that adds a benchmark for that which shows Hashes (with compare_by_identity) nearly as fast as Refinements, 50% faster than normal Hashes, and slightly faster than Switch.
https://gist.github.com/dgynn/c28cbefb514d26e057f37e60e03689f5

Note: rehash is needed for JRuby support. MRI automatically rehashes when compare_by_identity is called but JRuby does not.

Here are the MRI results

Warming up --------------------------------------
    Switch               1.000  i/100ms
    Refinement           1.000  i/100ms
    Hash                 1.000  i/100ms
    Hash by identity     1.000  i/100ms
Calculating -------------------------------------
    Switch               13.354  (± 7.5%) i/s -     67.000  in   5.031874s
    Refinement           17.478  (± 5.7%) i/s -     88.000  in   5.047458s
    Hash                 10.254  (± 0.0%) i/s -     52.000  in   5.075437s
    Hash by identity     15.274  (± 0.0%) i/s -     77.000  in   5.045927s

Comparison:
    Refinement      :       17.5 i/s
    Hash by identity:       15.3 i/s - 1.14x  slower
    Switch          :       13.4 i/s - 1.31x  slower
    Hash            :       10.3 i/s - 1.70x  slower

And JRuby results

Warming up --------------------------------------
    Switch               1.000  i/100ms
    Refinement           1.000  i/100ms
    Hash                 2.000  i/100ms
    Hash by identity     2.000  i/100ms
Calculating -------------------------------------
    Switch               20.850  (± 4.8%) i/s -    105.000  in   5.042976s
    Refinement            9.110  (± 0.0%) i/s -     46.000  in   5.057898s
    Hash                 20.180  (± 0.0%) i/s -    102.000  in   5.057621s
    Hash by identity     20.233  (± 4.9%) i/s -    102.000  in   5.051163s

Comparison:
    Switch          :       20.9 i/s
    Hash by identity:       20.2 i/s - same-ish: difference falls within error
    Hash            :       20.2 i/s - same-ish: difference falls within error
    Refinement      :        9.1 i/s - 2.29x  slower

schneems added a commit that referenced this issue Dec 12, 2016
Since we are always looking up these hashes with an identical duplicate (not just the same value, the same object) we can take advantage of Hash.compare_by_identity.

Suggested by @dgynn in #383 (comment)
@schneems
Copy link
Member

Implemented the compare_by_identity patch in #439. I tried adding on more hashes like VALID_METADATA_VALUE_TYPES_HASH but didn't get any perf difference. I still think I want to just remove valid_processor_metadata_value? checks or make them opt-in somehow.

This was referenced Nov 14, 2017
@schneems
Copy link
Member

@bouk are you interested in sending me a patch that just removes this method? We're already essentially doing the same class checks by passing processor results into digest utils. This is duplicated logic.

schneems added a commit that referenced this issue May 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants