-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce unnecessary allocations in Rule creation #204
Reduce unnecessary allocations in Rule creation #204
Conversation
The pattern of doing `[var].flatten` to coerce values to Arrays makes three Array allocations every time. The first is for the array enclosing var, the other two are in the C implementation of `flatten` to perform a depth-first traversal over any nested arrays found within the outer-most array[1]. Rules are allocated a *lot* during the typical execution of an application using CanCanCan for authorization. Since the parameters to Ability#can are expected to not be deeply nested, we can cut out the extra allocations by using Kernel#Array. This benchmark: [Array#flatten] Mem diff: 18.40625 MB [Kernel#Array] Mem diff: 1.01953125 MB Shows a savings of 17MB of allocations when instantiating 10,000 Rules[2]. This translates to a 2x speed improvement in Rule instantiation time: Calculating ------------------------------------- Array#flatten 16.147k i/100ms Kernel#Array 34.565k i/100ms ------------------------------------------------- Array#flatten 218.381k (± 9.7%) i/s - 1.098M Kernel#Array 460.892k (±30.7%) i/s - 2.108M Comparison: Kernel#Array: 460892.2 i/s Array#flatten: 218380.6 i/s - 2.11x slower [1] The first array used by flatten is a stack to push sub-arrays onto while traversing, the second is for the result array. [2] Memory figures are RSS, which doesn't consider shared memory, but we have none for this simple benchmark. GC was disabled during the benchmark eliminate the effects of unpredictable GC activity. Most of the intermediary arrays are immediately garbage, but the act of allocating them increases the work the garbage collector has to do. Benchmark/ips was used with GC enabled in the second benchmark. Benchmark script used is available here: https://gist.github.com/timraymond/8d7014e0c7804f0fe508
Thanks so much for this @timraymond ! I am currently putting together a roadmap for a new major version of Cancancan that will allow us to do some breaking changes, and hopefully drop support for some of the ORMs we support that are causing this build to fail (Datamapper in particular). I will keep this open and reference it in the new roadmap. |
This seems to break the Sequel specs. Are you still interested in getting this in @timraymond ? |
@Senjai I could—I only didn't because it seemed like there were larger changes going on for a major release and this would get reimplemented there? (see @bryanrite 's comment) I'll have some time tomorrow, and I'll see where I can get with the specs here. |
I'd be interested in merging this into next release. I'd be glad if you can fix the CI for Sequel but is a very strange behaviour from my point of view. Using Let me know if you can manage to solve it. Thanks |
Since v2 drops support for Mongoid and Sequel I can merge it there. |
I just tracked down some test failures after an upgrade as being due to this change. It sounds like you already knew this was going to be a breaking change for some specific cases, but here's another example to be aware of: I had some value object classes — let's call them But this change, despite its merits, somewhat unfortunately forces a conversion of a subject into an array, and counter-intuitively causes Old behavior:
New behavior:
Should I open a new issue for this? Workaround: Just had to wrap it in array to get it to preserve the value object. Changed from:
to:
Doesn't read as nice, but it works. Idea: I wonder if we could somehow use |
Thank you @TylerRick . Could you provide a gist to reproduce your issue? Take the following as a base: https://gist.github.com/coorasse/3f00f536563249125a37e15a1652648c. Thanks |
Sure! Here you go: https://gist.github.com/TylerRick/0a41a8f5b5dc21d5ae6d6ec1bc465b92 |
BTW @coorasse I posted a gist above but forgot to @mention you so don't know if you saw it. |
context: CanCanCommunity#204 (comment) `Kernel.#Array` calls not only `Object#to_ary` but also `Object#to_a` but some classes which represent cluster may implement `to_a`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I think that's why ActiveSupport has `Array.wrap`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I confirmed there's no performance regression via the following snippet: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips", "2.7.2" gem "cancancan", "2.3.0" gem "allocation_tracer", "0.6.3" end module CanCan class Rule def initialize(base_behavior, action, subject, conditions, block, __flag__:) both_block_and_hash_error = 'You are not able to supply a block with a hash of conditions in '\ "#{action} #{subject} ability. Use either one." raise Error, both_block_and_hash_error if conditions.is_a?(Hash) && block @match_all = action.nil? && subject.nil? @base_behavior = base_behavior case __flag__ when :array @actions = Array(action) @subjects = Array(subject) when :flatten @actions = [action].flatten @subjects = [subject].flatten when :wrap @actions = wrap(action) @subjects = wrap(subject) else raise end @conditions = conditions || {} @block = block end def wrap(object) if object.nil? [] elsif object.respond_to?(:to_ary) object.to_ary || [object] else [object] end end end end def memprof(name) GC.disable before = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i ObjectSpace::AllocationTracer.trace do 10_000.times do yield end end after = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i puts "[#{name}] Mem diff: #{(after - before).to_f/1024.0} MB" GC.enable GC.start end puts "Ruby #{RUBY_VERSION}" # I think memprof outputs wrong result because always first one gets the worst result # I just followed original one # https://gist.github.com/timraymond/8d7014e0c7804f0fe508 memprof("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } memprof("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } memprof("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } Benchmark.ips do |b| b.report("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } b.report("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } b.report("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } b.compare! end __END__ Ruby 2.6.1 [Kernel.#Array] Mem diff: 21.59765625 MB [Array#flatten] Mem diff: 2.21484375 MB [Array.wrap] Mem diff: 0.02734375 MB Warming up -------------------------------------- Kernel.#Array 5.024k i/100ms Array#flatten 3.863k i/100ms Array.wrap 5.217k i/100ms Calculating ------------------------------------- Kernel.#Array 204.954k (±21.6%) i/s - 964.608k in 5.003037s Array#flatten 143.892k (±17.3%) i/s - 695.340k in 5.005342s Array.wrap 195.569k (±18.5%) i/s - 933.843k in 5.003945s Comparison: Kernel.#Array: 204954.3 i/s Array.wrap: 195569.0 i/s - same-ish: difference falls within error Array#flatten: 143892.4 i/s - same-ish: difference falls within error Ruby 2.5.3 [Kernel.#Array] Mem diff: 21.98046875 MB [Array#flatten] Mem diff: 0.93359375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.816k i/100ms Array#flatten 4.235k i/100ms Array.wrap 4.610k i/100ms Calculating ------------------------------------- Kernel.#Array 166.440k (±30.7%) i/s - 722.400k in 5.010362s Array#flatten 136.055k (±23.3%) i/s - 639.485k in 5.009729s Array.wrap 173.409k (±18.6%) i/s - 820.580k in 5.019846s Comparison: Array.wrap: 173408.7 i/s Kernel.#Array: 166439.5 i/s - same-ish: difference falls within error Array#flatten: 136054.8 i/s - same-ish: difference falls within error Ruby 2.4.5 [Kernel.#Array] Mem diff: 17.0859375 MB [Array#flatten] Mem diff: 0.0234375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.746k i/100ms Array#flatten 4.019k i/100ms Array.wrap 4.347k i/100ms Calculating ------------------------------------- Kernel.#Array 195.099k (±17.4%) i/s - 939.708k in 4.994105s Array#flatten 117.881k (±19.3%) i/s - 562.660k in 5.012299s Array.wrap 160.679k (±13.9%) i/s - 782.460k in 5.005515s Comparison: Kernel.#Array: 195099.4 i/s Array.wrap: 160678.6 i/s - same-ish: difference falls within error Array#flatten: 117881.1 i/s - 1.66x slower ```
context: CanCanCommunity#204 (comment) `Kernel.#Array` calls not only `Object#to_ary` but also `Object#to_a` but some classes which represent cluster may implement `to_a`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I think that's why ActiveSupport has `Array.wrap`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I confirmed there's no performance regression via the following snippet: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips", "2.7.2" gem "cancancan", "2.3.0" gem "allocation_tracer", "0.6.3" end module CanCan class Rule def initialize(base_behavior, action, subject, conditions, block, __flag__:) both_block_and_hash_error = 'You are not able to supply a block with a hash of conditions in '\ "#{action} #{subject} ability. Use either one." raise Error, both_block_and_hash_error if conditions.is_a?(Hash) && block @match_all = action.nil? && subject.nil? @base_behavior = base_behavior case __flag__ when :array @actions = Array(action) @subjects = Array(subject) when :flatten @actions = [action].flatten @subjects = [subject].flatten when :wrap @actions = wrap(action) @subjects = wrap(subject) else raise end @conditions = conditions || {} @block = block end def wrap(object) if object.nil? [] elsif object.respond_to?(:to_ary) object.to_ary || [object] else [object] end end end end def memprof(name) GC.disable before = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i ObjectSpace::AllocationTracer.trace do 10_000.times do yield end end after = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i puts "[#{name}] Mem diff: #{(after - before).to_f/1024.0} MB" GC.enable GC.start end puts "Ruby #{RUBY_VERSION}" # I think memprof outputs wrong result because always first one gets the worst result # I just followed original one # https://gist.github.com/timraymond/8d7014e0c7804f0fe508 memprof("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } memprof("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } memprof("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } Benchmark.ips do |b| b.report("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } b.report("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } b.report("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } b.compare! end __END__ Ruby 2.6.1 [Kernel.#Array] Mem diff: 21.59765625 MB [Array#flatten] Mem diff: 2.21484375 MB [Array.wrap] Mem diff: 0.02734375 MB Warming up -------------------------------------- Kernel.#Array 5.024k i/100ms Array#flatten 3.863k i/100ms Array.wrap 5.217k i/100ms Calculating ------------------------------------- Kernel.#Array 204.954k (±21.6%) i/s - 964.608k in 5.003037s Array#flatten 143.892k (±17.3%) i/s - 695.340k in 5.005342s Array.wrap 195.569k (±18.5%) i/s - 933.843k in 5.003945s Comparison: Kernel.#Array: 204954.3 i/s Array.wrap: 195569.0 i/s - same-ish: difference falls within error Array#flatten: 143892.4 i/s - same-ish: difference falls within error Ruby 2.5.3 [Kernel.#Array] Mem diff: 21.98046875 MB [Array#flatten] Mem diff: 0.93359375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.816k i/100ms Array#flatten 4.235k i/100ms Array.wrap 4.610k i/100ms Calculating ------------------------------------- Kernel.#Array 166.440k (±30.7%) i/s - 722.400k in 5.010362s Array#flatten 136.055k (±23.3%) i/s - 639.485k in 5.009729s Array.wrap 173.409k (±18.6%) i/s - 820.580k in 5.019846s Comparison: Array.wrap: 173408.7 i/s Kernel.#Array: 166439.5 i/s - same-ish: difference falls within error Array#flatten: 136054.8 i/s - same-ish: difference falls within error Ruby 2.4.5 [Kernel.#Array] Mem diff: 17.0859375 MB [Array#flatten] Mem diff: 0.0234375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.746k i/100ms Array#flatten 4.019k i/100ms Array.wrap 4.347k i/100ms Calculating ------------------------------------- Kernel.#Array 195.099k (±17.4%) i/s - 939.708k in 4.994105s Array#flatten 117.881k (±19.3%) i/s - 562.660k in 5.012299s Array.wrap 160.679k (±13.9%) i/s - 782.460k in 5.005515s Comparison: Kernel.#Array: 195099.4 i/s Array.wrap: 160678.6 i/s - same-ish: difference falls within error Array#flatten: 117881.1 i/s - 1.66x slower ```
context: CanCanCommunity#204 (comment) `Kernel.#Array` calls not only `Object#to_ary` but also `Object#to_a` but some classes which represent cluster may implement `to_a`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I think that's why ActiveSupport has `Array.wrap`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I confirmed there's no performance regression via the following snippet: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips", "2.7.2" gem "cancancan", "2.3.0" gem "allocation_tracer", "0.6.3" end module CanCan class Rule def initialize(base_behavior, action, subject, conditions, block, __flag__:) both_block_and_hash_error = 'You are not able to supply a block with a hash of conditions in '\ "#{action} #{subject} ability. Use either one." raise Error, both_block_and_hash_error if conditions.is_a?(Hash) && block @match_all = action.nil? && subject.nil? @base_behavior = base_behavior case __flag__ when :array @actions = Array(action) @subjects = Array(subject) when :flatten @actions = [action].flatten @subjects = [subject].flatten when :wrap @actions = wrap(action) @subjects = wrap(subject) else raise end @conditions = conditions || {} @block = block end def wrap(object) if object.nil? [] elsif object.respond_to?(:to_ary) object.to_ary || [object] else [object] end end end end def memprof(name) GC.disable before = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i ObjectSpace::AllocationTracer.trace do 10_000.times do yield end end after = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i puts "[#{name}] Mem diff: #{(after - before).to_f/1024.0} MB" GC.enable GC.start end puts "Ruby #{RUBY_VERSION}" # I think memprof outputs wrong result because always first one gets the worst result # I just followed original one # https://gist.github.com/timraymond/8d7014e0c7804f0fe508 memprof("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } memprof("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } memprof("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } Benchmark.ips do |b| b.report("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } b.report("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } b.report("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } b.compare! end __END__ Ruby 2.6.1 [Kernel.#Array] Mem diff: 21.59765625 MB [Array#flatten] Mem diff: 2.21484375 MB [Array.wrap] Mem diff: 0.02734375 MB Warming up -------------------------------------- Kernel.#Array 5.024k i/100ms Array#flatten 3.863k i/100ms Array.wrap 5.217k i/100ms Calculating ------------------------------------- Kernel.#Array 204.954k (±21.6%) i/s - 964.608k in 5.003037s Array#flatten 143.892k (±17.3%) i/s - 695.340k in 5.005342s Array.wrap 195.569k (±18.5%) i/s - 933.843k in 5.003945s Comparison: Kernel.#Array: 204954.3 i/s Array.wrap: 195569.0 i/s - same-ish: difference falls within error Array#flatten: 143892.4 i/s - same-ish: difference falls within error Ruby 2.5.3 [Kernel.#Array] Mem diff: 21.98046875 MB [Array#flatten] Mem diff: 0.93359375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.816k i/100ms Array#flatten 4.235k i/100ms Array.wrap 4.610k i/100ms Calculating ------------------------------------- Kernel.#Array 166.440k (±30.7%) i/s - 722.400k in 5.010362s Array#flatten 136.055k (±23.3%) i/s - 639.485k in 5.009729s Array.wrap 173.409k (±18.6%) i/s - 820.580k in 5.019846s Comparison: Array.wrap: 173408.7 i/s Kernel.#Array: 166439.5 i/s - same-ish: difference falls within error Array#flatten: 136054.8 i/s - same-ish: difference falls within error Ruby 2.4.5 [Kernel.#Array] Mem diff: 17.0859375 MB [Array#flatten] Mem diff: 0.0234375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.746k i/100ms Array#flatten 4.019k i/100ms Array.wrap 4.347k i/100ms Calculating ------------------------------------- Kernel.#Array 195.099k (±17.4%) i/s - 939.708k in 4.994105s Array#flatten 117.881k (±19.3%) i/s - 562.660k in 5.012299s Array.wrap 160.679k (±13.9%) i/s - 782.460k in 5.005515s Comparison: Kernel.#Array: 195099.4 i/s Array.wrap: 160678.6 i/s - same-ish: difference falls within error Array#flatten: 117881.1 i/s - 1.66x slower ```
context: CanCanCommunity#204 (comment) `Kernel.#Array` calls not only `Object#to_ary` but also `Object#to_a` but some classes which represent cluster may implement `to_a`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I think that's why ActiveSupport has `Array.wrap`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I confirmed there's no performance regression via the following snippet: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips", "2.7.2" gem "cancancan", "2.3.0" gem "allocation_tracer", "0.6.3" end module CanCan class Rule def initialize(base_behavior, action, subject, conditions, block, __flag__:) both_block_and_hash_error = 'You are not able to supply a block with a hash of conditions in '\ "#{action} #{subject} ability. Use either one." raise Error, both_block_and_hash_error if conditions.is_a?(Hash) && block @match_all = action.nil? && subject.nil? @base_behavior = base_behavior case __flag__ when :array @actions = Array(action) @subjects = Array(subject) when :flatten @actions = [action].flatten @subjects = [subject].flatten when :wrap @actions = wrap(action) @subjects = wrap(subject) else raise end @conditions = conditions || {} @block = block end def wrap(object) if object.nil? [] elsif object.respond_to?(:to_ary) object.to_ary || [object] else [object] end end end end def memprof(name) GC.disable before = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i ObjectSpace::AllocationTracer.trace do 10_000.times do yield end end after = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i puts "[#{name}] Mem diff: #{(after - before).to_f/1024.0} MB" GC.enable GC.start end puts "Ruby #{RUBY_VERSION}" # I think memprof outputs wrong result because always first one gets the worst result # I just followed original one # https://gist.github.com/timraymond/8d7014e0c7804f0fe508 memprof("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } memprof("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } memprof("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } Benchmark.ips do |b| b.report("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } b.report("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } b.report("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } b.compare! end __END__ Ruby 2.6.1 [Kernel.#Array] Mem diff: 21.59765625 MB [Array#flatten] Mem diff: 2.21484375 MB [Array.wrap] Mem diff: 0.02734375 MB Warming up -------------------------------------- Kernel.#Array 5.024k i/100ms Array#flatten 3.863k i/100ms Array.wrap 5.217k i/100ms Calculating ------------------------------------- Kernel.#Array 204.954k (±21.6%) i/s - 964.608k in 5.003037s Array#flatten 143.892k (±17.3%) i/s - 695.340k in 5.005342s Array.wrap 195.569k (±18.5%) i/s - 933.843k in 5.003945s Comparison: Kernel.#Array: 204954.3 i/s Array.wrap: 195569.0 i/s - same-ish: difference falls within error Array#flatten: 143892.4 i/s - same-ish: difference falls within error Ruby 2.5.3 [Kernel.#Array] Mem diff: 21.98046875 MB [Array#flatten] Mem diff: 0.93359375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.816k i/100ms Array#flatten 4.235k i/100ms Array.wrap 4.610k i/100ms Calculating ------------------------------------- Kernel.#Array 166.440k (±30.7%) i/s - 722.400k in 5.010362s Array#flatten 136.055k (±23.3%) i/s - 639.485k in 5.009729s Array.wrap 173.409k (±18.6%) i/s - 820.580k in 5.019846s Comparison: Array.wrap: 173408.7 i/s Kernel.#Array: 166439.5 i/s - same-ish: difference falls within error Array#flatten: 136054.8 i/s - same-ish: difference falls within error Ruby 2.4.5 [Kernel.#Array] Mem diff: 17.0859375 MB [Array#flatten] Mem diff: 0.0234375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.746k i/100ms Array#flatten 4.019k i/100ms Array.wrap 4.347k i/100ms Calculating ------------------------------------- Kernel.#Array 195.099k (±17.4%) i/s - 939.708k in 4.994105s Array#flatten 117.881k (±19.3%) i/s - 562.660k in 5.012299s Array.wrap 160.679k (±13.9%) i/s - 782.460k in 5.005515s Comparison: Kernel.#Array: 195099.4 i/s Array.wrap: 160678.6 i/s - same-ish: difference falls within error Array#flatten: 117881.1 i/s - 1.66x slower ```
context: #204 (comment) `Kernel.#Array` calls not only `Object#to_ary` but also `Object#to_a` but some classes which represent cluster may implement `to_a`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I think that's why ActiveSupport has `Array.wrap`. https://api.rubyonrails.org/classes/Array.html#method-c-wrap I confirmed there's no performance regression via the following snippet: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips", "2.7.2" gem "cancancan", "2.3.0" gem "allocation_tracer", "0.6.3" end module CanCan class Rule def initialize(base_behavior, action, subject, conditions, block, __flag__:) both_block_and_hash_error = 'You are not able to supply a block with a hash of conditions in '\ "#{action} #{subject} ability. Use either one." raise Error, both_block_and_hash_error if conditions.is_a?(Hash) && block @match_all = action.nil? && subject.nil? @base_behavior = base_behavior case __flag__ when :array @actions = Array(action) @subjects = Array(subject) when :flatten @actions = [action].flatten @subjects = [subject].flatten when :wrap @actions = wrap(action) @subjects = wrap(subject) else raise end @conditions = conditions || {} @block = block end def wrap(object) if object.nil? [] elsif object.respond_to?(:to_ary) object.to_ary || [object] else [object] end end end end def memprof(name) GC.disable before = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i ObjectSpace::AllocationTracer.trace do 10_000.times do yield end end after = `ps -o rss -p #{Process.pid}`.split("\n").last.to_i puts "[#{name}] Mem diff: #{(after - before).to_f/1024.0} MB" GC.enable GC.start end puts "Ruby #{RUBY_VERSION}" memprof("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } memprof("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } memprof("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } Benchmark.ips do |b| b.report("Kernel.#Array") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :array) } b.report("Array#flatten") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :flatten) } b.report("Array.wrap") { CanCan::Rule.new(nil, [:read, :write], [:foo, :bar], nil, nil, __flag__: :wrap) } b.compare! end __END__ Ruby 2.6.1 [Kernel.#Array] Mem diff: 21.59765625 MB [Array#flatten] Mem diff: 2.21484375 MB [Array.wrap] Mem diff: 0.02734375 MB Warming up -------------------------------------- Kernel.#Array 5.024k i/100ms Array#flatten 3.863k i/100ms Array.wrap 5.217k i/100ms Calculating ------------------------------------- Kernel.#Array 204.954k (±21.6%) i/s - 964.608k in 5.003037s Array#flatten 143.892k (±17.3%) i/s - 695.340k in 5.005342s Array.wrap 195.569k (±18.5%) i/s - 933.843k in 5.003945s Comparison: Kernel.#Array: 204954.3 i/s Array.wrap: 195569.0 i/s - same-ish: difference falls within error Array#flatten: 143892.4 i/s - same-ish: difference falls within error Ruby 2.5.3 [Kernel.#Array] Mem diff: 21.98046875 MB [Array#flatten] Mem diff: 0.93359375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.816k i/100ms Array#flatten 4.235k i/100ms Array.wrap 4.610k i/100ms Calculating ------------------------------------- Kernel.#Array 166.440k (±30.7%) i/s - 722.400k in 5.010362s Array#flatten 136.055k (±23.3%) i/s - 639.485k in 5.009729s Array.wrap 173.409k (±18.6%) i/s - 820.580k in 5.019846s Comparison: Array.wrap: 173408.7 i/s Kernel.#Array: 166439.5 i/s - same-ish: difference falls within error Array#flatten: 136054.8 i/s - same-ish: difference falls within error Ruby 2.4.5 [Kernel.#Array] Mem diff: 17.0859375 MB [Array#flatten] Mem diff: 0.0234375 MB [Array.wrap] Mem diff: 0.0 MB Warming up -------------------------------------- Kernel.#Array 4.746k i/100ms Array#flatten 4.019k i/100ms Array.wrap 4.347k i/100ms Calculating ------------------------------------- Kernel.#Array 195.099k (±17.4%) i/s - 939.708k in 4.994105s Array#flatten 117.881k (±19.3%) i/s - 562.660k in 5.012299s Array.wrap 160.679k (±13.9%) i/s - 782.460k in 5.005515s Comparison: Kernel.#Array: 195099.4 i/s Array.wrap: 160678.6 i/s - same-ish: difference falls within error Array#flatten: 117881.1 i/s - 1.66x slower ```
Background
I was profiling our company's API with Stackprof to track down high memory usage issues that we have somewhere. CanCan::Rule.initialize came in as the top allocator of Objects:
Digging in a bit more with Stackprof, I found the offending lines:
The pattern of doing
[var].flatten
to coerce values to Arrays makesthree Array allocations every time. The first is for the array enclosing
var, the other two are in the C implementation of
flatten
to perform adepth-first traversal over any nested arrays found within the outer-most
array[1]. Rules are allocated a lot during the typical execution of an
application using CanCanCan for authorization. Since the parameters to
Ability#can are expected to not be deeply nested, we can cut out the
extra allocations by using Kernel#Array. This benchmark:
Shows a savings of 17MB of allocations when instantiating 10,000
Rules[2].
This translates to a 2x speed improvement in Rule instantiation time:
Known Issues
If there are users out there doing something like
then this will probably be a breaking change. I don't believe the migration path should be that bad... it should be no more than chaining on your own call to flatten like
can [default_params, :delete].flatten, :things
. You sacrifice the performance, but remain compatible.Miscellany
[1] The first array used by flatten is a stack to push sub-arrays onto
while traversing, the second is for the result array. See https://github.com/ruby/ruby/blob/trunk/array.c#L4385-L4386
[2] Memory figures are RSS, which doesn't consider shared memory, but we
have none for this simple benchmark. GC was disabled during the
benchmark eliminate the effects of unpredictable GC activity. Most of
the intermediary arrays are immediately garbage, but the act of
allocating them increases the work the garbage collector has to do.
Benchmark/ips was used with GC enabled in the second benchmark.
Benchmark script used is available here:
https://gist.github.com/timraymond/8d7014e0c7804f0fe508