-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix quadratic performance in FileTask#out_of_date? #224
Conversation
Thanks. Can you show the benchmark results? |
I don't have them, it was reported by one of my coworker. On our system, it went from our rake-based library doing 100% CPU for minutes to 2-3% CPU (the rest is taken by subcommands). It's so dramatic that I didn't see the need to isolate a benchmark. If having a benchmark is a prerequisite, I'll try to take the time to do it. |
I'm original writer of the However I agreed that benchmark is important for this. In case you didn't know yet, RDoc uses |
Fair enough ... how should I provide this benchmark ? Rake doesn't seem to have a benchmark running facility as @aycabta mentions for RDoc. Just in this PR ? |
FileTask#out_of_date? has been changed in 462e403 to call #needed? which calls #out_of_date? recursively. In some cases, where the graph of dependencies is fairly dense, this leads to quadratic performance when it was linear before. Use #all_prerequisite_tasks to avoid the problem. This also saves a File.exist? test as #timestamp already takes it into account.fix quadratic performance in FileTask#out_of_date? FileTask#out_of_date? has been changed in 462e403 to call #needed? which calls #out_of_date? recursively. In some cases, where the graph of dependencies is fairly dense, this leads to quadratic performance when it was linear before. Use #all_prerequisite_tasks to avoid the problem. This also saves a File.exist? test as #timestamp already takes it into account. I made a benchmark that measures the difference in duration for out_of_date? on 12.0, 12.1 and 12.1 with this commit applied. The benchmark used to validate the performance creates 5 layers of FileTask, where all tasks of the parent layer are connected to all tasks of the child. A root task is added at the top and #out_of_date? is called on it. The benchmark varies the number of tasks per layer. **12.0** | Tasks per layers | Duration (s) | Standard deviation | |------------------|--------------|--------------------| | 1 | 1.32e-05 | 0.96e-05 | | 2 | 8.69e-05 | 1.11e-06 | | 5 | 1.84e-05 | 2.43e-06 | | 8 | 2.89e-05 | 1.05e-05 | | 10 | 3.35e-05 | 4.12e-06 | | 15 | 4.97e-05 | 6.74e-06 | | 20 | 6.19e-05 | 6.23e-06 | **12.1** | Tasks per layers | Duration (s) | Standard deviation | |------------------|--------------|--------------------| | 1 | 7.00e-05 | 5.62e-05 | | 2 | 3.98e-04 | 7.38e-05 | | 5 | 2.32e-02 | 1.02e-03 | | 8 | 0.22 | 0.006 | | 10 | 0.65 | 0.006 | | 15 | 4.78 | 0.048 | | 20 | 20 | 0.49 | **PR 224** | Tasks per layers | Duration (s) | Standard deviation | |------------------|--------------|--------------------| | 1 | 4.47e-05 | 2.68e-05 | | 2 | 7.56e-05 | 2.92e-05 | | 5 | 2.42e-03 | 4.16e-05 | | 8 | 0.51e-03 | 7.21e-05 | | 10 | 0.77e-03 | 0.13e-03 | | 15 | 14.2e-03 | 0.11e-03 | | 20 | 24.2e-03 | 0.16e-03 | Benchmarking code: ~~~ ruby require 'rake' LAYER_MAX_SIZE = 20 LAYER_COUNT = 5 def measure(size) app = Rake::Application.new layers = (0...LAYER_COUNT).map do |layer_i| (0...size).map do |i| app.define_task(Rake::FileTask, "#{layer_i}_#{i}") end end layers.each_cons(2) do |parent, child| child_names = child.map(&:name) parent.each { |t| t.enhance(child_names) } end root = app.define_task(Rake::FileTask, "root") root.enhance(layers[0].map(&:name)) tic = Time.now root.send(:out_of_date?, tic) Time.now - tic end FileUtils.touch "root" sleep 0.1 LAYER_COUNT.times do |layer_i| LAYER_MAX_SIZE.times do |i| FileUtils.touch "#{LAYER_COUNT - layer_i - 1}_#{i}" end sleep 0.1 end COUNT = 100 [1, 2, 5, 8, 10, 15, 20].each do |size| mean = 0 sum_squared_deviations = 0 COUNT.times do |count| duration = measure(size) old_mean = mean mean = old_mean + (duration - old_mean) / (count + 1) sum_squared_deviations = sum_squared_deviations + (duration - old_mean) * (duration - mean) end puts "#{size} #{mean} #{Math.sqrt(sum_squared_deviations / (COUNT - 1))}" end ~~~
I've added the benchmark code and the results to the commit message |
OK, RDoc uses the benchmark test at On the other hand, if you want to just take benchmark result as distinct from test, I think that benchmark-ips is best choice for it. It's standard for presentation of Rails performance with patch. |
Oh, I was writing my comment but later than you. |
Ping |
I heard details of this issue from @aycabta on Asakusa.rb meetup. I'm ok to merge this. |
FileTask#out_of_date? has been changed in 462e403 to call #needed?
which calls #out_of_date? recursively. In some cases, where the
graph of dependencies is fairly dense, this leads to quadratic
performance when it was linear before.
Use #all_prerequisite_tasks to avoid the problem. This also saves
a File.exist? test as #timestamp already takes it into account.