-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually defining inception namespaces to skip autoload?
call
#308
base: main
Are you sure you want to change the base?
Conversation
In our journey of improving our Rails application bootstrap we have detected that calling the method `Module.autoload?` for each `autoload` registration is very costly. In our case it takes >3s. (see attached flamegraphs) As the comment in the implementation points out, this call is there to support an edge case on how certain gems autoload themselves. [More info here](https://github.com/fxn/zeitwerk/blob/main/lib/zeitwerk/registry.rb#L28) However this approach produces [a call](https://github.com/fxn/zeitwerk/blob/main/lib/zeitwerk/loader.rb#L540) to `Module.autoload?` for every single constant in our huge project. As it can be seen in the attached flamegraph. This is taking for our backend more than 3 seconds just to support few gems with this special case. In this PR we propose to enable a mechanism on which any Rails application can manually define the namespaces to be incepted and avoid the costly call for the rest of the constants. This resulted in a boot reduction of more than 3 seconds in our case. We only had to define 3 namespaces to be incepted in order to make our backend run properly. As a trade-off I understand that this will cause confusion to developers including gems needing the inception. So this PR proposes to keep backwards compatibility if you don't manually set `manual_incepted_namespaces`. TODO: - [ ] Add meaningful tests
Hi! Let me first understand the situation. As you probably know, Zeitwerk is lazy in development mode and only visits the top directories when an application boots. In my machine I can do 8 million (When the argument does not have an autoload set, I can do 44 million calls per second.) |
Thanks for replying :) Not sure how to count the number of top-level constants. But if I count the number of times the method I used this process to count them: In my Rails.autoloaders.main.logger = ->(msg) {
File.open('log/autoloader.log', 'a') { |f| f.write("#{msg}\n") }
}
Rails.autoloaders.once.logger = ->(msg) {
File.open('log/autoloader_once.log', 'a') { |f| f.write("#{msg}\n") }
} Then I counted number of "autoload set" like this: grep 'autoload set' autoloader.log | wc -l
10779
grep 'autoload set' autoloader_once.log | wc -l
5 I assume is not a hardware issue either. I have a decent machine and the performance issue is also in our remote devenvs which are similar to my machine. For reference I have AMD Ryzen 7 5800X 8-Core Processor with 16 cores and 32GB of RAM. All the tests were performed with Ruby 3.3.5. Any clue why |
Thanks. I am a bit skeptical of the premise, because while I have been told Factorial is a big application, Shopify is too, and Gusto is too, they are probably even bigger, and that is not an issue. As I said, above, for the common case in which This is just a heads up, I need to sit down and come up with actual numbers. |
Could you please confirm the following?
|
Spring my be relevant for (2), if present, besides other potential changes. |
TBH I was also skeptical to find such kind of performance issue before bigger companies such the ones you mention did.
We only eager-load in production. Our Rails setup is a multi-engine one (with about 168 heterogeneous size engines). # Needed for Spring
config.enable_reloading = true
# Do not eager load code on boot.
config.eager_load = false
# Use an evented file watcher to asynchronously detect changes in source code,
# routes, locales, etc. This feature depends on the listen gem.
config.file_watcher = ActiveSupport::EventedFileUpdateChecker if ENV.fetch(
'ENABLE_FILE_WATCHER',
'true'
) == 'true'
# Then we have this lines... That I will honestly try to get rid of and only load what is strictly needed
# v v v v v
# Auto/eager load paths
config.autoload_paths += Dir.glob("#{config.root}/lib/generators/*")
autoloader = Rails.autoloaders.main
autoloader.collapse(Dir.glob("#{config.root}/lib/**/public"))
config.autoload_lib(ignore: %w[assets tasks])
Yes, it's the only thing I change between runs and it clearly affects the total time.
I use to use
We were specially looking for monkey patching around I will keep investigating. We applied this forked version of Zeitwerk in our
We disable spring for performance improvement tests, however we have bootsnap enabled. But I always run tests more than once to make sure the cache is computed for the results |
@gtrias awesome! Could you please run these commands with engines loaded two times? One with the patch enabled, and one with the patch disabled:
That would give me an idea of the magnitude of the project and I could try to reproduce in a synthetic setup. |
@gtrias I have added a third command in the comment above. |
Sure! The env var # Manually defining inception namespaces for improved performance
# More information here:
if ENV.fetch('OPTIMIZE_AUTOLOAD', 'true') == 'true'
manual_incepted_namespaces = %w[ActionCable FQL Changelog]
Rails.autoloaders.main.manual_incepted_namespaces = manual_incepted_namespaces
Rails.autoloaders.once.manual_incepted_namespaces = manual_incepted_namespaces
end Here the results of the commands you're asking for # Without the optimization
time OPTIMIZE_AUTOLOAD=false DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.dirs.size'
1300
real 0m30.804s
user 0m22.679s
sys 0m7.934s
time OPTIMIZE_AUTOLOAD=false DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.unloadable_cpaths.size'
3656
real 0m30.794s
user 0m22.647s
sys 0m7.959s
time OPTIMIZE_AUTOLOAD=false DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.__autoloads.size'
7044
real 0m30.960s
user 0m23.143s
sys 0m7.628s
# With the optimization
time OPTIMIZE_AUTOLOAD=true DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.dirs.size'
1300
real 0m25.590s
user 0m20.333s
sys 0m5.065s
time OPTIMIZE_AUTOLOAD=true DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.unloadable_cpaths.size'
3656
real 0m25.353s
user 0m19.723s
sys 0m5.419s
time OPTIMIZE_AUTOLOAD=true DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.__autoloads.size'
7044
real 0m25.452s
user 0m19.933s
sys 0m5.326s |
Awesome, I see the app preloads 3656 constants on boot. Probably irrelevant, just noticing. Thanks, I'll try to reproduce. |
Oh, maybe relevant to your performance work nonetheless, loading 3K files may have a measurable impact perhaps. |
We're currently making sure we don't load anything unless is strictly necessary :) but it's a manual slow task since our whole backend abused of initializers that require a bunch of constants to be present. But obviously most of this definitions should not be needed to boot the application |
I think I have an idea on why calling From my understanding, having very deep nested namespaces such as Do the inception check need to run |
If you look at the patch, it is replacing The flag you founded tells lookup to follow the ancestor chain of the receiver, but since the library has no use case for that, there is a hard-coded |
Ouch, you're right. Then my theory is totally discarded 😞 |
I have measured in a big project, it has 1600 autoload paths Since this is strange, I did it in a super primitive way to know exactly what I am measuring: I def autoload?
value = nil
$seconds_in_autoload += Benchmark.realtime do
value = @mod.autoload?(@cname, false)
end
value
end with In development mode, we get 0.5s, and it preloads 2K constants. If you eager load, this app loads 36K constants, and I will investigate if I can reduce those numbers, but it is worth sharing in the thread that the magnitudes are different. Could you confirm the 3s with this simple technique? |
I have added a call counter: def autoload?
$total_autoload_class += 1
value = nil
$seconds_in_autoload += Benchmark.realtime do
value = @mod.autoload?(@cname, false)
end
value
end and booting in development mode says the method is called 30K times. How does that compare in you project? Since file contents do not matter, once we share these last experiments, I could come up with a self-contained minimal test. |
In our journey of improving our Rails application bootstrap we have detected that calling the method
Module.autoload?
for eachautoload
registration is very costly.In our case it takes >3s. (see attached flamegraphs)
As the comment in the implementation points out, this call is there to support an edge case on how certain gems autoload themselves. More info here However this approach produces a call to
Module.autoload?
for every single constant in our huge project. As it can be seen in the attached flamegraph. This is taking for our backend more than 3 seconds just to support few gems with this special case.In this PR we propose to enable a mechanism on which any Rails application can manually define the namespaces to be incepted and avoid the costly call for the rest of the constants. This resulted in a boot reduction of more than 3 seconds in our case. We only had to define 3 namespaces to be incepted in order to make our backend run properly.
This is the resulting flamegraph:
As a trade-off I understand that this will cause confusion to developers including gems needing the inception. So this PR proposes to keep backwards compatibility if you don't manually set
manual_incepted_namespaces
.TODO: