Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually defining inception namespaces to skip autoload? call #308

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gtrias
Copy link

@gtrias gtrias commented Dec 16, 2024

In our journey of improving our Rails application bootstrap we have detected that calling the method Module.autoload? for each autoload registration is very costly.
In our case it takes >3s. (see attached flamegraphs)

image

As the comment in the implementation points out, this call is there to support an edge case on how certain gems autoload themselves. More info here However this approach produces a call to Module.autoload? for every single constant in our huge project. As it can be seen in the attached flamegraph. This is taking for our backend more than 3 seconds just to support few gems with this special case.

In this PR we propose to enable a mechanism on which any Rails application can manually define the namespaces to be incepted and avoid the costly call for the rest of the constants. This resulted in a boot reduction of more than 3 seconds in our case. We only had to define 3 namespaces to be incepted in order to make our backend run properly.

This is the resulting flamegraph:

image

As a trade-off I understand that this will cause confusion to developers including gems needing the inception. So this PR proposes to keep backwards compatibility if you don't manually set manual_incepted_namespaces.

TODO:

  • Add meaningful tests

In our journey of improving our Rails application bootstrap we have
detected that calling the method `Module.autoload?` for each `autoload`
registration is very costly.
In our case it takes >3s. (see attached flamegraphs)

As the comment in the implementation points out, this call is there to
support an edge case on how certain gems autoload themselves. [More info here](https://github.com/fxn/zeitwerk/blob/main/lib/zeitwerk/registry.rb#L28)
However this approach produces [a
call](https://github.com/fxn/zeitwerk/blob/main/lib/zeitwerk/loader.rb#L540)
to `Module.autoload?` for every single constant in our huge project. As
it can be seen in the attached flamegraph. This is taking for our
backend more than 3 seconds just to support few gems with this special
case.

In this PR we propose to enable a mechanism on which any Rails
application can manually define the namespaces to be incepted and avoid
the costly call for the rest of the constants. This resulted in a boot
reduction of more than 3 seconds in our case. We only had to define 3
namespaces to be incepted in order to make our backend run properly.

As a trade-off I understand that this will cause confusion to developers
including gems needing the inception. So this PR proposes to keep
backwards compatibility if you don't manually set `manual_incepted_namespaces`.

TODO:
- [ ] Add meaningful tests
lib/zeitwerk/loader.rb Outdated Show resolved Hide resolved
@gtrias gtrias mentioned this pull request Dec 16, 2024
@fxn
Copy link
Owner

fxn commented Dec 16, 2024

Hi!

Let me first understand the situation. As you probably know, Zeitwerk is lazy in development mode and only visits the top directories when an application boots.

In my machine I can do 8 million autoload? calls per second when the argument has an autoload set, do you have 24 million top-level constants?

(When the argument does not have an autoload set, I can do 44 million calls per second.)

@gtrias
Copy link
Author

gtrias commented Dec 16, 2024

Hi!

Let me first understand the situation. As you probably know, Zeitwerk is lazy in development mode and only visits the top directories when an application boots.

In my machine I can do 8 million autoload? calls per second when the argument has an autoload set, do you have 24 million top-level constants?

(When the argument does not have an autoload set, I can do 44 million calls per second.)

Thanks for replying :)

Not sure how to count the number of top-level constants. But if I count the number of times the method loader#define_autoload is called in our bootstrap process it gives me 10.779 which is not a lot considering the numbers your machine is able to run autoload? in 1 second...

I used this process to count them:

In my config/application.rb I set a custom logger for zeitwerk loader like this:

    Rails.autoloaders.main.logger = ->(msg) {
      File.open('log/autoloader.log', 'a') { |f| f.write("#{msg}\n") }
    }
    Rails.autoloaders.once.logger = ->(msg) {
      File.open('log/autoloader_once.log', 'a') { |f| f.write("#{msg}\n") }
    }

Then I counted number of "autoload set" like this:

grep 'autoload set' autoloader.log | wc -l
10779

grep 'autoload set' autoloader_once.log | wc -l
5

I assume is not a hardware issue either. I have a decent machine and the performance issue is also in our remote devenvs which are similar to my machine.

For reference I have AMD Ryzen 7 5800X 8-Core Processor with 16 cores and 32GB of RAM.

All the tests were performed with Ruby 3.3.5.

Any clue why autoload? might be so slow for us? It's quite a lot to spend >3 seconds with this process

@jacobobq
Copy link

Here's my own trace on an Apple M2 Pro, 16GB of RAM.

Speedscope trace

@fxn
Copy link
Owner

fxn commented Dec 17, 2024

Thanks.

I am a bit skeptical of the premise, because while I have been told Factorial is a big application, Shopify is too, and Gusto is too, they are probably even bigger, and that is not an issue.

As I said, above, for the common case in which autoload? returns false, I can do 44 million calls in a second.

This is just a heads up, I need to sit down and come up with actual numbers.

@fxn
Copy link
Owner

fxn commented Dec 17, 2024

Could you please confirm the following?

  1. When you see 3s saved, is the application eager loading? "Booting" may or may not eager load depending on the environment, you know.
  2. Those saved 3s are absolutely 100% only due to this patch?
  3. Which command do you use for booting?
  4. If you boot the application and inspect the source location of Module#autoload?, does it show anything interesting?

@fxn
Copy link
Owner

fxn commented Dec 17, 2024

Spring my be relevant for (2), if present, besides other potential changes.

@gtrias
Copy link
Author

gtrias commented Dec 18, 2024

I am a bit skeptical of the premise, because while I have been told Factorial is a big application, Shopify is too, and Gusto > is too, they are probably even bigger, and that is not an issue.

TBH I was also skeptical to find such kind of performance issue before bigger companies such the ones you mention did.
Maybe it's something only affecting us because a weird configuration or contextual ad-hoc thingie. But at this point I'm unable to see what could be causing this behavior. Specially because our process of improving the boot performance consists in disabling almost everything and re-enable one by one checking how it affects the perf and trying to optimize it as much as possible.
In this specific case we had our rails app loading in 1,5 secs aprox without loading almost anything. And then we enabled engines loading and we got to ~8 seconds. At this point was when we detected this Module#autoload? issue.

Could you please confirm the following?

1. When you see 3s saved, is the application eager loading? "Booting" may or may not eager load depending on the environment, you know.

We only eager-load in production. Our Rails setup is a multi-engine one (with about 168 heterogeneous size engines).
In development environment, the one we're optimizing, we have the following relevant parameters:

  # Needed for Spring
  config.enable_reloading = true

  # Do not eager load code on boot.
  config.eager_load = false

  # Use an evented file watcher to asynchronously detect changes in source code,
  # routes, locales, etc. This feature depends on the listen gem.
  config.file_watcher = ActiveSupport::EventedFileUpdateChecker if ENV.fetch(
    'ENABLE_FILE_WATCHER',
    'true'
  ) == 'true'

  # Then we have this lines... That I will honestly try to get rid of and only load what is strictly needed
  # v v v v v
  # Auto/eager load paths
  config.autoload_paths += Dir.glob("#{config.root}/lib/generators/*")

  autoloader = Rails.autoloaders.main
  autoloader.collapse(Dir.glob("#{config.root}/lib/**/public"))
  config.autoload_lib(ignore: %w[assets tasks])
2. Those saved 3s are absolutely 100% only due to this patch?

Yes, it's the only thing I change between runs and it clearly affects the total time.
Also the flamegraph confirms that more than 3 seconds was spent by Module#autoload? calls

3. Which command do you use for booting?

I use to use rails runner to test the bootstrap

time STACKPROF=true DISABLE_SPRING=true bin/rails runner "puts 'foo'"
....
foo

real    0m26.982s
user    0m21.116s
sys     0m5.676s

# This disables the optimization and fallbacks to the default behavior
time OPTIMIZE_AUTOLOAD=false STACKPROF=true DISABLE_SPRING=true bin/rails runner "puts 'foo'"
....
foo

real    0m32.443s
user    0m23.911s
sys     0m8.390s
4. If you boot the application and inspect the source location of `Module#autoload?`, does it show anything interesting?

We were specially looking for monkey patching around Module#autoload? but we were not able to find anything relevant. Flamegraph traces don't show anything special, the slow part seems to be happening in Ruby internal C implementation of Module#autoload?

I will keep investigating. We applied this forked version of Zeitwerk in our main branch so it's already in production and all devenvs doing well so far.
I'm also curious to understand why this whole thing is such slow for us.

Spring my be relevant for (2), if present, besides other potential changes.

We disable spring for performance improvement tests, however we have bootsnap enabled. But I always run tests more than once to make sure the cache is computed for the results

@gtrias
Copy link
Author

gtrias commented Dec 18, 2024

BTW, to complement this information. I've just confirmed with the infra team that the change also improved our production Puma boot times by ~5seconds

image

@fxn
Copy link
Owner

fxn commented Dec 18, 2024

@gtrias awesome!

Could you please run these commands with engines loaded two times? One with the patch enabled, and one with the patch disabled:

bin/rails runner 'p Rails.autoloaders.main.dirs.size'
bin/rails runner 'p Rails.autoloaders.main.unloadable_cpaths.size'
bin/rails runner 'p Rails.autoloaders.main.__autoloads.size' # private API

That would give me an idea of the magnitude of the project and I could try to reproduce in a synthetic setup.

@fxn
Copy link
Owner

fxn commented Dec 18, 2024

@gtrias I have added a third command in the comment above.

@gtrias
Copy link
Author

gtrias commented Dec 18, 2024

Sure! The env var OPTIMIZE_AUTOLOAD enables/disables the Module#autoload? optimization

    # Manually defining inception namespaces for improved performance
    # More information here:
    if ENV.fetch('OPTIMIZE_AUTOLOAD', 'true') == 'true'
      manual_incepted_namespaces = %w[ActionCable FQL Changelog]
      Rails.autoloaders.main.manual_incepted_namespaces = manual_incepted_namespaces
      Rails.autoloaders.once.manual_incepted_namespaces = manual_incepted_namespaces
    end

Here the results of the commands you're asking for

# Without the optimization

time OPTIMIZE_AUTOLOAD=false DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.dirs.size'

1300

real    0m30.804s
user    0m22.679s
sys     0m7.934s

time OPTIMIZE_AUTOLOAD=false DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.unloadable_cpaths.size'

3656

real    0m30.794s
user    0m22.647s
sys     0m7.959s

time OPTIMIZE_AUTOLOAD=false DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.__autoloads.size'

7044

real    0m30.960s
user    0m23.143s
sys     0m7.628s

# With the optimization

time OPTIMIZE_AUTOLOAD=true DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.dirs.size'

1300

real    0m25.590s
user    0m20.333s
sys     0m5.065s

time OPTIMIZE_AUTOLOAD=true DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.unloadable_cpaths.size'

3656

real    0m25.353s
user    0m19.723s
sys     0m5.419s

time OPTIMIZE_AUTOLOAD=true DISABLE_SPRING=true bin/rails runner 'p Rails.autoloaders.main.__autoloads.size'

7044

real    0m25.452s
user    0m19.933s
sys     0m5.326s

@fxn
Copy link
Owner

fxn commented Dec 18, 2024

Awesome, I see the app preloads 3656 constants on boot. Probably irrelevant, just noticing.

Thanks, I'll try to reproduce.

@fxn
Copy link
Owner

fxn commented Dec 18, 2024

Oh, maybe relevant to your performance work nonetheless, loading 3K files may have a measurable impact perhaps.

@gtrias
Copy link
Author

gtrias commented Dec 18, 2024

Oh, maybe relevant to your performance work nonetheless, loading 3K files may have a measurable impact perhaps.

We're currently making sure we don't load anything unless is strictly necessary :) but it's a manual slow task since our whole backend abused of initializers that require a bunch of constants to be present. But obviously most of this definitions should not be needed to boot the application

@gtrias
Copy link
Author

gtrias commented Dec 18, 2024

I think I have an idea on why calling Module#autoload? might have so poor performance for our project.
Looking at Ruby source code it seems that the internal implementation of Module#autoload? there's an optional recur attribute defaulting to true that makes this autoload? check recursive by the nesting of the constant to be checked.

From my understanding, having very deep nested namespaces such as A::B::C::D::E::F etc could have a negative impact on running this check. And it wouldn't surprise me that we are specially affected since we're kinda addicted of nested namespacing 😅

Do the inception check need to run autoload? in recursive mode? I will try to test if it works by disabling recursive to check if that makes the trick as well.

@fxn
Copy link
Owner

fxn commented Dec 18, 2024

If you look at the patch, it is replacing cref.autoload?. The variable cref stores an internal Zeitwerk::Cref object whose API abstracts common operations with constants performed by the library.

The flag you founded tells lookup to follow the ancestor chain of the receiver, but since the library has no use case for that, there is a hard-coded false.

@gtrias
Copy link
Author

gtrias commented Dec 18, 2024

If you look at the patch, it is replacing cref.autoload?. The variable cref stores an internal Zeitwerk::Cref object whose API abstracts common operations with constants performed by the library.

The flag you founded tells lookup to follow the ancestor chain of the receiver, but since the library has no use case for that, there is a hard-coded false.

Ouch, you're right. Then my theory is totally discarded 😞

@fxn
Copy link
Owner

fxn commented Dec 19, 2024

I have measured in a big project, it has 1600 autoload paths

Since this is strange, I did it in a super primitive way to know exactly what I am measuring: I bundle opened zeitwerk and rewrote Zeitwerk::Cref#autoload? like this

def autoload?
  value = nil
  $seconds_in_autoload += Benchmark.realtime do
    value = @mod.autoload?(@cname, false)
  end
  value
end

with $seconds_in_autoload initialized to 0 in config/boot.rb.

In development mode, we get 0.5s, and it preloads 2K constants.

If you eager load, this app loads 36K constants, and autoload? takes 1.6s.

I will investigate if I can reduce those numbers, but it is worth sharing in the thread that the magnitudes are different.

Could you confirm the 3s with this simple technique?

@fxn
Copy link
Owner

fxn commented Dec 20, 2024

I have added a call counter:

def autoload?
  $total_autoload_class += 1
  value = nil
  $seconds_in_autoload += Benchmark.realtime do
    value = @mod.autoload?(@cname, false)
  end
  value
end

and booting in development mode says the method is called 30K times.

How does that compare in you project?

Since file contents do not matter, once we share these last experiments, I could come up with a self-contained minimal test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants