Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48

Merged
merged 13 commits into from
Jun 15, 2022

Conversation

dalehamel
Copy link
Member

@dalehamel dalehamel commented May 18, 2022

TL;DR

This implements a basic, not fully compliant HTTP server, which hosts a minimal rack app that responds to a single endpoint, /profile.

This is heavily inspired by golang's builtin pprof server, and aims to provide a similar means to passively profiling arbitrary Ruby applications.

Why

This is implemented to add support for passively collecting profiles from a running application. App profiler already supports profiling individual requests well, but this is intended to profile a whole process, for a specified duration.

In particular, this will be valuable for leveraging stackprof to provide on-CPU and object profiling, which we have no other means to collect easily and passively at the moment.

Implementation

Server

A minimal, non-compliant HTTP server is implemented, which does a blocking read on the server socket in a thread, and will process each request in a thread. It is minimal in order to avoid pulling in any additional dependencies, and is built with just the ruby standard library. It is "Rack app compliant-ish". For the intended use here of supporting just a GET request to a single path with some params, this should be sufficient.

The port that it listens on can be specified, but is intended not to be. It is meant to pick a free port, and bind to that so that each process will have a sort of "debug back door" for profiling.

The port will be logged as well as indicated via the name of a tempfile in a specific location, so that it can be easily discovered. This process is documented in the header comment for the file.

This will allow an external process to discover the port to connect to, and then pull ruby profiles of a specified type and duration for the target process by simply making an HTTP request.

Note that for safety / security reasons, it is must only bind to localhost only, meaning that it cannot be called be external clients. Otherwise, this could easily be a DOS vector, and information leak. Additionally, in production it is recommended (and defaults) to use a Unix socket, which will only work on the localhost.

As it binds on a completely separate socket, this will not be part of any default application router in rails. This also means it will work on any ruby process using app profiler that calls it, including non web server processes like jobs, etc.

The overhead of this server should be essentially nil if it is not called, as it will simply create a socket, and enter a blocking syscall waiting for a request to process. If the application that includes it is a rails app, the railtie will create one instance per process, should the rails app fork, using activesupport's ForkTracker.

Application

The profiling application here is a Rack app. It implements the /profile endpoint, and does various error handling to ensure it is only receiving requests it knows how to handle.

It prevents trying to profile while a profile is already running with a semaphore. The semaphore is acquired and checked at the same time, so that other threads using the same class instance cannot initiate profiling if a profile is already running.

The profile mode and duration of the profiling session can both be specified as parameters. If these parameters are invalid, it should return an error to the client.

To profile, it will simply call AppProfiler.start, then sleep for the specified duration, then end the profile. This should let it observe what the rest of the VM is doing. The results are then serialized to JSON and returned in the rack response.

Usage with Speedscope (locally)

CORS is enabled, so you can call the server directly from speedscope provided that you URL encode the address.

For example, with speedscope running locally on :9292, we can have it directly load a profile from the target app using port 57510. The host before URL encoding is: http://127.0.0.1:57510/profile?duration=1, and this can be specified to #profileURL so that speedscope will initiate the profiling session by entering http://127.0.0.1:9292/#profileURL=http%3A%2F%2F127.0.0.1%3A57510%2Fprofile%3Fduration%3D1 in the URL bar.

Configuration

This can be configured via a railtie. It defaults to disabled for now.

Adding the following to the development/production.rb:

config.app_profiler.server_enabled = true

Results in it being enabled. Readme documents additional config options.

@dalehamel dalehamel force-pushed the profile-server branch 3 times, most recently from 133e41a to 5094e65 Compare May 19, 2022 00:17
lib/app_profiler/server.rb Outdated Show resolved Hide resolved
@dalehamel dalehamel force-pushed the profile-server branch 23 times, most recently from 145ab4a to 7860dea Compare May 21, 2022 02:18
@dalehamel dalehamel changed the title Implement a simple, "HTTP" server to initiate profiling remotely Implement a simple, "HTTP" server and Rack app to initiate profiling remotely May 21, 2022
@dalehamel dalehamel marked this pull request as ready for review June 13, 2022 15:03
@dalehamel dalehamel requested a review from gmcgibbon June 13, 2022 19:54
to localhost only. The HTTP server is built using standard library modules only,
in order to keep dependencies minimal. Because it is an entirely separate server,
listening on an entirely separate socket, this should not interfere with any
existing application routes, and should even be usable in non-web apps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we could think about extracting this feature into a separate gem and then make it a peer dependency of app_profiler if need be. It seems to solve a separate (but related) problem to what app_profiler set out to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered this and flip-flopped on it a fair bit, but at least for now I think this gem is the best spot for it.

This functionality provides a new way to tie into the app profiler and ultimately profile apps, it just doesn't focus on individual requests, and is concerned with profiling at the process level. This can be a good starting point for profiling an app, which would lead to profiling specific requests, so I think the functionality is still pretty complementary.

Since the functionality is mostly in one new file, I don't think it would be too hard to separate in the future, but for now I think a separate gem would be overkill for this one feature, and worry about the overhead it would add it managing dependencies and rolling this out. So for now, I think just putting it behind a configuration flag in the existing gem makes sense.

Any thoughts on what a "tripwire" for deciding to move to a separate gem would be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly concerned with the amount of complexity and documentation we're adding for a feature that's disabled by default. I'm totally in favor of putting it here for now and if there's a need for it to be extracted, we can do it at that point. The solution can mature here for a bit before making a separate gem.

- Passive profile collection in production
- Periodically profiling production apps to analyze them responding to real
workloads
- Providing a statistical, long-term view of the "hot paths" of the workload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how a profile that thread sleeps is able to profile "real workloads" and "hot paths" of the application? I don't really understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Real workloads" refers to that this profiles the actual requests an application is receiving organically, ie, not a specific endpoint that was asked to be profiled, which is somewhat a "synthetic" request.

"Hot paths" indicates what parts of the code are showing up in the most stack traces, ie, regardless of what request is being performed. This might indicate a specific request is slow, or that a shared path used by different related requests is slow, etc.

Does that help to clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think it would help me to see what kind of profile data this produces in comparison to the traditional approach of profiling a request of a Rails app. Since the thread just sleeps, I assumed it would just be a blank profile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of profile data this produces in comparison to the traditional approach of profiling a request of a Rails app

TL;DR, it will provide for aggregated / averaged profiles (see the examples at the end), displayed in a format similar to the "left heavy" view in speedscope. It will show an aggregate view of everything the app is doing during these profiling sessions.

Since the thread just sleeps, I assumed it would just be a blank profile.

The sleeping thread shouldn't show up much, it should predominantly show threads that are actually doing work, ie,
this will profile the other threads in the application, though more generally, it will get the stack trace for "whatever is running at the time". My understanding of stackprof is that:

  • For CPU and Wall modes, an itimer is set (based on the configured "interval") then it will get whatever stack is the "active" one in the ruby VM at the time (stackprof itself works via interrupts / tracepoints, so doesn't have its own "thread", which is why it itself never shows up)
    • For Wall mode, the timer is based on real time elapsing
    • For CPU mode, the timer should only be "charged" when a thread is actually running
  • For Object mode, it will be based on a certain number of allocations (identified by the sample rate) Eg, "1" would be for every object allocation, for "1000", it would be every thousandth allocation, etc.

Note that it is still possible to see "sleep" in the CPU profile, because a sleeping thread in ruby will periodically awaken, then return to a native sleep, and this might coincide with the timer expiring given enough samples.

For Object mode, it is based on tracepoints and not a timer, so we will only see the stacks for other threads which are actually allocating objects (we won't see the profile server, since it is sleeping and therefore not allocating anything).

I think the most interesting data here is the CPU and object allocation, which should tell you:

  • CPU: Where is your application actually spending its CPU time when it gets a chance to run?
  • Object: What code paths in your application are responsible for allocating objects (how many?)

Here are some samples of real data collected (albeit, against a relatively idle test app) with this, displayed in cloud profiler. The samples were collected approximately once a minute, for approximately 10 seconds. These should be the equivalent of the "left heavy" view in speedscope.

CPU graph (basically showing what an idle server uses its CPU time for, not terribly exciting, would be more interesting if we had some requests happening, but this is a good "baseline"):

Screen Shot 2022-06-14 at 2 00 15 PM

Object profile graph:

Screen Shot 2022-06-14 at 2 01 45 PM

The object profile graph shows the count of objects allocated on average by each stack frame, eg, the selected frame in the UI here had ~ 4 objects on average.

Note that both of these graphs are the aggregate of many profiling sessions, each representing the average of ~ 250 sessions. This thus gives us a pretty good average view of what threads are actually doing work in the application, and where allocations are happening. The larger the number of samples, the better the view.

Again, of course, the data would be more interesting for an active app, as we are seeing the CPU graph dominated by threads periodically waking up from their sleep (including our own profile server), and a relatively small number of object allocations.

Does this help to clarify a bit? This is a relatively novel use case for ruby profiling both at Shopify and I think in general, but exists for other languages (eg, golang as I have mentioned), so I believe it will be useful for developers once we expose it, as this sort of data has been used for Go apps for a few years now.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved

@listen_thread = Thread.new do
loop do
Thread.new(@transport.socket.accept) do |session|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will spin up a bunch of threads in the loop. Is this intended, or should the loop be inside this block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should start a new thread for each accepted request. If a profiling request is already running, it will return an error. This is to allow the server to respond while a request is running, rather than hang indefinitely.

The call to accept here is blocking the listen thread, so even though there is a loop, it should only create a new thread if it actually receives a request (see the test app responds with conflict if profile already running for this below).

The majority of the time there should only be one thread, the listen thread, which is running this loop, waiting for a request. If it is profiling, a second thread would be created by this block, which will last the duration of the profiling session. During that session a third (fourth, etc) thread might briefly exist if it receives requests, but they should return a descriptive error that the server is busy, which is the intended behavior rather than they block indefinitely.

lib/app_profiler/server.rb Outdated Show resolved Hide resolved
lib/app_profiler/server.rb Show resolved Hide resolved
initializer "app_profiler.enable_server" do
if AppProfiler.server.enabled
AppProfiler::Server.start!
ActiveSupport::ForkTracker.after_fork do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but do we need to start the server on the main thread as well as forks?

Also we may need a require for the fork tracker if it isn't always loaded by default.

README.md Outdated Show resolved Hide resolved
@dalehamel dalehamel force-pushed the profile-server branch 2 times, most recently from 3348c29 to 7e1d541 Compare June 14, 2022 14:48
Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>
@dalehamel dalehamel force-pushed the profile-server branch 3 times, most recently from 611c0f0 to fd6cc32 Compare June 14, 2022 15:09
@dalehamel
Copy link
Member Author

Thank you for the review @gmcgibbon, I believe I have addressed your first round of feedback with the two additional commits (one for the readme, one for the rest), and the responses to your questions above. Please take another look when you have the chance ❤️

test/app_profiler/profile_server_test.rb Outdated Show resolved Hide resolved
test/app_profiler/profile_server_test.rb Outdated Show resolved Hide resolved
test/app_profiler/profile_server_test.rb Show resolved Hide resolved
dalehamel and others added 2 commits June 14, 2022 15:25
Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>
@dalehamel
Copy link
Member Author

@gmcgibbon I've taken your suggestions to clean up the test suite, thanks for those. I believe everything should be addressed now.

I've also updated the changelog, as I'd like to follow this up with a new release once this merges.

If you'd like to give it one more look before I merge let me know, otherwise let me know if you think this is good to ship (I see you already approved, thanks), and i'll merge and start the release process.

@gmcgibbon
Copy link
Member

:shipit:

@dalehamel dalehamel merged commit 0e2a323 into main Jun 15, 2022
@dalehamel dalehamel deleted the profile-server branch June 15, 2022 13:12
@shopify-shipit shopify-shipit bot temporarily deployed to production June 16, 2022 13:21 Inactive
@@ -37,6 +43,15 @@ class Railtie < Rails::Railtie
end
end

initializer "app_profiler.enable_server" do
if AppProfiler.server.enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this server starts when running tests, running migrations or starting the console? Or only when the Rails servers starts? Right now it starts every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Rails server only. I can open an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be enabled for any long-running process, so rails server or jobs.

What is the most idiomatic way to achieve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants