-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48
Conversation
133e41a
to
5094e65
Compare
145ab4a
to
7860dea
Compare
271ae4d
to
3f7ecfb
Compare
Co-authored-by: Dale Hamel <dale.hamel@shopify.com>
b932387
to
1408445
Compare
1408445
to
08b4299
Compare
to localhost only. The HTTP server is built using standard library modules only, | ||
in order to keep dependencies minimal. Because it is an entirely separate server, | ||
listening on an entirely separate socket, this should not interfere with any | ||
existing application routes, and should even be usable in non-web apps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we could think about extracting this feature into a separate gem and then make it a peer dependency of app_profiler if need be. It seems to solve a separate (but related) problem to what app_profiler set out to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered this and flip-flopped on it a fair bit, but at least for now I think this gem is the best spot for it.
This functionality provides a new way to tie into the app profiler and ultimately profile apps, it just doesn't focus on individual requests, and is concerned with profiling at the process level. This can be a good starting point for profiling an app, which would lead to profiling specific requests, so I think the functionality is still pretty complementary.
Since the functionality is mostly in one new file, I don't think it would be too hard to separate in the future, but for now I think a separate gem would be overkill for this one feature, and worry about the overhead it would add it managing dependencies and rolling this out. So for now, I think just putting it behind a configuration flag in the existing gem makes sense.
Any thoughts on what a "tripwire" for deciding to move to a separate gem would be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly concerned with the amount of complexity and documentation we're adding for a feature that's disabled by default. I'm totally in favor of putting it here for now and if there's a need for it to be extracted, we can do it at that point. The solution can mature here for a bit before making a separate gem.
- Passive profile collection in production | ||
- Periodically profiling production apps to analyze them responding to real | ||
workloads | ||
- Providing a statistical, long-term view of the "hot paths" of the workload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how a profile that thread sleeps is able to profile "real workloads" and "hot paths" of the application? I don't really understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Real workloads" refers to that this profiles the actual requests an application is receiving organically, ie, not a specific endpoint that was asked to be profiled, which is somewhat a "synthetic" request.
"Hot paths" indicates what parts of the code are showing up in the most stack traces, ie, regardless of what request is being performed. This might indicate a specific request is slow, or that a shared path used by different related requests is slow, etc.
Does that help to clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I think it would help me to see what kind of profile data this produces in comparison to the traditional approach of profiling a request of a Rails app. Since the thread just sleeps, I assumed it would just be a blank profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of profile data this produces in comparison to the traditional approach of profiling a request of a Rails app
TL;DR, it will provide for aggregated / averaged profiles (see the examples at the end), displayed in a format similar to the "left heavy" view in speedscope. It will show an aggregate view of everything the app is doing during these profiling sessions.
Since the thread just sleeps, I assumed it would just be a blank profile.
The sleeping thread shouldn't show up much, it should predominantly show threads that are actually doing work, ie,
this will profile the other threads in the application, though more generally, it will get the stack trace for "whatever is running at the time". My understanding of stackprof is that:
- For CPU and Wall modes, an
itimer
is set (based on the configured "interval") then it will get whatever stack is the "active" one in the ruby VM at the time (stackprof itself works via interrupts / tracepoints, so doesn't have its own "thread", which is why it itself never shows up)- For Wall mode, the timer is based on real time elapsing
- For CPU mode, the timer should only be "charged" when a thread is actually running
- For Object mode, it will be based on a certain number of allocations (identified by the sample rate) Eg, "1" would be for every object allocation, for "1000", it would be every thousandth allocation, etc.
Note that it is still possible to see "sleep" in the CPU profile, because a sleeping thread in ruby will periodically awaken, then return to a native sleep, and this might coincide with the timer expiring given enough samples.
For Object mode, it is based on tracepoints and not a timer, so we will only see the stacks for other threads which are actually allocating objects (we won't see the profile server, since it is sleeping and therefore not allocating anything).
I think the most interesting data here is the CPU and object allocation, which should tell you:
- CPU: Where is your application actually spending its CPU time when it gets a chance to run?
- Object: What code paths in your application are responsible for allocating objects (how many?)
Here are some samples of real data collected (albeit, against a relatively idle test app) with this, displayed in cloud profiler. The samples were collected approximately once a minute, for approximately 10 seconds. These should be the equivalent of the "left heavy" view in speedscope.
CPU graph (basically showing what an idle server uses its CPU time for, not terribly exciting, would be more interesting if we had some requests happening, but this is a good "baseline"):
Object profile graph:
The object profile graph shows the count of objects allocated on average by each stack frame, eg, the selected frame in the UI here had ~ 4 objects on average.
Note that both of these graphs are the aggregate of many profiling sessions, each representing the average of ~ 250 sessions. This thus gives us a pretty good average view of what threads are actually doing work in the application, and where allocations are happening. The larger the number of samples, the better the view.
Again, of course, the data would be more interesting for an active app, as we are seeing the CPU graph dominated by threads periodically waking up from their sleep (including our own profile server), and a relatively small number of object allocations.
Does this help to clarify a bit? This is a relatively novel use case for ruby profiling both at Shopify and I think in general, but exists for other languages (eg, golang as I have mentioned), so I believe it will be useful for developers once we expose it, as this sort of data has been used for Go apps for a few years now.
|
||
@listen_thread = Thread.new do | ||
loop do | ||
Thread.new(@transport.socket.accept) do |session| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will spin up a bunch of threads in the loop. Is this intended, or should the loop be inside this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should start a new thread for each accepted request. If a profiling request is already running, it will return an error. This is to allow the server to respond while a request is running, rather than hang indefinitely.
The call to accept
here is blocking the listen thread, so even though there is a loop, it should only create a new thread if it actually receives a request (see the test app responds with conflict if profile already running
for this below).
The majority of the time there should only be one thread, the listen thread, which is running this loop, waiting for a request. If it is profiling, a second thread would be created by this block, which will last the duration of the profiling session. During that session a third (fourth, etc) thread might briefly exist if it receives requests, but they should return a descriptive error that the server is busy, which is the intended behavior rather than they block indefinitely.
initializer "app_profiler.enable_server" do | ||
if AppProfiler.server.enabled | ||
AppProfiler::Server.start! | ||
ActiveSupport::ForkTracker.after_fork do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but do we need to start the server on the main thread as well as forks?
Also we may need a require for the fork tracker if it isn't always loaded by default.
3348c29
to
7e1d541
Compare
Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>
611c0f0
to
fd6cc32
Compare
Thank you for the review @gmcgibbon, I believe I have addressed your first round of feedback with the two additional commits (one for the readme, one for the rest), and the responses to your questions above. Please take another look when you have the chance ❤️ |
fd6cc32
to
7305cca
Compare
Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>
49773fc
to
b5f8bc8
Compare
@gmcgibbon I've taken your suggestions to clean up the test suite, thanks for those. I believe everything should be addressed now. I've also updated the changelog, as I'd like to follow this up with a new release once this merges. If you'd like to give it one more look before I merge let me know, otherwise let me know if you think this is good to ship (I see you already approved, thanks), and i'll merge and start the release process. |
@@ -37,6 +43,15 @@ class Railtie < Rails::Railtie | |||
end | |||
end | |||
|
|||
initializer "app_profiler.enable_server" do | |||
if AppProfiler.server.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this server starts when running tests, running migrations or starting the console? Or only when the Rails servers starts? Right now it starts every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Rails server only. I can open an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be enabled for any long-running process, so rails server or jobs.
What is the most idiomatic way to achieve this?
TL;DR
This implements a basic, not fully compliant HTTP server, which hosts a minimal rack app that responds to a single endpoint,
/profile
.This is heavily inspired by golang's builtin pprof server, and aims to provide a similar means to passively profiling arbitrary Ruby applications.
Why
This is implemented to add support for passively collecting profiles from a running application. App profiler already supports profiling individual requests well, but this is intended to profile a whole process, for a specified duration.
In particular, this will be valuable for leveraging stackprof to provide on-CPU and object profiling, which we have no other means to collect easily and passively at the moment.
Implementation
Server
A minimal, non-compliant HTTP server is implemented, which does a blocking read on the server socket in a thread, and will process each request in a thread. It is minimal in order to avoid pulling in any additional dependencies, and is built with just the ruby standard library. It is "Rack app compliant-ish". For the intended use here of supporting just a GET request to a single path with some params, this should be sufficient.
The port that it listens on can be specified, but is intended not to be. It is meant to pick a free port, and bind to that so that each process will have a sort of "debug back door" for profiling.
The port will be logged as well as indicated via the name of a tempfile in a specific location, so that it can be easily discovered. This process is documented in the header comment for the file.
This will allow an external process to discover the port to connect to, and then pull ruby profiles of a specified type and duration for the target process by simply making an HTTP request.
Note that for safety / security reasons, it is must only bind to localhost only, meaning that it cannot be called be external clients. Otherwise, this could easily be a DOS vector, and information leak. Additionally, in production it is recommended (and defaults) to use a Unix socket, which will only work on the localhost.
As it binds on a completely separate socket, this will not be part of any default application router in rails. This also means it will work on any ruby process using app profiler that calls it, including non web server processes like jobs, etc.
The overhead of this server should be essentially nil if it is not called, as it will simply create a socket, and enter a blocking syscall waiting for a request to process. If the application that includes it is a rails app, the railtie will create one instance per process, should the rails app fork, using activesupport's
ForkTracker
.Application
The profiling application here is a Rack app. It implements the
/profile
endpoint, and does various error handling to ensure it is only receiving requests it knows how to handle.It prevents trying to profile while a profile is already running with a semaphore. The semaphore is acquired and checked at the same time, so that other threads using the same class instance cannot initiate profiling if a profile is already running.
The profile mode and duration of the profiling session can both be specified as parameters. If these parameters are invalid, it should return an error to the client.
To profile, it will simply call
AppProfiler.start
, then sleep for the specified duration, then end the profile. This should let it observe what the rest of the VM is doing. The results are then serialized to JSON and returned in the rack response.Usage with Speedscope (locally)
CORS is enabled, so you can call the server directly from speedscope provided that you URL encode the address.
For example, with speedscope running locally on :9292, we can have it directly load a profile from the target app using port 57510. The host before URL encoding is:
http://127.0.0.1:57510/profile?duration=1
, and this can be specified to #profileURL so that speedscope will initiate the profiling session by enteringhttp://127.0.0.1:9292/#profileURL=http%3A%2F%2F127.0.0.1%3A57510%2Fprofile%3Fduration%3D1
in the URL bar.Configuration
This can be configured via a railtie. It defaults to disabled for now.
Adding the following to the development/production.rb:
Results in it being enabled. Readme documents additional config options.