Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48

dalehamel · 2022-05-18T20:05:14Z

TL;DR

This implements a basic, not fully compliant HTTP server, which hosts a minimal rack app that responds to a single endpoint, /profile.

This is heavily inspired by golang's builtin pprof server, and aims to provide a similar means to passively profiling arbitrary Ruby applications.

Why

This is implemented to add support for passively collecting profiles from a running application. App profiler already supports profiling individual requests well, but this is intended to profile a whole process, for a specified duration.

In particular, this will be valuable for leveraging stackprof to provide on-CPU and object profiling, which we have no other means to collect easily and passively at the moment.

Implementation

Server

A minimal, non-compliant HTTP server is implemented, which does a blocking read on the server socket in a thread, and will process each request in a thread. It is minimal in order to avoid pulling in any additional dependencies, and is built with just the ruby standard library. It is "Rack app compliant-ish". For the intended use here of supporting just a GET request to a single path with some params, this should be sufficient.

The port that it listens on can be specified, but is intended not to be. It is meant to pick a free port, and bind to that so that each process will have a sort of "debug back door" for profiling.

The port will be logged as well as indicated via the name of a tempfile in a specific location, so that it can be easily discovered. This process is documented in the header comment for the file.

This will allow an external process to discover the port to connect to, and then pull ruby profiles of a specified type and duration for the target process by simply making an HTTP request.

Note that for safety / security reasons, it is must only bind to localhost only, meaning that it cannot be called be external clients. Otherwise, this could easily be a DOS vector, and information leak. Additionally, in production it is recommended (and defaults) to use a Unix socket, which will only work on the localhost.

As it binds on a completely separate socket, this will not be part of any default application router in rails. This also means it will work on any ruby process using app profiler that calls it, including non web server processes like jobs, etc.

The overhead of this server should be essentially nil if it is not called, as it will simply create a socket, and enter a blocking syscall waiting for a request to process. If the application that includes it is a rails app, the railtie will create one instance per process, should the rails app fork, using activesupport's ForkTracker.

Application

The profiling application here is a Rack app. It implements the /profile endpoint, and does various error handling to ensure it is only receiving requests it knows how to handle.

It prevents trying to profile while a profile is already running with a semaphore. The semaphore is acquired and checked at the same time, so that other threads using the same class instance cannot initiate profiling if a profile is already running.

The profile mode and duration of the profiling session can both be specified as parameters. If these parameters are invalid, it should return an error to the client.

To profile, it will simply call AppProfiler.start, then sleep for the specified duration, then end the profile. This should let it observe what the rest of the VM is doing. The results are then serialized to JSON and returned in the rack response.

Usage with Speedscope (locally)

CORS is enabled, so you can call the server directly from speedscope provided that you URL encode the address.

For example, with speedscope running locally on :9292, we can have it directly load a profile from the target app using port 57510. The host before URL encoding is: http://127.0.0.1:57510/profile?duration=1, and this can be specified to #profileURL so that speedscope will initiate the profiling session by entering http://127.0.0.1:9292/#profileURL=http%3A%2F%2F127.0.0.1%3A57510%2Fprofile%3Fduration%3D1 in the URL bar.

Configuration

This can be configured via a railtie. It defaults to disabled for now.

Adding the following to the development/production.rb:

config.app_profiler.server_enabled = true

Results in it being enabled. Readme documents additional config options.

lib/app_profiler/server.rb

Co-authored-by: Dale Hamel <dale.hamel@shopify.com>

…iltie

gmcgibbon · 2022-06-13T17:05:45Z

README.md

+to localhost only. The HTTP server is built using standard library modules only,
+in order to keep dependencies minimal. Because it is an entirely separate server,
+listening on an entirely separate socket, this should not interfere with any
+existing application routes, and should even be usable in non-web apps.


In the future, we could think about extracting this feature into a separate gem and then make it a peer dependency of app_profiler if need be. It seems to solve a separate (but related) problem to what app_profiler set out to do.

I have considered this and flip-flopped on it a fair bit, but at least for now I think this gem is the best spot for it.

This functionality provides a new way to tie into the app profiler and ultimately profile apps, it just doesn't focus on individual requests, and is concerned with profiling at the process level. This can be a good starting point for profiling an app, which would lead to profiling specific requests, so I think the functionality is still pretty complementary.

Since the functionality is mostly in one new file, I don't think it would be too hard to separate in the future, but for now I think a separate gem would be overkill for this one feature, and worry about the overhead it would add it managing dependencies and rolling this out. So for now, I think just putting it behind a configuration flag in the existing gem makes sense.

Any thoughts on what a "tripwire" for deciding to move to a separate gem would be?

I'm mainly concerned with the amount of complexity and documentation we're adding for a feature that's disabled by default. I'm totally in favor of putting it here for now and if there's a need for it to be extracted, we can do it at that point. The solution can mature here for a bit before making a separate gem.

gmcgibbon · 2022-06-13T17:08:20Z

README.md

+- Passive profile collection in production
+  - Periodically profiling production apps to analyze them responding to real
+workloads
+  - Providing a statistical, long-term view of the "hot paths" of the workload


Can you explain how a profile that thread sleeps is able to profile "real workloads" and "hot paths" of the application? I don't really understand.

"Real workloads" refers to that this profiles the actual requests an application is receiving organically, ie, not a specific endpoint that was asked to be profiled, which is somewhat a "synthetic" request.

"Hot paths" indicates what parts of the code are showing up in the most stack traces, ie, regardless of what request is being performed. This might indicate a specific request is slow, or that a shared path used by different related requests is slow, etc.

Does that help to clarify?

Yes, but I think it would help me to see what kind of profile data this produces in comparison to the traditional approach of profiling a request of a Rails app. Since the thread just sleeps, I assumed it would just be a blank profile.

what kind of profile data this produces in comparison to the traditional approach of profiling a request of a Rails app

TL;DR, it will provide for aggregated / averaged profiles (see the examples at the end), displayed in a format similar to the "left heavy" view in speedscope. It will show an aggregate view of everything the app is doing during these profiling sessions.

Since the thread just sleeps, I assumed it would just be a blank profile.

The sleeping thread shouldn't show up much, it should predominantly show threads that are actually doing work, ie,
this will profile the other threads in the application, though more generally, it will get the stack trace for "whatever is running at the time". My understanding of stackprof is that:

For CPU and Wall modes, an itimer is set (based on the configured "interval") then it will get whatever stack is the "active" one in the ruby VM at the time (stackprof itself works via interrupts / tracepoints, so doesn't have its own "thread", which is why it itself never shows up)

For Wall mode, the timer is based on real time elapsing

For CPU mode, the timer should only be "charged" when a thread is actually running

For Object mode, it will be based on a certain number of allocations (identified by the sample rate) Eg, "1" would be for every object allocation, for "1000", it would be every thousandth allocation, etc.

Note that it is still possible to see "sleep" in the CPU profile, because a sleeping thread in ruby will periodically awaken, then return to a native sleep, and this might coincide with the timer expiring given enough samples.

For Object mode, it is based on tracepoints and not a timer, so we will only see the stacks for other threads which are actually allocating objects (we won't see the profile server, since it is sleeping and therefore not allocating anything).

I think the most interesting data here is the CPU and object allocation, which should tell you:

CPU: Where is your application actually spending its CPU time when it gets a chance to run?

Object: What code paths in your application are responsible for allocating objects (how many?)

Here are some samples of real data collected (albeit, against a relatively idle test app) with this, displayed in cloud profiler. The samples were collected approximately once a minute, for approximately 10 seconds. These should be the equivalent of the "left heavy" view in speedscope.

CPU graph (basically showing what an idle server uses its CPU time for, not terribly exciting, would be more interesting if we had some requests happening, but this is a good "baseline"):

Object profile graph:

The object profile graph shows the count of objects allocated on average by each stack frame, eg, the selected frame in the UI here had ~ 4 objects on average.

Note that both of these graphs are the aggregate of many profiling sessions, each representing the average of ~ 250 sessions. This thus gives us a pretty good average view of what threads are actually doing work in the application, and where allocations are happening. The larger the number of samples, the better the view.

Again, of course, the data would be more interesting for an active app, as we are seeing the CPU graph dominated by threads periodically waking up from their sleep (including our own profile server), and a relatively small number of object allocations.

Does this help to clarify a bit? This is a relatively novel use case for ruby profiling both at Shopify and I think in general, but exists for other languages (eg, golang as I have mentioned), so I believe it will be useful for developers once we expose it, as this sort of data has been used for Go apps for a few years now.

README.md

gmcgibbon · 2022-06-13T20:40:11Z

lib/app_profiler/server.rb

+
+        @listen_thread = Thread.new do
+          loop do
+            Thread.new(@transport.socket.accept) do |session|


This will spin up a bunch of threads in the loop. Is this intended, or should the loop be inside this block?

It should start a new thread for each accepted request. If a profiling request is already running, it will return an error. This is to allow the server to respond while a request is running, rather than hang indefinitely.

The call to accept here is blocking the listen thread, so even though there is a loop, it should only create a new thread if it actually receives a request (see the test app responds with conflict if profile already running for this below).

The majority of the time there should only be one thread, the listen thread, which is running this loop, waiting for a request. If it is profiling, a second thread would be created by this block, which will last the duration of the profiling session. During that session a third (fourth, etc) thread might briefly exist if it receives requests, but they should return a descriptive error that the server is busy, which is the intended behavior rather than they block indefinitely.

lib/app_profiler/server.rb

gmcgibbon · 2022-06-13T20:44:44Z

lib/app_profiler/railtie.rb

+    initializer "app_profiler.enable_server" do
+      if AppProfiler.server.enabled
+        AppProfiler::Server.start!
+        ActiveSupport::ForkTracker.after_fork do


Makes sense, but do we need to start the server on the main thread as well as forks?

Also we may need a require for the fork tracker if it isn't always loaded by default.

README.md

Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>

dalehamel · 2022-06-14T15:11:40Z

Thank you for the review @gmcgibbon, I believe I have addressed your first round of feedback with the two additional commits (one for the readme, one for the rest), and the responses to your questions above. Please take another look when you have the chance ❤️

test/app_profiler/profile_server_test.rb

lib/app_profiler/railtie.rb

Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>

dalehamel · 2022-06-14T19:34:48Z

@gmcgibbon I've taken your suggestions to clean up the test suite, thanks for those. I believe everything should be addressed now.

I've also updated the changelog, as I'd like to follow this up with a new release once this merges.

If you'd like to give it one more look before I merge let me know, otherwise let me know if you think this is good to ship (I see you already approved, thanks), and i'll merge and start the release process.

gmcgibbon · 2022-06-14T21:16:25Z

rafaelfranca · 2022-09-29T21:21:48Z

lib/app_profiler/railtie.rb

@@ -37,6 +43,15 @@ class Railtie < Rails::Railtie
      end
    end

+    initializer "app_profiler.enable_server" do
+      if AppProfiler.server.enabled


Should this server starts when running tests, running migrations or starting the console? Or only when the Rails servers starts? Right now it starts every time.

A Rails server only. I can open an issue.

It should be enabled for any long-running process, so rails server or jobs.

What is the most idiomatic way to achieve this?

dalehamel force-pushed the profile-server branch 3 times, most recently from 133e41a to 5094e65 Compare May 19, 2022 00:17

dalehamel commented May 19, 2022

View reviewed changes

lib/app_profiler/server.rb Outdated Show resolved Hide resolved

dalehamel force-pushed the profile-server branch 23 times, most recently from 145ab4a to 7860dea Compare May 21, 2022 02:18

dalehamel changed the title ~~Implement a simple, "HTTP" server to initiate profiling remotely~~ Implement a simple, "HTTP" server and Rack app to initiate profiling remotely May 21, 2022

dalehamel force-pushed the profile-server branch from 7860dea to cc174a4 Compare May 21, 2022 02:44

dalehamel force-pushed the profile-server branch from 271ae4d to 3f7ecfb Compare June 10, 2022 16:00

dalehamel and others added 3 commits June 13, 2022 10:32

Profile server: Additional HTTP server tests and cleanup

75cfa4f

Profile Server: Add support unix server transport type

e0c83ef

Co-authored-by: Dale Hamel <dale.hamel@shopify.com>

Profile Server: Start profile server after any forked processes in ra…

99d4a26

…iltie

dalehamel force-pushed the profile-server branch from b932387 to 1408445 Compare June 13, 2022 14:33

Facilitate resetting singleton values in tests with reset! method

08b4299

dalehamel force-pushed the profile-server branch from 1408445 to 08b4299 Compare June 13, 2022 14:59

dalehamel marked this pull request as ready for review June 13, 2022 15:03

dalehamel requested a review from gmcgibbon June 13, 2022 19:54

gmcgibbon reviewed Jun 13, 2022

View reviewed changes

dalehamel force-pushed the profile-server branch 2 times, most recently from 3348c29 to 7e1d541 Compare June 14, 2022 14:48

Readme improvements

020e251

Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>

dalehamel force-pushed the profile-server branch 3 times, most recently from 611c0f0 to fd6cc32 Compare June 14, 2022 15:09

dalehamel force-pushed the profile-server branch from fd6cc32 to 7305cca Compare June 14, 2022 15:19

gmcgibbon approved these changes Jun 14, 2022

View reviewed changes

test/app_profiler/profile_server_test.rb Outdated Show resolved Hide resolved

test/app_profiler/profile_server_test.rb Outdated Show resolved Hide resolved

test/app_profiler/profile_server_test.rb Show resolved Hide resolved

gmcgibbon reviewed Jun 14, 2022

View reviewed changes

lib/app_profiler/railtie.rb Outdated Show resolved Hide resolved

dalehamel and others added 2 commits June 14, 2022 15:25

Apply suggestions from code review

2ae35ea

Co-authored-by: Gannon McGibbon <gannon.mcgibbon@gmail.com>

Guard stopping profile server on pid match

b5f8bc8

dalehamel force-pushed the profile-server branch from 49773fc to b5f8bc8 Compare June 14, 2022 19:25

Update changelog

ab26c73

dalehamel merged commit 0e2a323 into main Jun 15, 2022

dalehamel deleted the profile-server branch June 15, 2022 13:12

shopify-shipit bot temporarily deployed to production June 16, 2022 13:21 Inactive

rafaelfranca reviewed Sep 29, 2022

View reviewed changes

gmcgibbon mentioned this pull request Sep 29, 2022

Profiling server starts when running migrations, console, etc. #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48

Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48

dalehamel commented May 18, 2022 •

edited

Loading

gmcgibbon Jun 13, 2022

dalehamel Jun 14, 2022

gmcgibbon Jun 14, 2022

gmcgibbon Jun 13, 2022

dalehamel Jun 14, 2022

gmcgibbon Jun 14, 2022

dalehamel Jun 14, 2022

gmcgibbon Jun 13, 2022

dalehamel Jun 14, 2022

gmcgibbon Jun 13, 2022

dalehamel commented Jun 14, 2022

dalehamel commented Jun 14, 2022

gmcgibbon commented Jun 14, 2022

rafaelfranca Sep 29, 2022

gmcgibbon Sep 29, 2022

dalehamel Sep 29, 2022

Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48

Implement a simple, "HTTP" server and Rack app to initiate profiling remotely #48

Conversation

dalehamel commented May 18, 2022 • edited Loading

TL;DR

Why

Implementation

Server

Application

Usage with Speedscope (locally)

Configuration

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalehamel commented Jun 14, 2022

dalehamel commented Jun 14, 2022

gmcgibbon commented Jun 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalehamel commented May 18, 2022 •

edited

Loading