Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passenger resets SIGPROF signal handlers causing profilers to crash application #2489

Closed
ivoanjo opened this issue Jul 25, 2023 · 2 comments · Fixed by #2496
Closed

Passenger resets SIGPROF signal handlers causing profilers to crash application #2489

ivoanjo opened this issue Jul 25, 2023 · 2 comments · Fixed by #2496

Comments

@ivoanjo
Copy link
Contributor

ivoanjo commented Jul 25, 2023

Issue report

Question 1: What is the problem?

Hey there 👋! I work at @DataDog on our continuous profiler for Ruby. We've had a couple of customers report issues when our profiler is used together with passenger.

TL;DR when Passenger starts it resets a bunch of signal handlers:

# Reset signal handlers to their default handler, and install some
# special handlers for a few signals. The previous signal handlers
# will be put back by calling revert_signal_handlers.
def reset_signal_handlers
Signal.list_trappable.each_key do |signal|
begin
prev_handler = trap(signal, DEFAULT)
if prev_handler != DEFAULT
@previous_signal_handlers[signal] = prev_handler
end
rescue ArgumentError
# Signal cannot be trapped; ignore it.
end
end
trap('HUP', IGNORE)
PhusionPassenger.call_event(:after_installing_signal_handlers)
end

...including the SIGPROF signal handler (which is returned by Signal.list_trappable).

This breaks profiling Ruby apps because the following happens:

  1. Profiler starts running, installing a SIGPROF signal handler used for data collection.
  2. Passenger resets signal handlers
  3. Profiler sends a SIGPROF signal, causing the Ruby app to crash because it no longer has a signal handler for this signal.

To add a bit more context, using SIGPROF for profiling is a common approach, not just something weird we do at Datadog:

I suspect the solution would be to extend the existing list_trappable with a result.delete("PROF") so that SIGPROF signal handlers remain intact.


  • What is the expected behavior?

Applications that have a SIGPROF signal handler installed and then receive a SIGPROF signal correctly handle it, rather than crashing.

  • What is the actual behavior?

These applications crash (as in, Ruby process goes poof)

  • How can we reproduce it? Please try to provide a sample application (or Virtual Machine) demonstrating the issue. Otherwise, if we can't reproduce it, we might have to ask you a number of followup questions or run certain commands to try and figure out the problem.

Here's an example pure-Ruby reproducer config.ru:

puts "Installing SIGPROF signal handler!"

Signal.trap("PROF") do
  puts " !! Got sigprof!"
end

puts "Testing signal handler!"

Process.kill("PROF", Process.pid)

puts "Running rack app!"

app = ->(env) {
  puts " ** Got request!"

  if env['REQUEST_URI'].start_with?('/signal')
    puts "Sending signal to myself!"
    Process.kill("PROF", Process.pid)
  end

  [200, {}, ['Hello, World!']]
}
run app

Here's how it looks in puma:

$ bundle exec puma
Puma starting in single mode...
* Puma version: 6.1.1 (ruby 3.2.2-p53) ("The Way Up")
*  Min threads: 0
*  Max threads: 5
*  Environment: development
*          PID: 203590
Installing SIGPROF signal handler!
Testing signal handler!
 !! Got sigprof!
Running rack app!
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
 ** Got request! # <-- Me hitting /signal
Sending signal to myself!
 !! Got sigprof! # <-- Signal handler still working fine
 ** Got request!
Sending signal to myself!
 !! Got sigprof!
 ** Got request!

and in passenger:

$ bundle exec passenger start
=============== Phusion Passenger(R) Standalone web server started ===============
PID file: passenger.3000.pid
Log file: passenger.3000.log
Environment: development
Accessible via: http://0.0.0.0:3000/

You can stop Phusion Passenger(R) Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
[ N 2023-07-25 10:13:12.5640 204259/T5 age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
App 204333 output: Installing SIGPROF signal handler!
App 204333 output: Testing signal handler!
App 204333 output:  !! Got sigprof!
App 204333 output: Running rack app!
App 204368 output:  ** Got request! # <-- Regular request
App 204368 output:  ** Got request! # <-- I hit /signal
App 204368 output: Sending signal to myself!
[ W 2023-07-25 10:13:27.9117 204259/Te age/Cor/Con/InternalUtils.cpp:96 ]: [Client 4-1] Sending 502 response: application did not send a complete response
[ W 2023-07-25 10:13:30.0219 204259/T3 age/Cor/App/Poo/AnalyticsCollection.cpp:101 ]: Process (pid=204368, group=example (development)) no longer exists! Detaching it from the pool.
[ N 2023-07-25 10:13:30.0220 204259/T3 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 204368, application example (development)

Question 2: Passenger version and integration mode:

  • Passenger 6.0.18 (Latest version at time of writing)
  • Standalone mode

Question 3: OS or Linux distro, platform (including version):

  • Ubuntu 22.04.2 LTS, x86_64

Question 4: Passenger installation method:

[x] RubyGems + Gemfile

Question 5: Your app's programming language (including any version managers) and framework (including versions):

  • Ruby 3.2.2
  • rvm 1.29.12-next (master)

Question 6: Are you using a PaaS and/or containerization? If so which one?

Can be reproduced locally.

Question 7: Anything else about your setup that we should know?

Happy to submit a PR :)


We strive for quality and appreciate you taking the time to submit a report! Please note that if you want guaranteed response times and priority issue support we encourage you to join our enterprise customer base. They also provide us with the means to continue our high level of open source support!

@CamJN
Copy link
Member

CamJN commented Jul 31, 2023

This seems fairly reasonable, if you submit a PR, please be sure to sign the contributor agreement.

@ivoanjo
Copy link
Contributor Author

ivoanjo commented Aug 2, 2023

Sounds good, I should have a PR for this within the next days! (Travelling a bit right now!)

ivoanjo added a commit to DataDog/passenger that referenced this issue Aug 31, 2023
**What does this PR do?**

This PR tweaks the list of trappable signals (`Signal.list_trappable`
in `ruby_core_enhancements.rb`) to consider `SIGPROF` a non-trappable
signal.

This fixes the issue reported in phusion#2489 where a profiler (or other
application) installed a `SIGPROF` handler, and then the handler was
reset and thus the application would crash upon receiving it.

**How to test the change?**

The following `config.ru` reproducer is able to show the problem:

```
puts "Installing SIGPROF signal handler!"

Signal.trap("PROF") do
  puts " !! Got sigprof!"
end

puts "Testing signal handler!"

Process.kill("PROF", Process.pid)

puts "Running rack app!"

app = ->(env) {
  puts " ** Got request!"

  if env['REQUEST_URI'].start_with?('/signal')
    puts "Sending signal to myself!"
    Process.kill("PROF", Process.pid)
  end

  [200, {}, ['Hello, World!']]
}
run app
```

Before this change:

```
$ bundle exec passenger start
=============== Phusion Passenger(R) Standalone web server started ===============
PID file: passenger.3000.pid
Log file: passenger.3000.log
Environment: development
Accessible via: http://0.0.0.0:3000/

You can stop Phusion Passenger(R) Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
[ N 2023-07-25 10:13:12.5640 204259/T5 age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
App 204333 output: Installing SIGPROF signal handler!
App 204333 output: Testing signal handler!
App 204333 output:  !! Got sigprof!
App 204333 output: Running rack app!
App 204368 output:  ** Got request! # <-- Regular request
App 204368 output:  ** Got request! # <-- I hit /signal
App 204368 output: Sending signal to myself!
[ W 2023-07-25 10:13:27.9117 204259/Te age/Cor/Con/InternalUtils.cpp:96 ]: [Client 4-1] Sending 502 response: application did not send a complete response
[ W 2023-07-25 10:13:30.0219 204259/T3 age/Cor/App/Poo/AnalyticsCollection.cpp:101 ]: Process (pid=204368, group=example (development)) no longer exists! Detaching it from the pool.
[ N 2023-07-25 10:13:30.0220 204259/T3 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 204368, application example (development)
```

(Every request fails because the worker crashes, and then gets replaced
with another one that crashes when it gets the next request, and so on)

After this change:

```
$ bundle exec passenger start
=============== Phusion Passenger(R) Standalone web server started ===============
PID file: passenger.3000.pid
Log file: passenger.3000.log
Environment: development
Accessible via: http://0.0.0.0:3000/

You can stop Phusion Passenger(R) Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
[ N 2023-08-25 14:39:17.6888 540283/T5 age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
App 540356 output: Installing SIGPROF signal handler!
App 540356 output: Testing signal handler!
App 540356 output:  !! Got sigprof!
App 540356 output: Running rack app!
App 540391 output:  ** Got request!
App 540391 output:  ** Got request!
App 540391 output: Sending signal to myself! # <-- No issues in request!
App 540391 output:  !! Got sigprof!
App 540391 output:  ** Got request!
App 540391 output: Sending signal to myself! ! <-- Works! :)
App 540391 output:  !! Got sigprof!
```

I've also checked with the dd-trace-rb, it no longer needs a workaround
when run under passenger (this workaround reduces the accuracy of
profiling data, hence my interest in not having it).

Once this fix gets released, I plan to add newer versions of passenger
to the allow-list, so dd-trace-rb will not apply the workaround for
them.

Fixes phusion#2489
CamJN pushed a commit that referenced this issue Sep 11, 2023
**What does this PR do?**

This PR tweaks the list of trappable signals (`Signal.list_trappable`
in `ruby_core_enhancements.rb`) to consider `SIGPROF` a non-trappable
signal.

This fixes the issue reported in #2489 where a profiler (or other
application) installed a `SIGPROF` handler, and then the handler was
reset and thus the application would crash upon receiving it.

**How to test the change?**

The following `config.ru` reproducer is able to show the problem:

```
puts "Installing SIGPROF signal handler!"

Signal.trap("PROF") do
  puts " !! Got sigprof!"
end

puts "Testing signal handler!"

Process.kill("PROF", Process.pid)

puts "Running rack app!"

app = ->(env) {
  puts " ** Got request!"

  if env['REQUEST_URI'].start_with?('/signal')
    puts "Sending signal to myself!"
    Process.kill("PROF", Process.pid)
  end

  [200, {}, ['Hello, World!']]
}
run app
```

Before this change:

```
$ bundle exec passenger start
=============== Phusion Passenger(R) Standalone web server started ===============
PID file: passenger.3000.pid
Log file: passenger.3000.log
Environment: development
Accessible via: http://0.0.0.0:3000/

You can stop Phusion Passenger(R) Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
[ N 2023-07-25 10:13:12.5640 204259/T5 age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
App 204333 output: Installing SIGPROF signal handler!
App 204333 output: Testing signal handler!
App 204333 output:  !! Got sigprof!
App 204333 output: Running rack app!
App 204368 output:  ** Got request! # <-- Regular request
App 204368 output:  ** Got request! # <-- I hit /signal
App 204368 output: Sending signal to myself!
[ W 2023-07-25 10:13:27.9117 204259/Te age/Cor/Con/InternalUtils.cpp:96 ]: [Client 4-1] Sending 502 response: application did not send a complete response
[ W 2023-07-25 10:13:30.0219 204259/T3 age/Cor/App/Poo/AnalyticsCollection.cpp:101 ]: Process (pid=204368, group=example (development)) no longer exists! Detaching it from the pool.
[ N 2023-07-25 10:13:30.0220 204259/T3 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 204368, application example (development)
```

(Every request fails because the worker crashes, and then gets replaced
with another one that crashes when it gets the next request, and so on)

After this change:

```
$ bundle exec passenger start
=============== Phusion Passenger(R) Standalone web server started ===============
PID file: passenger.3000.pid
Log file: passenger.3000.log
Environment: development
Accessible via: http://0.0.0.0:3000/

You can stop Phusion Passenger(R) Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
[ N 2023-08-25 14:39:17.6888 540283/T5 age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
App 540356 output: Installing SIGPROF signal handler!
App 540356 output: Testing signal handler!
App 540356 output:  !! Got sigprof!
App 540356 output: Running rack app!
App 540391 output:  ** Got request!
App 540391 output:  ** Got request!
App 540391 output: Sending signal to myself! # <-- No issues in request!
App 540391 output:  !! Got sigprof!
App 540391 output:  ** Got request!
App 540391 output: Sending signal to myself! ! <-- Works! :)
App 540391 output:  !! Got sigprof!
```

I've also checked with the dd-trace-rb, it no longer needs a workaround
when run under passenger (this workaround reduces the accuracy of
profiling data, hence my interest in not having it).

Once this fix gets released, I plan to add newer versions of passenger
to the allow-list, so dd-trace-rb will not apply the workaround for
them.

Fixes #2489
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants