From b8d024ba78b458945bc71676907928852d75eb6a Mon Sep 17 00:00:00 2001 From: Parker Selbert Date: Fri, 3 Nov 2023 10:35:45 -0500 Subject: [PATCH] Add Preparing for Production guide The new guide covers logging, instrumentation, pruning, and rescuing. Addresses part of #850 --- README.md | 18 ++-- guides/preparing_for_production.md | 140 +++++++++++++++++++++++++++++ mix.exs | 1 + 3 files changed, 152 insertions(+), 7 deletions(-) create mode 100644 guides/preparing_for_production.md diff --git a/README.md b/README.md index 0beb64a1..0cfaf06b 100644 --- a/README.md +++ b/README.md @@ -838,7 +838,16 @@ Another great use of execution data is error reporting. Here is an example of integrating with [Sentry][sentry] to report job failures: ```elixir -defmodule ErrorReporter do +defmodule MyApp.ErrorReporter do + def attach do + :telemetry.attach( + "oban-errors", + [:oban, :job, :exception], + &__MODULE__.handle_event/4, + [] + ) + end + def handle_event([:oban, :job, :exception], measure, meta, _) do extra = meta.job @@ -849,12 +858,7 @@ defmodule ErrorReporter do end end -:telemetry.attach( - "oban-errors", - [:oban, :job, :exception], - &ErrorReporter.handle_event/4, - [] -) +MyApp.ErrorReporter.attach() ``` You can use exception events to send error reports to Honeybadger, Rollbar, diff --git a/guides/preparing_for_production.md b/guides/preparing_for_production.md new file mode 100644 index 00000000..b0ea5f34 --- /dev/null +++ b/guides/preparing_for_production.md @@ -0,0 +1,140 @@ +# Preparing for Production + +There are a few additional bits of configuration to consider before you're ready to run Oban in +production. In development and test environments, job data is short lived and there's no scale to +contend with. Now we'll dig into enabling introspection, external observability, and maintaining +database health. + +## Logging + +Oban heavily utilizes Telemetry for instrumentation at every level. From job execution, plugin +activity, through to every database call there's a telemetry event to hook into. + +The simplest way to leverage Oban's telemetry usage is through the default logger, available with +`Oban.Telemetry.attach_default_logger/1`. Attach the logger in your `application.ex`: + +```elixir +defmodule MyApp.Application do + use Application + + @impl Application + def start(_type, _args) do + Oban.Telemetry.attach_default_logger() + + children = [ + ... + ] + end +end +``` + +By default, the logger emits JSON encoded logs at the `:info` level. You can disable encoding and +fall back to structured logging with `encode: false`, or change the log level with the `:level` +option. + +For example, to log without encoding at the `:debug` level: + +```elixir +Oban.Telemetry.attach_default_logger(encode: false, level: :debug) +``` + +## Pruning Jobs + +Job introspection and uniqueness relies on keeping job rows in the database after they have +executed. To prevent the `oban_jobs` table from growing indefinitely, the `Oban.Plugins.Pruner` +plugin provides out-of-band deletion of `completed`, `cancelled` and `discarded` jobs. + +Retaining jobs for 7 days is a good starting point, but depending on throughput, you may wish to +keep jobs for even longer. Include `Pruner` in the list of plugins and configure it to retain jobs +for 7 days, specified in seconds: + +```elixir +config :my_app, Oban, + plugins: [ + {Oban.Plugins.Pruner, max_age: 60 * 60 * 24 * 7}, + ... +``` + +## Rescuing Jobs + +During deployment or unexpected node restarts jobs may be left in an executing state indefinitely. +We call these jobs "orphans", but orphaning isn't a bad thing. It means that the job wasn't lost +and it may be retried again when the system comes back online. + +There are two mechanisms to mitigate orphans: + +1. Use the `Oban.Plugins.Lifeline` plugin to automatically move those jobs back to available so + they can run again. +2. Increase the `shutdown_grace_period` to allow the system more time to finish executing before + shutdown. + +Even with a higher `shutdown_grace_period` it's possible to have orphans if there is an unexpected +crash or extra long running jobs. + +Consider adding the `Lifeline` plugin and configure it to rescue after a generous period of time, +like 30 minutes: + +```elixir +config :my_app, Oban, + plugins: [ + {Oban.Plugins.Lifeline, rescue_after: :timer.minutes(30)}, + ... +``` + +## Error Handling + +Telemetry events can be used to report issues externally to services like Sentry or AppSignal. +Write a handler that sends error notifications to a third party (use a mock, or something that +sends a message back to the test process). + +You can use exception events to send error reports to Honeybadger, Rollbar, AppSignal or any other +application monitoring platform. + +Here's an example reporter module for [Sentry](https://hex.pm/packages/sentry): + +```elixir +defmodule MyApp.ObanReporter do + def attach do + :telemetry.attach("oban-errors", [:oban, :job, :exception], &__MODULE__.handle_event/4, []) + end + + def handle_event([:oban, :job, :exception], measure, meta, _) do + extra = + meta.job + |> Map.take([:id, :args, :meta, :queue, :worker]) + |> Map.merge(measure) + + Sentry.capture_exception(meta.reason, stacktrace: meta.stacktrace, extra: extra) + end +end +``` + +Attach the handler when your application boots: + +```elixir +# application.ex +@impl Application +def start(_type, _args) do + MyApp.ObanReporter.attach() +end +``` + +## Ship It! + +Now you're ready to ship to production with essential logging, error reporting, and baseline +job maintenance. + +For additional observability and introspection, consider integrating with one of these external +tools built on `Oban.Telemetry`: + +* [Oban Web](https://getoban.pro)—an official Oban package, it's a view of jobs, queues, and + metrics that you host directly within your application. Powered by Phoenix Live View and Oban + Metrics, it is extremely lightweight and continuously updated. + +* [PromEx](https://hex.pm/packages/prom_ex)—Prometheus metrics and Grafana dashboards based on + metrics from job events, producer events, and also from internal polling jobs to monitor queue + sizes. + +* [AppSignal](https://docs.appsignal.com/elixir/integrations/oban.html)—The AppSignal for Elixir + package instruments jobs performed by Oban workers, and collects metrics about your jobs' + performance. diff --git a/mix.exs b/mix.exs index f584f5d3..2cfd5945 100644 --- a/mix.exs +++ b/mix.exs @@ -65,6 +65,7 @@ defmodule Oban.MixProject do [ # Guides "guides/installation.md", + "guides/preparing_for_production.md", "guides/troubleshooting.md", "guides/release_configuration.md", "guides/writing_plugins.md",