Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to disable collection of general metrics (of the whole Sidekiq setup) #20

Merged
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

### Added

- Setting `collect_cluster_metrics` allowing to force enable or disable collection of global (whole Sidekiq installaction-wide) metrics. See [#20](https://github.com/yabeda-rb/yabeda-sidekiq/pull/20). [@mrexox]

By default all sidekiq worker processes (servers) collects global metrics about whole Sidekiq installation.
Client processes (everything else that is not Sidekiq worker) by default doesn't.

With this config you can override this behavior:
- force disable if you don't want multiple Sidekiq workers to report the same numbers (that causes excess load to both Redis and monitoring)
- force enable if you want non-Sidekiq process to collect them (like dedicated metric exporter process)

## 0.7.0 - 2020-07-15

### Changed
Expand Down Expand Up @@ -63,3 +74,4 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

[@dsalahutdinov]: https://github.com/dsalahutdinov "Salahutdinov Dmitry"
[@asusikov]: https://github.com/asusikov "Alexander Susikov"
[@mrexox]: https://github.com/mrexox "Valentine Kiselev"
28 changes: 24 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,30 @@ end

## Metrics

### Local per-process metrics

Metrics representing state of current Sidekiq worker process and stats of executed or executing jobs:

- Total number of executed jobs: `sidekiq_jobs_executed_total` - (segmented by queue and class name)
- Number of jobs have been finished successfully: `sidekiq_jobs_success_total` (segmented by queue and class name)
- Number of jobs have been failed: `sidekiq_jobs_failed_total` (segmented by queue and class name)
- Time of job run: `sidekiq_job_runtime` (seconds per job execution, segmented by queue and class name)
- Time of the queue latency `sidekiq_queue_latency` (the difference in seconds since the oldest job in the queue was enqueued)
- Time of the job latency `sidekiq_job_latency` (the difference in seconds since the enqueuing until running job)
- Maximum runtime of currently executing jobs: `sidekiq_running_job_runtime` (useful for detection of hung jobs, segmented by queue and class name)

### Global cluster-wide metrics

Metrics representing state of the whole Sidekiq installation (queues, processes, etc):

- Number of jobs in queues: `sidekiq_jobs_waiting_count` (segmented by queue)
- Time of the queue latency `sidekiq_queue_latency` (the difference in seconds since the oldest job in the queue was enqueued)
- Number of scheduled jobs:`sidekiq_jobs_scheduled_count`
- Number of jobs in retry set: `sidekiq_jobs_retry_count`
- Number of jobs in dead set (“morgue”): `sidekiq_jobs_dead_count`
- Active workers count: `sidekiq_active_processes`
- Active processes count: `sidekiq_active_workers_count`
- Maximum runtime of currently executing jobs: `sidekiq_running_job_runtime` (useful for detection of hung jobs, segmented by queue and class name)
- Active processes count: `sidekiq_active_processes`
- Active servers count: `sidekiq_active_workers_count`

By default all sidekiq worker processes (servers) collects global metrics about whole Sidekiq installation. This can be overridden by setting `collect_cluster_metrics` config key to `true` for non-Sidekiq processes or to `false` for Sidekiq processes (e.g. by setting `YABEDA_SIDEKIQ_COLLECT_CLUSTER_METRICS` env variable to `no`, see other methods in [anyway_config] docs).

## Custom tags

Expand All @@ -74,6 +85,14 @@ class MyWorker
end
```

## Configuration

Configuration is handled by [anyway_config] gem. With it you can load settings from environment variables (upcased and prefixed with `YABEDA_SIDEKIQ_`), YAML files, and other sources. See [anyway_config] docs for details.

Config key | Type | Default | Description |
------------------------- | -------- | ------------------------------------------------------- | ----------- |
`collect_cluster_metrics` | boolean | Enabled in Sidekiq worker processes, disabled otherwise | Defines whether this Ruby process should collect and expose metrics representing state of the whole Sidekiq installation (queues, processes, etc). |

# Roadmap (TODO or Help wanted)

- Implement optional segmentation of retry/schedule/dead sets
Expand Down Expand Up @@ -131,3 +150,4 @@ The gem is available as open source under the terms of the [MIT License](https:/
[Sidekiq]: https://github.com/mperham/sidekiq/ "Simple, efficient background processing for Ruby"
[yabeda]: https://github.com/yabeda-rb/yabeda
[yabeda-prometheus]: https://github.com/yabeda-rb/yabeda-prometheus
[anyway_config]: https://github.com/palkan/anyway_config "Configuration library for Ruby gems and applications"
62 changes: 36 additions & 26 deletions lib/yabeda/sidekiq.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
require "yabeda/sidekiq/version"
require "yabeda/sidekiq/client_middleware"
require "yabeda/sidekiq/server_middleware"
require "yabeda/sidekiq/config"

module Yabeda
module Sidekiq
Expand All @@ -16,36 +17,47 @@ module Sidekiq
].freeze

Yabeda.configure do
config = Config.new

group :sidekiq

counter :jobs_enqueued_total, tags: %i[queue worker], comment: "A counter of the total number of jobs sidekiq enqueued."

next unless ::Sidekiq.server?

counter :jobs_executed_total, tags: %i[queue worker], comment: "A counter of the total number of jobs sidekiq executed."
counter :jobs_success_total, tags: %i[queue worker], comment: "A counter of the total number of jobs successfully processed by sidekiq."
counter :jobs_failed_total, tags: %i[queue worker], comment: "A counter of the total number of jobs failed in sidekiq."

gauge :jobs_waiting_count, tags: %i[queue], comment: "The number of jobs waiting to process in sidekiq."
gauge :active_workers_count, tags: [], comment: "The number of currently running machines with sidekiq workers."
gauge :jobs_scheduled_count, tags: [], comment: "The number of jobs scheduled for later execution."
gauge :jobs_retry_count, tags: [], comment: "The number of failed jobs waiting to be retried"
gauge :jobs_dead_count, tags: [], comment: "The number of jobs exceeded their retry count."
gauge :active_processes, tags: [], comment: "The number of active Sidekiq worker processes."
gauge :queue_latency, tags: %i[queue], comment: "The queue latency, the difference in seconds since the oldest job in the queue was enqueued"
gauge :running_job_runtime, tags: %i[queue worker], aggregation: :max, unit: :seconds,
comment: "How long currently running jobs are running (useful for detection of hung jobs)"

histogram :job_latency, comment: "The job latency, the difference in seconds between enqueued and running time",
unit: :seconds, per: :job,
tags: %i[queue worker],
buckets: LONG_RUNNING_JOB_RUNTIME_BUCKETS
histogram :job_runtime, comment: "A histogram of the job execution time.",
unit: :seconds, per: :job,
tags: %i[queue worker],
buckets: LONG_RUNNING_JOB_RUNTIME_BUCKETS
if ::Sidekiq.server?
counter :jobs_executed_total, tags: %i[queue worker], comment: "A counter of the total number of jobs sidekiq executed."
counter :jobs_success_total, tags: %i[queue worker], comment: "A counter of the total number of jobs successfully processed by sidekiq."
counter :jobs_failed_total, tags: %i[queue worker], comment: "A counter of the total number of jobs failed in sidekiq."

gauge :running_job_runtime, tags: %i[queue worker], aggregation: :max, unit: :seconds,
comment: "How long currently running jobs are running (useful for detection of hung jobs)"

histogram :job_latency, comment: "The job latency, the difference in seconds between enqueued and running time",
unit: :seconds, per: :job,
tags: %i[queue worker],
buckets: LONG_RUNNING_JOB_RUNTIME_BUCKETS
histogram :job_runtime, comment: "A histogram of the job execution time.",
unit: :seconds, per: :job,
tags: %i[queue worker],
buckets: LONG_RUNNING_JOB_RUNTIME_BUCKETS
end

# Metrics not specific for current Sidekiq process, but representing state of the whole Sidekiq installation (queues, processes, etc)
# You can opt-out from collecting these by setting YABEDA_SIDEKIQ_COLLECT_CLUSTER_METRICS to falsy value (+no+ or +false+)
if config.collect_cluster_metrics # defaults to +::Sidekiq.server?+
gauge :jobs_waiting_count, tags: %i[queue], comment: "The number of jobs waiting to process in sidekiq."
gauge :active_workers_count, tags: [], comment: "The number of currently running machines with sidekiq workers."
gauge :jobs_scheduled_count, tags: [], comment: "The number of jobs scheduled for later execution."
gauge :jobs_retry_count, tags: [], comment: "The number of failed jobs waiting to be retried"
gauge :jobs_dead_count, tags: [], comment: "The number of jobs exceeded their retry count."
gauge :active_processes, tags: [], comment: "The number of active Sidekiq worker processes."
gauge :queue_latency, tags: %i[queue], comment: "The queue latency, the difference in seconds since the oldest job in the queue was enqueued"
end

collect do
Yabeda::Sidekiq.track_max_job_runtime if ::Sidekiq.server?

next unless config.collect_cluster_metrics

stats = ::Sidekiq::Stats.new

stats.queues.each do |k, v|
Expand All @@ -61,8 +73,6 @@ module Sidekiq
sidekiq_queue_latency.set({ queue: queue.name }, queue.latency)
end

Yabeda::Sidekiq.track_max_job_runtime

# That is quite slow if your retry set is large
# I don't want to enable it by default
# retries_by_queues =
Expand Down
18 changes: 18 additions & 0 deletions lib/yabeda/sidekiq/config.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# frozen_string_literal: true

require "anyway"

module Yabeda
module Sidekiq
class Config < ::Anyway::Config
config_name :yabeda_sidekiq

# By default all sidekiq worker processes (servers) collects global metrics about whole Sidekiq installation.
# Client processes (everything else that is not Sidekiq worker) by default doesn't.
# With this config you can override this behavior:
# - force disable if you don't want multiple Sidekiq workers to report the same numbers (that causes excess load to both Redis and monitoring)
# - force enable if you want non-Sidekiq process to collect them (like dedicated metric exporter process)
attr_config collect_cluster_metrics: ::Sidekiq.server?
end
end
end
1 change: 1 addition & 0 deletions yabeda-sidekiq.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Gem::Specification.new do |spec|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
spec.require_paths = ["lib"]

spec.add_dependency "anyway_config", ">= 1.3", "< 3"
spec.add_dependency "sidekiq"
spec.add_dependency "yabeda", "~> 0.6"

Expand Down