Skip to content

Metrics and Monitoring

David Gross edited this page Feb 8, 2015 · 15 revisions

Hystrix captures metrics using the HystrixRollingNumber and HystrixRollingPercentile classes in rolling windows. The rolling windows allow Hystrix to use low-latency moving windows of metrics for circuit breaker health checks and operations.

Direct Access

You can access metrics programmatically with the following calls:

HystrixCommandMetrics.getInstances()
HystrixThreadPoolMetrics.getInstances()

Metrics Event Stream

You can use the hystrix-metrics-event-stream to power the dashboard, real-time alerting, and other such use cases.

Metrics Publisher

You can publish metrics by using an implementation of HystrixMetricsPublisher.

Register your HystrixMetricsPublisher implementations by calling HystrixPlugins.registerMetricsPublisher(HystrixMetricsPublisher impl).

Hystrix includes the following implementations as hystrix-contrib modules:

The following sections explain the metrics published with those implementations:

Command Metrics

Each HystrixCommand publishes metrics with the following tags:

  • Servo Tag: "instance", Value: HystrixCommandKey.name()
  • Servo Tag: "type", Value: "HystrixCommand"

Informational and Status

  • Boolean isCircuitBreakerOpen
  • Number errorPercentage
  • Number executionSemaphorePermitsInUse
  • String commandGroup
  • Number currentTime

Cumulative Counts (Counter)

The following represent cumulative counts since the start of the application.

  • Long countCollapsedRequests
  • Long countExceptionsThrown
  • Long countFailure
  • Long countFallbackFailure
  • Long countFallbackRejection
  • Long countFallbackSuccess
  • Long countResponsesFromCache
  • Long countSemaphoreRejected
  • Long countShortCircuited
  • Long countSuccess
  • Long countThreadPoolRejected
  • Long countTimeout

Rolling Counts (Gauge)

The following are rolling counts as configured by metrics.rollingStats.* properties.

These are “point in time” counts representing the last x seconds (for example 10 seconds).

  • Number rollingCountCollapsedRequests
  • Number rollingCountExceptionsThrown
  • Number rollingCountFailure
  • Number rollingCountFallbackFailure
  • Number rollingCountFallbackRejection
  • Number rollingCountFallbackSuccess
  • Number rollingCountResponsesFromCache
  • Number rollingCountSemaphoreRejected
  • Number rollingCountShortCircuited
  • Number rollingCountSuccess
  • Number rollingCountThreadPoolRejected
  • Number rollingCountTimeout

Latency Percentiles: HystrixCommand.run() Execution (Gauge)

These metrics represent percentiles of execution times for the HystrixCommand.run() method (on the child thread if using thread isolation).

These are rolling percentiles as configured by metrics.rollingPercentile.* properties.

  • Number latencyExecute_mean
  • Number latencyExecute_percentile_5
  • Number latencyExecute_percentile_25
  • Number latencyExecute_percentile_50
  • Number latencyExecute_percentile_75
  • Number latencyExecute_percentile_90
  • Number latencyExecute_percentile_99
  • Number latencyExecute_percentile_995

Latency Percentiles: End-to-End Execution (Gauge)

These metrics represent percentiles of execution times for the end-to-end execution of HystrixCommand.execute() or HystrixCommand.queue() until a response is returned (or is ready to return in case of queue()).

The purpose of this compared with the latencyExecute* percentiles is to measure the cost of thread queuing/scheduling/execution, semaphores, circuit breaker logic, and other aspects of overhead (including metrics capture itself).

These are rolling percentiles as configured by metrics.rollingPercentile.* properties.

  • Number latencyTotal_mean
  • Number latencyTotal_percentile_5
  • Number latencyTotal_percentile_25
  • Number latencyTotal_percentile_50
  • Number latencyTotal_percentile_75
  • Number latencyTotal_percentile_90
  • Number latencyTotal_percentile_99
  • Number latencyTotal_percentile_995

Property Values (Informational)

These informational metrics report the actual property values being used by the HystrixCommand. This enables you to see when a dynamic property takes effect and to confirm a property is set as expected.

  • Number propertyValue_rollingStatisticalWindowInMilliseconds
  • Number propertyValue_circuitBreakerRequestVolumeThreshold
  • Number propertyValue_circuitBreakerSleepWindowInMilliseconds
  • Number propertyValue_circuitBreakerErrorThresholdPercentage
  • Boolean propertyValue_circuitBreakerForceOpen
  • Boolean propertyValue_circuitBreakerForceClosed
  • Number propertyValue_executionIsolationThreadTimeoutInMilliseconds
  • String propertyValue_executionIsolationStrategy
  • Boolean propertyValue_metricsRollingPercentileEnabled
  • Boolean propertyValue_requestCacheEnabled
  • Boolean propertyValue_requestLogEnabled
  • Number propertyValue_executionIsolationSemaphoreMaxConcurrentRequests
  • Number propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests

ThreadPool Metrics

Each HystrixThreadPool publishes metrics with the following tags:

  • Servo Tag: "instance", Value: HystrixThreadPoolKey.name()
  • Servo Tag: "type", Value: "HystrixThreadPool"

Informational and Status

  • String name
  • Number currentTime

Rolling Counts (Gauge)

  • Number rollingMaxActiveThreads
  • Number rollingCountThreadsExecuted

Cumulative Counts (Counter)

  • Long countThreadsExecuted

ThreadPool State (Gauge)

  • Number threadActiveCount
  • Number completedTaskCount
  • Number largestPoolSize
  • Number totalTaskCount
  • Number queueSize

Property Values (Informational)

  • Number propertyValue_corePoolSize
  • Number propertyValue_keepAliveTimeInMinutes
  • Number propertyValue_queueSizeRejectionThreshold
  • Number propertyValue_maxQueueSize