forked from HdrHistogram/HdrHistogram
-
Notifications
You must be signed in to change notification settings - Fork 0
A High Dynamic Range (HDR) Histogram
License
Unknown, CC0-1.0 licenses found
Licenses found
Unknown
LICENSE.txt
CC0-1.0
COPYING.txt
odeheurles/HdrHistogram
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
HdrHistogram: A High Dynamic Range (HDR) Histogram [excerpt from the Histogram JavaDoc] Histogram supports the recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level. For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better). Histogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values. A combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded data information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples. Internally, Histogram data is maintained using a concept somewhat similar to that of floating point number representation: Using a an exponent and a (non-normalized) mantissa to support a wide dynamic range at a high but varying (by exponent value) resolution. Histogram uses exponentially increasing bucket value ranges (the parallel of the exponent portion of a floating point number) with each bucket containing a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion of a floating point number). Both dynamic range and resolution are configurable, with highestTrackableValue controlling dynamic range, and largestValueWithSingleUnitResolution controlling resolution. An common use example of an HDR Histogram would be to record response times in units of microseconds across a dynamic range stretching from 1 usec to over an hour, with a good enough resolution to support later performing post-recording analysis on the collected data. Analysis can including computing, examining, and reporting of distribution by percentiles, linear or logarithmic value buckets, mean and standard deviation, or by any other means that can can be easily added by using the various iteration techniques supported by the Histogram. In order to facilitate the accuracy needed for various post-recording anaylsis techniques, this example can maintain where a resolution of ~1 usec or better for times ranging to ~2 msec in magnitude, while at the same time maintaining a resolution of ~1 msec or better for times ranging to ~2 sec, and a resolution of ~1 second or better for values up to 2,000 seconds. This sort of example resolution can be thought of as "always accurate to 3 decimal points." Such an example Histogram would simply be created with a highestTrackableValue of 3,600,000,000, and a largestValueWithSingleUnitResolution of 2,000, and would occupy a fixed, unchanging memory footprint of around 369KB (see "Footprint estimation" below). Synchronization and concurrent access In the interest of keeping value recording cost to a minimum, Histogram is NOT internally synchronized, and does NOT use atomic variables. Callers wishing to make potentially concurrent, multi-threaded updates or queries against the the Histogram object should take care to externally synchronize and/or order their access. Iteration Histogram supports multiple convenient forms of iterating through the histogram data set, including linear, logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or each possible value level. Identical iteration mechanisms are available for the histogram's default (corrected) data set and for it's raw data (see "Corrected and Raw data sets" below) set for the HistogramData available through either getHistogramData() or getRawHistogramData(). The iteration mechanisms all provide HistogramIterationValue data points along the histogram's iterated data set, and are available for the default (corrected) histogram data set via the following HistogramData methods: percentiles : An Iterable<HistogramIterationValue> through the histogram using aPercentileIterator linearBucketValues : An Iterable<HistogramIterationValue> through the histogram using a LinearIterator logarithmicBucketValues : An Iterable<HistogramIterationValue> through the histogram using a LogarithmicIterator recordedValues : An Iterable<HistogramIterationValue> through the histogram using a RecordedValuesIterator allValues : An Iterable<HistogramIterationValue> through the histogram using a AllValuesIterator Iteration is typically done with a for-each loop statement. E.g.: for (HistogramIterationValue v : histogram.getHistogramData().percentiles(percentileTicksPerHalfDistance)) { ... } or for (HistogramIterationValue v : histogram.getRawHistogramData().linearBucketValues(valueUnitsPerBucket)) { ... } The iterators associated with each iteration method are resettable, such that a caller that would like to avoid allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's hasNext() and next() methods: to avoid allocating a new iterator object for each iteration loop: PercentileIterator iter = histogram.getHistogramData().percentiles().iterator(percentileTicksPerHalfDistance); ... iter.reset(percentileTicksPerHalfDistance); for (iter.hasNext() { HistogramIterationValue v = iter.next(); ... } Equivalent Values and value ranges Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. Histogram provides methods for determining the lowest and highest equivalent values for any given value, as we as determining whether two values are equivalent, and for finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid double-counting count). Corrected and Raw data sets In order to support a common use case needed when histogram values are used to track response time distribution, Histogram collects both raw and corrected (weighted) histogram results by supporting an optional expectedIntervalBetweenValueSamples parameter to the recordValue method. When a value recorded in the histogram exceeds the expectedIntervalBetweenValueSamples parameter, the raw histogram data will reflect only the single reported result, while the default (corrected) histogram data will reflect an appropriate number of additional results with linearly decreasing values (down to the last value that would still be higher than expectedIntervalBetweenValueSamples). To illustrate why this corrective behavior is critically needed in order to accurately represent value distribution when large value measurements may lead to missed samples, imagine a system for which response times samples are taken once every 10 msec to characterize response time distribution. The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample showing a 1msec response time value. at each sample for 100 seconds (10,000 logged samples at 1msec each). The hypethetical system then encounters a 100 sec pause during which only a single sample is recorded (with a 100 second value). The raw data histogram collected for such a hypothetical system (over the 200 second scenario above) would show ~99.99% of results at 1msec or below, which is obviously "not right". The same histogram, corrected with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the response time distribution. Only ~50% of results will be at 1msec or below, with the remaining 50% coming from the auto-generated value records covering the missing increments spread between 10msec and 100 sec. The raw and default (corrected) data sets will differ only if at least one value recorded with the recordValue method was greater than it's associated expectedIntervalBetweenValueSamples parameter. The raw and default (corrected) data set will be identical in contents if all values recorded via the recordValue were smaller than their associated (and optional) expectedIntervalBetweenValueSamples parameters. While both the raw and corrected histogram data are tracked and accessible, it is the (default) corrected numbers that would typically be consulted and reported. When used for response time characterization, the default (corrected) data set will tend to much more accurately reflect the response time distribution that a random, uncoordinated request would have experienced. Footprint estimation Due to it's dynamic range representation, Histogram is relatively efficient in memory space requirements given the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved for a given highestTrackableValue and largestValueWithSingleUnitResolution combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value recording counts array. The total footprint can be estimated by: largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits); expectedHistogramFootprintInBytes = 1024 + 8 * (log2RoundedUp((1.0 * highestTrackableValue) / largestValueWithSingleUnitResolution) + 2) * roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution) A conservative (high) estimate of a Histogram's footprint in bytes is available via the getEstimatedFootprintInBytes() method.
About
A High Dynamic Range (HDR) Histogram
Resources
License
Unknown, CC0-1.0 licenses found
Licenses found
Unknown
LICENSE.txt
CC0-1.0
COPYING.txt
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published