Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exemplars prototype #936

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions docs/examples/exemplars/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
OpenTelemetry Exemplars Example
===============================

.. _Exemplars:

Exemplars are example measurements for aggregations. While they are simple conceptually, exemplars can estimate any statistic about the input distribution, can provide links to sample traces for high latency requests, and much more.
For more information about exemplars and how they work in OpenTelemetry, see the `spec <https://github.com/open-telemetry/oteps/pull/113>`_

Examples
--------

Installation

.. code-block:: sh

pip install opentelemetry-api
pip install opentelemetry-sdk
pip install matplotlib # may have to install Qt as well
pip install numpy

pip install opentelemetry-exporter-cloud-monitoring # if you want to export exemplars to cloud monitoring

Statistical exemplars
^^^^^^^^^^^^^^^^^^^^^

The opentelemetry SDK provides a way to sample exemplars statistically:

- Exemplars will be picked to represent the input distribution, without unquantifiable bias
- A "sample_count" attribute will be set on each exemplar to quantify how many measurements each exemplar represents (for randomly sampled exemplars, this value will be N (total measurements) / num_samples. For histogram exemplars, this value will be specific to each bucket).

.. literalinclude:: statistical_exemplars.py
:language: python
:lines: 1-

For the output of this example, see the corresponding Jupyter notebook.

Trace exemplars
^^^^^^^^^^^^^^^^^^

Trace exemplars are exemplars that have not been sampled statistically,
but instead aim to provide value as individual exemplars.
They will have a trace id/span id attached for the active trace when the exemplar was recorded,
and they may focus on measurements with abnormally high/low values.

.. literalinclude:: trace_exemplars.py
:language: python
:lines: 1-

Currently only the Google Cloud Monitoring exporter supports uploading these exemplars.
340 changes: 340 additions & 0 deletions docs/examples/exemplars/statistical_exemplars.ipynb

Large diffs are not rendered by default.

180 changes: 180 additions & 0 deletions docs/examples/exemplars/statistical_exemplars.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
import random
from collections import defaultdict

import matplotlib.pyplot as plt
import numpy as np
from opentelemetry import metrics
from opentelemetry.sdk.metrics import Counter, MeterProvider
from opentelemetry.sdk.metrics.export.aggregate import SumAggregator
from opentelemetry.sdk.metrics.export.controller import PushController
from opentelemetry.sdk.metrics.export.in_memory_metrics_exporter import (
InMemoryMetricsExporter,
)
from opentelemetry.sdk.metrics.view import View, ViewConfig

# set up opentelemetry

# Sets the global MeterProvider instance
metrics.set_meter_provider(MeterProvider())

meter = metrics.get_meter(__name__)

# Export to a python list so we can do stats with the data
exporter = InMemoryMetricsExporter()

# instead of waiting for the controller to tick over time, we will just tick it ourselves
controller = PushController(meter, exporter, 500)

# Create the metric that we will use
bytes_counter = meter.create_metric(
name="bytes_counter",
description="Number of bytes received by service",
unit="By",
value_type=int,
metric_type=Counter,
)

# Every time interval we will collect 100 exemplars statistically (selected without bias)
aggregator_config = {"num_exemplars": 100, "statistical_exemplars": True}

# Assign a Sum aggregator to `bytes_counter` that collects exemplars
counter_view = View(
bytes_counter,
SumAggregator,
aggregator_config=aggregator_config,
label_keys=["environment"],
view_config=ViewConfig.LABEL_KEYS,
)

meter.register_view(counter_view)

# generate the random metric data


def unknown_customer_calls():
"""Generate customer call data to our application"""

# set a random seed for consistency of data for example purposes
np.random.seed(1)
# Make exemplar selection consistent for example purposes
random.seed(1)

# customer 123 is a big user, and made 1000 requests in this timeframe
requests = np.random.normal(
1000, 100, 1000
) # 1000 requests with average 1000 bytes, standard deviation 100

for request in requests:
bytes_counter.add(
int(request),
{
"environment": "production",
"method": "REST",
"customer_id": 123,
},
)

# customer 247 is another big user, making fewer, but bigger requests
requests = np.random.normal(
5000, 1250, 200
) # 200 requests with average size of 5k bytes

for request in requests:
bytes_counter.add(
int(request),
{
"environment": "production",
"method": "REST",
"customer_id": 247,
},
)

# There are many other smaller customers
for customer_id in range(250):
cnnradams marked this conversation as resolved.
Show resolved Hide resolved
requests = np.random.normal(1000, 250, np.random.randint(1, 10))
method = "REST" if np.random.randint(2) else "gRPC"
for request in requests:
bytes_counter.add(
int(request),
{
"environment": "production",
"method": method,
"customer_id": customer_id,
},
)


unknown_customer_calls()

# Tick the controller so it sends metrics to the exporter
controller.tick()

# collect metrics from our exporter
metric_data = exporter.get_exported_metrics()

# get the exemplars from the bytes_in counter aggregator
aggregator = metric_data[0].aggregator
exemplars = aggregator.checkpoint_exemplars

# Sum up the total bytes in per customer from all of the exemplars collected
customer_bytes_map = defaultdict(int)
for exemplar in exemplars:
customer_bytes_map[exemplar.dropped_labels] += exemplar.value


customer_bytes_list = sorted(
customer_bytes_map.items(), key=lambda t: t[1], reverse=True
)

# Save our top 5 customers and sum all of the rest into "Others".
top_5_customers = [
("Customer {}".format(dict(val[0])["customer_id"]), val[1])
for val in customer_bytes_list[:5]
] + [("Other Customers", sum([val[1] for val in customer_bytes_list[5:]]))]

# unzip the data into X (sizes of each customer's contribution) and labels
labels, X = zip(*top_5_customers)

# create the chart with matplotlib and show it
plt.pie(X, labels=labels)
plt.show()

# Estimate how many bytes customer 123 sent
customer_123_bytes = customer_bytes_map[
(("customer_id", 123), ("method", "REST"))
]

# Since the exemplars were randomly sampled, all sample_counts will be the same
sample_count = exemplars[0].sample_count
full_customer_123_bytes = sample_count * customer_123_bytes

# With seed == 1 we get 1008612 - quite close to the statistical mean of 1000000! (more exemplars would make this estimation even more accurate)
print(
"Customer 123 sent about {} bytes this interval".format(
int(full_customer_123_bytes)
)
)

# Determine the top 25 customers by how many bytes they sent in exemplars
top_25_customers = customer_bytes_list[:25]

# out of those 25 customers, determine how many used grpc, and come up with a ratio
percent_grpc = sum(
1
for customer_value in top_25_customers
if customer_value[0][1][1] == "gRPC"
) / len(top_25_customers)

print(
"~{}% of the top 25 customers (by bytes in) used gRPC this interval".format(
int(percent_grpc * 100)
)
)

# Determine the 50th, 90th, and 99th percentile of byte size sent in
quantiles = np.quantile(
[exemplar.value for exemplar in exemplars], [0.5, 0.9, 0.99]
)
print("50th Percentile Bytes In:", int(quantiles[0]))
print("90th Percentile Bytes In:", int(quantiles[1]))
print("99th Percentile Bytes In:", int(quantiles[2]))
Comment on lines +178 to +180
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how accurate these were

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, not sure a good way to get that information.

84 changes: 84 additions & 0 deletions docs/examples/exemplars/trace_exemplars.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Copyright The OpenTelemetry Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
"""
This example shows how to generate trace exemplars for a histogram, and how to export them to Google Cloud Monitoring.
"""

import random
import time

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider, ValueRecorder
from opentelemetry.sdk.metrics.export import ConsoleMetricsExporter
from opentelemetry.sdk.metrics.export.aggregate import HistogramAggregator
from opentelemetry.sdk.metrics.view import View, ViewConfig

# Set up OpenTelemetry metrics
metrics.set_meter_provider(MeterProvider(stateful=False))
meter = metrics.get_meter(__name__)

# Use the Google Cloud Monitoring Metrics Exporter since its the only one that currently supports exemplars
metrics.get_meter_provider().start_pipeline(
meter, ConsoleMetricsExporter(), 10
)

# Create our duration metric
request_duration = meter.create_metric(
name="request_duration",
description="duration (ms) of incoming requests",
unit="ms",
value_type=int,
metric_type=ValueRecorder,
)

# Add a Histogram view to our duration metric, and make it generate 1 exemplars per bucket
duration_view = View(
request_duration,
# Latency in buckets:
# [>=0ms, >=25ms, >=50ms, >=75ms, >=100ms, >=200ms, >=400ms, >=600ms, >=800ms, >=1s, >=2s, >=4s, >=6s]
# We want to generate 1 exemplar per bucket, where each exemplar has a linked trace that was recorded.
# So we need to set num_exemplars to 1 and not specify statistical_exemplars (defaults to false)
HistogramAggregator,
aggregator_config={
"bounds": [
0,
25,
50,
75,
100,
200,
400,
600,
800,
1000,
2000,
4000,
6000,
],
"num_exemplars": 1,
},
label_keys=["environment"],
view_config=ViewConfig.LABEL_KEYS,
)

meter.register_view(duration_view)

for i in range(100):
# Generate some random data for the histogram with a dropped label "customer_id"
request_duration.record(
random.randint(1, 8000),
{"environment": "staging", "customer_id": random.randint(1, 100)},
)
time.sleep(1)
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def update(self, value: metrics_api.ValueT):
with self._view_datas_lock:
# record the value for each view_data belonging to this aggregator
for view_data in self.view_datas:
view_data.record(value)
view_data.record(value, self._labels)

def release(self):
self.decrease_ref_count()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,12 @@ def export(
) -> "MetricsExportResult":
for record in metric_records:
print(
'{}(data="{}", labels="{}", value={})'.format(
'{}(data="{}", labels="{}", value={}, exemplars={})'.format(
type(self).__name__,
record.instrument,
record.labels,
record.aggregator.checkpoint,
record.aggregator.checkpoint_exemplars,
)
)
return MetricsExportResult.SUCCESS
Loading