Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry performance benchmark spec #748

Merged
merged 19 commits into from
Nov 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ release.

New:

- Add performance benchmark specification
([#748](https://github.com/open-telemetry/opentelemetry-specification/pull/748))
- Enforce that the Baggage API must be fully functional, even without an installed SDK.
([#1103](https://github.com/open-telemetry/opentelemetry-specification/pull/1103))
- Rename "Canonical status code" to "Status code"
Expand Down
70 changes: 70 additions & 0 deletions specification/performance-benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Performance Benchmark of OpenTelemetry API

This document describes common performance benchmark guidelines on how to
measure and report the performance of OpenTelemetry SDKs.

The goal of this benchmark is to provide a tool to get the basic performance
overhead of the OpenTelemetry SDK for given events throughput on the target
platform.

## Benchmark Configuration

### Span Configuration

- No parent `Span` and `SpanContext`.
ThomsonTan marked this conversation as resolved.
Show resolved Hide resolved
- Default Span [Kind](./trace/api.md#spankind) and
[Status](./trace/api.md#set-status).
- Associated to a [resource](overview.md#resources) with attributes
`service.name`, `service.version` and 10 characters string value for each
attribute, and attribute `service.instance.id` with a unique UUID. See
[Service](./resource/semantic_conventions/README.md#service) for details.
- 1 [attribute](./common/common.md#attributes) with a signed 64-bit integer
value.
- 1 [event](./trace/api.md#add-events) without any attributes.
- The `AlwaysOn` sampler should be enabled.
- Each `Span` is created and immediately ended.

### Measurement Configuration

For the languages with bootstrap cost like JIT compilation, a warm-up phase is
recommended to take place before the measurement, which runs under the same
`Span` [configuration](#span-configuration).

## Throughput Measurement

### Create Spans

Number of spans which could be created and exported via OTLP exporter in 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this whole section before I understood that it describes the measurement that the experiment should take and report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Measurement" to related section header to highlight the subject, let me know if further clarification needed, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: in languages with a JIT (like Java), you'll definitely want to have the benchmark system warm up a bit before actually making measurements. This is notoriously difficult to get right. Also, with essentially identical input data, it's very possible that the Java JIT will optimize away significant portions of the code that might not be optimized during realistic workloads.

I would definitely add a description of what information you're hoping to extract by making this measurement like this. It will also probably vary greatly depending on what sort of hardware you're running the benchmark on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note: with zero-duration spans, the BatchSpanProcessor is likely to drop a large number of them, due to extremely high volume (at least, this is how it works in the Java implementation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more note: The BatchSpanProcessor is highly configurable, and the throughput of this setup will vary a lot depending on how it's configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default configuration of BSP and OTLP in the implementating SDK is suggested to run SDK. This spec is not targeting to shows perf number for different configurations and helping users to choose the best.

You are right the spans are very close to "zero duration". As the spec is meant to measure pure SDK performance with out any user logic, this would be expected.

Threads number is mentioned here, but the spec requires measuring single core and all cores performance, the the SDK and benchmark author need to define the thread number which does best fit for each scenario.

The spec asks to implement benchmark for each SDK implementation, so when a user wants to adopt one SDK, like opentelemetry-java, the user could run the benchmark program on the target platform to get an idea of the baseline performance cost of SDK with given throughput. Like if the user specifies 10,000 events/second and gets 20% CPU report over all cores from this benchmark, the user could triage whether this is affordable overhead of SDK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkwatson any more questions or further clarification needed? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking to have the motivation and purpose added to the document itself, not just in a PR comment that will be lost forever. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, add the summary of above comment to a new paragraph to the beginning of this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkwatson let me know if more motivation and purpose background need to be added to the doc.

second per logical core and average number over all logical cores, with each
span containing 10 attributes, and each attribute containing two 20 characters
ThomsonTan marked this conversation as resolved.
Show resolved Hide resolved
strings, one as attribute name the other as value.

## Instrumentation Cost

### CPU Usage Measurement

With given number of span throughput specified by user, or 10,000 spans per
second as default if user does not input the number, measure and report the CPU
usage for SDK with both default configured simple and batching span processors
together with OTLP exporter. The benchmark should create an out-of-process OTLP
receiver which listens on the exporting target or adopts existing OTLP exporter
which runs out-of-process, responds with success status immediately and drops
the data. The collector should not add significant CPU overhead to the
measurement. Because the benchmark does not include user processing logic, the
total CPU consumption of benchmark program could be considered as approximation
of SDK's CPU consumption.

The total running time for one test iteration is suggested to be at least 15
seconds. The average and peak CPU usage should be reported.

### Memory Usage Measurement

Measure dynamic memory consumption, e.g. heap, for the same scenario as above
CPU Usage section with 15 seconds duration.

## Report

### Report Format

All the numbers above should be measured multiple times (suggest 10 times at
least) and reported.