Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for sampling #882

Merged
merged 4 commits into from
Jul 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/api/trace.sampling.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
opentelemetry.trace.sampling
============================
Sampling Traces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right file to make these changes? One con I see is that documentation on traces won't show up in the top level search tree, so it'll be much harder to find.

Also if you look at the current API docs page, there'll be no information that there are more meaningful docs here:

https://opentelemetry-python.readthedocs.io/en/stable/api/api.html

It'll just say "opentelemetry.trace package".

I would advocate for something to bring this documentation into top-level nav, for easy discovery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - does it make sense to just move it up a level into the OpenTelemetry Python API package or should it be alone in its own subsection somewhere? Or maybe this just means every package with actual documentation should have a non-default name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's actually fine to leave this doc as a submodule of trace: it doesn't seem to be a "top-level" concept or a pillar of observability. We might have sampling for logging in the future, but the API surface and behaviour are different enough that I think we should have a separate opentelemetry.logging.sampling package for that when that days comes. Addressing @toumorokoshi comment about difficulty finding the doc, this behaviour is the same for all the documentation that isn't in the top level search tree, so why should sampling be any different? Unless if we have a flat tree and place all the modules in the top level, there's going to be some searching involved to find these topics (i'm also against a flat tree because it is too complicated visually).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for me, common intermediate-advanced use cases should be called out. In this case, I feel that configuring a sample is a common use case.

@cnnradams if you want to make a final call here, this PR content is good regardless of whether it's exposed at the top level. So if you at least address my existing comments, I'll approve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these docs are in a reasonable location if the names of the parent/sibling pages don't look like they are auto-generated (specifically, the OpenTelemetry Python API page makes every child look auto generated). Common use cases like sampling probably should be described through root level examples which link to the actual documentation.

either way, addressed your other comments.

===============

.. automodule:: opentelemetry.trace.sampling
:members:
Expand Down
64 changes: 61 additions & 3 deletions opentelemetry-api/src/opentelemetry/trace/sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,52 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""
For general information about sampling, see `the specification <https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#sampling>`_.

OpenTelemetry provides two types of samplers:
toumorokoshi marked this conversation as resolved.
Show resolved Hide resolved

- `StaticSampler`
- `ProbabilitySampler`

A `StaticSampler` always returns the same sampling decision regardless of the conditions. Both possible StaticSamplers are already created:

- Always sample spans: `ALWAYS_ON`
- Never sample spans: `ALWAYS_OFF`

A `ProbabilitySampler` makes a random sampling decision based on the sampling probability given. If the span being sampled has a parent, `ProbabilitySampler` will respect the parent span's sampling decision.

Currently, sampling decisions are always made during the creation of the span. However, this might not always be the case in the future (see `OTEP #115 <https://github.com/open-telemetry/oteps/pull/115>`_).

Custom samplers can be created by subclassing `Sampler` and implementing `Sampler.should_sample`.

To use a sampler, pass it into the tracer provider constructor. For example:

.. code:: python

from opentelemetry import trace
from opentelemetry.trace.sampling import ProbabilitySampler
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
ConsoleSpanExporter,
SimpleExportSpanProcessor,
)

# sample 1 in every 1000 traces
sampler = ProbabilitySampler(1/1000)

# set the sampler onto the global tracer provider
trace.set_tracer_provider(TracerProvider(sampler=sampler))

# set up an exporter for sampled spans
trace.get_tracer_provider().add_span_processor(
SimpleExportSpanProcessor(ConsoleSpanExporter())
)

# created spans will now be sampled by the ProbabilitySampler
with trace.get_tracer(__name__).start_as_current_span("Test Span"):
...
"""
import abc
from typing import Dict, Mapping, Optional, Sequence

Expand Down Expand Up @@ -78,6 +124,14 @@ def should_sample(


class ProbabilitySampler(Sampler):
"""
Sampler that makes sampling decisions probabalistically based on `rate`,
while also respecting the parent span sampling decision.

Args:
rate: Probability (between 0 and 1) that a span will be sampled
cnnradams marked this conversation as resolved.
Show resolved Hide resolved
"""

def __init__(self, rate: float):
self._rate = rate
self._bound = self.get_bound_for_rate(self._rate)
Expand Down Expand Up @@ -118,11 +172,15 @@ def should_sample(
return Decision(trace_id & self.TRACE_ID_LIMIT < self.bound)


# Samplers that ignore the parent sampling decision and never/always sample.
ALWAYS_OFF = StaticSampler(Decision(False))
"""Sampler that never samples spans, regardless of the parent span's sampling decision."""
toumorokoshi marked this conversation as resolved.
Show resolved Hide resolved

ALWAYS_ON = StaticSampler(Decision(True))
"""Sampler that always samples spans, regardless of the parent span's sampling decision."""


# Samplers that respect the parent sampling decision, but otherwise
# never/always sample.
DEFAULT_OFF = ProbabilitySampler(0.0)
"""Sampler that respects its parent span's sampling decision, but otherwise never samples."""

DEFAULT_ON = ProbabilitySampler(1.0)
"""Sampler that respects its parent span's sampling decision, but otherwise always samples."""