Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MCO data exportation proposal #1639

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
---
title: multicluster-logs-traces-forwarding
authors:
- "@pavolloffay"
reviewers:
- "@moadz"
- "@periklis"
- "@alanconway"
- "@jcantrill"
- "@berenss"
- "@bjoydeep"
approvers:
- "@moadz"
- "@periklis"
- "@alanconway"
- "@jcantrill"
api-approvers:
- "@moadz"
- "@periklis"
- "@alanconway"
- "@jcantrill"
creation-date: 2024-06-08
last-updated: 2024-06-08
tracking-link:
-
see-also:
- None
replaces:
- None
superseded-by:
- None
---


# Multi-Cluster telemetry data exportation

## Release Signoff Checklist

- [x] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

The objective of multi cluster observability is to offer users a capability to collect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster observability addon or cluster observability operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCO -multi cluster observability, which MCOA is part of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes what I meant is that the sentence sounds like incomplete because it is not clear if you refer to one or another

metrics, logs and traces from spoke clusters. Currently, the collection technology
uses three different technology stacks and protocols for exporting data (Prometheus
remote-write for metrics, Loki push for logs and OTLP for traces).
Loki push and Prometheus remote-write are not commonly supported as ingest protocols by
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly an accurate statement. 'Most' suggests a vast majority don't accept it, which is not correct. This list is not even comprehensive wrt to 'Managed Prometheus' offerrings, all of which natively support .

A more accurate statement would be DataDog and Dynatrace do not support native Prometheus remote_write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can attest to this. Prometheus remote write is a hugely popular protocol that the community has adopted, and there are now efforts for even 2.0 of this protocol to include more info. Most common large-scale vendors support it, but some rely on other signals as their main source of data.

I would say Loki conventions are also quite popular and people adhere to it, even if the underlying project is different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most common large-scale vendors support it, but some rely on other signals as their main source of data.

I did some digging into various vendors. Some of the largest vendors don't support RW but also some of the new/smaller vendors don't support RW either. On the other hand, most of them support OTLP. I will rephrase the sentence.

Datadog: no native RW, no RW ingestion via their agent https://docs.datadoghq.com/containers/kubernetes/prometheus/?tab=kubernetesadv2. Vector has beta support for RW but not sure if DD supports it. They don't support OTLP natively either, only via their agent.
Dynatrace: no native RW, no RW ingestion via their agent
Honeycomb: no native RW
Instana no native RW, ingestion possible via their agent
LogicMonitor: no native RW https://www.logicmonitor.com/support/monitoring/applications-databases/openmetrics-monitoring
Lumigo: no RW/prometheus support
Lightstep: no RW support, https://docs.lightstep.com/docs/ingest-prometheus

Splunk: native RW
New relic: native RW
elastic: native RW

observability vendors.

This enhancement proposal seeks to strengthen interoperability of MCOA by unifying and
simplifying exporting of all MCOA telemetry data (metrics, logs, traces)
by exposing a unified export API and consolidating export protocols. This capability
enables users to send data from MCOA to any observability vendor and apply
fine-grained filtering and routing on exported data to configurable sinks.

## Motivation

At the moment exporting all telemetry data from OpenShift is fragmented to three
technology stacks (Prometheus, ClusterLoggingForwarder, OpenTelemetry collector).
Every tool uses a different configuration API, export protocol and provides (or does
Comment on lines +63 to +64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this really be a negative aspect? All these stacks are tailor-made to handle their representative signals, and all have their own user bases. I'm not clear as to why it is a bad thing if I don't want to configure my metrics the same way I configure my logs or my traces. Each serves a different utility, and as a user, I'd like to choose how to manage them separately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saswatamcode This may be true but I don't read that as the intent of this proposal; that seems orthogonal. I understand the intent as providing a way to unify the signals to a single protocol which is OTLP using OTEL semantics

not provide) filtering/PII capabilities.

### Prior art and user requests

* Red Hat OpenShift as OpenTelemetry (OTLP) native platform: https://www.redhat.com/en/blog/red-hat-openshift-opentelemetry-otlp-native-platform
* Export in-cluster metrics to 3rd party vendor https://github.com/openshift/cluster-monitoring-operator/issues/2000
* Exporting metrics to Dynatrace https://issues.redhat.com/browse/OBSDA-433
* Export all metrics to Dynatrace https://issues.redhat.com/browse/OBSDA-450
* Customer asking to export metric to Splunk https://redhat-internal.slack.com/archives/C04TFRRKUA2/p1687853284985279

### User Stories

* As a fleet administrator, I want to export all telemetry signals collected by MCOA to an OTLP compatible endpoint(s).
* As a fleet administrator, I want to filter sensitive data before it is exported to MCO telemetry store or 3rd party OTLP endpoint.
* As a fleet administrator, I want to decide which data is exported to MCO telemetry store or 3rd party OTLP endpoint.

### Goals

* Use OTLP protocol to export all telemetry data to a 3rd party system.
* Provide a single configuration API on MCO CRD for exporting all telemetry data.
* Provide unified filtering and routing capabilities for all exported telemetry data.

### Non-Goals

* Data visualization and querying.

## Proposal

The following section describes how data exportation, routing and filtering is
configured in MCO and MCOA.

![Architecture](./multicluster-observability-addon-interoperability-arch.jpg)

### Workflow Description

1. Configure OTLP endpoint in MCO (`MultiClusterObservability`) CR.
2. The MCOA configures an additional OTLP exporter in the OpenTelemetry collector. The
exporter is in the pipeline that receives all supported telemtery signals.
3. (optional) Filtering (e.g. for PII) can be configured in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation that users will configure their own PII filtering? With platform metrics it's mostly pod names and IP's which i guess could be anonymised, but that would render them useless for general platform troubleshooting would it not? I'm still not clear on how this is supposed to be held by users.

Furthermore if they are writing metrics onto clusters that they own, PII becomes a authorization and deletion concern, not an ingestion concern. I would say this feature is mostly relevant when offloading your observability data to a vendor or third party (e.g. RHOBS or DataDog)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear as to how this would be easier/unified, Such filtering can be configured on prometheus scrape-level or for ClusterLoggingForwarder as well and actually having them split ensures a user can be intentional about what exactly they want to filter. Such PII-filtering configuration for metrics can be easily set on scrape configs as needed if one really wants to filter out certain specific labels.

But @moadz raises a great point here which is, when will I as a user, need to censor my own metrics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation that users will configure their own PII filtering?

yes, and most likely mostly for user workload metrics. They could filter out platform data as well, but it will be their responsibility if they break the console.

logs are traces are more important for PII than metrics.

I'm not clear as to how this would be easier/unified, Such filtering can be configured on prometheus scrape-level or for ClusterLoggingForwarder as well and actually having them split ensures a user can be intentional about what exactly they want to filter.

With MCO we are intentional about making it easy to provision and manage the entire stack and ultimately provide a good integrated product experience. As a user I would prefer to configure processing/filtering capability in a single API rather than on three different APIs/stacks (they could even have different processing/filtering capabilities).

`OpenTelemetryCollector` CR manged by MCOA by [transformprocessor](https://github.
com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/transformprocessor/README.md).
4. (optional) Routing can be configured in `OpenTelemetryCollector` CR managed by MCOA by
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
[routingprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/routingprocessor).


### API Extensions

None - no new APIs for CRDs are introduced.

### Implementation Details/Notes/Constraints [optional]

#### General configuration and fleet-wide stanzas

To support above workflow MCOA deploys additional collector which forwards collected
data to 3rd party OTLP endpoint.

- An `OpenTelemetryCollector` resource that enables receivers for supported telemetry
signals. The individual telemetry stacks will forward data to these endpoints.
The collector enables OTLP exporter for and forwarding to 3rd party vendor.

#### Hypershift [optional]

pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
N/A

### Drawbacks

- MCOA configuration through the MultiClusterObservability: the MCO CR nowadays has an already extensive set of configuration fields, when designing the MCOA configuration, we will need to take extra caution as to not make this CR more complex and hard to navigate;
- MCOA manifest sync: with MCOA being deployed by MCO we will need to set up a procedure to maintain the MCOA manifests that live in the MCO repo up to date.
- CRD conflicts: MCOA will leverage the CRDs from other operators we will have to ensure that we will not be running into situations where two operators are managing the same CRD
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are definitely valid drawbacks, but are broad and generic to MCOA/Open Cluster Management and Kubernetes as a whole, not specifically to the approach echoed here.

The main drawbacks as I see them are:

  • Highly centralised component responsible for all observability signals. By extension, if the OTEL collector is down, you would get nothing exported out of your cluster, even mission critical alerting that might tell you your collector is down for example. If you're purely using it for vendor driven alerting, that's probably fine. But a lack if in-cluster alerting if you opt in for that topology means you would never know if you stopped producing telemetry.
  • Lack of native HA redundancy on OTEL Collector as a technology, how is its availability impacted by rollouts for example? How fault tolerant is it going to be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to this,

  • Native OTLP ingestion is not stable for upstream projects like Prometheus and Thanos, and even if they work, they are not yet performant enough to replace remote write, and there are quite some decisions to be taken, on how to handle differences between the protocol.
  • OTLP remote write exporter can be used but then this is also a slower path as compared to directly remote writing metrics, and such conversions from one protocol to another can result in broken semantics at times.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this proposal @saswatamcode, there is a collector running in the hub that translates back to remote_write to avoid the drawbacks you mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup either way we go, there are drawbacks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this proposal @saswatamcode, there is a collector running in the hub that translates back to remote_write to avoid the drawbacks you mentioned.

No, this proposal does not imply running a collector on the hub that translates to protocols supported by the stores.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link. The translator most likely incurs some CPU/mem cost. Are there some benchmarks that show that or even a change in the ingestion throughput?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between remote write vs native otlp? I don't think there is anything publicly available as a benchmark, maybe some prombench scenario. But the way this translator works is by translating otlp to prometheus remote write requests https://github.com/prometheus/prometheus/blob/main/storage/remote/otlptranslator/prometheusremotewrite/metrics_to_prw.go#L41 so it is doing additional work on top of regular remote write ingestion.

To get around this vendors like grafana have products like https://grafana.com/docs/grafana-cloud/send-data/alloy/

Hopefully over some time we see it become more native in Prometheus, and equally performant 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get around this vendors like grafana have products like https://grafana.com/docs/grafana-cloud/send-data/alloy/

That is just a custom build like OTEL collector. We have Red Hat build of OpenTelemetry https://docs.openshift.com/container-platform/4.15/observability/otel/otel-configuration-of-otel-collector.html. In the next version we will add Prometheus Remote Write exporter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some benchmark results, comparing

  1. Prometheus -> RW -> RW endpoint
  2. OTelCol -> OLTP -> OTLP endpoint
  3. OtelCol -> RW -> RW endpoint

https://github.com/danielm0hr/edge-metrics-measurements/blob/main/talks/DanielMohr_PromAgentVsOtelCol.pdf

I guess in scenario 3 this translation is happening, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the 3rd one is translating on the export side.
The translation layer I mentioned above is a new one on ingest i.e, OTelCol -> OTLP -> Prometheus/Thanos (as OTLP endpoint).


## Design Details

### Open Questions [optional]

TBD

### Test Plan

TBD

### Graduation Criteria

TBD

#### Dev Preview

TBD

#### Dev Preview -> Tech Preview

TBD

#### Tech Preview -> GA

TBD

#### Removing a deprecated feature

None

### Upgrade / Downgrade Strategy

None

### Version Skew Strategy

None

### Operational Aspects of API Extensions

TBD

#### Failure Modes

TBD

#### Support Procedures

TBD

## Implementation History

TBD

## Alternatives

### Multiple OTLP exporter/sinks

OTLP exporter/sink could be implemented in all telemetry collectors (`Prometheus`,
`ClusterLogForwarder`, `OpenTelemetryCollector`), however providing a common filtering
and routing capabilities will be problematic if not possible.

In addition to exporting in OTLP, a single collector will enable MCO to easily support
exporting data to other systems with custom protocols (e.g. AWS CloudWatch,
Google Cloud Monitoring/Logging, Azure Monitor).
Comment on lines +195 to +196
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google Cloud Monitoring and Azure Monitor support remote write already.

AWS Cloudwatch does not, but AWS Managed Prometheus plugs into cloudwatch and accepts remote_write. Likewise for GCM and Googles managed Prometheus.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert here, but it seems like the Azure monitor requires a sidecar for ingesting remote-write. This applies for metrics, for logs and traces a solution might be different.

The intention is to simplify data exporting by providing a well-supported solution across telemetry signals supported by MCO. A unified approach will eliminate silos and overlapping product features.


### Integrate directly into present Multi-Cluster-Observability-Operator

TBD

## Infrastructure Needed [optional]

None

[ocm-addon-framework]:https://github.com/open-cluster-management-io/addon-framework
[opentelemetry-operator]:https://github.com/open-telemetry/opentelemetry-operator
[rhacm-multi-cluster-observability]:https://github.com/stolostron/multicluster-observability-operator

## RANDOM IDEAS

-