Add MCO data exportation proposal #1639

pavolloffay · 2024-06-10T10:58:56Z

Some more information is available as well in https://docs.google.com/presentation/d/1atQkPXjL7oRX_WDpbGoP49LBAusgukMrWAYeUdkUryU/edit#slide=id.g2e465a380d2_0_0

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

openshift-ci · 2024-06-10T11:00:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from pavolloffay. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

enhancements/observability/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

iblancasa · 2024-06-10T11:01:46Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+
+## Summary
+
+The objective of multi cluster observability is to offer users a capability to collect 


cluster observability addon or cluster observability operator?

MCO -multi cluster observability, which MCOA is part of

Yes what I meant is that the sentence sounds like incomplete because it is not clear if you refer to one or another

enhancements/observability/multicluster-observability-addon-interoperability.md

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

enhancements/observability/multicluster-observability-addon-interoperability.md

simonpasquier · 2024-06-10T13:43:07Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+
+### Goals
+
+* Use a single protocol (OTLP) for exporting all data to MCO and/or 3rd party system.


I don't see how "Use a single protocol (OTLP) for exporting all data to MCO" is related to any of the user stories. I'd consider the protocol being used between the spoke clusters and the hub to be an internal implementation detail.

Not all customers want to send data from spoke clusters to a central location in their infrastructure. There are cases where they directly want to send from spoke to a third party service. At that point, it is important the spoke clusters can "talk" to external services and the protocol used is not just an internal detail.

User story:

As a fleet administrator, I want to export all telemetry signals collected by MCOA to an OTLP compatible endpoint(s).

is related to this goal. The OTLP protocol is crucial here as it is the most supported protocol across observability vendors and OSS tools.

moadz

Thanks for the proposal Pavol! I think it's a really interesting idea, and it's quite sexy to hold, but paints a very complex problem with a reasonably broad brush. I'll also caveat this by saying i'm responding from a purely metrics perspective, as I have little context on logs. I'll try to structure this in the form of a few outrageously leading and contrived questions:

Should we be making it easier to export telemetry via MCO?

Answer: A resounding YES

I agree with the core thesis of this proposal, which is that a user should be able to reason about exports purely in open and unified, vendor agnostic means.

This means if I want to export metrics, logs and traces from a cluster, OTLP provides the most bang-for-your-buck in terms of vendor compatibility. This is something that is already achievable by running the OTEL Collector either alongside the existing stack, or completely on its own.

We should:

Make it trivial to export metrics and alerts in the OTLP format from the in-cluster stack if deployed. This should be easy and one-CRD/configuration step.
Make it so that the in-cluster components speak nicely to OTEL Collector in this case and treat OTLP as close to a first-class citizen in OpenShift as possible (think CMO forwarding via OTEL Collector etc.)
Or simply just run the collector on its own as we do with Microshift and RHEL hosts via flightctl

So far... we 100% agree.

If we're already using it for third parties, should we use a single protocol (OTLP via OTEL Collector) for exporting all metrics data to MCO?

Answer: No

This is where this proposal confuses me slightly, because you're making an assumption that all MCO and OCP customers would like to export metrics via OTLP, and thus we should also adopt the collector for this purpose.

If the aim is to make it easy to export to a third party, OTLP/OTEL collector makes sense because the user is not always authoritative over compatibility with the third party.

MCO is not a third party, it is a first party. The MCO components run on customer premises and they own the data that they are producing. They can go in and delete, manage access to it, and audit its contents without additional cost. The benefits of OTLP for a third party don't apply to MCO as a first party, because we have end-to-end authority over ingestion, storage and query. This is the core of why we should be investing in MCO as a product segment because it's unique to us as a platform provider. Our equivalents in Cloudwatch/AWS, Azure Monitor/Azure and Google Cloud Monitoring/GCP do not allow the customer to actually retain their data on their own infrastructure, and manipulate it at their will.

Given we control how the sausage is made, and how it is subsequently consumed, there is actually very little value in adopting OTLP on the critical path here. Given users will be paying for the compute, and would like it to run reliably and cheaply on their infrastructure, that should be our focus.

The factual basis for this is that we have received zero (0) RFE's from ACM monitoring customers asking us to support OTLP as a line format. So if none of our existing customers are asking for it why build it?

This approach seems to suggest that the overhead of producing and/or storing metrics in Prometheus, translating them to OTLP and then translating them back to OpenMetrics format is worth it in the default case (collecting and storing platform metrics on MCO), which currently encompasses 100% of our users.

Does this mean that there is no future for the OTEL collector as a general purpose Observability data sink and forwarder on the critical path for metrics ingest into MCO storage?

Answer: Categorically NO.

I personally would love it if this were the case. It would slim down our stack, reduce running costs and simplify operation and maintenance. As such I have set the minimum criteria we would need from OTEL collector without regressing on existing functionality we provide.

I'm all for simplification and unification as long as it abides by one core and sacrosanct principle. It must be for the functional(features) or non-functional(performance and reliability) benefit of the end-user.

Functional benefits

Currently the most pressing functional benefits we would need form OTEL collector to use on the critical path is:

[MUST] Aggregation of metrics (recording rules); infrastructure metrics cardinality is huge, even to third parties, so we would need to be able to aggregate metrics to reduce the storage and query cost in the hub.
[MUST] Downsampling-on–the-wire for metrics; bandwidth is a sacred resource and with high-cardinality metrics writing more often can often cost you 10x the egress with little benefit to show for it. Metrics-collector currently only does this. It reduces cluster egress costs at the cost of semantic accuracy by only scraping every 5 mins. If OTEL Collector could do this we would use it asap.
[SHOULD] Dynamic catch-all observability signal collection based on boundary conditions; e.g. if an alert starts firing, collect all container level metrics, traces and logs in the namespace related to the alert. We do this with metrics-collector currently, but the implementation leaves much to be desired.
[COULD] Correlation and troubleshooting benefits that are materialised in the ACM UI; if we're not doing this, then there's no benefit in supporting unified collection in the spokes.

Non-functional benefits

[MUST] Prometheus/OTEL Collector via OTLP is more/equally as performant to Prometheus/Metrics-collector via remote-write; OTEL Collector metrics performance remains an unknown quantity, the published benchmarks leave a lot to be desired as they compare OTLP performance to defunct standards like OpenCensus and SignalFX on very small samples (10k DPS) we need a broader and more comprehensive tests on how this performs. What we need for OTEL collector before we use OTLP for our default Spoke-to-Hub exposition format is something equivalent to the remote-write 2.0 spec benchmarks.

So CPU/Memory profiles, load tests and flamegraphs. The whole shabang!

[MUST] Be fault tolerant during rollouts and configuration changes; our customers rely on MCO/Hub alert forwarding for troubleshooting infrastructure issues and declaring incidents. The collector should be able to provide this facility, either by running in HA for rollout fault tolerance, or proxying critical metrics and alerts through a redundant stack. This is actually one of metrics-collectors weak points, so would be great if we could address this through the OTEL Collector.

This is all assuming that native OTLP write support does not materialise in the short term, but that would likely be a replication format, and would not address the features collector could unlock wrt on the wire processing.

moadz · 2024-06-10T18:08:55Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+metrics, logs and traces from spoke clusters. Currently, the collection technology 
+uses three different technology stacks and protocols for exporting data (Prometheus 
+remote-write for metrics, Loki push for logs and OTLP for traces). 
+Loki push and Prometheus remote-write are not commonly supported as ingest protocols by 


This isn't strictly an accurate statement. 'Most' suggests a vast majority don't accept it, which is not correct. This list is not even comprehensive wrt to 'Managed Prometheus' offerrings, all of which natively support .

A more accurate statement would be DataDog and Dynatrace do not support native Prometheus remote_write.

Can attest to this. Prometheus remote write is a hugely popular protocol that the community has adopted, and there are now efforts for even 2.0 of this protocol to include more info. Most common large-scale vendors support it, but some rely on other signals as their main source of data.

I would say Loki conventions are also quite popular and people adhere to it, even if the underlying project is different.

Most common large-scale vendors support it, but some rely on other signals as their main source of data.

I did some digging into various vendors. Some of the largest vendors don't support RW but also some of the new/smaller vendors don't support RW either. On the other hand, most of them support OTLP. I will rephrase the sentence.

Datadog: no native RW, no RW ingestion via their agent https://docs.datadoghq.com/containers/kubernetes/prometheus/?tab=kubernetesadv2. Vector has beta support for RW but not sure if DD supports it. They don't support OTLP natively either, only via their agent.
Dynatrace: no native RW, no RW ingestion via their agent
Honeycomb: no native RW
Instana no native RW, ingestion possible via their agent
LogicMonitor: no native RW https://www.logicmonitor.com/support/monitoring/applications-databases/openmetrics-monitoring
Lumigo: no RW/prometheus support
Lightstep: no RW support, https://docs.lightstep.com/docs/ingest-prometheus

Splunk: native RW
New relic: native RW
elastic: native RW

moadz · 2024-06-10T18:19:24Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+
+### Goals
+
+* Use a single protocol (OTLP) for exporting all data to MCO and/or 3rd party system.


This is contradictory to the original stated aim, which is:

This enhancement proposal seeks to strengthen interoperability of MCOA by unifying and
simplifying exporting of all MCOA telemetry data (metrics, logs, traces)

and the User Story provided:

As a fleet administrator, I want to export all telemetry signals collected by MCOA to an OTLP compatible endpoint(s).

What protocol the spokes speak to the central store is immaterial to the user, given that isn't something that they need to be compatible with. It's compatible by default.

The main objective of the proposal is to enable users to export data to 3rd party observability vendors with day two functional requirements (filtering, routing).

From the summary

This capability
enables users to send data from MCOA to any observability vendor and apply
fine-grained filtering and routing on exported data to configurable sinks.

enhancements/observability/multicluster-observability-addon-interoperability.md

moadz · 2024-06-10T18:24:17Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+1. Configure OTLP endpoint in MCO (`MultiClusterObservability`) CR.
+2. The MCOA configures an additional OTLP exporter in the OpenTelemetry collector. The 
+   exporter is in the pipeline that receives all data.
+3. (optional) Filtering (e.g. for PII) can be configured in 


Is the expectation that users will configure their own PII filtering? With platform metrics it's mostly pod names and IP's which i guess could be anonymised, but that would render them useless for general platform troubleshooting would it not? I'm still not clear on how this is supposed to be held by users.

Furthermore if they are writing metrics onto clusters that they own, PII becomes a authorization and deletion concern, not an ingestion concern. I would say this feature is mostly relevant when offloading your observability data to a vendor or third party (e.g. RHOBS or DataDog)

I'm not clear as to how this would be easier/unified, Such filtering can be configured on prometheus scrape-level or for ClusterLoggingForwarder as well and actually having them split ensures a user can be intentional about what exactly they want to filter. Such PII-filtering configuration for metrics can be easily set on scrape configs as needed if one really wants to filter out certain specific labels.

But @moadz raises a great point here which is, when will I as a user, need to censor my own metrics?

Is the expectation that users will configure their own PII filtering?

yes, and most likely mostly for user workload metrics. They could filter out platform data as well, but it will be their responsibility if they break the console.

logs are traces are more important for PII than metrics.

I'm not clear as to how this would be easier/unified, Such filtering can be configured on prometheus scrape-level or for ClusterLoggingForwarder as well and actually having them split ensures a user can be intentional about what exactly they want to filter.

With MCO we are intentional about making it easy to provision and manage the entire stack and ultimately provide a good integrated product experience. As a user I would prefer to configure processing/filtering capability in a single API rather than on three different APIs/stacks (they could even have different processing/filtering capabilities).

moadz · 2024-06-10T18:27:18Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+To support above workflow MCOA deploys additional collector which forwards all data to 
+MCO telemetry store and/or 3rd party OTLP endpoint.
+
+- An `OpenTelemetryCollector` resource that enables receivers for all telemetry 
+  signals and an OTLP exporter for and forwarding to the MCO store and 3rd party 
+  vendor.


How is this solving for exporting to third parties? MCO runs on customer prem, on APIs that are versioned (on both sides) alongside the release of ACM that they are running on, it isn't a third party. So does the OTEL collector have to encompass this ingress to MCO as well? What is the material benefit to the user if that is the case?

This also makes the assumption that most users would like to ingest their metrics in OTLP, which is again not the case. There are zero (0) ACM Observability RFE's that include facilitating OTLP between the spokes and the hub.

I would also be curious as to if this would be an efficient way of exporting data to a vendor.

There are usually limits to how many requests a vendor will ingest concurrently, and there might be rate limits as well.

For customers having a large number of spoke clusters, wouldn't first ingesting centrally and then exporting a much more efficient way of doing this?

There are usually limits to how many requests a vendor will ingest concurrently, and there might be rate limits as well.

@saswatamcode do you have any pointers for this? It seems contra productive for a vendor to rate limit their customers who are usually billed per ingested data volumes.

Yes usually more profitable for a vendor to ingest as much as possible but vendors would also need to protect their own infra or have user-set billing limits. I know datadog had something like https://docs.datadoghq.com/api/latest/rate-limits/. Not sure about others

enhancements/observability/multicluster-observability-addon-interoperability.md

moadz · 2024-06-10T18:48:03Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+- MCOA configuration through the MultiClusterObservability: the MCO CR nowadays has an already extensive set of configuration fields, when designing the MCOA configuration, we will need to take extra caution as to not make this CR more complex and hard to navigate;
+- MCOA manifest sync: with MCOA being deployed by MCO we will need to set up a procedure to maintain the MCOA manifests that live in the MCO repo up to date.
+- CRD conflicts: MCOA will leverage the CRDs from other operators we will have to ensure that we will not be running into situations where two operators are managing the same CRD


These are definitely valid drawbacks, but are broad and generic to MCOA/Open Cluster Management and Kubernetes as a whole, not specifically to the approach echoed here.

The main drawbacks as I see them are:

Highly centralised component responsible for all observability signals. By extension, if the OTEL collector is down, you would get nothing exported out of your cluster, even mission critical alerting that might tell you your collector is down for example. If you're purely using it for vendor driven alerting, that's probably fine. But a lack if in-cluster alerting if you opt in for that topology means you would never know if you stopped producing telemetry.

Lack of native HA redundancy on OTEL Collector as a technology, how is its availability impacted by rollouts for example? How fault tolerant is it going to be?

To add to this,

Native OTLP ingestion is not stable for upstream projects like Prometheus and Thanos, and even if they work, they are not yet performant enough to replace remote write, and there are quite some decisions to be taken, on how to handle differences between the protocol.

OTLP remote write exporter can be used but then this is also a slower path as compared to directly remote writing metrics, and such conversions from one protocol to another can result in broken semantics at times.

I think in this proposal @saswatamcode, there is a collector running in the hub that translates back to remote_write to avoid the drawbacks you mentioned.

Yup either way we go, there are drawbacks.

I think in this proposal @saswatamcode, there is a collector running in the hub that translates back to remote_write to avoid the drawbacks you mentioned.

No, this proposal does not imply running a collector on the hub that translates to protocols supported by the stores.

Thanks for the link. The translator most likely incurs some CPU/mem cost. Are there some benchmarks that show that or even a change in the ingestion throughput?

Between remote write vs native otlp? I don't think there is anything publicly available as a benchmark, maybe some prombench scenario. But the way this translator works is by translating otlp to prometheus remote write requests https://github.com/prometheus/prometheus/blob/main/storage/remote/otlptranslator/prometheusremotewrite/metrics_to_prw.go#L41 so it is doing additional work on top of regular remote write ingestion.

To get around this vendors like grafana have products like https://grafana.com/docs/grafana-cloud/send-data/alloy/

Hopefully over some time we see it become more native in Prometheus, and equally performant 🙂

To get around this vendors like grafana have products like https://grafana.com/docs/grafana-cloud/send-data/alloy/

That is just a custom build like OTEL collector. We have Red Hat build of OpenTelemetry https://docs.openshift.com/container-platform/4.15/observability/otel/otel-configuration-of-otel-collector.html. In the next version we will add Prometheus Remote Write exporter.

Here are some benchmark results, comparing

Prometheus -> RW -> RW endpoint

OTelCol -> OLTP -> OTLP endpoint

OtelCol -> RW -> RW endpoint

https://github.com/danielm0hr/edge-metrics-measurements/blob/main/talks/DanielMohr_PromAgentVsOtelCol.pdf

I guess in scenario 3 this translation is happening, right?

Yes, the 3rd one is translating on the export side.
The translation layer I mentioned above is a new one on ingest i.e, OTelCol -> OTLP -> Prometheus/Thanos (as OTLP endpoint).

moadz · 2024-06-10T18:54:09Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+exporting data to other systems with custom protocols (e.g. AWS CloudWatch, 
+Google Cloud Monitoring/Logging, Azure Monitor).


Google Cloud Monitoring and Azure Monitor support remote write already.

AWS Cloudwatch does not, but AWS Managed Prometheus plugs into cloudwatch and accepts remote_write. Likewise for GCM and Googles managed Prometheus.

I am not an expert here, but it seems like the Azure monitor requires a sidecar for ingesting remote-write. This applies for metrics, for logs and traces a solution might be different.

The intention is to simplify data exporting by providing a well-supported solution across telemetry signals supported by MCO. A unified approach will eliminate silos and overlapping product features.

saswatamcode · 2024-06-10T21:37:52Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+metrics, logs and traces from spoke clusters. Currently, the collection technology 
+uses three different technology stacks and protocols for exporting data (Prometheus 
+remote-write for metrics, Loki push for logs and OTLP for traces). 
+Loki push and Prometheus remote-write are not commonly supported as ingest protocols by 


Can attest to this. Prometheus remote write is a hugely popular protocol that the community has adopted, and there are now efforts for even 2.0 of this protocol to include more info. Most common large-scale vendors support it, but some rely on other signals as their main source of data.

I would say Loki conventions are also quite popular and people adhere to it, even if the underlying project is different.

saswatamcode · 2024-06-10T21:38:54Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+1. Configure OTLP endpoint in MCO (`MultiClusterObservability`) CR.
+2. The MCOA configures an additional OTLP exporter in the OpenTelemetry collector. The 
+   exporter is in the pipeline that receives all data.
+3. (optional) Filtering (e.g. for PII) can be configured in 


I'm not clear as to how this would be easier/unified, Such filtering can be configured on prometheus scrape-level or for ClusterLoggingForwarder as well and actually having them split ensures a user can be intentional about what exactly they want to filter. Such PII-filtering configuration for metrics can be easily set on scrape configs as needed if one really wants to filter out certain specific labels.

But @moadz raises a great point here which is, when will I as a user, need to censor my own metrics?

saswatamcode · 2024-06-10T21:49:48Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+To support above workflow MCOA deploys additional collector which forwards all data to 
+MCO telemetry store and/or 3rd party OTLP endpoint.
+
+- An `OpenTelemetryCollector` resource that enables receivers for all telemetry 
+  signals and an OTLP exporter for and forwarding to the MCO store and 3rd party 
+  vendor.


I would also be curious as to if this would be an efficient way of exporting data to a vendor.

There are usually limits to how many requests a vendor will ingest concurrently, and there might be rate limits as well.

For customers having a large number of spoke clusters, wouldn't first ingesting centrally and then exporting a much more efficient way of doing this?

saswatamcode · 2024-06-10T21:56:35Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+technology stacks (Prometheus, ClusterLoggingForwarder, OpenTelemetry collector). 
+Every tool uses a different configuration API, export protocol and provides (or does 


Would this really be a negative aspect? All these stacks are tailor-made to handle their representative signals, and all have their own user bases. I'm not clear as to why it is a bad thing if I don't want to configure my metrics the same way I configure my logs or my traces. Each serves a different utility, and as a user, I'd like to choose how to manage them separately

@saswatamcode This may be true but I don't read that as the intent of this proposal; that seems orthogonal. I understand the intent as providing a way to unify the signals to a single protocol which is OTLP using OTEL semantics

saswatamcode · 2024-06-10T22:02:16Z

enhancements/observability/multicluster-observability-addon-interoperability.md

+- MCOA configuration through the MultiClusterObservability: the MCO CR nowadays has an already extensive set of configuration fields, when designing the MCOA configuration, we will need to take extra caution as to not make this CR more complex and hard to navigate;
+- MCOA manifest sync: with MCOA being deployed by MCO we will need to set up a procedure to maintain the MCOA manifests that live in the MCO repo up to date.
+- CRD conflicts: MCOA will leverage the CRDs from other operators we will have to ensure that we will not be running into situations where two operators are managing the same CRD


To add to this,

Native OTLP ingestion is not stable for upstream projects like Prometheus and Thanos, and even if they work, they are not yet performant enough to replace remote write, and there are quite some decisions to be taken, on how to handle differences between the protocol.

OTLP remote write exporter can be used but then this is also a slower path as compared to directly remote writing metrics, and such conversions from one protocol to another can result in broken semantics at times.

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

pavolloffay · 2024-06-11T13:14:11Z

@moadz regarding

If we're already using it for third parties, should we use a single protocol (OTLP via OTEL Collector) for exporting all metrics data to MCO?

I agree that it is an implementation detail of how data is sent to the MCO telemetry store. We will need to support remote-write for existing deployments anyway. I see the value of unifying on the OTLP for the hub store as well if we can guarantee non-functional requirements.

I have altered the proposal to use OTLP only for 3rd party stores.

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

jan--f · 2024-06-19T13:15:12Z

Moad make a lot of good points from the technical perspective, I agree with all of them, especially the arguent against moving the ACM internal data streams to OTLP. I would like to raise another, perhaps slightly less technical point.
The title of this had me quite excited.
I think discussing APIs, especially when it comes to data export, instead of technology choices is a great idea. However this proposal would be more aptly named Add MCO OTLP export. I thought this was possible already too.

Arguing about a technology agnostic API would improve the scope of the discussion. Furthermore we could hopefully avoid discussing this again in a while when the next new hot tech comes around. With an API in place we can then add, switch and deprecate technologies as needed.
I'm sure in practice it won't be as simple as I make it out to be here, but I do think separating what API would solve a problem from "we should us technology X" would scope arguments more effectively.

pavolloffay · 2024-06-20T08:53:22Z

Being explicit on the protocol is necessary here, our choice should take into account which systems we want to integrate with. Every system supports an explicit list of specific protocols. The important aspect of providing support for another protocol depends on our internal architecture (e.g. it's easier to implement on a single component than on 3 different stacks) and how we structure high-level MCO CRD.

jan--f · 2024-06-26T16:06:52Z

Sure I agree this proposal should include a protocol that is getting implemented. What I'm arguing for is to abstract the API layer such that other implementations are also possible. Alternatively lets at least rename this Add MCO OTLP export or similar.

I would strongly prefer a data export API that is technology agnostic as much as it can be.

openshift-bot · 2024-07-31T01:15:05Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

iblancasa · 2024-07-31T10:11:37Z

/remove-lifecycle stale

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

openshift-ci · 2024-08-21T14:20:08Z

@pavolloffay: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`9af53af`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pavolloffay added 2 commits June 10, 2024 12:30

Add MCO data exportation proposal

408f689

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

Fix

cf7a6a3

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

openshift-ci bot requested review from jan--f and jcantrill June 10, 2024 11:00

iblancasa reviewed Jun 10, 2024

View reviewed changes

pavolloffay added 3 commits June 10, 2024 14:54

Add image

528c3ca

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

fix

953b5a0

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

fix

1224507

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

jcantrill reviewed Jun 10, 2024

View reviewed changes

simonpasquier reviewed Jun 10, 2024

View reviewed changes

moadz suggested changes Jun 10, 2024

View reviewed changes

saswatamcode reviewed Jun 10, 2024

View reviewed changes

pavolloffay added 2 commits June 11, 2024 10:08

Fix some review comments

0c53dcb

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

Fix

4a75f15

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

Fix

62a07ac

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2024

Some edits

9af53af

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>


		## Summary

		The objective of multi cluster observability is to offer users a capability to collect


		### Goals

		* Use a single protocol (OTLP) for exporting all data to MCO and/or 3rd party system.

		exporting data to other systems with custom protocols (e.g. AWS CloudWatch,
		Google Cloud Monitoring/Logging, Azure Monitor).

		technology stacks (Prometheus, ClusterLoggingForwarder, OpenTelemetry collector).
		Every tool uses a different configuration API, export protocol and provides (or does

Add MCO data exportation proposal #1639

Are you sure you want to change the base?

Add MCO data exportation proposal #1639

Conversation

pavolloffay commented Jun 10, 2024

openshift-ci bot commented Jun 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

moadz left a comment • edited Loading

Choose a reason for hiding this comment

Should we be making it easier to export telemetry via MCO?

Answer: A resounding YES

If we're already using it for third parties, should we use a single protocol (OTLP via OTEL Collector) for exporting all metrics data to MCO?

Answer: No

Does this mean that there is no future for the OTEL collector as a general purpose Observability data sink and forwarder on the critical path for metrics ingest into MCO storage?

Answer: Categorically NO.

Functional benefits

Non-functional benefits

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saswatamcode Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Jun 11, 2024

jan--f commented Jun 19, 2024

pavolloffay commented Jun 20, 2024

jan--f commented Jun 26, 2024

openshift-bot commented Jul 31, 2024

iblancasa commented Jul 31, 2024

openshift-ci bot commented Aug 21, 2024

pavolloffay Jun 11, 2024 •

edited

Loading

moadz left a comment •

edited

Loading

saswatamcode Jun 11, 2024 •

edited

Loading