Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1522 Open Telemetry exporter API #671

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

NETOBSERV-1522 Open Telemetry exporter API #671

wants to merge 10 commits into from

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Jun 6, 2024

Description

Add Open Telemetry to FlowCollector exporters:

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
...
  exporters:
    - type: OpenTelemetry
      openTelemetry:
        targetHost: my-otelcol-collector-headless.otlp.svc
        targetPort: 4317
        type: grpc
        logs:
          enable: true
        metrics:
          enable: true
          prefix: netobserv
          pushTimeInterval: 20s
          expiryTime: 2m

Custom transform rules can be applied using customRules field in FlowCollectorOpenTelemetry exporter. Else default rules from https://github.com/rhobs/observability-data-model/blob/main/network-observability.md#format-proposal are applied.
See OpenTelemetryDefaultTransformRules in constants.

Rely on https://github.com/netobserv/flowlogs-pipeline/blob/main/contrib/opentelemetry/demo.md for Otel installation on your cluster

TODO: dig into ways to implement traces. Current traces configuration is disabled to avoid confusion.

Dependencies

netobserv/flowlogs-pipeline#672
netobserv/flowlogs-pipeline#684

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Copy link

openshift-ci bot commented Jun 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

TLS ClientTLS `json:"tls"`

// Rules to transform flow logs to Opentelemetry-conformant format.
Rules []api.GenericTransformRule `json:"rules,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expose rules or hardcode them ?

See netobserv/flowlogs-pipeline#531 for FLP implementation

@jpinsonneau jpinsonneau changed the title NETOBSERV-1522 wip otel exporter API NETOBSERV-1522 Open Telemetry exporter API Jun 10, 2024
@jpinsonneau jpinsonneau marked this pull request as ready for review June 10, 2024 14:20
fromStage.EncodeOtelMetrics(fmt.Sprintf("%s-metrics", name), api.EncodeOtlpMetrics{
OtlpConnectionInfo: &conn,
Prefix: constants.OperatorName,
Metrics: flpMetrics,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ we should probably adapt the metrics as same as logs to match Open Telemetry field namings

For now I just got rid of the transform stage and kept the same metrics as prometheus. Let me know your thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that means to rewrite flpMetrics to map all the labels and the filters with the otel semantic.
That can be done in a follow-up I guess, if you prefer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can add a todo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 6b556fb


// Type of Open Telemetry connection. The available options are `http` and `grpc`.
// +optional
Type string `json:"type"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename protocol ?
That's the term used in otel: https://opentelemetry.io/docs/collector/configuration/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes !


// Time duration of no-flow to wait before deleting data item
// +kubebuilder:default:="2m"
ExpiryTime *metav1.Duration `json:"expiryTime,omitempty"`
Copy link
Member

@jotak jotak Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same setting exists for regular prom metrics but we chose not to expose it as this is more of an implementation detail. I think we should do the same here ;

or if we really want to expose it, then it might be better to set as an advanced setting, and even share the same between prom & otel configs (because at the end of the day this is really the same thing: how long do we keep in memory metrics that reference elements that may disappear, such as those having a pod name label and that pod doesn't exist anymore)

But for simplicity I wouldn't even bother about that, and just remove this setting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll remove it then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the pushTimeInterval ?


// Prefix added to each metric name
// +optional
Prefix string `json:"prefix,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like "expiryTime", there's also a "prefix" config that we force as "netobserv" for regular metrics, and I wonder if we really should expose it, or keep being more opinionated. I tend to prefer being more opinionated, and expose less config, at least as long as nobody has a need for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can remove it and force netobserv here then

@@ -16,6 +16,7 @@ limitations under the License.
package v1beta2

import (
"github.com/netobserv/flowlogs-pipeline/pkg/api"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not have coupling with the FLP api here, the config definitions should be duplicated (or if we force the semantic convention then we can just remove that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I duplicated the GenericTransformRule

@jpinsonneau
Copy link
Contributor Author

Addressed feedback & updated go vendor

@jpinsonneau
Copy link
Contributor Author

As this PR is still waiting for review, I took the chance to do the metrics transformation and added testing
6b556fb

TargetPort int `json:"targetPort"`

// Protocol of Open Telemetry connection. The available options are `http` and `grpc`.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be an enum I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, efdb0d4

Comment on lines 430 to 434
// Custom rules to transform flow logs to Opentelemetry-conformant format.
// By default the following format will be used: https://github.com/rhobs/observability-data-model/blob/main/network-observability.md#format-proposal
// Setting `customRules` will fully override the default rules allowing you to export each field to your naming convention
// +optional
Rules *[]GenericTransformRule `json:"customRules,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we try to make the CRD more opinionated than what FLP exposes. Here, I think we should present that more clearly as a mapping of fields, and tell why we expose it (because there's no standard)

Here's a proposal:

Suggested change
// Custom rules to transform flow logs to Opentelemetry-conformant format.
// By default the following format will be used: https://github.com/rhobs/observability-data-model/blob/main/network-observability.md#format-proposal
// Setting `customRules` will fully override the default rules allowing you to export each field to your naming convention
// +optional
Rules *[]GenericTransformRule `json:"customRules,omitempty"`
// Custom fields mapping to an OpenTelemetry conformant format.
// By default, NetObserv format proposal is used: https://github.com/rhobs/observability-data-model/blob/main/network-observability.md#format-proposal .
// As there is currently no accepted otlp standard for L3/4 network logs, you can freely override it with your own.
// +optional
FieldsMapping *[]GenericTransformRule `json:"fieldsMapping,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also done in efdb0d4

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, lgtm!

@openshift-ci openshift-ci bot added the lgtm label Jul 2, 2024
@memodi
Copy link
Contributor

memodi commented Jul 9, 2024

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 9, 2024
Copy link

github-actions bot commented Jul 9, 2024

New images:

  • quay.io/netobserv/network-observability-operator:cd31dfd
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-cd31dfd
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-cd31dfd

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cd31dfd make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-cd31dfd

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-cd31dfd
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@openshift-ci openshift-ci bot removed the lgtm label Jul 10, 2024
Copy link

openshift-ci bot commented Jul 10, 2024

New changes are detected. LGTM label has been removed.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 10, 2024
Copy link

openshift-ci bot commented Jul 10, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau
Copy link
Contributor Author

Fixed empty metrics in 9c9c293

@memodi
Copy link
Contributor

memodi commented Jul 12, 2024

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 12, 2024
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:809a682
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-809a682
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-809a682

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:809a682 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-809a682

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-809a682
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 16, 2024
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 16, 2024
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:eb79fe5
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-eb79fe5
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-eb79fe5

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:eb79fe5 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-eb79fe5

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-eb79fe5
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@jpinsonneau
Copy link
Contributor Author

Fix the value key mentionned in otel metrics config
7ba86fe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants