Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opentelemetry exporter #531

Merged
merged 17 commits into from
Jan 9, 2024
Merged

Opentelemetry exporter #531

merged 17 commits into from
Jan 9, 2024

Conversation

KalmanMeth
Copy link
Collaborator

@KalmanMeth KalmanMeth commented Oct 24, 2023

Description

Export logs, traces, and metrics to an Opentelemetry collector.
Configuration enables to specify using either grpc or http protocol.
TLS details may be specified. If not specified, then the tls option is set to insecure.
Sample configuration:


  - name: otlp_collector
    encode:
      type: otlptraces
      otlptraces:
        address: localhost
        port: 4317
        connectionType: grpc
  - name: otlp_instana
    encode:
      type: otlptrace
      otlptrace:
        httpaddress: "1.2.3.4:443"
        tls:
          insecureSkipVerify: true
        headers:
          x-instana-key: *****
          x-instana-host: host
          x-instana-time: "0"

Headers to be sent to collector may be specified.
Field names should be transformed to Opentelemetry-conformant format using a transform_generic stage.


  - name: otlp_transform
    transform:
      type: generic
      generic:
        policy: replace_keys
        rules:
          - input: Bytes
            output: bytes
            multiplier: 1
          - input: Packets
            output: packets
          - input: DstAddr
            output: destination.address
          - input: DstMac
            output: destination.mac
          - input: DstHostIP
            output: destination.host.address
          - input: DstK8S_HostName
            output: destination.k8s.node.name
          - input: DstPort
            output: destination.port
          - input: DstPod
            output: destination.k8s.pod.name
          - input: DstK8S_Name
            output: destination.k8s.name
          - input: DstK8S_Type
            output: destination.k8s.type
          - input: DstK8S_OwnerName
            output: destination.k8s.owner.name
          - input: DstK8S_Type
            output: destination.k8s.type
          - input: DstK8S_HostIP
            output: destination.k8s.host.address
          - input: DstK8S_HostName
            output: destination.k8s.host.name
          - input: DstK8S_Namespace
            output: destination.k8s.namespace.name
          - input: SrcAddr
            output: source.address
          - input: SrcMac
            output: source.mac
          - input: SrcHostIP
            output: source.host.address
          - input: SrcK8S_HostName
            output: source.k8s.node.name
          - input: SrcPort
            output: source.port
          - input: SrcPod
            output: source.k8s.pod.name
          - input: SrcK8S_Name
            output: source.k8s.name
          - input: SrcK8S_Type
            output: source.k8s.type
          - input: SrcK8S_OwnerName
            output: source.k8s.owner.name
          - input: SrcK8S_OwnerType
            output: source.k8s.owner.type
          - input: SrcK8S_Namespace
            output: source.k8s.namespace.name
          - input: SrcK8S_HostIP
            output: source.k8s.host.address
          - input: SrcK8S_HostName
            output: source.k8s.host.name
          - input: TimeReceived
            output: timereceived
          - input: Proto
            output: protocol
          - input: FlowDirection
            output: flowdirection

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci
Copy link

openshift-ci bot commented Oct 24, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kalmanmeth. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented Oct 24, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kalmanmeth. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Oct 24, 2023

Codecov Report

Attention: 283 lines in your changes are missing coverage. Please review.

Comparison is base (7c78d63) 66.84% compared to head (4a97ae5) 66.09%.
Report is 19 commits behind head on main.

Files Patch % Lines
pkg/pipeline/encode/opentelemetry/opentelemetry.go 54.39% 99 Missing and 10 partials ⚠️
...ipeline/encode/opentelemetry/encode_otlpmetrics.go 46.10% 80 Missing and 3 partials ⚠️
pkg/pipeline/transform/transform_network.go 25.00% 39 Missing and 3 partials ⚠️
.../pipeline/encode/opentelemetry/encode_otlptrace.go 32.20% 39 Missing and 1 partial ⚠️
pkg/pipeline/encode/encode_prom_metric.go 76.92% 3 Missing ⚠️
pkg/pipeline/transform/kubernetes/kubernetes.go 0.00% 3 Missing ⚠️
pkg/api/encode_prom.go 66.66% 2 Missing ⚠️
pkg/pipeline/pipeline_builder.go 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #531      +/-   ##
==========================================
- Coverage   66.84%   66.09%   -0.76%     
==========================================
  Files          95      102       +7     
  Lines        6802     7409     +607     
==========================================
+ Hits         4547     4897     +350     
- Misses       1988     2221     +233     
- Partials      267      291      +24     
Flag Coverage Δ
unittests 66.09% <50.35%> (-0.76%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jotak
Copy link
Member

jotak commented Oct 31, 2023

Thanks @KalmanMeth !
On semantic convention, I wonder if we should have something more "baked-in" rather than asking the user to configure the translations. Especially, some stuff might be hard to do only by config, e.g: our SrcK8S_Name should actually be translated into source.k8s.pod.name or source.k8s.service.name or source.k8s.node.name depending on what is SrcK8S_Type.

So perhaps, since the knowledge currently lies in the add_kubernetes transform, we could have a flag otel on/off in this block that, if turned on, would automatically use the otel semantics instead of our current names ?

@KalmanMeth KalmanMeth changed the title Opentelemetry exporter WIP: Opentelemetry exporter Nov 2, 2023
@KalmanMeth KalmanMeth changed the title WIP: Opentelemetry exporter Opentelemetry exporter Nov 20, 2023
@KalmanMeth
Copy link
Collaborator Author

Thanks @KalmanMeth ! On semantic convention, I wonder if we should have something more "baked-in" rather than asking the user to configure the translations. Especially, some stuff might be hard to do only by config, e.g: our SrcK8S_Name should actually be translated into source.k8s.pod.name or source.k8s.service.name or source.k8s.node.name depending on what is SrcK8S_Type.

So perhaps, since the knowledge currently lies in the add_kubernetes transform, we could have a flag otel on/off in this block that, if turned on, would automatically use the otel semantics instead of our current names ?

I added the otel option to add_kubernetes by setting the assignee field to otel. I defined the fields like k8s.node.name, k8s.pod.name, etc. We should have a discussion on which other fields we think we should provide in this manner as opposed to using the transform-generic capabilities.

@KalmanMeth
Copy link
Collaborator Author

I have working counters and gauges for the opentelemetry metrics. I removed the code for opentelemetry historgrams. This may have to wait for a future PR.

Comment on lines 48 to 54
func (e *EncodeOtlpTrace) Encode(metricRecord config.GenericMap) {
log.Tracef("entering EncodeOtlpTrace. entry = %v", metricRecord)
_, span := e.tracer.Start(e.ctx, flpEncodeSpanName)
defer span.End()
attributes := obtainAttributesFromEntry(metricRecord)
span.SetAttributes(*attributes...)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is the goal here with traces, but I don't think we're doing something useful here.
Normally, traces are provided by the workloads that are being traced, for instance when a workload A sends a request to service S, A would generate a span saying "I am sending a request to S".

I believe what we could possibly do, as an external observer, is to generate spans on behalf of the workloads & services as long as we see their IPs flowing. I'm not 100% sure how useful it is, but that's the best I can see. But it's not really what this is doing here: it creates spans for FLP without providing any real information about the network connections being observed in otel format.

Comment on lines 138 to 146
case "nil":
if found {
return nil
}
case "!nil":
if !found {
return nil
}
default:
Copy link
Member

@jotak jotak Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically: these special keywords "nil" / "!nil" have been replaced with a more explicit API where you can have more advanced filters, e.g. using regexp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jotak
Copy link
Member

jotak commented Nov 21, 2023

Focusing on code reviewing strictly speaking rather than if it meets expectations because I must say I am not quite sure about the use cases but you @KalmanMeth have apparently one using instana so you're you own judge on that :-) I struggle a bit to see where netobserv can bring value on tracing (unless we captured http headers with our agents, which we don't)

So apart from this comment this looks good to me.

I guess there can be follow-ups to improve more the semantic conventions compliance.

@KalmanMeth
Copy link
Collaborator Author

@jotak The trace capability now breaks down a single log entry into sub-entries - one for source and one for destination. I added instructions to test the setup using the opentelemetry collector and jaeger tracing UI that is fed with information from flp. I would like to progress with this PR to merge.

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
thanks @KalmanMeth !

@jotak
Copy link
Member

jotak commented Jan 9, 2024

/lgtm

@jotak jotak added no-qe This PR doesn't necessitate QE approval no-doc This PR doesn't require documentation change on the NetObserv operator labels Jan 9, 2024
@KalmanMeth KalmanMeth merged commit 31e4180 into netobserv:main Jan 9, 2024
7 of 8 checks passed
@OlivierCazade OlivierCazade added the breaking-change This pull request has breaking changes. They should be described in PR description. label Jan 17, 2024
@OlivierCazade
Copy link
Contributor

Adding breaking change label due to change in name type in the API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This pull request has breaking changes. They should be described in PR description. no-doc This PR doesn't require documentation change on the NetObserv operator no-qe This PR doesn't necessitate QE approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants