Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-555 NETOBSERV-552 CRD updates - many breaking changes #169

Merged
merged 5 commits into from
Sep 28, 2022

Conversation

jotak
Copy link
Member

@jotak jotak commented Sep 21, 2022

Main items:

  • NETOBSERV-555: "spec.flowlogsPipeline.kind" is removed, assuming
    DaemonSet
  • processor.replicas/hpa are now
    kafka.consumerReplicas/consumerAutoscaler (hpa renamed autoscaler,
    also for console)
  • NETOBSERV-552: "ovn-kubernetes" and "cno" options moved into IPFIX section
  • IPFIX documented as a legacy option
  • NETOBSERV-544: remove kafka.enable option. Instead, use a top-level
    enum for deployment mode: DIRECT or KAFKA
  • FLP reconciler refactoring (see first commit if you want to review that separately)

Smaller changes:

  • "flp.prometheus" option renamed to "flp.metricsServer", trying to be
    more agnositc (it's actually openmetrics/opentelemetry standard) and
    not make user thinks it actually communicates with prometheus
  • Refactored the main controller tests to adapt to new deployment
    options
  • Make "FetchAll" logs less verbose

@jotak jotak added the breaking-change This pull request has breaking changes. They should be described in PR description. label Sep 21, 2022
@jotak
Copy link
Member Author

jotak commented Sep 21, 2022

You can check just the first commit (9599fd4) to review separately the FLP reconciler refactoring.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, great job! I have few comments

@@ -4,12 +4,19 @@ metadata:
name: cluster
spec:
namespace: netobserv
deploymentType: DIRECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name deploymentType sounds like a bit ambiguous to me. Maybe communication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fan of deploymentType, but maybe not communication either .. let's get more ideas :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about optionnalComponents where we can list Kafka but also Prometheus and even Loki in the future if it become optional ?
It will be very flexible and would allow any new options.

If we really want to be explicit, we can simply use components with values like Agent-Processor and Agent-Kafka-Processor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas: wiring, connect, messaging, linking...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transfer...

@@ -4,12 +4,19 @@ metadata:
name: cluster
spec:
namespace: netobserv
deploymentType: DIRECT
agent:
type: EBPF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% related to this task, but I was thinking in the case where a customer would like to override some properties of the eBPF agent, which is the only supported agent for downstream.

They should provide a yaml fragment like:

agent:
  ebpf:
    sampling: 1

Being ebpf the only section they should use, wouldn't it be cleaner just to:

  • remove agent.type
  • move all the ebpf stuff as properties of the agent section
  • try to find a "hidden" place for ipfix section that would allow overriding the eBPF agent without making our users to "pay" the price of a discriminated union.

Another option could be to move as properties of agent these properties that are common to all the agents (sampling, cacheMaxLenght, cacheMaxTimeout...) and keep some implementation details as subsections of ebpf (like the image, the image pull policy, etc...)

Copy link
Contributor

@OlivierCazade OlivierCazade Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Commenting here for the third time because I am reading this too fast...)

My understanding is we are finally keeping both agent downstream but we are going to document clearly that only the ebpf agent is officially supported.

But may be we could move to the agent section the fields that are relevent to both agent:

agent:
   sampling: 1
   ebpf:
      ...
   ipfix:
      ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not shared fields, and I don't think we want make them shared, for instance because we want to have different default values (eBPF supports higher sampling rate than IPFIX for instance)

Copy link
Contributor

@OlivierCazade OlivierCazade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feedback we had during the CRD API review was also to not use boolean in the API, are we going to change this part?

I only reviewed the CRD change not the impacted code. I will continue the review tomorrow.

//+kubebuilder:default:=1
// consumerReplicas defines the number of replicas (pods) to start for flowlogs-pipeline-transformer, which consumes Kafka messages.
// This setting is ignored when Kafka is disabled.
ConsumerReplicas int32 `json:"consumerReplicas,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the logic of moving the field here, but I think it would be better to let it in the FLP section:

  • while this fields only apply when kafka is enabled, this field apply to FLP. May be we could name it better in FLP like "KafkaConsumerReplicas"
  • we would be more consitant with the console plugin section which also has this kind of field

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// consumerAutoscaler spec of a horizontal pod autoscaler to set up for flowlogs-pipeline-transformer, which consumes Kafka messages.
// This setting is ignored when Kafka is disabled.
// +optional
ConsumerAutoscaler *FlowCollectorHPA `json:"consumerAutoscaler,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

AgentIPFIX = "IPFIX"
AgentEBPF = "EBPF"
DeploymentTypeDirect = "DIRECT"
DeploymentTypeKafka = "KAFKA"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it DeploymentModelSimple and DeploymentModelKafka?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to organize a vote, because everyone has different ideas :)

//+kubebuilder:default:=1
// consumerReplicas defines the number of replicas (pods) to start for flowlogs-pipeline-transformer, which consumes Kafka messages.
// This setting is ignored when Kafka is disabled.
ConsumerReplicas int32 `json:"consumerReplicas,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest FLPConsumerReplicas

// consumerAutoscaler spec of a horizontal pod autoscaler to set up for flowlogs-pipeline-transformer, which consumes Kafka messages.
// This setting is ignored when Kafka is disabled.
// +optional
ConsumerAutoscaler *FlowCollectorHPA `json:"consumerAutoscaler,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to change HPA to Autoscaler everywhere?  There are VPA (Vertical Pod Autoscaler) so HPA makes it more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rollback it, but it was to avoid acronyms that maybe not everybody knows. and the full size "horizontalPodAutoscaler" is.. very long :/

cases as it offers better performances and should work regardless
of the CNI installed on the cluster. "IPFIX" works with OVN-Kubernetes
CNI (other CNIs could work if they support exporting IPFIX,
but they would require manual configuration).
enum:
- IPFIX
- EBPF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the order in the UI dropdown, then put EBPF first.

to start for flowlogs-pipeline-transformer, which consumes Kafka
messages. This setting is ignored when Kafka is disabled.
format: int32
minimum: 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 1?

enable:
default: false
description: enable TLS
type: boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like TLS has a similar situation with "enable" as Kafka.  Might be okay though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll check the remaining booleans in a follow-up

@jotak
Copy link
Member Author

jotak commented Sep 22, 2022

I'll review the boolean fields as @OlivierCazade and @stleerh pointed out

Decoupling the inner "single reconcilers" into different files and
structs:
  - Main entry point is still flp_reconciler.go, which now only
    initialises and calls inner reconcilers
  - ingest/transfo/monolith inner reconcilers do their own stuff in a
    decoupled way, with their own "*_objects" files
  - "flp_common_objects" is like a common lib for the specialized
    objects files
Main items:
- NETOBSERV-555: "spec.flowlogsPipeline.kind" is removed, assuming
  DaemonSet
- processor.replicas/hpa are now
  kafka.consumerReplicas/consumerAutoscaler (hpa renamed autoscaler,
also for console)
- NETOBSERV-552: "ovn-kubernetes" and "cno" options moved into IPFIX section
- IPFIX documented as a legacy option
- NETOBSERV-544: remove kafka.enable option. Instead, use a top-level
  enum for deployment mode: DIRECT or KAFKA

Smaller changes:
- "flp.prometheus" option renamed to "flp.metricsServer", trying to be
  more agnositc (it's actually openmetrics/opentelemetry standard) and
not make user thinks it actually communicates with prometheus
- Refactored the main controller tests to adapt to new deployment
  options
- Make "FetchAll" logs less verbose
- move consumer replicas/hpa to FLP section rather than kafka
- s/DeploymentType/DeploymentModel
Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A minor change suggestion

api/v1alpha1/flowcollector_types.go Outdated Show resolved Hide resolved
Co-authored-by: Mario Macias <mmaciasl@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm label Sep 27, 2022
Copy link
Contributor

@OlivierCazade OlivierCazade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Amoghrd
Copy link
Contributor

Amoghrd commented Sep 27, 2022

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 27, 2022
@github-actions
Copy link

New image: ["quay.io/netobserv/network-observability-operator:eeb83b9"]. It will expire after two weeks.

@openshift-ci openshift-ci bot removed the lgtm label Sep 28, 2022
@openshift-ci
Copy link

openshift-ci bot commented Sep 28, 2022

New changes are detected. LGTM label has been removed.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 28, 2022
@jotak
Copy link
Member Author

jotak commented Sep 28, 2022

/approve

@openshift-ci
Copy link

openshift-ci bot commented Sep 28, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak jotak merged commit 919f6a0 into netobserv:main Sep 28, 2022
KalmanMeth pushed a commit to KalmanMeth/network-observability-operator that referenced this pull request Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved breaking-change This pull request has breaking changes. They should be described in PR description.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants