diff --git a/README.md b/README.md index 0f0c9c367..bac9b798b 100644 --- a/README.md +++ b/README.md @@ -20,8 +20,6 @@ After the operator is installed, create a `FlowCollector` resource: ![OpenShift OperatorHub FlowCollector](./docs/assets/operatorhub-flowcollector.png) -> Note: by default, NetObserv configures [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) for IPFIX exports. If you are not using OVN-Kubernetes as your CNI, then configure `FlowCollector` to use the eBPF agent instead, unless you have a device such as an OVS in your network that you want to export IPFIX flows. To use the eBPF agent, set `Agent` to `ebpf`. - Refer to the [Configuration section](#configuration) of this document. ### Install from repository @@ -45,8 +43,6 @@ make deploy-sample-cr Alternatively, you can [grab and edit](./config/samples/flows_v1alpha1_flowcollector.yaml) this config before installing it. -> Note: by default, NetObserv configures [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) for IPFIX exports. If you are not using OVN-Kubernetes as your CNI, then configure `FlowCollector` to use the eBPF agent instead, unless you have a device such as an OVS in your network that you want to export IPFIX flows. To use the eBPF agent, set `spec.agent` to `ebpf`. - You can still edit the `FlowCollector` after it's installed: the operator will take care about reconciling everything with the updated configuration: ```bash @@ -111,9 +107,9 @@ As it operates cluster-wide, only a single `FlowCollector` is allowed, and it ha A couple of settings deserve special attention: -- Agent (`spec.agent`) can be `ipfix` or `ebpf`. As mentioned above, the IPFIX option is fully functional when using OVN-Kubernetes CNI. Other CNIs are not supported, but you may still be able to configure them manually if they allow IPFIX exports, whereas eBPF is expected to work regardless of the running CNI. +- Agent (`spec.agent`) can be `ebpf` (default) or `ipfix`. eBPF is recommended, as it should work in more situations and offers better performances. If you can't, or don't want to use eBPF, note that the IPFIX option is fully functional only when using OVN-Kubernetes CNI. Other CNIs are not officially supported, but you may still be able to configure them manually if they allow IPFIX exports. -- Sampling (`spec.ipfix.sampling` and `spec.ebpf.sampling`): 24/7 unsampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still often necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means no sampling. The lower it is, the more accurate are flows and derived metrics. By default, sampling is set to 400 for IPFIX, and is disabled for eBPF. +- Sampling (`spec.ebpf.sampling` and `spec.ipfix.sampling`): 24/7, 1:1 sampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still sometimes necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics. By default, sampling is set to 50 (ie. 1:50) for eBPF and 400 (1:400) for IPFIX. Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage. - Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned in the _Getting Started_ section, but you may have to configure differently if you used another installation method. @@ -141,23 +137,11 @@ It depends on which `agent` you want to use: `ebpf` or `ipfix`, and whether you What matters is the version of the Linux kernel: 4.18 or more is supported. Earlier versions are not tested. -Other than that, there are no known restrictions yet on the Kubernetes version. +Other than that, there are no known restrictions on the Kubernetes version. #### To use IPFIX exports -OpenShift 4.10 or above, or upstream OVN-Kubernetes, are recommended, as the operator will configure OVS for you. Otherwise, you need to configure it manually. - -For OpenShift 4.8 or 4.9: - -* Configure `spec.flowlogsPipeline.kind` to be `Deployment` -* Run the following: - -```bash -FLP_IP=`kubectl get svc flowlogs-pipeline -n network-observability -ojsonpath='{.spec.clusterIP}'` && echo $FLP_IP -kubectl patch networks.operator.openshift.io cluster --type='json' -p "[{'op': 'add', 'path': '/spec', 'value': {'exportNetworkFlows': {'ipfix': { 'collectors': ['$FLP_IP:2055']}}}}]" -``` - -OpenShift versions older than 4.8 don't support IPFIX exports. +OpenShift 4.10 or above, or upstream OVN-Kubernetes, are recommended, as the operator will configure OVS for you. For other CNIs, you need to find out if they can export IPFIX, and configure them accordingly. @@ -190,7 +174,7 @@ network-observability-plugin-7fb8c5477b-drg2z 1/1 Running 0 43m Results may slightly differ depending on the installation method and the `FlowCollector` configuration. At least you should see `flowlogs-pipeline` pods in a `Running` state. -If you use the eBPF agent in privileged mode (`spec.ebpf.privileged=true`), check also for pods in privileged namespace: +If you use the eBPF agent, check also for pods in privileged namespace: ```bash # Assuming configured namespace is network-observability (default) @@ -206,11 +190,25 @@ netobserv-ebpf-agent-ldj66 1/1 Running 0 7s ``` -Finally, make sure Loki is correctly deployed, and reachable from pods via the URL defined in `spec.loki.url`. +Finally, make sure Loki is correctly deployed, and reachable from pods via the URL defined in `spec.loki.url`. You can for instance check using this command: + +```bash +kubectl exec $(kubectl get pod -l "app=flowlogs-pipeline" -o name) -- curl -G -s "`kubectl get flowcollector cluster -o=jsonpath={.spec.loki.url}`loki/api/v1/query" --data-urlencode 'query={app="netobserv-flowcollector"}' --data-urlencode 'limit=1' +``` + +It should return some json in this form: + +``` +{"status":"success","data":{"resultType":"streams","result":[...],"stats":{...}}} +``` ### Everything seems correctly deployed but there isn't any flow showing up -Wait 10 minutes and check again. When `spec.agent` is `ipfix`, there is sometimes a delay, up to 10 minutes, before the flows appear. This is due to the IPFIX protocol requiring exporter and collector to exchange record template definitions as a preliminary step. The eBPF agent doesn't have such a delay. +If using IPFIX (ie. `spec.agent` is `ipfix` in FlowCollector), wait 10 minutes and check again. There is sometimes a delay, up to 10 minutes, before the flows appear. This is due to the IPFIX protocol requiring exporter and collector to exchange record template definitions as a preliminary step. The eBPF agent doesn't have such a delay. + +Else, check for any suspicious error in logs, especially in the `flowlogs-pipeline` pods and the eBPF agent pods. You may also take a look at prometheus metrics prefixed with `netobserv_`: they can give you clues if flows are processed, if errors are reported, etc. + +Finally, don't hesitate to [open an issue](https://github.com/netobserv/network-observability-operator/issues). ### There is no Network Traffic menu entry in OpenShift Console diff --git a/api/v1alpha1/flowcollector_types.go b/api/v1alpha1/flowcollector_types.go index 0798f7e91..9f542cf09 100644 --- a/api/v1alpha1/flowcollector_types.go +++ b/api/v1alpha1/flowcollector_types.go @@ -45,9 +45,9 @@ type FlowCollectorSpec struct { Namespace string `json:"namespace,omitempty"` //+kubebuilder:validation:Enum=ipfix;ebpf - //+kubebuilder:default:=ipfix - // Select the flows tracing agent. Possible values are "ipfix" (default) to use - // the IPFIX collector, or "ebpf" to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes + //+kubebuilder:default:=ebpf + // Select the flows tracing agent. Possible values are "ipfix" to use + // the IPFIX collector, or "ebpf" (default) to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes // CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could // work but necessitate manual configuration. Agent string `json:"agent"` @@ -133,6 +133,8 @@ type FlowCollectorEBPF struct { Resources corev1.ResourceRequirements `json:"resources,omitempty" protobuf:"bytes,8,opt,name=resources"` // Sampling is the sampling rate on the reporter. 100 means one flow on 100 is sent. 0 or 1 means disabled. + //+kubebuilder:validation:Minimum=0 + //+kubebuilder:default:=50 //+optional Sampling int32 `json:"sampling,omitempty"` diff --git a/config/crd/bases/flows.netobserv.io_flowcollectors.yaml b/config/crd/bases/flows.netobserv.io_flowcollectors.yaml index ed5d6c3c8..28c21b0b8 100644 --- a/config/crd/bases/flows.netobserv.io_flowcollectors.yaml +++ b/config/crd/bases/flows.netobserv.io_flowcollectors.yaml @@ -48,9 +48,9 @@ spec: description: FlowCollectorSpec defines the desired state of FlowCollector properties: agent: - default: ipfix + default: ebpf description: Select the flows tracing agent. Possible values are "ipfix" - (default) to use the IPFIX collector, or "ebpf" to use NetObserv + to use the IPFIX collector, or "ebpf" (default) to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could work but necessitate manual configuration. @@ -799,9 +799,11 @@ spec: type: object type: object sampling: + default: 50 description: Sampling is the sampling rate on the reporter. 100 means one flow on 100 is sent. 0 or 1 means disabled. format: int32 + minimum: 0 type: integer type: object flowlogsPipeline: diff --git a/config/manifests/bases/netobserv-operator.clusterserviceversion.yaml b/config/manifests/bases/netobserv-operator.clusterserviceversion.yaml index 467bbbcea..ccbc7d538 100644 --- a/config/manifests/bases/netobserv-operator.clusterserviceversion.yaml +++ b/config/manifests/bases/netobserv-operator.clusterserviceversion.yaml @@ -27,8 +27,6 @@ spec: The operator provides dashboards, metrics, and keeps flows accessible in a queryable log store, Grafana Loki. When used in OpenShift, new dashboards are available in the Console. - This is an early release, we would be grateful if you could inform us of any issues. - ## Dependencies - [Loki](https://grafana.com/oss/loki/) is required, it is used as a store for all collected flows. @@ -51,9 +49,9 @@ spec: A couple of settings deserve special attention: - - Agent (`spec.agent`) can be `ipfix` or `ebpf`. The IPFIX option is fully functional when using [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) CNI. Other CNIs are not supported, but you may still be able to configure them manually if they allow IPFIX exports, whereas eBPF is expected to work regardless of the running CNI. + - Agent (`spec.agent`) can be `ebpf` or `ipfix`. eBPF is recommended, as it should work in more situations and offers better performances. If you can't, or don't want to use eBPF, note that the IPFIX option is fully functional only when using [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) CNI. Other CNIs are not officially supported, but you may still be able to configure them manually if they allow IPFIX exports. - - Sampling (`spec.ipfix.sampling` and `spec.ebpf.sampling`): 24/7 unsampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still often necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means no sampling. The lower it is, the more accurate are flows and derived metrics. By default, sampling is set to 400 for IPFIX, and is disabled for eBPF. + - Sampling (`spec.ebpf.sampling` and `spec.ipfix.sampling`): 24/7, 1:1 sampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still sometimes necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics. By default, sampling is set to 50 (ie. 1:50) for eBPF and 400 (1:400) for IPFIX. Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage. - Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you may have to configure differently if you used another installation method. diff --git a/config/samples/flows_v1alpha1_flowcollector.yaml b/config/samples/flows_v1alpha1_flowcollector.yaml index 5a7823d45..3c48c4d8c 100644 --- a/config/samples/flows_v1alpha1_flowcollector.yaml +++ b/config/samples/flows_v1alpha1_flowcollector.yaml @@ -4,7 +4,7 @@ metadata: name: cluster spec: namespace: "network-observability" - agent: ipfix + agent: ebpf ipfix: cacheActiveTimeout: 20s cacheMaxFlows: 400 @@ -12,7 +12,7 @@ spec: ebpf: image: 'quay.io/netobserv/netobserv-ebpf-agent:main' imagePullPolicy: IfNotPresent - sampling: 0 + sampling: 50 cacheActiveTimeout: 5s cacheMaxFlows: 1000 interfaces: [] diff --git a/docs/FlowCollector.md b/docs/FlowCollector.md index 8cee0f5c8..1c45a7595 100644 --- a/docs/FlowCollector.md +++ b/docs/FlowCollector.md @@ -87,10 +87,10 @@ FlowCollectorSpec defines the desired state of FlowCollector agent enum - Select the flows tracing agent. Possible values are "ipfix" (default) to use the IPFIX collector, or "ebpf" to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could work but necessitate manual configuration.
+ Select the flows tracing agent. Possible values are "ipfix" to use the IPFIX collector, or "ebpf" (default) to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could work but necessitate manual configuration.

Enum: ipfix, ebpf
- Default: ipfix
+ Default: ebpf
true @@ -1389,6 +1389,8 @@ Settings related to eBPF-based flow reporter when the "agent" property is set to Sampling is the sampling rate on the reporter. 100 means one flow on 100 is sent. 0 or 1 means disabled.

Format: int32
+ Default: 50
+ Minimum: 0
false