Skip to content

Commit

Permalink
NETOBSERV-467 Updated defaults (eBPF / sampling), update doc
Browse files Browse the repository at this point in the history
- Default agent is eBPF
- Default eBPF sampling is 50
- Update doc to align with these changes
- Add more hints in doc about sampling
- In doc's troubleshooting, stop mentioning the workarounds for
  openshift 4.8 / 4.9: they'll be obsolete when we remove the Deployment
kind option for FLP
- Add a couple of troubleshooting info
  • Loading branch information
jotak committed Sep 2, 2022
1 parent d30306f commit d4638c3
Show file tree
Hide file tree
Showing 6 changed files with 38 additions and 36 deletions.
44 changes: 21 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ After the operator is installed, create a `FlowCollector` resource:

![OpenShift OperatorHub FlowCollector](./docs/assets/operatorhub-flowcollector.png)

> Note: by default, NetObserv configures [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) for IPFIX exports. If you are not using OVN-Kubernetes as your CNI, then configure `FlowCollector` to use the eBPF agent instead, unless you have a device such as an OVS in your network that you want to export IPFIX flows. To use the eBPF agent, set `Agent` to `ebpf`.
Refer to the [Configuration section](#configuration) of this document.

### Install from repository
Expand All @@ -45,8 +43,6 @@ make deploy-sample-cr

Alternatively, you can [grab and edit](./config/samples/flows_v1alpha1_flowcollector.yaml) this config before installing it.

> Note: by default, NetObserv configures [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) for IPFIX exports. If you are not using OVN-Kubernetes as your CNI, then configure `FlowCollector` to use the eBPF agent instead, unless you have a device such as an OVS in your network that you want to export IPFIX flows. To use the eBPF agent, set `spec.agent` to `ebpf`.
You can still edit the `FlowCollector` after it's installed: the operator will take care about reconciling everything with the updated configuration:

```bash
Expand Down Expand Up @@ -111,9 +107,9 @@ As it operates cluster-wide, only a single `FlowCollector` is allowed, and it ha

A couple of settings deserve special attention:

- Agent (`spec.agent`) can be `ipfix` or `ebpf`. As mentioned above, the IPFIX option is fully functional when using OVN-Kubernetes CNI. Other CNIs are not supported, but you may still be able to configure them manually if they allow IPFIX exports, whereas eBPF is expected to work regardless of the running CNI.
- Agent (`spec.agent`) can be `ebpf` (default) or `ipfix`. eBPF is recommended, as it should work in more situations and offers better performances. If you can't, or don't want to use eBPF, note that the IPFIX option is fully functional only when using OVN-Kubernetes CNI. Other CNIs are not officially supported, but you may still be able to configure them manually if they allow IPFIX exports.

- Sampling (`spec.ipfix.sampling` and `spec.ebpf.sampling`): 24/7 unsampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still often necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means no sampling. The lower it is, the more accurate are flows and derived metrics. By default, sampling is set to 400 for IPFIX, and is disabled for eBPF.
- Sampling (`spec.ebpf.sampling` and `spec.ipfix.sampling`): 24/7, 1:1 sampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still sometimes necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics. By default, sampling is set to 50 (ie. 1:50) for eBPF and 400 (1:400) for IPFIX. Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.

- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned in the _Getting Started_ section, but you may have to configure differently if you used another installation method.

Expand Down Expand Up @@ -141,23 +137,11 @@ It depends on which `agent` you want to use: `ebpf` or `ipfix`, and whether you

What matters is the version of the Linux kernel: 4.18 or more is supported. Earlier versions are not tested.

Other than that, there are no known restrictions yet on the Kubernetes version.
Other than that, there are no known restrictions on the Kubernetes version.

#### To use IPFIX exports

OpenShift 4.10 or above, or upstream OVN-Kubernetes, are recommended, as the operator will configure OVS for you. Otherwise, you need to configure it manually.

For OpenShift 4.8 or 4.9:

* Configure `spec.flowlogsPipeline.kind` to be `Deployment`
* Run the following:

```bash
FLP_IP=`kubectl get svc flowlogs-pipeline -n network-observability -ojsonpath='{.spec.clusterIP}'` && echo $FLP_IP
kubectl patch networks.operator.openshift.io cluster --type='json' -p "[{'op': 'add', 'path': '/spec', 'value': {'exportNetworkFlows': {'ipfix': { 'collectors': ['$FLP_IP:2055']}}}}]"
```

OpenShift versions older than 4.8 don't support IPFIX exports.
OpenShift 4.10 or above, or upstream OVN-Kubernetes, are recommended, as the operator will configure OVS for you.

For other CNIs, you need to find out if they can export IPFIX, and configure them accordingly.

Expand Down Expand Up @@ -190,7 +174,7 @@ network-observability-plugin-7fb8c5477b-drg2z 1/1 Running 0 43m

Results may slightly differ depending on the installation method and the `FlowCollector` configuration. At least you should see `flowlogs-pipeline` pods in a `Running` state.

If you use the eBPF agent in privileged mode (`spec.ebpf.privileged=true`), check also for pods in privileged namespace:
If you use the eBPF agent, check also for pods in privileged namespace:

```bash
# Assuming configured namespace is network-observability (default)
Expand All @@ -206,11 +190,25 @@ netobserv-ebpf-agent-ldj66 1/1 Running 0 7s
```

Finally, make sure Loki is correctly deployed, and reachable from pods via the URL defined in `spec.loki.url`.
Finally, make sure Loki is correctly deployed, and reachable from pods via the URL defined in `spec.loki.url`. You can for instance check using this command:

```bash
kubectl exec $(kubectl get pod -l "app=flowlogs-pipeline" -o name) -- curl -G -s "`kubectl get flowcollector cluster -o=jsonpath={.spec.loki.url}`loki/api/v1/query" --data-urlencode 'query={app="netobserv-flowcollector"}' --data-urlencode 'limit=1'
```

It should return some json in this form:

```
{"status":"success","data":{"resultType":"streams","result":[...],"stats":{...}}}
```

### Everything seems correctly deployed but there isn't any flow showing up

Wait 10 minutes and check again. When `spec.agent` is `ipfix`, there is sometimes a delay, up to 10 minutes, before the flows appear. This is due to the IPFIX protocol requiring exporter and collector to exchange record template definitions as a preliminary step. The eBPF agent doesn't have such a delay.
If using IPFIX (ie. `spec.agent` is `ipfix` in FlowCollector), wait 10 minutes and check again. There is sometimes a delay, up to 10 minutes, before the flows appear. This is due to the IPFIX protocol requiring exporter and collector to exchange record template definitions as a preliminary step. The eBPF agent doesn't have such a delay.

Else, check for any suspicious error in logs, especially in the `flowlogs-pipeline` pods and the eBPF agent pods. You may also take a look at prometheus metrics prefixed with `netobserv_`: they can give you clues if flows are processed, if errors are reported, etc.

Finally, don't hesitate to [open an issue](https://github.com/netobserv/network-observability-operator/issues).

### There is no Network Traffic menu entry in OpenShift Console

Expand Down
8 changes: 5 additions & 3 deletions api/v1alpha1/flowcollector_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ type FlowCollectorSpec struct {
Namespace string `json:"namespace,omitempty"`

//+kubebuilder:validation:Enum=ipfix;ebpf
//+kubebuilder:default:=ipfix
// Select the flows tracing agent. Possible values are "ipfix" (default) to use
// the IPFIX collector, or "ebpf" to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes
//+kubebuilder:default:=ebpf
// Select the flows tracing agent. Possible values are "ipfix" to use
// the IPFIX collector, or "ebpf" (default) to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes
// CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could
// work but necessitate manual configuration.
Agent string `json:"agent"`
Expand Down Expand Up @@ -133,6 +133,8 @@ type FlowCollectorEBPF struct {
Resources corev1.ResourceRequirements `json:"resources,omitempty" protobuf:"bytes,8,opt,name=resources"`

// Sampling is the sampling rate on the reporter. 100 means one flow on 100 is sent. 0 or 1 means disabled.
//+kubebuilder:validation:Minimum=0
//+kubebuilder:default:=50
//+optional
Sampling int32 `json:"sampling,omitempty"`

Expand Down
6 changes: 4 additions & 2 deletions config/crd/bases/flows.netobserv.io_flowcollectors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ spec:
description: FlowCollectorSpec defines the desired state of FlowCollector
properties:
agent:
default: ipfix
default: ebpf
description: Select the flows tracing agent. Possible values are "ipfix"
(default) to use the IPFIX collector, or "ebpf" to use NetObserv
to use the IPFIX collector, or "ebpf" (default) to use NetObserv
eBPF agent. When using IPFIX with OVN-Kubernetes CNI, NetObserv
will configure OVN's IPFIX exporter. Other CNIs are not supported,
they could work but necessitate manual configuration.
Expand Down Expand Up @@ -799,9 +799,11 @@ spec:
type: object
type: object
sampling:
default: 50
description: Sampling is the sampling rate on the reporter. 100
means one flow on 100 is sent. 0 or 1 means disabled.
format: int32
minimum: 0
type: integer
type: object
flowlogsPipeline:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ spec:
The operator provides dashboards, metrics, and keeps flows accessible in a queryable log store, Grafana Loki. When used in OpenShift, new dashboards are available in the Console.
This is an early release, we would be grateful if you could inform us of any issues.
## Dependencies
- [Loki](https://grafana.com/oss/loki/) is required, it is used as a store for all collected flows.
Expand All @@ -51,9 +49,9 @@ spec:
A couple of settings deserve special attention:
- Agent (`spec.agent`) can be `ipfix` or `ebpf`. The IPFIX option is fully functional when using [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) CNI. Other CNIs are not supported, but you may still be able to configure them manually if they allow IPFIX exports, whereas eBPF is expected to work regardless of the running CNI.
- Agent (`spec.agent`) can be `ebpf` or `ipfix`. eBPF is recommended, as it should work in more situations and offers better performances. If you can't, or don't want to use eBPF, note that the IPFIX option is fully functional only when using [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) CNI. Other CNIs are not officially supported, but you may still be able to configure them manually if they allow IPFIX exports.
- Sampling (`spec.ipfix.sampling` and `spec.ebpf.sampling`): 24/7 unsampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still often necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means no sampling. The lower it is, the more accurate are flows and derived metrics. By default, sampling is set to 400 for IPFIX, and is disabled for eBPF.
- Sampling (`spec.ebpf.sampling` and `spec.ipfix.sampling`): 24/7, 1:1 sampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still sometimes necessary to mitigate by setting a sampling ratio. A value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics. By default, sampling is set to 50 (ie. 1:50) for eBPF and 400 (1:400) for IPFIX. Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.
- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you may have to configure differently if you used another installation method.
Expand Down
4 changes: 2 additions & 2 deletions config/samples/flows_v1alpha1_flowcollector.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ metadata:
name: cluster
spec:
namespace: "network-observability"
agent: ipfix
agent: ebpf
ipfix:
cacheActiveTimeout: 20s
cacheMaxFlows: 400
sampling: 400
ebpf:
image: 'quay.io/netobserv/netobserv-ebpf-agent:main'
imagePullPolicy: IfNotPresent
sampling: 0
sampling: 50
cacheActiveTimeout: 5s
cacheMaxFlows: 1000
interfaces: []
Expand Down
6 changes: 4 additions & 2 deletions docs/FlowCollector.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,10 @@ FlowCollectorSpec defines the desired state of FlowCollector
<td><b>agent</b></td>
<td>enum</td>
<td>
Select the flows tracing agent. Possible values are "ipfix" (default) to use the IPFIX collector, or "ebpf" to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could work but necessitate manual configuration.<br/>
Select the flows tracing agent. Possible values are "ipfix" to use the IPFIX collector, or "ebpf" (default) to use NetObserv eBPF agent. When using IPFIX with OVN-Kubernetes CNI, NetObserv will configure OVN's IPFIX exporter. Other CNIs are not supported, they could work but necessitate manual configuration.<br/>
<br/>
<i>Enum</i>: ipfix, ebpf<br/>
<i>Default</i>: ipfix<br/>
<i>Default</i>: ebpf<br/>
</td>
<td>true</td>
</tr><tr>
Expand Down Expand Up @@ -1389,6 +1389,8 @@ Settings related to eBPF-based flow reporter when the "agent" property is set to
Sampling is the sampling rate on the reporter. 100 means one flow on 100 is sent. 0 or 1 means disabled.<br/>
<br/>
<i>Format</i>: int32<br/>
<i>Default</i>: 50<br/>
<i>Minimum</i>: 0<br/>
</td>
<td>false</td>
</tr></tbody>
Expand Down

0 comments on commit d4638c3

Please sign in to comment.