NETOBSERV-1075 Dashboard feedback: split infra / applications #388

jpinsonneau · 2023-07-03T13:17:11Z

moved to embedded json approach instead of FLP for more flexibility
splitted application / infrastructures namespace & workload graphs
grouped by sections

controllers/flowcollector_controller.go

github-actions · 2023-07-04T12:09:39Z

New images:

quay.io/netobserv/network-observability-operator:b1a5d4e
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-b1a5d4e
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-b1a5d4e

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-b1a5d4e
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

jotak · 2023-07-04T12:11:01Z

controllers/flowcollector_controller_test.go

@@ -326,14 +323,6 @@ func flowCollectorControllerSpecs() {
 				Expect(ds.Spec.Template.Spec.Tolerations).
 					To(ContainElement(v1.Toleration{Operator: v1.TolerationOpExists}))
 			})
-
-			By("Expecting the monitoring dashboards configmap to be deleted")


I didn't realize that, but I guess that's the drawback of not using FLP: we loose the ability to tie dashboards with metrics presence. Hence this test not passing, I guess. Can we think about a simple alternative for that? Maybe it requires some discussion?

done in 71da325 with other enhancements described below

jpinsonneau · 2023-07-11T15:54:29Z

Updated this PRs to address all feedback gathered:

tie metrics to dashboards using tags. Check FilterDashboardRows helper
added missing metrics definitions
added / updated missing graphs in dashboards
- moved namespace flows to health dashboard
added a new health flag (default) to ignore extra flows graphs per node / namespace / workload

Here are examples of dashboards with no filters:

jotak · 2023-07-13T08:48:47Z

api/v1beta1/flowcollector_types.go

-	// Available tags are: `egress`, `ingress`, `flows`, `bytes`, `packets`, `namespaces`, `nodes`, `workloads`.
-	//+kubebuilder:default:={"egress","packets"}
+	// Available tags are: `egress`, `ingress`, `flows`, `bytes`, `packets`, `namespaces`, `nodes`, `workloads`, `health`.
+	//+kubebuilder:default:={"egress","packets", "health"}


this would ignore health by default, this isn't what we want?

Maybe a better naming can be found ?

The metrics flagged with health are only flows metrics definitions per nodes / namespaces / workloads

The health dashboard will look like as before by default

(or health is not what I thought it is )

It seems like, at least for now, "health" fully overlaps with "flows" so we could keep just "flows"

github-actions · 2023-07-13T08:51:17Z

New images:

quay.io/netobserv/network-observability-operator:3e2639c
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-3e2639c
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-3e2639c

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:3e2639c make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-3e2639c

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-3e2639c
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

jotak · 2023-07-13T08:59:20Z

controllers/flowlogspipeline/metrics_definitions/namespace_egress_bytes_total.yaml

+      - name: namespace_egress_bytes_total
+        type: counter
+        valuekey: Bytes
+        filter:


After rebasing, you'll get the existing metrics changed like:

filters: - key: FlowDirection value: "1" - key: Duplicate value: "false"

You'll need to make the same change on newly added metrics as well (for all metrics counting bytes or packets)

jotak · 2023-07-13T09:53:37Z

pkg/helper/dashboard.go

+
+import "encoding/json"
+
+func FilterDashboardRows(dashboard string, ignoreFlags []string) (string, error) {


So, if I'm correct, tags are duplicated: in every dashboard row (for dashboards filtering) and in metrics definitions (for FLP confgen). I guess it's ok for now but I think it could be error prone, especially as the dashboard jsons are very noisy and hard to review. So maybe we can think about how would could "secure" this, either for right now or for a follow-up. I can imagine:

modifying FLP confgen so that it gives us the list of excluded and included metrics; then based on that, we would do the dashboards filtering based on a grep (Contains) on the promql expression

or do the same thing without confgen, but it means the operator parsing the metrics definition files

or, since those dashboards are statically defined, we could also just write a unit test that validates that the dashboard tags correctly match the metric defs tags

wdyt?

btw speaking of unit test I think it would be good to have one here that ensures the dashboard files are correctly parsed without errors

yes the first option would be the best. We can dig in that, maybe in a followup ?

The parsing is already ensured by this test
https://github.com/jpinsonneau/network-observability-operator/blob/71da325c421290cf515e8ef6aed34b6b60db0577/controllers/flowcollector_controller_test.go#L336-L348

jotak · 2023-07-13T11:39:18Z

controllers/flowlogspipeline/metrics_definitions/namespace_flows_total.yaml

 tags:
  - flows
  - namespaces
+  - health


I'm struggling with the tag system to be able to get exactly what we want; I think we would need this kind of thing, to allow more fine-grained selection:

Suggested change

tags:

- flows

- namespaces

- health

tags:

- flows

- namespaces

- namespaces-flows

wdyt?
(btw currently I removed health because it fully overlaps with flows .. I don't know if it may become useful again later)

so we will end up with:

nodes-flows

namespaces-flows

workloads-flows

that could be disabled separately for health dashboard. That makes sense to me

controllers/infra_health_dashboard.json

codecov · 2023-07-13T15:24:33Z

Codecov Report

Merging #388 (69b1105) into main (0d6ebe8) will increase coverage by 1.66%.
Report is 1 commits behind head on main.
The diff coverage is 85.42%.

@@            Coverage Diff             @@
##             main     #388      +/-   ##
==========================================
+ Coverage   53.67%   55.33%   +1.66%     
==========================================
  Files          44       45       +1     
  Lines        5559     5824     +265     
==========================================
+ Hits         2984     3223     +239     
- Misses       2359     2381      +22     
- Partials      216      220       +4

Flag	Coverage Δ
unittests	`55.33% <85.42%> (+1.66%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
api/v1alpha1/zz_generated.conversion.go	`0.25% <0.00%> (-0.01%)`	⬇️
api/v1beta1/flowcollector_types.go	`100.00% <ø> (ø)`
controllers/flowlogspipeline/flp_reconciler.go	`67.85% <ø> (-5.32%)`	⬇️
controllers/flowcollector_controller.go	`53.71% <20.00%> (-1.41%)`	⬇️
controllers/flowlogspipeline/flp_common_objects.go	`81.21% <60.00%> (-0.56%)`	⬇️
...ntrollers/flowlogspipeline/flp_monolith_objects.go	`88.46% <75.00%> (ø)`
...ontrollers/flowlogspipeline/flp_transfo_objects.go	`93.68% <75.00%> (ø)`
controllers/flowcollector_objects.go	`46.91% <77.77%> (+9.62%)`	⬆️
pkg/helper/dashboard.go	`96.49% <96.49%> (ø)`
api/v1beta1/zz_generated.deepcopy.go	`55.36% <100.00%> (+1.58%)`	⬆️
... and 5 more

... and 1 file with indirect coverage changes

github-actions · 2023-07-17T07:40:03Z

New images:

quay.io/netobserv/network-observability-operator:e2f53eb
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-e2f53eb
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-e2f53eb

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:e2f53eb make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-e2f53eb

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-e2f53eb
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

github-actions · 2023-07-17T10:27:29Z

New images:

quay.io/netobserv/network-observability-operator:eb2201e
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-eb2201e
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-eb2201e

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:eb2201e make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-eb2201e

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-eb2201e
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

github-actions · 2023-07-21T16:22:51Z

New images:

quay.io/netobserv/network-observability-operator:5493d75
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-5493d75
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-5493d75

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:5493d75 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-5493d75

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-5493d75
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

memodi · 2023-07-21T18:19:12Z

@jpinsonneau @jotak - I am not sure what's the intention of "Application" panel here, based on looking at panels, I find difficult to spot differences between Application and Infrastructure.

I was thinking Application would be something that's NOT netobserv|openshift.* and to have view something like this:

wdyt?

jotak · 2023-07-24T06:59:56Z

Yes, we should exclude netobserv from applications, I agree

Co-authored-by: Joel Takvorian <joel.takvorian@qaraywa.net>

Namespace-based metrics are included in workload's ones, so this would result in unnecesary metrics creation when both of them are enabled So namespaces tags is now disabled by default; to still show namespace-based metrics in dashboard, we need to use the workload metrics aggregated by namespace. So We need to duplicate the related rows in the dashboard.

jotak · 2023-07-24T16:23:11Z

(FYI I just rebased the branch)

Changed for a programmatic approach, to handle more easily json manipulation and avoid the very large (and very redundant) dashboard json Also change from using topk5 to topk10 as it renders better charts (less chunks in timeseries)

Inject namespace for infra filters

jpinsonneau · 2023-07-26T08:40:37Z

Tested back with @jotak changes and works fine:

@memodi note that @jotak excluded netobserv from Applications panels

I only see it appearing for kube api queries (default -> netobserv)

github-actions · 2023-07-26T08:45:50Z

New images:

quay.io/netobserv/network-observability-operator:9f7f8f3
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-9f7f8f3
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-9f7f8f3

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:9f7f8f3 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-9f7f8f3

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-9f7f8f3
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

jotak · 2023-07-26T09:23:38Z

@memodi also another change that I did is to run a top10 instead of top5, because I found top5 has a tendency to often show different metrics for different timestamps, which results in "chunking" more the timeseries. With top10 I feel like the visualization is better / more consistent; sounds good?

(of course it entirely depends on the number and the nature of the workloads that people will be running on their clusters)

memodi · 2023-07-31T16:00:43Z

/ok-to-test

jotak · 2023-07-31T16:02:38Z

/approve
/lgtm

openshift-ci · 2023-07-31T16:02:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jotak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added the do-not-merge/work-in-progress label Jul 3, 2023

jpinsonneau force-pushed the 1075 branch from ee52b76 to 60bb1b6 Compare July 4, 2023 10:59

jpinsonneau requested review from jotak and OlivierCazade July 4, 2023 11:07

jpinsonneau marked this pull request as ready for review July 4, 2023 11:07

openshift-ci bot removed the do-not-merge/work-in-progress label Jul 4, 2023

jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 4, 2023

jotak reviewed Jul 4, 2023

View reviewed changes

controllers/flowcollector_controller.go Outdated Show resolved Hide resolved

jotak reviewed Jul 4, 2023

View reviewed changes

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 4, 2023

jpinsonneau mentioned this pull request Jul 6, 2023

NETOBSERV-1017 Metrics: pod or service talkers #391

Closed

jpinsonneau force-pushed the 1075 branch from b33d21e to 24cc2df Compare July 6, 2023 15:55

jpinsonneau mentioned this pull request Jul 7, 2023

NETOBSERV-140 & NETOBSERV-141 Top pod / service talkers #392

Closed

jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 13, 2023

jotak reviewed Jul 13, 2023

View reviewed changes

controllers/infra_health_dashboard.json Outdated Show resolved Hide resolved

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 13, 2023

jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 17, 2023

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 17, 2023

jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 17, 2023

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 17, 2023

jpinsonneau and others added 9 commits July 24, 2023 18:22

flow metrics dashboard

6d07566

svc monitor check

eb39ed0

rebase fix

36b3613

tied metrics / dashboards

734467b

infra_health_dashboard flows overhead

ef707f2

Co-authored-by: Joel Takvorian <joel.takvorian@qaraywa.net>

fine grained tags

778ebd7

Update bundle

0ca6782

Fix linter issues

67fb951

jotak force-pushed the 1075 branch from 5aa29c1 to 5036d4f Compare July 24, 2023 16:22

openshift-ci bot removed the lgtm label Jul 24, 2023

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 24, 2023

jotak and others added 3 commits July 24, 2023 18:24

Update metrics definitions

d550e5d

Inject namespace for infra filters

614de49

Changed for a programmatic approach, to handle more easily json manipulation and avoid the very large (and very redundant) dashboard json Also change from using topk5 to topk10 as it renders better charts (less chunks in timeseries)

Merge pull request #1 from jotak/split-infra

69b1105

Inject namespace for infra filters

jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 26, 2023

openshift-ci bot added the lgtm label Jul 31, 2023

openshift-ci bot added the approved label Jul 31, 2023

openshift-merge-robot merged commit fe0ace8 into netobserv:main Jul 31, 2023
9 checks passed

memodi mentioned this pull request Aug 23, 2023

NETOBSERV-1232 "netobserv" NS is not excluded from Applications for Flow dashboards #404

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NETOBSERV-1075 Dashboard feedback: split infra / applications #388

NETOBSERV-1075 Dashboard feedback: split infra / applications #388

jpinsonneau commented Jul 3, 2023 •

edited

Loading

github-actions bot commented Jul 4, 2023

jotak Jul 4, 2023

jpinsonneau Jul 11, 2023

jpinsonneau commented Jul 11, 2023

jotak Jul 13, 2023

jpinsonneau Jul 13, 2023

jotak Jul 13, 2023

jotak Jul 13, 2023

github-actions bot commented Jul 13, 2023

jotak Jul 13, 2023

jotak Jul 13, 2023

jotak Jul 13, 2023

jpinsonneau Jul 13, 2023

jotak Jul 13, 2023 •

edited

Loading

jpinsonneau Jul 13, 2023

jpinsonneau Jul 13, 2023

codecov bot commented Jul 13, 2023 •

edited

Loading

github-actions bot commented Jul 17, 2023

github-actions bot commented Jul 17, 2023

github-actions bot commented Jul 21, 2023

memodi commented Jul 21, 2023

jotak commented Jul 24, 2023

jotak commented Jul 24, 2023

jpinsonneau commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

jotak commented Jul 26, 2023 •

edited

Loading

memodi commented Jul 31, 2023

jotak commented Jul 31, 2023

openshift-ci bot commented Jul 31, 2023


		import "encoding/json"

		func FilterDashboardRows(dashboard string, ignoreFlags []string) (string, error) {

NETOBSERV-1075 Dashboard feedback: split infra / applications #388

NETOBSERV-1075 Dashboard feedback: split infra / applications #388

Conversation

jpinsonneau commented Jul 3, 2023 • edited Loading

github-actions bot commented Jul 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpinsonneau commented Jul 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jotak Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 13, 2023 • edited Loading

Codecov Report

github-actions bot commented Jul 17, 2023

github-actions bot commented Jul 17, 2023

github-actions bot commented Jul 21, 2023

memodi commented Jul 21, 2023

jotak commented Jul 24, 2023

jotak commented Jul 24, 2023

jpinsonneau commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

jotak commented Jul 26, 2023 • edited Loading

memodi commented Jul 31, 2023

jotak commented Jul 31, 2023

openshift-ci bot commented Jul 31, 2023

jpinsonneau commented Jul 3, 2023 •

edited

Loading

jotak Jul 13, 2023 •

edited

Loading

codecov bot commented Jul 13, 2023 •

edited

Loading

jotak commented Jul 26, 2023 •

edited

Loading