metrics: Add SELinux related audit metrics to the enricher #916

jhrozek · 2022-05-03T09:14:53Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

On dispatching an AVC line from the enricher via GRPC, also issue a
metric with the details of the AVC, similarly to what we do in the
seccomp case.

Which issue(s) this PR fixes:

Fixes #609

Does this PR have test?

No, I'm not sure if our way of getting metrics through the service
is stable enough for tests.

Special notes for your reviewer:

N/A

Does this PR introduce a user-facing change?

The security_profiles_operator_selinux_profile_audit_total metric was actually
enabled and uses the appropriate labels scraped from the audit.log file.

codecov-commenter · 2022-05-03T09:18:51Z

Codecov Report

Merging #916 (6dd8132) into main (72a0786) will increase coverage by 0.10%.
The diff coverage is 61.53%.

@@            Coverage Diff             @@
##             main     #916      +/-   ##
==========================================
+ Coverage   50.64%   50.75%   +0.10%     
==========================================
  Files          42       42              
  Lines        4626     4660      +34     
==========================================
+ Hits         2343     2365      +22     
- Misses       2198     2209      +11     
- Partials       85       86       +1

internal/pkg/daemon/metrics/grpc.go

internal/pkg/daemon/enricher/enricher.go

internal/pkg/daemon/metrics/grpc.go

api/grpc/metrics/api.proto

jhrozek · 2022-05-03T11:36:36Z

Thank you for the review @saschagrunert

jhrozek · 2022-05-03T12:01:41Z

/test pull-security-profiles-operator-test-e2e

JAORMX

I'm concerned about this dramatically increasing the cardinality of the SELinux metrics...

From an article on the topic

As a general rule of thumb I'd avoid having any metric whose cardinality on a /metrics could potentially go over 10 due to growth in label values. The way I think of it is that if it has already grown to be 10 today, it might be 15 in a year's time. It can be additionally okay to have no more than a literal handful within a given Prometheus that go to 100. These are only rules of thumb, if you know for example that the number of replicas of your service will always be tiny then you can exceed these a bit, conversely if you're expecting thousands of replicas I'd carefully consider every time series on the /metrics.

What are you trying to get with these metrics? Maybe there can be another solution instead of adding these new labels.

JAORMX · 2022-05-04T05:49:11Z

internal/pkg/daemon/metrics/metrics.go

+				metricsLabelPerms,
+				metricsLabelScontext,
+				metricsLabelTcontext,
+				metricsLabelTclass,


wouldn't doing this dramatically increase the cardinality of this metric? I really think we shouldn't do this...

What real-world implication does increasing the cardinality have?

The main issue (real-world implication) is that increasing cardinality of metrics will lead to overloading Prometheus. Thus, by not adding only what's minimally needed for us to debug problems at scale we make the SPO a noisy neighbour in terms of metrics

jhrozek · 2022-05-04T07:36:37Z

I'm concerned about this dramatically increasing the cardinality of the SELinux metrics...

From an article on the topic

As a general rule of thumb I'd avoid having any metric whose cardinality on a /metrics could potentially go over 10 due to growth in label values. The way I think of it is that if it has already grown to be 10 today, it might be 15 in a year's time. It can be additionally okay to have no more than a literal handful within a given Prometheus that go to 100. These are only rules of thumb, if you know for example that the number of replicas of your service will always be tiny then you can exceed these a bit, conversely if you're expecting thousands of replicas I'd carefully consider every time series on the /metrics.

What are you trying to get with these metrics? Maybe there can be another solution instead of adding these new labels.

Have a way of observing what is being recorded. There's two reasons:

for the recording itself
I'm still thinking about "watching" workloads with a policy when introducing the policy into production and I was thinking about using the log enricher so that you could see what AVCs does the workload trigger and can adjust either the workload or the policy.

If the number of labels is a concern, I guess we could remove /some/ but at least the source and target contexts should stay so that you could see what is touching what and go to see the full log enricher logs.

jhrozek · 2022-05-04T07:37:23Z

btw the problem with "watching log enricher logs" is that each spod instance has their own. The metrics allow you to watch all of them from a central place.

saschagrunert

/hold for @JAORMX

jhrozek · 2022-05-04T11:42:43Z

OK, I removed the perm and the class. This still retains scontext and tcontext which should give an idea of who is touching what and brings the amount of labels to just one more than the already existing seccomp metric

JAORMX · 2022-05-04T11:47:59Z

OK, I removed the perm and the class. This still retains scontext and tcontext which should give an idea of who is touching what and brings the amount of labels to just one more than the already existing seccomp metric

I think this is a good compromise. Thanks!

jhrozek · 2022-05-05T14:46:11Z

ugh another CI failure, let me reopen the PR..

jhrozek · 2022-05-05T14:46:55Z

/hold cancel

On dispatching an AVC line from the enricher via GRPC, also issue a metric with the details of the AVC, similarly to what we do in the seccomp case.

saschagrunert · 2022-05-09T07:09:52Z

/retest

k8s-ci-robot · 2022-05-09T10:48:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JAORMX, jhrozek, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [JAORMX,jhrozek,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JAORMX · 2022-05-09T11:35:12Z

/retest

k8s-ci-robot requested review from cmurphy and JAORMX May 3, 2022 09:15

saschagrunert reviewed May 3, 2022

View reviewed changes

internal/pkg/daemon/metrics/grpc.go Outdated Show resolved Hide resolved

internal/pkg/daemon/enricher/enricher.go Outdated Show resolved Hide resolved

internal/pkg/daemon/metrics/grpc.go Outdated Show resolved Hide resolved

saschagrunert reviewed May 3, 2022

View reviewed changes

api/grpc/metrics/api.proto Outdated Show resolved Hide resolved

jhrozek force-pushed the selinux_metrics branch from 6dd8132 to 115631a Compare May 3, 2022 11:36

JAORMX requested changes May 4, 2022

View reviewed changes

k8s-ci-robot assigned JAORMX May 4, 2022

saschagrunert approved these changes May 4, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 4, 2022

k8s-ci-robot assigned saschagrunert May 4, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2022

jhrozek force-pushed the selinux_metrics branch from 115631a to c22ad97 Compare May 4, 2022 11:41

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2022

jhrozek force-pushed the selinux_metrics branch from c22ad97 to 61194c9 Compare May 4, 2022 11:43

JAORMX approved these changes May 4, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2022

jhrozek closed this May 5, 2022

jhrozek reopened this May 5, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2022

metrics: Add SELinux related audit metrics to the enricher

9670bb2

On dispatching an AVC line from the enricher via GRPC, also issue a metric with the details of the AVC, similarly to what we do in the seccomp case.

jhrozek force-pushed the selinux_metrics branch from 61194c9 to 9670bb2 Compare May 6, 2022 20:00

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2022

saschagrunert approved these changes May 9, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 9, 2022

k8s-ci-robot merged commit fe57597 into kubernetes-sigs:main May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: Add SELinux related audit metrics to the enricher #916

metrics: Add SELinux related audit metrics to the enricher #916

jhrozek commented May 3, 2022

codecov-commenter commented May 3, 2022

jhrozek commented May 3, 2022

jhrozek commented May 3, 2022

JAORMX left a comment

JAORMX May 4, 2022

jhrozek May 4, 2022

JAORMX May 4, 2022

jhrozek commented May 4, 2022

jhrozek commented May 4, 2022

saschagrunert left a comment

jhrozek commented May 4, 2022

JAORMX commented May 4, 2022

jhrozek commented May 5, 2022

jhrozek commented May 5, 2022

saschagrunert commented May 9, 2022

k8s-ci-robot commented May 9, 2022

JAORMX commented May 9, 2022

metrics: Add SELinux related audit metrics to the enricher #916

metrics: Add SELinux related audit metrics to the enricher #916

Conversation

jhrozek commented May 3, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR have test?

Special notes for your reviewer:

Does this PR introduce a user-facing change?

codecov-commenter commented May 3, 2022

Codecov Report

jhrozek commented May 3, 2022

jhrozek commented May 3, 2022

JAORMX left a comment

Choose a reason for hiding this comment

JAORMX May 4, 2022

Choose a reason for hiding this comment

jhrozek May 4, 2022

Choose a reason for hiding this comment

JAORMX May 4, 2022

Choose a reason for hiding this comment

jhrozek commented May 4, 2022

jhrozek commented May 4, 2022

saschagrunert left a comment

Choose a reason for hiding this comment

jhrozek commented May 4, 2022

JAORMX commented May 4, 2022

jhrozek commented May 5, 2022

jhrozek commented May 5, 2022

saschagrunert commented May 9, 2022

k8s-ci-robot commented May 9, 2022

JAORMX commented May 9, 2022