Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial support for labels #447

Merged
merged 2 commits into from
Sep 13, 2023
Merged

Conversation

KalmanMeth
Copy link
Collaborator

No description provided.

@openshift-ci
Copy link

openshift-ci bot commented Jun 25, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kalmanmeth. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Jun 25, 2023

Codecov Report

Patch coverage: 88.88% and project coverage change: +0.08% 🎉

Comparison is base (e958fe7) 66.03% compared to head (9f27b00) 66.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #447      +/-   ##
==========================================
+ Coverage   66.03%   66.11%   +0.08%     
==========================================
  Files          94       94              
  Lines        6921     6948      +27     
==========================================
+ Hits         4570     4594      +24     
- Misses       2092     2094       +2     
- Partials      259      260       +1     
Flag Coverage Δ
unittests 66.11% <88.88%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/api/transform_filter.go 100.00% <ø> (ø)
pkg/pipeline/transform/transform_filter.go 89.13% <88.88%> (-0.11%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/api.md Outdated
@@ -148,6 +148,7 @@ Following is the supported API format for network transformations:
type: (enum) one of the following:
add_regex_if: add output field if input field satisfies regex pattern from parameters field
add_if: add output field if input field satisfies criteria from parameters field
add_label_if: add output field to list of labels if input field satisfies criteria from parameters field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we should have raised earlier as it also concerns "add_regex_if" and "add_if", but this doesn't look like a network-specific transformation, shouldn't that rather go in generic_transform?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotak Should we do this in a separate PR? We would move add_if, add_regex_if, add_label_if to generic transform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah no problem with me to do that later

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth if you can open an issue so we will not forget? This will be a "Breaking changes" one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth I see that the way you set the value in the code is labels[rule.Output] = rule.c --- can you extend the text to explain that output =key and Assignee=value

@jotak
Copy link
Member

jotak commented Jun 28, 2023

The code looks good, but is that part of the changes needed for multi-cloud? I don't really see how that will be used, can you elaborate on that?

Comment on lines 188 to 192
labelsString := ""
for key, value := range labels {
tmpString := fmt.Sprintf("%s=%s,", key, value)
labelsString += tmpString
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more efficient to use strings.Builder for string concatenation.
See here for example:
https://stackoverflow.com/a/47798475

@jotak
Copy link
Member

jotak commented Jun 28, 2023

I think I get the intent finally but I'm not sure this is an optimal approach: basically (and correct if I'm wrong), what we'll need to do is to label every flow with some static labels (e.g: "siteid=foo") - I wasn't sure at first why it is "AddLabelIf, but I guess it must be conditional and set only if the source/destination is known to be in cluster - which, after k8s enrichment, can boil down to checking if a field such as SrcK8S_Name is set or unset.

How would we configure this feature for that?
Maybe something like:

	{Type: api.OpAddLabelIf, Input: "SrcK8S_Name", Parameters: `!=""`, Output: "SrcK8S_Labels", Assignee: staticLabels},

First thing - I haven't dig a lot in the evaluation library - the !="" thing doesn't seem to work for unset fields? Is there a way to check for absence / nullity?

Second thing that is more a concern to me, is the performance impact implied by this evaluation library compared to a more specific implementation. Running some microbenchmarks calling transformer.Transform(entry) gives me x12 allocs/op and x12 ns/op with this generic solution using the evaluation library, compared to a specific implementation (basically, a if _, ok := in[rule.Input]; ok { out[rule.Output] = rule.Labels}) - which would correspond to a "AddIfPresent" rule.

To be honest I don't know how big of a difference that would be at scale, but given this rule would be applied on every flow I think we must try to optimize anyway.

@KalmanMeth
Copy link
Collaborator Author

Second thing that is more a concern to me, is the performance impact implied by this evaluation library compared to a more specific implementation.

Do you think we should add a cache so we only have to perform each evaluation once, and then future identical comparisons will be obtained from the cache?

@jotak
Copy link
Member

jotak commented Jul 25, 2023

@KalmanMeth at the moment I still don't see how that's going to be used in the context of k8s metadata labels, so it's hard for me to review/ answer. Especially I don't get what's going to be set in the "If" part of that filter. Can you provide an example?

What I said in my previous comment was invalidated by this PR since it shows that we're adding cluster info unconditionally, hence without using this filter

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpinsonneau @KalmanMeth The main question and value proposition for this "labels" approach is if we can search a subset of the key=value labels from the list using some Loki query capabilities. @KalmanMeth did you check that manually somehow? @jpinsonneau @jotak any inputs from your POV on that ( one thing I thought about was the wrap the labels with json curly brachets, but I am not sure if this is needed or will help )

docs/api.md Outdated
@@ -148,6 +148,7 @@ Following is the supported API format for network transformations:
type: (enum) one of the following:
add_regex_if: add output field if input field satisfies regex pattern from parameters field
add_if: add output field if input field satisfies criteria from parameters field
add_label_if: add output field to list of labels if input field satisfies criteria from parameters field
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth if you can open an issue so we will not forget? This will be a "Breaking changes" one

@@ -81,6 +83,17 @@ func (n *Network) Transform(inputEntry config.GenericMap) (config.GenericMap, bo
}
outputEntry[rule.Output+"_Evaluate"] = true
}
case api.OpAddLabelIf:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth this is an "expensive" operation and I envision that we will also want to add "static" labels so can you add another operation that is not with "if" but just a simple AddLabel operation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

docs/api.md Outdated
@@ -148,6 +148,7 @@ Following is the supported API format for network transformations:
type: (enum) one of the following:
add_regex_if: add output field if input field satisfies regex pattern from parameters field
add_if: add output field if input field satisfies criteria from parameters field
add_label_if: add output field to list of labels if input field satisfies criteria from parameters field
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth I see that the way you set the value in the code is labels[rule.Output] = rule.c --- can you extend the text to explain that output =key and Assignee=value

@jotak
Copy link
Member

jotak commented Jul 26, 2023

@eranra @KalmanMeth : Initially, here's how I would have seen a label mapper:

Something that is part of kube enrichment, configured like that:

- name: enrich
  transform:
    type: network
    network:
      rules:
      - input: SrcAddr
        output: SrcK8S
        type: add_kubernetes
        labelMap:
          fromNode:
          - match: "topology\.kubernetes\.io/.*"
            output: NodeLabels
          fromPod:
          - match: ".*"
            output: PodLabels

and for instance when applied to a pod that is running on a node that has the following labels:

kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-139-105.eu-west-3.compute.internal
kubernetes.io/os=linux
node-role.kubernetes.io/worker=
node.kubernetes.io/instance-type=m6i.xlarge
node.openshift.io/os_id=rhcos
topology.ebs.csi.aws.com/zone=eu-west-3a
topology.kubernetes.io/region=eu-west-3
topology.kubernetes.io/zone=eu-west-3a

... it would result in json:

{
SrcK8S_Name: "my-app",
...
SrcK8S_NodeLabels: "topology.kubernetes.io/region=eu-west-3,topology.kubernetes.io/zone=eu-west-3a",
SrcK8S_PodLabels: "app=my-app,controller-revision-hash=5ccbdbc54,pod-template-generation=1,version=main",
}

But tbh, while I think it would be nice to have that, I also see that it would come at a higher cost than labels statically added from the operator when they are supposed to be constant in a given cluster, such as the topology.kubernetes.io/region / zone listed above. So maybe for these ones, which are the ones interesting for our multi-cluster work, simply adding static labels as text in the same was as for the cluster ID (without "If") would not only be good enough, but also more performant

@jotak
Copy link
Member

jotak commented Jul 26, 2023

ie using the same "add_field_if_doesnt_exist" like here: https://github.com/netobserv/network-observability-operator/pull/386/files#diff-c761638f65b982f9ecf3717616920a5a3c0bb1670fd8ee7e9ab51afc4f166581R578-R580

... but containing a string expression of some chosen labels instead of the cluster id

@KalmanMeth KalmanMeth changed the title initial support for labels in network_transform initial support for labels Sep 6, 2023
@KalmanMeth
Copy link
Collaborator Author

I moved the add_labels to transform_filter, now that we have also moved the add_if etc to transform_filter.
I also added a generic add_label (in addition to add_label_if) and a generic add_field which add the label/field directly without performing tests on existing fields.

Comment on lines 27 to +35
RemoveEntryIfDoesntExist string `yaml:"remove_entry_if_doesnt_exist" json:"remove_entry_if_doesnt_exist" doc:"removes the entry if the field does not exist"`
RemoveEntryIfEqual string `yaml:"remove_entry_if_equal" json:"remove_entry_if_equal" doc:"removes the entry if the field value equals specified value"`
RemoveEntryIfNotEqual string `yaml:"remove_entry_if_not_equal" json:"remove_entry_if_not_equal" doc:"removes the entry if the field value does not equal specified value"`
AddField string `yaml:"add_field" json:"add_field" doc:"adds (input) field to the entry; overrides previous value if present (key=input, value=value)"`
AddFieldIfDoesntExist string `yaml:"add_field_if_doesnt_exist" json:"add_field_if_doesnt_exist" doc:"adds a field to the entry if the field does not exist"`
AddFieldIf string `yaml:"add_field_if" json:"add_field_if" doc:"add output field set to assignee if input field satisfies criteria from parameters field"`
AddRegExIf string `yaml:"add_regex_if" json:"add_regex_if" doc:"add output field if input field satisfies regex pattern from parameters field"`
AddLabel string `yaml:"add_label" json:"add_label" doc:"add (input) field to list of labels with value taken from Value field (key=input, value=value)"`
AddLabelIf string `yaml:"add_label_if" json:"add_label_if" doc:"add output field to list of labels with value taken from assignee field if input field satisfies criteria from parameters field"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: some doc strings are written in imperative mood "add (input) field...", and some in present simple "adds (input) field...". Which is preferred?

@KalmanMeth KalmanMeth merged commit f2f3346 into netobserv:main Sep 13, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants