Expose new Kafka parameters + tweak defaults #170

jotak · 2022-09-23T07:05:21Z

Follow-up on netobserv/flowlogs-pipeline#309

PullQueueCapacity: capacity of the kafka consumer client internal
queue
PullMaxBytes: indicates to the broker the maximum batch size that the consumer will accept.

Update
Add producerBatchSize, add metric, tweak defaults
3fc9186

New metric "bandwidth_per_namespace" allows to get flows sizes (ie.
extracted Bytes sum) per source and dest namespace
That proves useful for self-monitoring, eg. to get the overhead of
netobserv's traffic compared to the rest of the traffic.

Also, setting a much higher cacheMaxSize for the ebpf agent, to reduce
traffic overhead.

Also raising limits for agent+FLP to 800MB

And tweaking loki client default to decrease number of retries and increase batch size

jotak · 2022-09-23T07:10:18Z

@mariomac @OlivierCazade @stleerh here I just expose the new parameters from the kafka client that Olivier added to FLP, maybe we should expose more, what do you think?

There's for instance BatchMaxLen, which impacts on batching within FLP pipeline. I'm not sure if it's interesting to tweak, as these batches are not sent over the network (bigger batches on that won't decrease network usage as far as I can tell). Also, if we decide to expose that, I wonder if we could then tie kafka's PullMaxBytes with that (the former counts in number of flows, the latter in number of bytes, flows have approx. a constant size so we may make a correlation)

jotak · 2022-09-23T07:13:27Z

Also, this is only for the consumer. What about the producer? Any tweaks to expose?

jotak · 2022-09-23T07:14:58Z

api/v1alpha1/flowcollector_types.go

+
+	//+kubebuilder:default:=100
+	// pullQueueCapacity defines the capacity of the internal message queue used in the Kafka consumer client.
+	PullQueueCapacity int `json:"pullQueueCapacity"`


maybe we should rather name it ConsumerQueueCapacity ? (that would be more consistent with the other CRD update PR)

I think we need to specify QueueCapacity, BatchSize and LingerMS as a subsection of each component: for the agent and for FLP.

BTW I think LingerMS in only for Producers (the agent, in this case)

jotak · 2022-09-23T07:15:43Z

api/v1alpha1/flowcollector_types.go

+
+	//+kubebuilder:default:=1000000
+	// pullMaxBytes indicates to the broker the maximum batch size that the consumer will accept. Default: 1MB.
+	PullMaxBytes int `json:"pullMaxBytes"`


Same here: ConsumerMaxBytes ?

mariomac · 2022-09-23T08:03:54Z

BatchMaxBytes is certainly confusing. It could currently have an effect in the Loki load (as it acts as a retainer in the pipeline). In the PR I'm working, this argument won't have any effect when you combine Kafka+Protobuf, as messages already come batched from the Agent.

As they mention, it's important to be able to setup the Kafka's batch.size (I initially thought it was set by BatchMaxBytes 😅) and the linger.ms. However I think linger can impact more in the producer side. I double checked that the eBPF agent allows setting the batch size but not the linger so I will add this property as part of my current kafka optimizations there.

mariomac · 2022-09-23T08:06:41Z

Apart of these values being configured at the broker side, as currently specified, we'd need to allow specifying these per producer and consumer, as different producer/consumption patterns would require different batchSize/linger to optimize CPU and consumption.

mariomac · 2022-09-23T09:20:28Z

UPDATE: I reminded that our Golang library, Segmentio's Kafka-Go, doesn't allow setting the linger value. So we will have to play only with the max batch size.

jotak · 2022-09-26T06:52:39Z

About linger: it seems indeed to be lacking in segmentio's api, however, is that really an issue since we already have our own batch/timeout implemented in the agent for grouping flows?

mariomac · 2022-09-26T07:44:07Z

@jotak I think we can just avoid linger configuration.

mariomac · 2022-09-26T12:29:24Z

api/v1alpha1/flowcollector_types.go

@@ -215,6 +215,21 @@ type FlowCollectorKafka struct {
 	// tls client configuration.
 	// +optional
 	TLS ClientTLS `json:"tls"`
+


I'd move Consumer... vars to FLP configuration and ProducerBatchSize to the agent configuration.

If we use Kafka to connect future functionalities, the queue capacities and batch sizes should probably be different for each component, according to their load amount and pattern.

ok, sounds like a good argument, especially as we plan to allow kafka exports

@mariomac done

stleerh · 2022-09-26T16:00:00Z

Also, this is only for the consumer. What about the producer? Any tweaks to expose?

See https://issues.redhat.com/browse/NETOBSERV-593.

stleerh · 2022-09-26T16:27:53Z

vendor/github.com/netobserv/flowlogs-pipeline/pkg/config/stage_params.go

@@ -53,6 +53,10 @@ func NewConnTrackParams(name string, ct api.ConnTrack) StageParam {
 	return StageParam{Name: name, Extract: &Extract{Type: api.ConnTrackType, ConnTrack: &ct}}
 }

+func NewTimbasedParams(name string, ct api.ExtractTimebased) StageParam {


Misspelling: NewTimebasedParams

stleerh · 2022-09-26T16:28:01Z

vendor/github.com/netobserv/flowlogs-pipeline/pkg/confgen/flowlogs2metrics_config.go

@@ -78,6 +83,8 @@ func (cg *ConfGen) GenerateTruncatedConfig() []config.StageParam {
 			parameters[i] = config.NewTransformNetworkParams("transform_network", *cg.config.Transform.Network)
 		case "extract_aggregate":
 			parameters[i] = config.NewAggregateParams("extract_aggregate", cg.aggregateDefinitions)
+		case "extract_timebased":
+			parameters[i] = config.NewTimbasedParams("extract_timebased", cg.timebasedTopKs)


Misspelling: NewTimebasedParams

this is in vendors, so that's actually in FLP repo

stleerh · 2022-09-26T16:34:07Z

It would be good if there was a generic way to pass any config settings to Kafka as an advanced feature. Then it could expose the more common ones only.

jotak · 2022-09-27T11:49:19Z

It would be good if there was a generic way to pass any config settings to Kafka as an advanced feature. Then it could expose the more common ones only.

yeah, I don't know exactly what's the best way to do that, at some point I was wondering if we shouldn't have a second CRD for advanced configuration .. the point being to try keeping things simple for the average user. iirc @OlivierCazade suggested also something like that previously

mariomac · 2022-09-28T06:43:35Z

With respect to your conversation @jotak , @stleerh : would it be possible then just to have an "extraConfig" field that is just backed by a map that passes all its entries as properties to Kafka? Something similar to the env property of the eBPF agent.

- PullQueueCapacity: capacity of the kafka consumer client internal queue - PullMaxBytes: indicates to the broker the maximum batch size that the consumer will accept. Add producerBatchSize, add metric, tweak defaults New metric "bandwidth_per_namespace" allows to get flows sizes (ie. extracted Bytes sum) per source and dest namespace That proves useful for self-monitoring, eg. to get the overhead of netobserv's traffic compared to the rest of the traffic. Also, setting a much higher cacheMaxSize for the ebpf agent, to reduce traffic overhead. Also raising limits for agent+FLP to 800MB

jotak · 2022-09-29T08:22:21Z

PR rebased

@mariomac are you thinking exposing that instead of having these new fields in the CR? (hence we would reject this PR?)
That's probably a topic we should discuss in meeting

We can also take a look at how the loki-operator works, seems it also allow overriding some settings, I'm curious if their approach differs

mariomac · 2022-09-29T10:44:06Z

@jotak I'm not against exposing the values via CRD, but I think that at the mid term it's better to just allow overriding some internals, so we don't need to create a patch and release a new version every time we want to tweak some internals for a given customer.

However, allow overriding the FLP configuration might be complex, as it is not just a properties map but a structured and connected graph

jotak · 2022-09-29T13:39:49Z

/approve

openshift-ci · 2022-09-29T13:39:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jotak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kafka e2e test code cleanup

jotak requested review from mariomac and OlivierCazade September 23, 2022 07:05

jotak requested a review from stleerh September 23, 2022 07:10

jotak commented Sep 23, 2022

View reviewed changes

jotak force-pushed the new-kafka-settings branch 2 times, most recently from 47ce607 to 3fc9186 Compare September 26, 2022 12:10

mariomac reviewed Sep 26, 2022

View reviewed changes

jotak force-pushed the new-kafka-settings branch 3 times, most recently from ae5a216 to 4274c62 Compare September 26, 2022 13:30

jotak changed the title ~~Expose new Kafka parameters~~ Expose new Kafka parameters + tweak defaults Sep 26, 2022

stleerh reviewed Sep 26, 2022

View reviewed changes

jotak force-pushed the new-kafka-settings branch from 4274c62 to e2824b5 Compare September 29, 2022 08:16

mariomac approved these changes Sep 29, 2022

View reviewed changes

openshift-ci bot assigned mariomac Sep 29, 2022

openshift-ci bot added the lgtm label Sep 29, 2022

openshift-ci bot added the approved label Sep 29, 2022

openshift-merge-robot merged commit 9e69ef2 into netobserv:main Sep 29, 2022

KalmanMeth pushed a commit to KalmanMeth/network-observability-operator that referenced this pull request Feb 13, 2023

Merge pull request netobserv#170 from KalmanMeth/kafka-e2e-cleanup

a1d72e8

kafka e2e test code cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose new Kafka parameters + tweak defaults #170

Expose new Kafka parameters + tweak defaults #170

jotak commented Sep 23, 2022 •

edited

Loading

jotak commented Sep 23, 2022

jotak commented Sep 23, 2022

jotak Sep 23, 2022

mariomac Sep 23, 2022

jotak Sep 23, 2022

mariomac commented Sep 23, 2022

mariomac commented Sep 23, 2022

mariomac commented Sep 23, 2022

jotak commented Sep 26, 2022

mariomac commented Sep 26, 2022

mariomac Sep 26, 2022

jotak Sep 26, 2022

jotak Sep 26, 2022

stleerh commented Sep 26, 2022

stleerh Sep 26, 2022

stleerh Sep 26, 2022

jotak Sep 27, 2022

stleerh commented Sep 26, 2022

jotak commented Sep 27, 2022 •

edited

Loading

mariomac commented Sep 28, 2022

jotak commented Sep 29, 2022

mariomac commented Sep 29, 2022

jotak commented Sep 29, 2022

openshift-ci bot commented Sep 29, 2022

Expose new Kafka parameters + tweak defaults #170

Expose new Kafka parameters + tweak defaults #170

Conversation

jotak commented Sep 23, 2022 • edited Loading

jotak commented Sep 23, 2022

jotak commented Sep 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mariomac commented Sep 23, 2022

mariomac commented Sep 23, 2022

mariomac commented Sep 23, 2022

jotak commented Sep 26, 2022

mariomac commented Sep 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stleerh commented Sep 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stleerh commented Sep 26, 2022

jotak commented Sep 27, 2022 • edited Loading

mariomac commented Sep 28, 2022

jotak commented Sep 29, 2022

mariomac commented Sep 29, 2022

jotak commented Sep 29, 2022

openshift-ci bot commented Sep 29, 2022

jotak commented Sep 23, 2022 •

edited

Loading

jotak commented Sep 27, 2022 •

edited

Loading