Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose new Kafka parameters + tweak defaults #170

Merged
merged 1 commit into from
Sep 29, 2022

Conversation

jotak
Copy link
Member

@jotak jotak commented Sep 23, 2022

Follow-up on netobserv/flowlogs-pipeline#309

  • PullQueueCapacity: capacity of the kafka consumer client internal
    queue
  • PullMaxBytes: indicates to the broker the maximum batch size that the consumer will accept.

Update
Add producerBatchSize, add metric, tweak defaults
3fc9186

New metric "bandwidth_per_namespace" allows to get flows sizes (ie.
extracted Bytes sum) per source and dest namespace
That proves useful for self-monitoring, eg. to get the overhead of
netobserv's traffic compared to the rest of the traffic.

Also, setting a much higher cacheMaxSize for the ebpf agent, to reduce
traffic overhead.

Also raising limits for agent+FLP to 800MB

And tweaking loki client default to decrease number of retries and increase batch size

@jotak
Copy link
Member Author

jotak commented Sep 23, 2022

@mariomac @OlivierCazade @stleerh here I just expose the new parameters from the kafka client that Olivier added to FLP, maybe we should expose more, what do you think?

There's for instance BatchMaxLen, which impacts on batching within FLP pipeline. I'm not sure if it's interesting to tweak, as these batches are not sent over the network (bigger batches on that won't decrease network usage as far as I can tell). Also, if we decide to expose that, I wonder if we could then tie kafka's PullMaxBytes with that (the former counts in number of flows, the latter in number of bytes, flows have approx. a constant size so we may make a correlation)

@jotak jotak requested a review from stleerh September 23, 2022 07:10
@jotak
Copy link
Member Author

jotak commented Sep 23, 2022

Also, this is only for the consumer. What about the producer? Any tweaks to expose?


//+kubebuilder:default:=100
// pullQueueCapacity defines the capacity of the internal message queue used in the Kafka consumer client.
PullQueueCapacity int `json:"pullQueueCapacity"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should rather name it ConsumerQueueCapacity ? (that would be more consistent with the other CRD update PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to specify QueueCapacity, BatchSize and LingerMS as a subsection of each component: for the agent and for FLP.

BTW I think LingerMS in only for Producers (the agent, in this case)


//+kubebuilder:default:=1000000
// pullMaxBytes indicates to the broker the maximum batch size that the consumer will accept. Default: 1MB.
PullMaxBytes int `json:"pullMaxBytes"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: ConsumerMaxBytes ?

@mariomac
Copy link
Contributor

BatchMaxBytes is certainly confusing. It could currently have an effect in the Loki load (as it acts as a retainer in the pipeline). In the PR I'm working, this argument won't have any effect when you combine Kafka+Protobuf, as messages already come batched from the Agent.

As they mention, it's important to be able to setup the Kafka's batch.size (I initially thought it was set by BatchMaxBytes 😅) and the linger.ms. However I think linger can impact more in the producer side. I double checked that the eBPF agent allows setting the batch size but not the linger so I will add this property as part of my current kafka optimizations there.

@mariomac
Copy link
Contributor

Apart of these values being configured at the broker side, as currently specified, we'd need to allow specifying these per producer and consumer, as different producer/consumption patterns would require different batchSize/linger to optimize CPU and consumption.

@mariomac
Copy link
Contributor

UPDATE: I reminded that our Golang library, Segmentio's Kafka-Go, doesn't allow setting the linger value. So we will have to play only with the max batch size.

@jotak
Copy link
Member Author

jotak commented Sep 26, 2022

About linger: it seems indeed to be lacking in segmentio's api, however, is that really an issue since we already have our own batch/timeout implemented in the agent for grouping flows?

@mariomac
Copy link
Contributor

@jotak I think we can just avoid linger configuration.

@jotak jotak force-pushed the new-kafka-settings branch 2 times, most recently from 47ce607 to 3fc9186 Compare September 26, 2022 12:10
@@ -215,6 +215,21 @@ type FlowCollectorKafka struct {
// tls client configuration.
// +optional
TLS ClientTLS `json:"tls"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move Consumer... vars to FLP configuration and ProducerBatchSize to the agent configuration.

If we use Kafka to connect future functionalities, the queue capacities and batch sizes should probably be different for each component, according to their load amount and pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds like a good argument, especially as we plan to allow kafka exports

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariomac done

@jotak jotak force-pushed the new-kafka-settings branch 3 times, most recently from ae5a216 to 4274c62 Compare September 26, 2022 13:30
@stleerh
Copy link
Contributor

stleerh commented Sep 26, 2022

Also, this is only for the consumer. What about the producer? Any tweaks to expose?

See https://issues.redhat.com/browse/NETOBSERV-593.

@jotak jotak changed the title Expose new Kafka parameters Expose new Kafka parameters + tweak defaults Sep 26, 2022
@@ -53,6 +53,10 @@ func NewConnTrackParams(name string, ct api.ConnTrack) StageParam {
return StageParam{Name: name, Extract: &Extract{Type: api.ConnTrackType, ConnTrack: &ct}}
}

func NewTimbasedParams(name string, ct api.ExtractTimebased) StageParam {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misspelling: NewTimebasedParams

@@ -78,6 +83,8 @@ func (cg *ConfGen) GenerateTruncatedConfig() []config.StageParam {
parameters[i] = config.NewTransformNetworkParams("transform_network", *cg.config.Transform.Network)
case "extract_aggregate":
parameters[i] = config.NewAggregateParams("extract_aggregate", cg.aggregateDefinitions)
case "extract_timebased":
parameters[i] = config.NewTimbasedParams("extract_timebased", cg.timebasedTopKs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misspelling: NewTimebasedParams

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in vendors, so that's actually in FLP repo

@stleerh
Copy link
Contributor

stleerh commented Sep 26, 2022

It would be good if there was a generic way to pass any config settings to Kafka as an advanced feature.  Then it could expose the more common ones only.

@jotak
Copy link
Member Author

jotak commented Sep 27, 2022

It would be good if there was a generic way to pass any config settings to Kafka as an advanced feature. Then it could expose the more common ones only.

yeah, I don't know exactly what's the best way to do that, at some point I was wondering if we shouldn't have a second CRD for advanced configuration .. the point being to try keeping things simple for the average user. iirc @OlivierCazade suggested also something like that previously

@mariomac
Copy link
Contributor

With respect to your conversation @jotak , @stleerh : would it be possible then just to have an "extraConfig" field that is just backed by a map that passes all its entries as properties to Kafka? Something similar to the env property of the eBPF agent.

- PullQueueCapacity: capacity of the kafka consumer client internal
  queue
- PullMaxBytes: indicates to the broker the maximum batch size that the consumer will accept.

Add producerBatchSize, add metric, tweak defaults

New metric "bandwidth_per_namespace" allows to get flows sizes (ie.
extracted Bytes sum) per source and dest namespace
That proves useful for self-monitoring, eg. to get the overhead of
netobserv's traffic compared to the rest of the traffic.

Also, setting a much higher cacheMaxSize for the ebpf agent, to reduce
traffic overhead.

Also raising limits for agent+FLP to 800MB
@jotak
Copy link
Member Author

jotak commented Sep 29, 2022

PR rebased

@mariomac are you thinking exposing that instead of having these new fields in the CR? (hence we would reject this PR?)
That's probably a topic we should discuss in meeting

We can also take a look at how the loki-operator works, seems it also allow overriding some settings, I'm curious if their approach differs

@mariomac
Copy link
Contributor

@jotak I'm not against exposing the values via CRD, but I think that at the mid term it's better to just allow overriding some internals, so we don't need to create a patch and release a new version every time we want to tweak some internals for a given customer.

However, allow overriding the FLP configuration might be complex, as it is not just a properties map but a structured and connected graph

@jotak
Copy link
Member Author

jotak commented Sep 29, 2022

/approve

@openshift-ci
Copy link

openshift-ci bot commented Sep 29, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 9e69ef2 into netobserv:main Sep 29, 2022
KalmanMeth pushed a commit to KalmanMeth/network-observability-operator that referenced this pull request Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants