limit memory used by FLP #376

KalmanMeth · 2023-02-05T09:34:01Z

The memory usage of FLP seems to grow linearly with the number of concurrent connections being handled at the same time.
For encode_prom, state is saved (both in FLP and in the prometheus Go client) for each metric/labels combination that is reported. This results in memory growth that is linear with respect to the number of connections currently identified as active. In a configuration with limited memory resources it is desirable to instruct FLP to limit the amount of memory usage.
The tradeoff would be to not report on new connections once the memory limit is hit.

The conntrack stage memory usage also grows with the number of connections and should have the possibility of limiting the number of connections tracked.

openshift-ci · 2023-02-05T09:34:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kalmanmeth by writing /assign @kalmanmeth in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov-commenter · 2023-02-05T09:43:00Z

Codecov Report

Merging #376 (23d38f3) into main (bb058e1) will increase coverage by 0.19%.
The diff coverage is 88.05%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #376      +/-   ##
==========================================
+ Coverage   61.71%   61.91%   +0.19%     
==========================================
  Files          91       91              
  Lines        5835     5873      +38     
==========================================
+ Hits         3601     3636      +35     
- Misses       2014     2016       +2     
- Partials      220      221       +1

Flag	Coverage Δ
unittests	`61.91% <88.05%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/api/conntrack.go	`87.00% <ø> (ø)`
pkg/api/encode_prom.go	`100.00% <ø> (ø)`
pkg/pipeline/encode/encode_prom.go	`77.77% <66.66%> (+0.41%)`	⬆️
pkg/test/utils.go	`73.91% <81.81%> (+1.25%)`	⬆️
pkg/pipeline/extract/aggregate/aggregates.go	`90.19% <100.00%> (ø)`
pkg/pipeline/extract/conntrack/conntrack.go	`92.94% <100.00%> (+0.17%)`	⬆️
pkg/pipeline/utils/timed_cache.go	`95.74% <100.00%> (+0.14%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ronensc

LGTM

pkg/pipeline/encode/prom_cache_test.go

pkg/pipeline/utils/timed_cache.go

ronensc · 2023-02-08T09:51:55Z

pkg/pipeline/extract/conntrack/conntrack.go

+			if (ct.config.MaxConnectionsTracked > 0) && (ct.config.MaxConnectionsTracked <= ct.connStore.mom.Len()) {
+				log.Warningf("too many connections; skipping flow log %v: ", fl)
+				ct.metrics.inputRecords.WithLabelValues("discarded").Inc()
+				continue


This continue will skip also over the if ct.shouldOutputFlowLogs which I think should be executed:

flowlogs-pipeline/pkg/pipeline/extract/conntrack/conntrack.go

Lines 99 to 105 in 8753a3f

if ct.shouldOutputFlowLogs {

record := fl.Copy()

addHashField(record, computedHash.hashTotal)

addTypeField(record, api.ConnTrackOutputRecordTypeName("FlowLog"))

outputRecords = append(outputRecords, record)

ct.metrics.outputRecords.WithLabelValues("flowLog").Inc()

}

One option would be to bring that "if" before if !exists. Maybe you could find a better solution.

updated code to address suggestion.

ronensc · 2023-02-08T09:53:15Z

pkg/pipeline/extract/conntrack/conntrack.go

@@ -67,6 +67,11 @@ func (ct *conntrackImpl) Extract(flowLogs []config.GenericMap) []config.GenericM
 		}
 		conn, exists := ct.connStore.getConnection(computedHash.hashTotal)
 		if !exists {
+			if (ct.config.MaxConnectionsTracked > 0) && (ct.config.MaxConnectionsTracked <= ct.connStore.mom.Len()) {
+				log.Warningf("too many connections; skipping flow log %v: ", fl)
+				ct.metrics.inputRecords.WithLabelValues("discarded").Inc()


+1 for tracking the discarded flowlogs

pkg/api/conntrack.go

ronensc · 2023-02-08T10:02:07Z

pkg/api/extract_aggregate.go

@@ -23,6 +23,6 @@ type AggregateOperation string
 type AggregateDefinition struct {
 	Name          string             `yaml:"name,omitempty" json:"name,omitempty" doc:"description of aggregation result"`
 	GroupByKeys   AggregateBy        `yaml:"groupByKeys,omitempty" json:"groupByKeys,omitempty" doc:"list of fields on which to aggregate"`
-	OperationType AggregateOperation `yaml:"operationType,omitempty" json:"operationType,omitempty" doc:"sum, min, max, avg or raw_values"`
+	OperationType AggregateOperation `yaml:"operationType,omitempty" json:"operationType,omitempty" doc:"sum, min, max, count, avg or raw_values"`


Good catch!

ronensc

LGTM

add maxMetrics to encode_prom

d18d039

KalmanMeth linked an issue Feb 5, 2023 that may be closed by this pull request

Allow to limit the memory used by FLP #374

Closed

added test for maxMetrics

a68131d

KalmanMeth requested a review from ronensc February 5, 2023 13:29

ronensc approved these changes Feb 6, 2023

View reviewed changes

pkg/pipeline/encode/prom_cache_test.go Outdated Show resolved Hide resolved

pkg/pipeline/utils/timed_cache.go Show resolved Hide resolved

openshift-ci bot assigned ronensc Feb 6, 2023

openshift-ci bot added the lgtm label Feb 6, 2023

addressed reviewer comments

e4a6f62

openshift-ci bot removed the lgtm label Feb 6, 2023

limited number of connections tracked in conntrack

0228ed4

KalmanMeth changed the title ~~add maxMetrics to encode_prom~~ limit memory used by FLP Feb 7, 2023

KalmanMeth added 2 commits February 7, 2023 12:39

added MaxConnectionsTracked to test result

2b7cb4b

added to metric of discarded flow logs

8753a3f

KalmanMeth requested review from jotak, eranra and ronensc February 8, 2023 08:53

ronensc reviewed Feb 8, 2023

View reviewed changes

addressed review comments

23d38f3

ronensc approved these changes Feb 8, 2023

View reviewed changes

openshift-ci bot added the lgtm label Feb 8, 2023

KalmanMeth merged commit b6979fe into netobserv:main Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit memory used by FLP #376

limit memory used by FLP #376

KalmanMeth commented Feb 5, 2023 •

edited

Loading

openshift-ci bot commented Feb 5, 2023

codecov-commenter commented Feb 5, 2023 •

edited

Loading

ronensc left a comment

ronensc Feb 8, 2023

KalmanMeth Feb 8, 2023

ronensc Feb 8, 2023

ronensc Feb 8, 2023

ronensc left a comment

	if ct.shouldOutputFlowLogs {
	record := fl.Copy()
	addHashField(record, computedHash.hashTotal)
	addTypeField(record, api.ConnTrackOutputRecordTypeName("FlowLog"))
	outputRecords = append(outputRecords, record)
	ct.metrics.outputRecords.WithLabelValues("flowLog").Inc()
	}

limit memory used by FLP #376

limit memory used by FLP #376

Conversation

KalmanMeth commented Feb 5, 2023 • edited Loading

openshift-ci bot commented Feb 5, 2023

codecov-commenter commented Feb 5, 2023 • edited Loading

Codecov Report

ronensc left a comment

Choose a reason for hiding this comment

ronensc Feb 8, 2023

Choose a reason for hiding this comment

KalmanMeth Feb 8, 2023

Choose a reason for hiding this comment

ronensc Feb 8, 2023

Choose a reason for hiding this comment

ronensc Feb 8, 2023

Choose a reason for hiding this comment

ronensc left a comment

Choose a reason for hiding this comment

KalmanMeth commented Feb 5, 2023 •

edited

Loading

codecov-commenter commented Feb 5, 2023 •

edited

Loading