Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with inconsistent GenericMap output #297

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

jotak
Copy link
Member

@jotak jotak commented Mar 14, 2024

Description

The FLP GenericMap provided by the agent differs in types depending on the exporter being used, especially with DirectFLP each records size is smaller than when used with protobuf, due to protobuf using uint32 almost everywhere. This is not a problem in most cases as FLP generally converts that in JSON, but it breaks the FLP IPFIX export which expects strict types.

This commit brings more consistency between the FLP-direct mode and the protobuf mode, by sharing the same GenericMap encoding function. Which means the protobuf decode is now done in two steps: first proto-to-flow.Record, then flow.Record-to-GenericMap.

Dependencies

FLP update of the IPFIX exporter: netobserv/flowlogs-pipeline#634

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

The FLP GenericMap provided by the agent differs in types depending on
the exporter being used, especially with DirectFLP each records size is
smaller than when used with protobuf, due to protobuf using uint32
almost everywhere. This is not a problem in most cases as FLP generally
converts that in JSON, but it breaks the FLP IPFIX export which expects
strict types.

This commit brings more consistency between the FLP-direct mode and the
protobuf mode, by sharing the same GenericMap encoding function. Which
means the protobuf decode is now done in two steps: first
proto-to-flow.Record, then flow.Record-to-GenericMap.
@jotak jotak added the breaking-change This pull request has breaking changes. They should be described in PR description. label Mar 14, 2024
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 14, 2024
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:e1397a0

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=e1397a0 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 14, 2024
@jotak jotak marked this pull request as ready for review March 14, 2024 17:59
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 95.38462% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 33.17%. Comparing base (58d01d9) to head (dffe40a).
Report is 1 commits behind head on main.

Files Patch % Lines
pkg/decode/decode_protobuf.go 94.82% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #297      +/-   ##
==========================================
- Coverage   36.74%   33.17%   -3.57%     
==========================================
  Files          42       40       -2     
  Lines        3813     3587     -226     
==========================================
- Hits         1401     1190     -211     
+ Misses       2334     2322      -12     
+ Partials       78       75       -3     
Flag Coverage Δ
unittests 33.17% <95.38%> (-3.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


// FlowToPB is an auxiliary function to convert a single flow record, as returned by the eBPF agent,
// into a protobuf-encoded message ready to be sent to the collector via kafka
func FlowToPB(fr *flow.Record) *Record {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why did u move this file from export to here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also called from decode package now, and leaving it in export would create a cycle. The other option would be to move it to decode, I can do that if you prefer.

return net.IP(ip.GetIpv6())
}
n := ip.GetIpv4()
return net.IPv4(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it any cleaner like this

byteArray := make([]byte, 4)
binary.NativeEndian.PutUint32(byteArray, n)

or is it possible to maye it byte type in pb like ipv6 so we can avoid all of this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do your first suggestion (I'm not sure about the other, if the way it is was initially done for good reasons or not..)

But I guess I should either use LittleEndian or BigEndian but not Native, as that would make it arch-dependent (and thus fail on BE archs)

@msherif1234
Copy link
Contributor

any cpu impact for the additional processing with ur scale test ?

@msherif1234
Copy link
Contributor

why this is marked as breaking change ?

@jotak
Copy link
Member Author

jotak commented Mar 15, 2024

why this is marked as breaking change ?

That's a breaking change because the types in the GenericMap output change, as you can see here: https://github.com/netobserv/netobserv-ebpf-agent/pull/297/files#diff-5b98b198dc9b743f2c1c2d470d443d9ec2ed38b4ee15777cb857584ddacce657
For instance "Proto" was u8 in the inner model, cast as uint32 in protobuf, previously with the grpc exporter it was still uint32 in the GenericMap but now it's back to uint8 in that Map.

So, consumers who would cast explicitly to uint32 would now have to cast to uint8

@msherif1234
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Mar 15, 2024
jotak added a commit to jotak/flowlogs-pipeline that referenced this pull request Mar 15, 2024
Fix inconsistent maps received from the agent depending on the export
mode (grpc / kafka / direct-flp)

Depends on netobserv/netobserv-ebpf-agent#297
@jotak
Copy link
Member Author

jotak commented Mar 15, 2024

any cpu impact for the additional processing with ur scale test ?

Slight increase of CPU in one of the tests (0.167 -> 0.172) and memory as well (671MB -> 679 MB) (i picked the one run that seems more relevant as it showed a similar number of flows per seconds). I'd say it's in reasonable bounds...

@jotak jotak added no-doc This PR doesn't require documentation change on the NetObserv operator no-qe This PR doesn't necessitate QE approval labels Mar 18, 2024
@jotak
Copy link
Member Author

jotak commented Mar 18, 2024

/approve
marking no-qe as there's no visible output

Copy link

openshift-ci bot commented Mar 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak
Copy link
Member Author

jotak commented Mar 18, 2024

/retest-required

@jotak
Copy link
Member Author

jotak commented Mar 18, 2024

/retest

@jotak jotak merged commit 8a88717 into netobserv:main Mar 18, 2024
11 of 13 checks passed
@jotak jotak deleted the consistent-generic-map branch March 18, 2024 09:41
jotak added a commit to jotak/flowlogs-pipeline that referenced this pull request Mar 18, 2024
Fix inconsistent maps received from the agent depending on the export
mode (grpc / kafka / direct-flp)

Depends on netobserv/netobserv-ebpf-agent#297
jotak added a commit to netobserv/flowlogs-pipeline that referenced this pull request Mar 18, 2024
* Fix inconsistent maps

Fix inconsistent maps received from the agent depending on the export
mode (grpc / kafka / direct-flp)

Depends on netobserv/netobserv-ebpf-agent#297

* bump agent protobuf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved breaking-change This pull request has breaking changes. They should be described in PR description. lgtm no-doc This PR doesn't require documentation change on the NetObserv operator no-qe This PR doesn't necessitate QE approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants