-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-unique keys in Kafka output headers #30369
Add support for non-unique keys in Kafka output headers #30369
Conversation
According to the Kafka documentation header keys are not supposed to be unique, therefore we must support the headers in the same way. Also, documented the headers configuration, so customers can start using it.
This pull request does not have a backport label. Could you fix it @rdner? 🙏
NOTE: |
/test |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM. However, I am on the fence about this change. It would be nice if we had a similar configuration for all outputs. In the Elasticsearch output, we use the previous format, you just changed: https://github.com/elastic/beats/blob/main/libbeat/_meta/config/output-elasticsearch.reference.yml.tmpl#L44-L46
Can't we provide multiple values in one header by separating them with a comma? Example: My-Header: first-value, second-value
.
@kvch I don't think we can separate values with a comma, how does one put a value that contains a comma then? |
My idea is based on the RFC of the HTTP protocol:
According to the RFC values that contain commas are quoted:
Ref: https://datatracker.ietf.org/doc/html/rfc7231 If Kafka treats headers the same way as HTTP does, there is no need for your PR. Users can already configure multiple values for the same keys. But I haven't really found any information about how Kafka treats such headers. That is the bedrock of my question in my previous comment.
Right. Our configuration should not reflect how Kafka itself is configured. We should be consistent throughout our own configuration. For example, letting people configure headers the same way in all outputs regardless of how the output themselves implement headers. That should not concern our users. |
Also, even if Kafka does not support it, we can still implement the comma-separated list approach. |
@kvch I'm not sure I understand the comparison with HTTP, Kafka headers are not related to HTTP headers in any manner. Let's say I, as a user, want to set the following headers for each message sent to Kafka: {
"key": "first-key",
"value": "1,2,3,4"
},
{
"key": "second-key",
"value": "value1"
},
{
"key": "second-key",
"value": "value2"
} These headers should be attached to the message as it's represented above in the same order without any transformations. If we don't support this config structure, it's not possible to satisfy the following requirements:
We cannot just change how this feature works on our side if we say this parameter is Kafka headers. It's going to be very confusing for our customers that they cannot do the same thing as any Kafka client does. For me personally, the argument of having consistent Kafka headers configuration with HTTP headers in Elasticsearh is quite weak, I don't see any reason why they should be related and consistent, they are different things. |
The requirements I listed were taken from the initial design document for Kafka Headers:
|
I understand that Kafka headers are not related to HTTP headers. I was just curious if Kafka allowed setting values in headers similarly to HTTP headers. The documentation you referred to was quite short and did not provide much info about what a value can be or about the ordering, etc. But thanks for explaining it in detail. The headers can also have a schema. Is that something we should support? |
Well, there was no request from the community yet. The headers feature was a community contribution (linked PR). UPD |
LGTM, good catch in making this consistent with the spec. |
…into feature/use-with-kind-k8s-env * 'feature/use-with-kind-k8s-env' of github.com:v1v/beats: (52 commits) ci: home is declared within withBeatsEnv ci: use withKindEnv step ci: use getBranchesFromAliases and support next-patch-8 (elastic#30400) Update fields.yml (elastic#29609) Heartbeat: fix browser metrics and trace mappings (elastic#30258) Apply light edits to 8.0 changelog (elastic#30351) packetbeat/beater: make sure Npcap installation runs before interfaces are needed (elastic#30396) Add a ring-buffer reporter to libbeat (elastic#28750) Osquerybeat: Add install verification for osquerybeat (elastic#30388) update windows matrix support (elastic#30373) Refactor of metricbeat process-gathering metrics and system/process (elastic#30076) adjust next changelog wording (elastic#30371) [Metricbeat] azure: move event report into loop validDim loop (elastic#29945) fix: report GitHub Check before the cache (elastic#30372) Add support for non-unique keys in Kafka output headers (elastic#30369) ci: 6 major branch reached EOL (elastic#30357) reduce Elastic Agent shut down time by stopping processes concurrently (elastic#29650) [Filebeat] Add message to register encode/decode debug logs (elastic#30271) [libbeat] kafka message header support (elastic#29940) Heartbeat: set duration to zero for syntax errors (elastic#30227) ...
What does this PR do?
According to the Kafka documentation, header keys are not supposed to
be unique, therefore we must support the headers in the same way.
Also, documented the headers configuration, so customers can start using it.
This is a follow up to #29940
Why is it important?
Initially, it was a community PR hence for someone this functionality is important. This PR brings the implementation closer to the original Kafka feature as the documentation states:
source https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/connect/header/Header.html
Checklist
- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Run Kafka locally in its standard configuration
Run filebeat with the following configuration:
(replace
/path/to/your/file.log
with you real path)/path/to/your/file.log
For example, when I appended a line "test message" to my file I got this output:
Mind the headers in the beginning of the line.
Related issues