Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Splunk HTTP Event Collector #85

Merged
merged 16 commits into from
Apr 12, 2024

Conversation

MaxGroot
Copy link
Contributor

This pull request adds support for the Splunk HTTP Event Collector (HEC) to the Splunk adapter. This is especially useful in scenarios where the already supported TCP Input is not available, for example in Splunk Cloud environments.

Sending data over HTTP Event Collector supports both sending the data in key=value format (like the TCP Data Input already uses) and a json format.

Something I'm still unsure of is whether to include the 'builtin' Record Fields such as _generated, _source, _classification and _version. Currently, the keyvalue format does not send these fields, but the json format does. This difference is caused by the usage of the JsonRecordPacker which includes these fields. I don't think it should be inconsistent between the two formats, but what is desired: including or omitting these builtin fields?

@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (09ed812) 80.20% compared to head (a53bccc) 81.25%.
Report is 1 commits behind head on main.

Files Patch % Lines
flow/record/adapter/splunk.py 92.68% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   80.20%   81.25%   +1.05%     
==========================================
  Files          33       33              
  Lines        3132     3239     +107     
==========================================
+ Hits         2512     2632     +120     
+ Misses        620      607      -13     
Flag Coverage Δ
unittests 81.25% <92.68%> (+1.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Show resolved Hide resolved
* Implement code review suggestions
* Add some comments and whitespaces for readability
* Move buffering into seperate method of SplunkWriter
* Write the record descriptor `_name` as a type for json sourcetype
* Update tests to account for the above
@MaxGroot
Copy link
Contributor Author

While working on the HTTP adapter it came to mind that there seems to be undesired behavior in the current Splunk adapter whenever a record has a field with the name type, as type is currently being used by the splunk adapter to denote the name property of the record's RecordDescriptor. For example,

In [1]: from flow.record import RecordDescriptor, RecordWriter
In [2]: descriptor = RecordDescriptor("my/record", [("string", "type")])
In [3]: record = descriptor(type="some_value")
In [4]: record
Out[4]: <my/record type='some_value'
In [6]: writer = RecordWriter("splunk://localhost:4444")
In [7]: writer.write(record)

This will currently yield the following input for Splunk:

netcat -lvp 4444
Listening on 0.0.0.0 4444
Connection received on localhost 1337
type="my/record" rdtag=None type="some_value"

As you can see, the type field is set twice, for two very different things.
I think the Splunk adapter should to the same thing for type as it currently does for tag. The 'meta' field should be prefixed with rd, which would yield the following input in Splunk:

rdtype="my/record" rdtag=None type="some_value"

This however might break existing Splunk dashboards that build on top of disssect. In my experience records from dissect rarely have a 'type' field, which might explain why things haven't broken down in the past. Having said that, I would still consider the above (current) behavior to be a bug, and the introduction of the HTTP collector support of this PR requires me to either also implement the behavior in this PR, or fix it now.

Do you think the record descriptor name should be sent as rdtype rather than type?

And if so, should it be implemented in a separate PR first, or should I implement it within this PR?

@Schamper
Copy link
Member

While working on the HTTP adapter it came to mind that there seems to be undesired behavior in the current Splunk adapter whenever a record has a field with the name type, as type is currently being used by the splunk adapter to denote the name property of the record's RecordDescriptor. For example,

In [1]: from flow.record import RecordDescriptor, RecordWriter
In [2]: descriptor = RecordDescriptor("my/record", [("string", "type")])
In [3]: record = descriptor(type="some_value")
In [4]: record
Out[4]: <my/record type='some_value'
In [6]: writer = RecordWriter("splunk://localhost:4444")
In [7]: writer.write(record)

This will currently yield the following input for Splunk:

netcat -lvp 4444
Listening on 0.0.0.0 4444
Connection received on localhost 1337
type="my/record" rdtag=None type="some_value"

As you can see, the type field is set twice, for two very different things. I think the Splunk adapter should to the same thing for type as it currently does for tag. The 'meta' field should be prefixed with rd, which would yield the following input in Splunk:

rdtype="my/record" rdtag=None type="some_value"

This however might break existing Splunk dashboards that build on top of disssect. In my experience records from dissect rarely have a 'type' field, which might explain why things haven't broken down in the past. Having said that, I would still consider the above (current) behavior to be a bug, and the introduction of the HTTP collector support of this PR requires me to either also implement the behavior in this PR, or fix it now.

Do you think the record descriptor name should be sent as rdtype rather than type?

And if so, should it be implemented in a separate PR first, or should I implement it within this PR?

Yes.

@yunzheng
Copy link
Member

yunzheng commented Nov 14, 2023

Something I'm still unsure of is whether to include the 'builtin' Record Fields such as _generated, _source, _classification and _version. Currently, the keyvalue format does not send these fields, but the json format does. This difference is caused by the usage of the JsonRecordPacker which includes these fields. I don't think it should be inconsistent between the two formats, but what is desired: including or omitting these builtin fields?

I would say _version is useless as the Splunk adapter has no reader support for deserialising it back using its original RecordDescriptor.

_generated could be useful to have imo.

_source can also be useful if people are using it. This is meant for extra metadata, for example you could put the disk image filename or path here which can be useful to determine where the record originated from.

_classification can also be useful if people are using it :) - eg: TLP:AMBER, TLP:CLEAR etc.

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2024

Codecov Report

Attention: Patch coverage is 94.57364% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 81.64%. Comparing base (9eb1557) to head (ce00f30).

Files Patch % Lines
flow/record/adapter/splunk.py 94.57% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   80.54%   81.64%   +1.10%     
==========================================
  Files          34       34              
  Lines        3197     3307     +110     
==========================================
+ Hits         2575     2700     +125     
+ Misses        622      607      -15     
Flag Coverage Δ
unittests 81.64% <94.57%> (+1.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Shows how `_source` and `source` are handled differently
@MaxGroot MaxGroot force-pushed the improvement/splunk-http-adapter branch from ff33136 to b5724c3 Compare March 31, 2024 12:34
flow/record/adapter/splunk.py Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
flow/record/adapter/splunk.py Outdated Show resolved Hide resolved
@yunzheng yunzheng merged commit 4a47670 into fox-it:main Apr 12, 2024
17 checks passed
@MaxGroot MaxGroot deleted the improvement/splunk-http-adapter branch April 13, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants