Add support for Splunk HTTP Event Collector #85

MaxGroot · 2023-09-15T08:42:51Z

This pull request adds support for the Splunk HTTP Event Collector (HEC) to the Splunk adapter. This is especially useful in scenarios where the already supported TCP Input is not available, for example in Splunk Cloud environments.

Sending data over HTTP Event Collector supports both sending the data in key=value format (like the TCP Data Input already uses) and a json format.

Something I'm still unsure of is whether to include the 'builtin' Record Fields such as _generated, _source, _classification and _version. Currently, the keyvalue format does not send these fields, but the json format does. This difference is caused by the usage of the JsonRecordPacker which includes these fields. I don't think it should be inconsistent between the two formats, but what is desired: including or omitting these builtin fields?

codecov · 2023-09-15T08:44:08Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (09ed812) 80.20% compared to head (a53bccc) 81.25%.
Report is 1 commits behind head on main.

Files	Patch %	Lines
flow/record/adapter/splunk.py	92.68%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   80.20%   81.25%   +1.05%     
==========================================
  Files          33       33              
  Lines        3132     3239     +107     
==========================================
+ Hits         2512     2632     +120     
+ Misses        620      607      -13

Flag	Coverage Δ
unittests	`81.25% <92.68%> (+1.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

flow/record/adapter/splunk.py

* Implement code review suggestions * Add some comments and whitespaces for readability * Move buffering into seperate method of SplunkWriter * Write the record descriptor `_name` as a type for json sourcetype * Update tests to account for the above

MaxGroot · 2023-11-10T09:27:44Z

While working on the HTTP adapter it came to mind that there seems to be undesired behavior in the current Splunk adapter whenever a record has a field with the name type, as type is currently being used by the splunk adapter to denote the name property of the record's RecordDescriptor. For example,

In [1]: from flow.record import RecordDescriptor, RecordWriter
In [2]: descriptor = RecordDescriptor("my/record", [("string", "type")])
In [3]: record = descriptor(type="some_value")
In [4]: record
Out[4]: <my/record type='some_value'
In [6]: writer = RecordWriter("splunk://localhost:4444")
In [7]: writer.write(record)

This will currently yield the following input for Splunk:

netcat -lvp 4444
Listening on 0.0.0.0 4444
Connection received on localhost 1337
type="my/record" rdtag=None type="some_value"

As you can see, the type field is set twice, for two very different things.
I think the Splunk adapter should to the same thing for type as it currently does for tag. The 'meta' field should be prefixed with rd, which would yield the following input in Splunk:

rdtype="my/record" rdtag=None type="some_value"

This however might break existing Splunk dashboards that build on top of disssect. In my experience records from dissect rarely have a 'type' field, which might explain why things haven't broken down in the past. Having said that, I would still consider the above (current) behavior to be a bug, and the introduction of the HTTP collector support of this PR requires me to either also implement the behavior in this PR, or fix it now.

Do you think the record descriptor name should be sent as rdtype rather than type?

And if so, should it be implemented in a separate PR first, or should I implement it within this PR?

Schamper · 2023-11-14T13:42:45Z

While working on the HTTP adapter it came to mind that there seems to be undesired behavior in the current Splunk adapter whenever a record has a field with the name type, as type is currently being used by the splunk adapter to denote the name property of the record's RecordDescriptor. For example,
In [1]: from flow.record import RecordDescriptor, RecordWriter
In [2]: descriptor = RecordDescriptor("my/record", [("string", "type")])
In [3]: record = descriptor(type="some_value")
In [4]: record
Out[4]: <my/record type='some_value'
In [6]: writer = RecordWriter("splunk://localhost:4444")
In [7]: writer.write(record)
This will currently yield the following input for Splunk:
netcat -lvp 4444
Listening on 0.0.0.0 4444
Connection received on localhost 1337
type="my/record" rdtag=None type="some_value"
As you can see, the type field is set twice, for two very different things. I think the Splunk adapter should to the same thing for type as it currently does for tag. The 'meta' field should be prefixed with rd, which would yield the following input in Splunk:
rdtype="my/record" rdtag=None type="some_value"
This however might break existing Splunk dashboards that build on top of disssect. In my experience records from dissect rarely have a 'type' field, which might explain why things haven't broken down in the past. Having said that, I would still consider the above (current) behavior to be a bug, and the introduction of the HTTP collector support of this PR requires me to either also implement the behavior in this PR, or fix it now.

Do you think the record descriptor name should be sent as rdtype rather than type?

And if so, should it be implemented in a separate PR first, or should I implement it within this PR?

Yes.

yunzheng · 2023-11-14T22:48:30Z

Something I'm still unsure of is whether to include the 'builtin' Record Fields such as _generated, _source, _classification and _version. Currently, the keyvalue format does not send these fields, but the json format does. This difference is caused by the usage of the JsonRecordPacker which includes these fields. I don't think it should be inconsistent between the two formats, but what is desired: including or omitting these builtin fields?

I would say _version is useless as the Splunk adapter has no reader support for deserialising it back using its original RecordDescriptor.

_generated could be useful to have imo.

_source can also be useful if people are using it. This is meant for extra metadata, for example you could put the disk image filename or path here which can be useful to determine where the record originated from.

_classification can also be useful if people are using it :) - eg: TLP:AMBER, TLP:CLEAR etc.

Yes.

flow/record/adapter/splunk.py

codecov-commenter · 2024-03-31T12:05:27Z

Codecov Report

Attention: Patch coverage is 94.57364% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 81.64%. Comparing base (9eb1557) to head (ce00f30).

Files	Patch %	Lines
flow/record/adapter/splunk.py	94.57%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   80.54%   81.64%   +1.10%     
==========================================
  Files          34       34              
  Lines        3197     3307     +110     
==========================================
+ Hits         2575     2700     +125     
+ Misses        622      607      -15

Flag	Coverage Δ
unittests	`81.64% <94.57%> (+1.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Shows how `_source` and `source` are handled differently

flow/record/adapter/splunk.py

Add support for Splunk HTTP Event Collector

bb4dcbd

Schamper requested changes Sep 15, 2023

View reviewed changes

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

yunzheng requested changes Oct 17, 2023

View reviewed changes

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

flow/record/adapter/splunk.py Show resolved Hide resolved

yunzheng reviewed Oct 17, 2023

View reviewed changes

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

MaxGroot added 2 commits November 8, 2023 13:20

Merge branch 'main' into improvement/splunk-http-adapter

2512d75

Schamper and others added 2 commits January 16, 2024 13:02

Merge branch 'main' into improvement/splunk-http-adapter

6f42417

Yes.

d0edb84

Yes.

Schamper requested changes Feb 2, 2024

View reviewed changes

MaxGroot added 2 commits February 16, 2024 16:09

Implement code review suggestions

b7b7d38

Merge branch 'main' into improvement/splunk-http-adapter

a53bccc

Schamper requested changes Mar 5, 2024

View reviewed changes

flow/record/adapter/splunk.py Show resolved Hide resolved

MaxGroot added 2 commits March 31, 2024 11:08

Merge branch 'main' into improvement/splunk-http-adapter

3eae0f6

Add rd_ prefix to base fields, add 'source' to records sourcetype

5fbd455

Also add 'source' field in test

b5724c3

Shows how `_source` and `source` are handled differently

MaxGroot force-pushed the improvement/splunk-http-adapter branch from ff33136 to b5724c3 Compare March 31, 2024 12:34

Prefix reserved fields from flow.record with 'rd_'

81c1ad2

Schamper requested changes Apr 3, 2024

View reviewed changes

Implement code review suggestions

08a59fa

Schamper requested changes Apr 5, 2024

View reviewed changes

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

Apply suggestion

3abc7be

Schamper requested changes Apr 11, 2024

View reviewed changes

flow/record/adapter/splunk.py Outdated Show resolved Hide resolved

MaxGroot and others added 3 commits April 12, 2024 14:04

Switch from requests to httpx

c4319a1

Add splunk flavour to pyproject.toml

82f1e63

Merge branch 'main' into improvement/splunk-http-adapter

ce00f30

yunzheng approved these changes Apr 12, 2024

View reviewed changes

Schamper approved these changes Apr 12, 2024

View reviewed changes

yunzheng merged commit 4a47670 into fox-it:main Apr 12, 2024
17 checks passed

MaxGroot deleted the improvement/splunk-http-adapter branch April 13, 2024 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Splunk HTTP Event Collector #85

Add support for Splunk HTTP Event Collector #85

MaxGroot commented Sep 15, 2023

codecov bot commented Sep 15, 2023 •

edited

Loading

MaxGroot commented Nov 10, 2023

Schamper commented Nov 14, 2023

yunzheng commented Nov 14, 2023 •

edited

Loading

codecov-commenter commented Mar 31, 2024 •

edited

Loading

Add support for Splunk HTTP Event Collector #85

Add support for Splunk HTTP Event Collector #85

Conversation

MaxGroot commented Sep 15, 2023

codecov bot commented Sep 15, 2023 • edited Loading

Codecov Report

MaxGroot commented Nov 10, 2023

Schamper commented Nov 14, 2023

yunzheng commented Nov 14, 2023 • edited Loading

codecov-commenter commented Mar 31, 2024 • edited Loading

Codecov Report

codecov bot commented Sep 15, 2023 •

edited

Loading

yunzheng commented Nov 14, 2023 •

edited

Loading

codecov-commenter commented Mar 31, 2024 •

edited

Loading