-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splunk Metrics serializer #4339
Splunk Metrics serializer #4339
Conversation
in a format that can be consumed by a Splunk metrics index. Can be used with any output including file or HTTP. Please see the docs in README.md
obj["_value"] = v | ||
|
||
dataGroup.Event = "metric" | ||
dataGroup.Time = float64(metric.Time().UnixNano() / int64(s.TimestampUnits)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you're intending to do here, but this won't result in the correct behavior. https://play.golang.org/p/VEDBzbF0j0e
If you want to do math between 2 ints, and get a float result, you need to convert the int to float before the arithmetic.
return "", err | ||
} | ||
|
||
metricString = string(metricJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picking on the string vs byte slice thing some more, you're converting a byte slice to a string, and then later on converting that string back into a byte slice (up on line 50). It would use less memory to keep it as a byte slice.
} | ||
|
||
for _, m := range objects { | ||
serialized = serialized + m + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if serialized
were a byte slice. Every time you append to a string, a new string has to be allocated. If your batches are large (in terms of count or size), this can suck up a lot of memory. With byte slice, an allocation is performed only if the length of the slice exceeds the capacity.
Ditto for other places where this is stored in a string (e.g. line 23).
Also I'm not seeing much point of the objects
slice. You should be able to append to serialized
directly.
return []byte(serialized), nil | ||
} | ||
|
||
func (s *serializer) SerializeBatch(metrics []telegraf.Metric) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SerializeBatch
seems like it's just a reinvention of json.Encoder
. It might be better to use the existing standard lib functionality. It should also simplify your code a lot.
|
||
for _, metric := range metrics { | ||
m, err := s.createObject(metric) | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error should be logged somewhere when not nil. Otherwise metrics are going to be getting dropped and the user won't have any idea why.
plugins/serializers/registry.go
Outdated
@@ -73,6 +74,8 @@ func NewSerializer(config *Config) (Serializer, error) { | |||
serializer, err = NewGraphiteSerializer(config.Prefix, config.Template, config.GraphiteTagSupport) | |||
case "json": | |||
serializer, err = NewJsonSerializer(config.TimestampUnits) | |||
case "splunkmetric": | |||
serializer, err = NewSplunkmetricSerializer(config.TimestampUnits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splunk only supports seconds. The serializer should not be using config.TimestampUnits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splunk absolutely support sub second timestamps. From the default datetime.xml:
<define name="_utcepoch" extract="utcepoch, subsecond">
<!-- update regex before '2017' -->
<text><![CDATA[((?<=^|[\s#,"=\(\[\|\{])(?:1[012345]|9)\d{8}|^@[\da-fA-F]{16,24})(?:\.?(\d{1,6}))?(?![\d\(])]]></text>
</define>```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't say it doesn't. It only supports seconds as the unit. It supports more than that as the precision. Unit != precision.
|
||
func (s *serializer) createObject(metric telegraf.Metric) (metricString string, err error) { | ||
|
||
/* Splunk supports one metric per line and has the following required names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not entirely accurate. Splunk's http event collector is not line-oriented. It's object oriented. You can shove multiple JSON objects on a single line and it's happy to consume them.
See this example: https://docs.splunk.com/Documentation/Splunk/7.1.1/Data/HTTPEventCollectortokenmanagement#Send_multiple_events_to_HEC_in_one_request
{"event": "Pony 1 has left the barn"}{"event": "Pony 2 has left the barn"}{"event": "Pony 3 has left the barn", "nested": {"key1": "value1"}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it's not line oriented, but it doesn't support reading an array of JSON objects as metrics. This is OK:
{"event": "Pony 1 has left the barn"}{"event": "Pony 2 has left the barn"}{"event": "Pony 3 has left the barn", "nested": {"key1": "value1"}}
This is not OK:
[{"event": "Pony 1 has left the barn"}{"event": "Pony 2 has left the barn"}{"event": "Pony 3 has left the barn", "nested": {"key1": "value1"}}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that developer documentation should be accurate. We shouldn't tell the developers it's "one metric per line" if it's not.
** metric_name: The name of the metric | ||
** _value: The value for the metric | ||
** _time: The timestamp for the metric | ||
** All other index fields become deminsions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't think this is a good structure to follow. This will prevent using this serializer with pretty much every single input plugin, as none of them follow this format. I think the implementation over on #4185 has a much more flexible implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is wrong, it should be time
, not _time
(I'll fix), but metric_name
and _value
are required fields for the metrics store. Naming the fields this way means you don't need custom props.conf or transforms.conf. In fact this is the same naming convention that is used in the PR you reference. The point of this serializer is to format the metrics into this format so that it can be used with the generic, well tested, HTTP output. Or the file output if you're running telegraf on a machine that is running a Splunk forwarder.
These updates cleanup a lot of the handling of the data and allows to configure if the output should be in a HEC compatible format or a file/stdout format. Additional documentation files also updated.
@phemmer Thank you for the comprehensive review, I appreciate all of the feedback and believe I have addressed all/most of the issues with this latest commit. I know that there were some PRs for Splunk HEC outputs, but this was specifically written as a serializer so that you could have Splunk compatible metrics generated for any reasonable output (e.g. file or HTTP.) Furthermore, it will allow you to easily redirect different types of metrics to different metrics indexes, which the output based PRs do not allow for. The output based PRs also require that you have/deploy a HEC to be used to get metrics into the Splunk infrastructure vs. being able to use existing infrastructure (forwarders) to get the data into a metrics index. |
Looks like it doesn't work properly with the
my config:
|
… numeric. Some inputs return non-numeric values (e.g. zookeeper returns a version string) Splunk requires that metrics be numeric, but previously the serializer would drop the entire input vs. just the offending metric. Now just drop the offencing key/value and improve error logging: 2018-07-20T02:37:00Z E! Can not parse value: 3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b for key: version
This looks great, and 👍 for leveraging the existing http plugin. Can we get this merged and into an official release? |
docs/DATA_FORMATS_OUTPUT.md
Outdated
|
||
```toml | ||
[[outputs.http]] | ||
# ## URL is the address to send metrics to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the initial #
through line 252
|
||
```toml | ||
[[outputs.http]] | ||
# ## URL is the address to send metrics to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, leading #
isn't necessary
An example configuration of a file based output is: | ||
|
||
```toml | ||
# # Send telegraf metrics to file(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same #
|
||
import ( | ||
"encoding/json" | ||
// "errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this commented import
m, err := s.createObject(metric) | ||
if err != nil { | ||
log.Printf("E! [serializer.splunkmetric] Dropping invalid metric: %v [%v]", metric, m) | ||
return []byte(""), err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantic, but maybe return nil, err
here
Fixup README files as requested in review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should unify the documentation into just a single place. Let's link from the DATA_FORMATS_OUTPUT documents to the README and sometime before 1.8 is final we can move the other serializers documentation to follow suite.
@danielnelson For the sake of clarity, you want me to remove the text in DATA_FORMATS_OUTPUT and replace it with a link to the README. Possibly merging some of the text from that I put in the DATA_FORMAT_OUTPUT file into the README if additional clarity is required. |
That's right, thank you |
…plunkmetric README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on the docs, thanks for taking care of that for me. I took another look over the pull request and noticed a few additional things we should discuss before merging:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md | ||
data_format = "splunkmetric" | ||
## Provides time, index, source overrides for the HEC | ||
hec_routing = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this variable splunk_hec_routing
, due to how the parser variables are injected directly into the plugin we have started prefixing the variables. Long term, these will probably become tables and the parsers will become more independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep things consistent, lets call it splunkmetric_hec_routing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do.
|
||
m, err := s.createObject(metric) | ||
if err != nil { | ||
log.Printf("D! [serializer.splunkmetric] Dropping invalid metric: %v [%v]", metric, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should return this as an error: return nil, fmt.Errorf("Dropping invalid metric: %s", metric.Name())
for _, metric := range metrics { | ||
m, err := s.createObject(metric) | ||
if err != nil { | ||
log.Printf("D! [serializer.splunkmetric] Dropping invalid metric: %v [%v]", metric, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return error. If there are situations where a metric cannot be serialized but it is not an error, have createObject return nil, nil
and check if m is nil before appending.
for k, v := range metric.Fields() { | ||
|
||
if !verifyValue(v) { | ||
log.Printf("D! Can not parse value: %v for key: %v", v, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should just continue on the next field. Otherwise we will never be able to serialize a metric with a string field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will include in the other changes being made.
|
||
dataGroup := HECTimeSeries{} | ||
|
||
for k, v := range metric.Fields() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use metric.FieldList() so that no allocation is performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the refactoring mentioned below (thank for the suggestion)
|
||
dataGroup.Event = "metric" | ||
// Convert ns to float seconds since epoch. | ||
dataGroup.Time = float64(metric.Time().UnixNano()) / float64(1000000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric.Time().Unix()
will give you unix time in seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want seconds, I want ms but as a float (this is Splunk's spec...) e.g. 1536295485.123
} | ||
|
||
func (s *serializer) createObject(metric telegraf.Metric) (metricJson []byte, err error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do a bad job of explaining what is guaranteed in a telegraf.Metric, and actually as I write this I notice some additional checks I need to add.
Here is a brief rundown of things you may or may not need to check:
- metric name may be an empty string
- zero or more tags, tag keys are any non-empty string, tag values may be empty strings
- zero, yes zero, or more fields, field keys are any non-empty string, field values may be any float64 (including NaN, +/-Inf),int64,uint64,string,bool
- time is any time.Time.
The part about tag/field keys not being empty strings is not true right now, but after writing this I am going to ensure this is the case in 1.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this actually caused me to re-examine the the serializer with several different inputs, and I found a case in which metrics were lost (dropped), so I'm refactoring some of the code to deal with that. (Also, it's been a crazy week...so hope to get this done over the next few days.)
@@ -0,0 +1,139 @@ | |||
# Splunk Metrics serialzier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling
@ronnocol We are hoping to release 1.8.0-rc1 on Wednesday afternoon, let me know if you think that could be a problem for this pr. |
Tested http input to Splunk 7.1.2 (w/ hec routing) Verified output of file output.
@danielnelson I believe I have resolved all of your requested changes as well as resolved an issue where metrics might have been dropped. I know that you want to cut an RC1 on Wednesday, I believe that this serializer is ready to be included in that. |
This serializer properly formats the metrics data according to the Splunk metrics JSON specification, it can be used with any output that supports serializers, including file or HTTP. I decided that just formatting the data according to the Splunk spec was a better investment than managing the whole output pipeline. Now you can use telegraf without requiring a HEC (by using a file output), or you can use the well tested and maintained HTTP output and just set the extra headers required by the HEC.
Please see the docs in README.md
Required for all PRs: