-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser to pass through data to processors #15694
Comments
Hi, I believe our xpath parser can parse CBOR documents already. Have you looked at that? We added support after #13464 and resolved this in
In these types of cases what we have suggested is using the exec processor. You can post-process any additional fields using an external tool already. Let me know what you think about the xpath parser! |
Thanks, I had not noticed this. I looked into it, and I'm a bit confused about the documentation. To me, it looks like I have to know the structure of the data when it's incoming, and I cannot just pass all the data as fields / binary to parsers to post process (which is what I want). Also, if I would use this xpath_cbor-parser, I could not receive anything but cbor anymore from that particular input (if I have understood correctly). If the parser would just wrap the input bytes into e.g. base64, it could be passed to an external processor which could based on e.g. input's tags do proper processing. |
In this case, please use the execd processor. This is really what this is for. After an input, you can pass your data to an external processor to do whatever you want with the data, including parsing out a field. |
I'm using execd processor but as far as I know I cannot have one as input parser (at least that's how I read this doc) and the input (in my case cbor coming from mqtt) has to be parsed into metric before I can use execd processor . |
Correct - you can continue to use your mqtt input to produce the example metric that you provided in your oringal post. Then the processors are run, and telegraf sends that data to your processor to do whatever processing you want on that field. That processor returns the new/updated metric. |
Do you want me to create a pr for an input processor which enables this functionality, or do we continue to use a fork of telegraf which has this? |
I'm not following what your proposal is. In your original message:
We have both a parser to parse CBOR already and the execd processor to let you transform this data all you want. |
If I understood the documentation correctly, I'm not able to parse arbitarily formatted cbor into metric for later post-processing as well as if I would use the parser, I would not be able to also receive json from the same input. |
Correct, a parser only handles a single data format when parsing the data. Are you wanting to set up an mqtt input to parse two types of data both JSON and cbor messages? |
What might help is if you could provide a better description of the entire scenario of what you are trying to do. What data is coming in, what do you use to read it, and what processing do you plan to do with it. |
Yeah I might have been a bit unclear with the entire scenario. I have an mqtt input which has multiple topics. Part of the topic identifies what kind of data format the payload should be (E.g. Foo/json1, bar/json2, foo/cbor). I then need to parse and process content of these messages (with execd plugins) and ultimately output to influxdb (and maybe others). I have created a parser plugin which creates a metric with single string field. This field contains b64 encoded bytes form the input (mqtt) and is passed to the processors which based on the mqtt topic (tag) process the payload. Now as far as I know I cannot have execd plugin as parser and I cannot have a parser which accepts arbitrary binary and just puts it to a metric for later processing. I hope this clarifies my scenario. |
Thanks for the flow, I am follow along a bit better.
When you say passed to processors, these are telegraf processors plugins that are acting on it, or additional processors that your custom parser call? I want to ensure I understand the flow of the metrics and order of calling in your custom build. What if we added a base64 option to our value parser (#15697), which will take the message and encode it in base64 as the field. This sounds like what you are doing in your custom parser? |
To be a bit more specific we have a starlark processor which extracts data format as a tag from the mqtt topic (tag) and then we have multiple execd processors which have filters for data formats (one parses json1, one parses json2 and one parses cbor). Basically I also just took your value parser and removed some stuff and just put the input bytes to field after b64 encoding them to string. |
thanks for confirming the flow! It really does help understand what and how you are handling the data.
Ah ok! Would you be able to try out the artifacts in #15697 with the value parser and base64 data type? |
I can try it tomorrow when I get back to work. |
The parser is almost exactly what I made and almost works, but I think that for binary formats, using the stripped string and then re-encoding it, might lose data / not work (like with cbor): value = base64.StdEncoding.EncodeToString([]byte(vStr)) And I would suggest doing this instead: value = base64.StdEncoding.EncodeToString(buf) I changed this one line, recompiled, tested, and it worked now like my parser. This is can be replicated with e.g. this data: >>> import cbor2, base64
>>> cbor2.dumps({'foo': {'id': 217056256, 'data': [0, 0, 0, 40, 40, 0, 0, 0]}, 'timestamp': 1722494850.30815})
b'\xa2cfoo\xa2bid\x1a\x0c\xf0\x04\x00ddata\x88\x00\x00\x00\x18(\x18(\x00\x00\x00itimestamp\xfbA\xd9\xaa\xcb\xe0\x93\xb8\xbb'
>>> stripped = b'8AQAZGRhdGGIAAAAGCgYKAAAAGl0aW1lc3RhbXD7Qdmqy+CTuLs='
>>> buf = b'omNmb2+iYmlkGgzwBABkZGF0YYgAAAAYKBgoAAAAaXRpbWVzdGFtcPtB2arL4JO4uw=='
>>> cbor2.loads(base64.b64decode(stripped))
CBORSimpleValue(value=16)
>>> cbor2.loads(base64.b64decode(buf))
{'foo': {'id': 217056256, 'data': [0, 0, 0, 40, 40, 0, 0, 0]}, 'timestamp': 1722494850.30815} |
@juha-ylikoski would you please try to push your change to @powersj's branch? Not sure if you got the permissions... Please make sure you signed the CLA before your push! |
Only maintainers can push to PRs, so I'll take a look at this shortly. |
I've pushed an update and new artifacts will be up in 20-30mins. It uses the raw string as you suggested. Let me know! |
@powersj this seems to work as I would expect and does not trim the binary. |
Awesome, thank you for confirming! I'll get this landed. |
Use Case
I think it would be beneficial to be able to offload parsing of the data in inputs to external processor plugins. I have a use case where I would like to be able to parse cbor and transform it in processors before outputting it.
However, this currently would require me to write a parser plugin in go and to recompile telegraf with it instead of allowing me to write an external plugin like in case of processors.
Expected behavior
A parser which could read arbitary binary data into e.g. base64 encoded value which is passed into metric like:
Actual behavior
I need to write a custom cbor parser and recompile telegraf.
Additional info
I have written a parser like this, and I'm willing to create a pr for it if this project is willing to accept it.
The text was updated successfully, but these errors were encountered: