-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(rfcs): Remap support for template strings #4905
Conversation
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, showing that it's actually possible (barely, currently, but with some modifications, more ergonomically) for remap-based templates to delete fields while they are being processed.
The one thing this relies on, is for Remap to have mutable access to the event. That's doable, but perhaps surprising in the context of templating? With the work in #4744, we could define which configuration fields support mutable access to the event and which don't, but it's definitely something we need to consider carefully.
I'm not a fan of the proposed alternatives, given the added (cognitive) complexity.
From reading this RFC, it also isn't entirely clear to me what this RFC is currently proposing.
It seems to me that the first step towards a solution for this is to support Remap for templating in general, regardless of the specific issue with deleting fields as is the case in the loki
sink.
I don't see anything written about the syntax used to use the Remap language in templates without breaking backward compatibility, or a discussion on which Remap features are available (e.g. mutating events, which functions do we expose, etc).
It feels as if the RFC skips several steps to end up discussing how to handle deleting fields, whereas the steps that come before this aren't as clear to me yet.
Is it worth discussing that in this RFC, or should we have a separate RFC first to discuss how to add Remap as an option for string templates?
This script, `if .foo { .bar } else { .baz }`, would result in all three | ||
fields being returned - `.foo`, `.bar` and `.baz`, and subsequently removed | ||
from the message sent to Loki. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems way too surprising, and to me, disqualifies this alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let's leave Loki's special case out of this RFC and handle it separately. I am not a fan of this, and it's likely we'll handle field deletion separately in an explicit manner. In the interim, users can continue to use the old templating until we solve this.
There are likely to be a number of edge cases that would need to be thought | ||
through if we took this approach. For example, if a field is used in a function | ||
but it's value is only used to influence the result - should it be included? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems like a no-go, again because there are way too many edge-cases and the added cognitive complexity it adds to the language.
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC itself in terms of details is going in the right direction, but I'm not convinced of the proposed solution yet (but also don't have one myself that isn't flawed in some way).
Curious what other people think.
Also, it seems we should just name this RFC "Support remap in string templates", as that's what's being discussed here (field lists and/or deleting those lazily is part of the proposal, but does not encapsulate everything in it).
So, we can assume that if (after trimming) the template within the `{{..}}` | ||
starts with a `.` or it contains a single bracket `{` or parentheses (`(..)`) | ||
the syntax will be Remap syntax and should be parsed as such. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this statement is accurate.
The following are all valid remap scripts (each line representing a new script):
if true { true } else { false }
.foo
$foo
"foo"
5 + 5
true
Since parsing happens at boot-time, and the existing templating logic is so simple, I would say the most straightforward solution is to try to parse the script as remap, and if it errors, try to parse it as a regular template.
If we want to be 100% sure we're backward compatible, we could instead parse as regular template first, and then as remap. To allow people to force remap, we could have a special syntax such as three {}
, e.g.:
{{ true }} // regular template, returning the `true` _field_
{{ .true }} // remap script, returning the `true` _field_
{{{ true }}} // remap script, returning the `true` _boolean_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be an idea to just forgo supporting both templates and remap using {{ }}
and just use templates for {{ }}
and remap for {{{ }}}
.
We can deprecate {{ }}
at some point, and after a few versions remove regular templates and have Remap work with both {{ }}
and {{{ }}}
(and potentially drop {{{ }}}
a few versions later)
fields are only removed from the event at the end. This differs from the `del` | ||
function in that `del` will remove the field immediately and it would not be | ||
available for use after that point. The order in which the template fields are | ||
calculated would have an impact on the final result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could update del
to take a lazy
boolean argument:
.foo = del(.bar, lazy = true)
Or add a del_lazy
function.
We'd then update the Object
trait as such:
/// Any object you want to map through the remap language has to implement this
/// trait.
pub trait Object {
/// Remove the given path from the object.
///
/// If `compact` is true, after deletion, if an empty object or array is
/// left behind, it should be removed as well.
+ ///
+ /// If `lazy` is true, the path should be deleted "lazily", by whatever
+ /// definition of "lazy" the callee determines.
- fn remove(&mut self, path: &str, compact: bool);
+ fn remove(&mut self, path: &str, compact: bool, lazy: bool);
// ...
}
For the loki
sink, it would first run all the templated labels.*
fields, and then afterward remove all fields marked with del(..., lazy = true)
(any calls to del
without the lazy flag would delete as usual, whenever del
is called).
I can't say I'm super excited about adding a lazy
flag, but I'm equally unenthusiastic about any mark
function.
Alternatively, we could just make all dell
calls lazy by default in the loki
context (or any context that configures the program to "lazily delete"), but that can also be confusing to users who didn't expect that to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice write-up!
I think I lean towards {{{ }}}
to indicate remap style templates to let people opt into it and avoid any ambiguity. As Jean notes, we can deprecate the simple {{ }}
templates and eventually let {{ }}
indicate a remap template too.
|
||
## Outstanding Questions | ||
|
||
- How important is it for the Loki sink to remove fields used in the template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose deprecating remove_label_fields
; allowing its use only with non-remap templates.
I think, going forward, a better way to handle options like remove_label_fields
is to let users use remap
in the component itself. That is, they'd do something like:
[sinks.loki]
type = "loki"
inputs = ["json"]
endpoint = "http://localhost:3100"
remove_label_fields = true
labels.key = "{{{ .foo }}}-{{{ .buzz }}}"
remap = """del(.foo, .buzz)"""
Additionally, I think this approach is more flexible by allowing users to keep some of the fields that are also in the label if they choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. This is a clearly superior solution 👍
I'd probably vote to have two config fields available in all components:
remap.before
remap.after
Both take a Remap script, and run either before or after the other logic of the component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably vote to have two config fields available in all components:
Curious what @lukesteensen thinks about this. We've had similar discussions for control flow:
[transforms.sampler]
type = "sampler"
[[if]]
condition = ".status == 200"
# regular component options here.
[[if]]
condition = ".status == 500"
# regular component options here.
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Ok, I have now updated this to specify:
I'm feeling good about this now. I think we have a pretty clean way forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @FungusHumungus.
The remap.before
and remap.after
global config fields has a fairly significant impact on Vector itself.
I would almost advocate we split that part out of this RFC into a new one, that describes all the intrinsic details and then ask more people to weigh in on that proposal (especially because it's also closely tied to using conditional logic in the Vector config itself, as @binarylogic mentioned).
Then, once that RFC is approved and implemented, we can just PR an update to the loki
sink to remove remove_label_fields
favouring remap.after
, and then finally we can create a separate (small) RFC to propose the new remap-inspired template fields using {{{ ... }}}
(probably just updating this one, and removing all references to loki
and the remap.before
stuff).
It's maybe a bit more involved than just getting this RFC in, but I'm not comfortable enough that we've covered our bases with the impact that remap.before/after
can have on Vector as a whole.
Having said that, if @binarylogic considers this unneeded, then I'm happy to see this RFC merged, as there's nothing blocking any more AFAICT.
type = "loki" | ||
inputs = ["json"] | ||
endpoint = "http://localhost:3100" | ||
remove_label_fields = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just drop this config straight away?
Even if people still use the original {{ ... }}
templating, they still can replace remove_label_fields
with remap.after
. This change would be a breaking change, but one we're going to make at some point regardless, so might as well get it out of the way.
|
||
### Add `remap.before` and `remap.after` fields | ||
|
||
To replicate this functionality we will add `remap.before` and `remap.after` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I've thought about remap.before
and remap.after
a bit more, and looking at this example, it's not so clear what remap.after
means in this context, and how it would relate to the same field in other components.
For the loki
sink, to have remap.after
be useful, it would have to fire before the data is sent to Loki, but after the labels.key
is applied.
However, for transforms or sources, remap.after
would just run "right before the transform/source is done with the event".
So, I guess remap.after
is described as
Runs before the end of the component. For sources and transforms this is before the event is passed to the next component, for sinks this is before the event is delivered by Vector to the configured sink.
For remap.before
, it would be similar, except that sources/sinks are swapped:
Runs right after the start of the component. For transforms and sinks this is before the event is processed by the given component, for sources this is after the event is received by Vector, but before the source has processed the event.
It requires a bit of explaining, but it does provide the most amount of flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think this is the most easily understandable way to model it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not entirely clear to me why we need two variants. For sources, it really only makes sense to modify the event after we've received it, and likewise, for sinks, we can only change it before we send it.
Transforms do have a before and after, but it seems confusing to wrap one transform with what's effectively just another transform. Do we have a good example of a transform that'd benefit from being able to have pre- or post-processing with remap without just chaining a remap transform? It seems to me that a single remap
option on sources and sinks should cover the vast majority of use cases.
Coming back to templating specifically, I'm not sure it makes sense to encourage users to use multiple different remap executions (one generic pre-sink and one per templated field) and try to explain their ordering. Why not just have one remap execution that happens first and leave the templating as simple field access?
Initially, we can just leave the options to remove templated fields as-is, but eventually, with the introduction of better event metadata, we could give remap access to that metadata and allow users to remove fields from the normal event data while still having access via metadata for templated fields.
I understand the desire to use remap for literally everything, but there is a reason that most templating solutions use a dedicated language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, there are considerations to be made here. We agreed that this RFC is not the place to introduce remap.before/after
or any other variant.
I'm sure whoever picks up that RFC work can expand on the use-case for before/after (or lack thereof).
(note, I talked with Stephen on Discord about this) Thanks for putting this together @FungusHumungus! Given the various issues (deleting
Given these 2 points, let's close this and see where that takes us. It's possible this is enough. |
Ref #4904
Readable version
Note, this RFC does suggest the simplest solution - just remove support for field lists in template fields. I'm not necessarily recommending that, I just figure it's best to start simple and work up from there!
Signed-off-by: Stephen Wakely fungus.humungus@gmail.com