Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(rfcs): Remap support for template strings #4905

Closed
wants to merge 5 commits into from

Conversation

StephenWakely
Copy link
Contributor

@StephenWakely StephenWakely commented Nov 6, 2020

Ref #4904

Readable version

Note, this RFC does suggest the simplest solution - just remove support for field lists in template fields. I'm not necessarily recommending that, I just figure it's best to start simple and work up from there!

Signed-off-by: Stephen Wakely fungus.humungus@gmail.com

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, showing that it's actually possible (barely, currently, but with some modifications, more ergonomically) for remap-based templates to delete fields while they are being processed.

The one thing this relies on, is for Remap to have mutable access to the event. That's doable, but perhaps surprising in the context of templating? With the work in #4744, we could define which configuration fields support mutable access to the event and which don't, but it's definitely something we need to consider carefully.

I'm not a fan of the proposed alternatives, given the added (cognitive) complexity.

From reading this RFC, it also isn't entirely clear to me what this RFC is currently proposing.

It seems to me that the first step towards a solution for this is to support Remap for templating in general, regardless of the specific issue with deleting fields as is the case in the loki sink.

I don't see anything written about the syntax used to use the Remap language in templates without breaking backward compatibility, or a discussion on which Remap features are available (e.g. mutating events, which functions do we expose, etc).

It feels as if the RFC skips several steps to end up discussing how to handle deleting fields, whereas the steps that come before this aren't as clear to me yet.

Is it worth discussing that in this RFC, or should we have a separate RFC first to discuss how to add Remap as an option for string templates?

rfcs/2020-11-06-4909-remap-template-fields.md Outdated Show resolved Hide resolved
Comment on lines +100 to +102
This script, `if .foo { .bar } else { .baz }`, would result in all three
fields being returned - `.foo`, `.bar` and `.baz`, and subsequently removed
from the message sent to Loki.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems way too surprising, and to me, disqualifies this alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's leave Loki's special case out of this RFC and handle it separately. I am not a fan of this, and it's likely we'll handle field deletion separately in an explicit manner. In the interim, users can continue to use the old templating until we solve this.

Comment on lines +115 to +117
There are likely to be a number of edge cases that would need to be thought
through if we took this approach. For example, if a field is used in a function
but it's value is only used to influence the result - should it be included?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems like a no-go, again because there are way too many edge-cases and the added cognitive complexity it adds to the language.

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC itself in terms of details is going in the right direction, but I'm not convinced of the proposed solution yet (but also don't have one myself that isn't flawed in some way).

Curious what other people think.

Also, it seems we should just name this RFC "Support remap in string templates", as that's what's being discussed here (field lists and/or deleting those lazily is part of the proposal, but does not encapsulate everything in it).

Comment on lines 44 to 46
So, we can assume that if (after trimming) the template within the `{{..}}`
starts with a `.` or it contains a single bracket `{` or parentheses (`(..)`)
the syntax will be Remap syntax and should be parsed as such.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this statement is accurate.

The following are all valid remap scripts (each line representing a new script):

if true { true } else { false }
.foo
$foo
"foo"
5 + 5
true

Since parsing happens at boot-time, and the existing templating logic is so simple, I would say the most straightforward solution is to try to parse the script as remap, and if it errors, try to parse it as a regular template.

If we want to be 100% sure we're backward compatible, we could instead parse as regular template first, and then as remap. To allow people to force remap, we could have a special syntax such as three {}, e.g.:

{{ true }} 		// regular template, returning the `true` _field_
{{ .true }}  	// remap script, returning the `true` _field_
{{{ true }}} 	// remap script, returning the `true` _boolean_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be an idea to just forgo supporting both templates and remap using {{ }} and just use templates for {{ }} and remap for {{{ }}}.

We can deprecate {{ }} at some point, and after a few versions remove regular templates and have Remap work with both {{ }} and {{{ }}} (and potentially drop {{{ }}} a few versions later)

Comment on lines 101 to 104
fields are only removed from the event at the end. This differs from the `del`
function in that `del` will remove the field immediately and it would not be
available for use after that point. The order in which the template fields are
calculated would have an impact on the final result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could update del to take a lazy boolean argument:

.foo = del(.bar, lazy = true)

Or add a del_lazy function.

We'd then update the Object trait as such:

/// Any object you want to map through the remap language has to implement this
/// trait.
pub trait Object {
    /// Remove the given path from the object.
    ///
    /// If `compact` is true, after deletion, if an empty object or array is
    /// left behind, it should be removed as well.
+   ///
+   /// If `lazy` is true, the path should be deleted "lazily", by whatever
+   /// definition of "lazy" the callee determines.
-   fn remove(&mut self, path: &str, compact: bool);
+   fn remove(&mut self, path: &str, compact: bool, lazy: bool);

	// ...
}

For the loki sink, it would first run all the templated labels.* fields, and then afterward remove all fields marked with del(..., lazy = true) (any calls to del without the lazy flag would delete as usual, whenever del is called).


I can't say I'm super excited about adding a lazy flag, but I'm equally unenthusiastic about any mark function.

Alternatively, we could just make all dell calls lazy by default in the loki context (or any context that configures the program to "lazily delete"), but that can also be confusing to users who didn't expect that to happen.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write-up!

I think I lean towards {{{ }}} to indicate remap style templates to let people opt into it and avoid any ambiguity. As Jean notes, we can deprecate the simple {{ }} templates and eventually let {{ }} indicate a remap template too.


## Outstanding Questions

- How important is it for the Loki sink to remove fields used in the template
Copy link
Member

@jszwedko jszwedko Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose deprecating remove_label_fields; allowing its use only with non-remap templates.

I think, going forward, a better way to handle options like remove_label_fields is to let users use remap in the component itself. That is, they'd do something like:

[sinks.loki]
  type = "loki"
  inputs = ["json"]
  endpoint = "http://localhost:3100"
  remove_label_fields = true
  labels.key = "{{{ .foo }}}-{{{ .buzz }}}"
  remap = """del(.foo, .buzz)"""

Additionally, I think this approach is more flexible by allowing users to keep some of the fields that are also in the label if they choose.

Copy link
Contributor

@JeanMertz JeanMertz Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. This is a clearly superior solution 👍

I'd probably vote to have two config fields available in all components:

  • remap.before
  • remap.after

Both take a Remap script, and run either before or after the other logic of the component.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably vote to have two config fields available in all components:

Curious what @lukesteensen thinks about this. We've had similar discussions for control flow:

[transforms.sampler]
  type = "sampler"
  
  [[if]]
    condition = ".status == 200"
    # regular component options here.

  [[if]]
    condition = ".status == 500"
    # regular component options here.

@jamtur01 jamtur01 added domain: vrl Anything related to the Vector Remap Language transform: remap Anything `remap` transform related labels Dec 3, 2020
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
@StephenWakely
Copy link
Contributor Author

Ok, I have now updated this to specify:

  • {{{..}}} syntax for Remap template fields, {{.}} remains for old style template fields. We can unambiguously distinguish between the two.
  • Added remap.before and remap.after fields to the config for any processing that may need doing, including removing any fields from the event. This allows us to deprecate the remove_label_fields.

I'm feeling good about this now. I think we have a pretty clean way forward.

Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @FungusHumungus.

The remap.before and remap.after global config fields has a fairly significant impact on Vector itself.

I would almost advocate we split that part out of this RFC into a new one, that describes all the intrinsic details and then ask more people to weigh in on that proposal (especially because it's also closely tied to using conditional logic in the Vector config itself, as @binarylogic mentioned).

Then, once that RFC is approved and implemented, we can just PR an update to the loki sink to remove remove_label_fields favouring remap.after, and then finally we can create a separate (small) RFC to propose the new remap-inspired template fields using {{{ ... }}} (probably just updating this one, and removing all references to loki and the remap.before stuff).

It's maybe a bit more involved than just getting this RFC in, but I'm not comfortable enough that we've covered our bases with the impact that remap.before/after can have on Vector as a whole.

Having said that, if @binarylogic considers this unneeded, then I'm happy to see this RFC merged, as there's nothing blocking any more AFAICT.

type = "loki"
inputs = ["json"]
endpoint = "http://localhost:3100"
remove_label_fields = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just drop this config straight away?

Even if people still use the original {{ ... }} templating, they still can replace remove_label_fields with remap.after. This change would be a breaking change, but one we're going to make at some point regardless, so might as well get it out of the way.


### Add `remap.before` and `remap.after` fields

To replicate this functionality we will add `remap.before` and `remap.after`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've thought about remap.before and remap.after a bit more, and looking at this example, it's not so clear what remap.after means in this context, and how it would relate to the same field in other components.

For the loki sink, to have remap.after be useful, it would have to fire before the data is sent to Loki, but after the labels.key is applied.

However, for transforms or sources, remap.after would just run "right before the transform/source is done with the event".

So, I guess remap.after is described as

Runs before the end of the component. For sources and transforms this is before the event is passed to the next component, for sinks this is before the event is delivered by Vector to the configured sink.

For remap.before, it would be similar, except that sources/sinks are swapped:

Runs right after the start of the component. For transforms and sinks this is before the event is processed by the given component, for sources this is after the event is received by Vector, but before the source has processed the event.

It requires a bit of explaining, but it does provide the most amount of flexibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think this is the most easily understandable way to model it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear to me why we need two variants. For sources, it really only makes sense to modify the event after we've received it, and likewise, for sinks, we can only change it before we send it.

Transforms do have a before and after, but it seems confusing to wrap one transform with what's effectively just another transform. Do we have a good example of a transform that'd benefit from being able to have pre- or post-processing with remap without just chaining a remap transform? It seems to me that a single remap option on sources and sinks should cover the vast majority of use cases.

Coming back to templating specifically, I'm not sure it makes sense to encourage users to use multiple different remap executions (one generic pre-sink and one per templated field) and try to explain their ordering. Why not just have one remap execution that happens first and leave the templating as simple field access?

Initially, we can just leave the options to remove templated fields as-is, but eventually, with the introduction of better event metadata, we could give remap access to that metadata and allow users to remove fields from the normal event data while still having access via metadata for templated fields.

I understand the desire to use remap for literally everything, but there is a reason that most templating solutions use a dedicated language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, there are considerations to be made here. We agreed that this RFC is not the place to introduce remap.before/after or any other variant.

I'm sure whoever picks up that RFC work can expand on the use-case for before/after (or lack thereof).

@binarylogic binarylogic changed the title chore(rfcs): Supporting field lists with remap as template fields chore(rfcs): Remap support for template strings Jan 3, 2021
@binarylogic
Copy link
Contributor

(note, I talked with Stephen on Discord about this)

Thanks for putting this together @FungusHumungus! Given the various issues (deleting loki sink labels, using private data in templates), we are leaning towards doing nothing and reapproaching this later. There are a few reasons for this:

  1. We are working on adding event metadata (Event metadata RFC #5802) as part of the various e2e work. We will allow the reading of metadata in templates which solves the issue of using event data in templates that should not be encoded.
  2. If a user needs to use complex Remap functions to generate specific configuration values then they should do this in a remap transform before the sink. Not only does this simplify observability of Vector and readability of Vector configs, but it also simplifies templating. The user can just call the field ({{ special_field_value }}) directly. It is possible that we will allow users to set custom metadata values (see point 1) which would resolve the need to exclude these values when encoding the event.

Given these 2 points, let's close this and see where that takes us. It's possible this is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language transform: remap Anything `remap` transform related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants