Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(rfcs): Remap support for template strings #4905

Closed
wants to merge 5 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions rfcs/2020-11-06-4909-remap-template-fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# RFC 4909 - 2020-11-06 - Remap support for template strings

We would like to unify the templating syntax within configuration fields with
the Remap language.


## Scope

This RFC will look into ways we can use Remap whilst still supporting the
current method of templating fields (known throughout this RFC as the template
syntax).

## Motivation

Using the Remap language rather than the template syntax provides much greater
power in how these fields are defined.

The advantages of using Remap rather than the existing template syntax are:

- One familiar syntax and function reference for Vector.
- Access to all of remap's functions for templating.

However, we do still need to support the template syntax in order to maintain
backward compatability.


## Internal Proposal

There are two issues that need resolving to allow Remap to be used.

### Determine which syntax is being used

We need to determine if a template field is using Remap syntax or the
template syntax.

- With the template syntax each templated field will contain a single path - for
example `{{ message.field[2] }}`.

- With the Remap syntax it is possible to contain a single path, however this will
be prefixed by a `.` - eg. `{{ .message.field[2] }}`. Any more complex syntax
will be at minimum an `if` statement - eg. `{{ if .bar { .baz } else { .boo } }}`
or a function call - eg. `{{ replace(.bar, "foo", "bar) }}`

So, we can assume that if (after trimming) the template within the `{{..}}`
starts with a `.` or it contains a single bracket `{` or parentheses (`(..)`)
the syntax will be Remap syntax and should be parsed as such.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this statement is accurate.

The following are all valid remap scripts (each line representing a new script):

if true { true } else { false }
.foo
$foo
"foo"
5 + 5
true

Since parsing happens at boot-time, and the existing templating logic is so simple, I would say the most straightforward solution is to try to parse the script as remap, and if it errors, try to parse it as a regular template.

If we want to be 100% sure we're backward compatible, we could instead parse as regular template first, and then as remap. To allow people to force remap, we could have a special syntax such as three {}, e.g.:

{{ true }} 		// regular template, returning the `true` _field_
{{ .true }}  	// remap script, returning the `true` _field_
{{{ true }}} 	// remap script, returning the `true` _boolean_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be an idea to just forgo supporting both templates and remap using {{ }} and just use templates for {{ }} and remap for {{{ }}}.

We can deprecate {{ }} at some point, and after a few versions remove regular templates and have Remap work with both {{ }} and {{{ }}} (and potentially drop {{{ }}} a few versions later)



### Removing fields used in the template

A feature available with the template syntax is to be able to list all the fields
used within the template.

The Loki sink wants a list of the fields that were used in the
template. It uses this list to remove these fields from the event sent to Loki.

If you had this sink:

```toml
[sinks.loki]
type = "loki"
inputs = ["json"]
endpoint = "http://localhost:3100"
remove_label_fields = true
labels.key = "{{foo}}-{{buzz}}"
```

And this event was sent to it:

```json
{"foo": "bar", "buzz": "zab", "message1": "thing", "message2": "thong"}
```

The actual message sent to Loki would be:

```
{"message1": "thing", "message2": "thong"}
```

With the label `key = bar-zab` attached to the message. The fields `foo` and
`buzz` that were used in defining the label have been removed from the message.

It is not going to be possible to do this with the Remap syntax as it is much
more complicated and ambiguous to determine which fields are used to generate
the templated string. To replicate this functionality a remap function will be
needed to mark fields for deletion, call it `mark(field)` for now. The above
script could be written in the remap syntax as:

```
labels.key = """
{{
mark(.foo)
mark(.buzz)
.foo + "-" + .buzz
}}
"""
```

The function `mark` is technically not a mutable function, so it is safe to use
in template fields. The event is kept intact throughout the process and the
fields are only removed from the event at the end. This differs from the `del`
function in that `del` will remove the field immediately and it would not be
available for use after that point. The order in which the template fields are
calculated would have an impact on the final result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could update del to take a lazy boolean argument:

.foo = del(.bar, lazy = true)

Or add a del_lazy function.

We'd then update the Object trait as such:

/// Any object you want to map through the remap language has to implement this
/// trait.
pub trait Object {
    /// Remove the given path from the object.
    ///
    /// If `compact` is true, after deletion, if an empty object or array is
    /// left behind, it should be removed as well.
+   ///
+   /// If `lazy` is true, the path should be deleted "lazily", by whatever
+   /// definition of "lazy" the callee determines.
-   fn remove(&mut self, path: &str, compact: bool);
+   fn remove(&mut self, path: &str, compact: bool, lazy: bool);

	// ...
}

For the loki sink, it would first run all the templated labels.* fields, and then afterward remove all fields marked with del(..., lazy = true) (any calls to del without the lazy flag would delete as usual, whenever del is called).


I can't say I'm super excited about adding a lazy flag, but I'm equally unenthusiastic about any mark function.

Alternatively, we could just make all dell calls lazy by default in the loki context (or any context that configures the program to "lazily delete"), but that can also be confusing to users who didn't expect that to happen.



## Rationale

The benefits of using Remap in the template fields are:

- One familiar syntax and function reference for Vector.
- Access to all of remap's functions for templating.
- Less code to manage (once the old template fields are deprecated).


## Drawbacks

There could be an additional maintenance burden. The Remap syntax is more
complex which does mean there is more of a learning curve for the user and more
likelyhood that they make mistakes.


## Alternatives

### Do nothing

We do already have the existing template syntax. Perhaps we can stick with this.

The advantage of using Remap for these fields are that it allows more
flexibility in defining how the event is used. However, given that remap can be
used as a transform, should the user really need this they could put a Remap
transform in the process to process these fields so they can be easily used in
the template for the next phase.

### Use distinct syntax to distinguish between template and Remap syntax.

We could specify that double parentheses (`{{..}}`) are used for template
syntax (the current technique) and triple parentheses (`{{{...}}}`) are used
for Remap syntax.

This would provide a clear and unambiguous way to distinguish between the
syntaxes rather than rely on a set of heuristics.

### Returning fields at load time

Instead of requiring the user to mark the fields that are removed, Remap could
keep track of the fields used whilst running the script.

There are several options:

- *Keep track of all fields used in the script.*
This script, `if .foo { .bar } else { .baz }`, would result in all three
fields being returned - `.foo`, `.bar` and `.baz`, and subsequently removed
from the message sent to Loki.
Comment on lines +154 to +156
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems way too surprising, and to me, disqualifies this alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's leave Loki's special case out of this RFC and handle it separately. I am not a fan of this, and it's likely we'll handle field deletion separately in an explicit manner. In the interim, users can continue to use the old templating until we solve this.


- *Keep track of the fields used in the run path.*
The script, `if .foo { .bar } else { .baz }`,
would result in `.foo` and either `.bar` or `.baz` being returned.

If necessary, Remap could distinguish between fields that are used in the
condition and those used it the result, so only `.bar` or `.baz` could be
returned.

There are likely to be a number of edge cases that would need to be thought
through if we took this approach. For example, if a field is used in a function
but it's value is only used to influence the result - should it be included?
Comment on lines +166 to +168
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems like a no-go, again because there are way too many edge-cases and the added cognitive complexity it adds to the language.


```
replace(.field1, .field2, .field3)
```

The result is the value of `field1`, but with any occurrence of the value of
`field2` being replaced by `field3`. Which fields would be correct to include
in this list?


## Outstanding Questions

- How important is it for the Loki sink to remove fields used in the template
Copy link
Member

@jszwedko jszwedko Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose deprecating remove_label_fields; allowing its use only with non-remap templates.

I think, going forward, a better way to handle options like remove_label_fields is to let users use remap in the component itself. That is, they'd do something like:

[sinks.loki]
  type = "loki"
  inputs = ["json"]
  endpoint = "http://localhost:3100"
  remove_label_fields = true
  labels.key = "{{{ .foo }}}-{{{ .buzz }}}"
  remap = """del(.foo, .buzz)"""

Additionally, I think this approach is more flexible by allowing users to keep some of the fields that are also in the label if they choose.

Copy link
Contributor

@JeanMertz JeanMertz Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. This is a clearly superior solution 👍

I'd probably vote to have two config fields available in all components:

  • remap.before
  • remap.after

Both take a Remap script, and run either before or after the other logic of the component.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably vote to have two config fields available in all components:

Curious what @lukesteensen thinks about this. We've had similar discussions for control flow:

[transforms.sampler]
  type = "sampler"
  
  [[if]]
    condition = ".status == 200"
    # regular component options here.

  [[if]]
    condition = ".status == 500"
    # regular component options here.

from the event?

## Plan Of Attack

Incremental steps that execute this change. Generally this is in the form of:

- [ ] Submit a PR with spike-level code _roughly_ demonstrating the change.
- [ ] Incremental change #1
- [ ] Incremental change #2
- [ ] ...

Note: This can be filled out during the review process.