Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add config composition RFC #4427

Merged
merged 4 commits into from
Oct 23, 2020
Merged

chore: add config composition RFC #4427

merged 4 commits into from
Oct 23, 2020

Conversation

lukesteensen
Copy link
Member

@lukesteensen lukesteensen commented Oct 7, 2020

Closes #3791

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@Hoverbear Hoverbear self-requested a review October 8, 2020 21:29
- [ ] Add `Vec<dyn TransformFn>` field to `Pipeline`
- [ ] Implement composed sources as facade that prepends relevant `TransformFn`
to `Pipeline` passed to `SourceConfig::build`
- [ ] Move `event_processed` internal events to topology wrappers instead of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, part of me wonders if there is an opportunity to have defined "primitive" and "meta" components, where meta components would just be two primitive components behind a config facade as described here.

We already have examples:

  • Sources like humio which are a splunk hec facade
  • Sinks that use the BatchedHttpSink which is itself a wrapper over HttpSink.
  • Swimlanes sorta fits this?

As you can see, this implies "meta" components can themselves contain "meta" components. 🤔

Greenfieldy stuff below

This could factor into how we do things like config parsing even. So folks could later define their own "meta" pipelines like json_parser -> regex -> geoip and have that macro'd, or even configurable. (Basically your step 4!)

Say like:

# /etc/vector/config.toml
[sources.my_meta_source]
type = "meta"
meta.definition = "/mnt/network/shared/etc/vector/meta/yell_at_canadians.toml"
inputs = ["k8s"]
ip_field = "address"

And then

# /mnt/network/shared/etc/vector/meta/yell_at_canadians.toml
[meta]
variables.ip_field = "string"

[transforms.geoip]
# ...
source = "{{ meta.variables.ip_field }}"

[transforms.lua]
# ... uppercase anything from input that has a Canadian IP

This would allow users to build small reproducable, configurable units. 🤔

these two reasons, I doubt that level (4) is worthwhile right now (this could
change when/if we have more data-driven config definition in general).

My proposal is that we initally focus on level (2) while collecting data on use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we have cases of 3 already, in that some sinks are a codec-ish -> batchedhttp -> http.

allow users to directly configure how to parse the incoming data as part of the
source config itself. With that feature implemented, it would be relatively
straightforward to do something similar to level (1) but expanding to both
a source and an included codec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider sinks when discussing codecs. :) Seems like that work is left for a pt 2 though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, codecs are basically a way for users to do something similar manually. Definitely related, but achieves a different goal.

Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
@Hoverbear Hoverbear marked this pull request as ready for review October 14, 2020 18:39
Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see this work get started. We can expand on this in pt 2.

@Hoverbear Hoverbear assigned bruceg and unassigned bruceg Oct 14, 2020
@Hoverbear Hoverbear requested a review from bruceg October 14, 2020 18:53
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Comment on lines +53 to +56
The main problem is that this does not mesh well with the config traits as they
currently exist and the API can be confusing. To do this properly would likely
involve deeper changes to the config traits to better support this kind of
staged building.
Copy link
Member

@bruceg bruceg Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may entail a layer between configuration and components. I believe this is similar to what has been discussed with a config "compiler".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good start to me! I agree that we'll get a lot of value just from level 2.

Just as another example, consuming CloudWatch Logs via the aws_kinesis_firehose source requires configuration like:

[sources.firehose]
  # General
  type = "aws_kinesis_firehose"
  address = "127.0.0.1:9000"
  access_key = "secret"

[transforms.cloudwatch]
  type = "aws_cloudwatch_logs_subscription_parser"
  inputs = ["firehose"]

# only if the user was logging JSON to CloudWatch Logs
[transforms.json]
  type = "json_parser"
  inputs = ["cloudwatch"]

It seems like level 2 could encompass simplify the first bit of this at least (the firehose source + cloudwatch logs parser).

@lukesteensen
Copy link
Member Author

It seems like level 2 could encompass simplify the first bit of this at least (the firehose source + cloudwatch logs parser).

We should actually be able to handle this whole use case since we can stack multiple transforms in the Pipeline we pass to the source.

@lukesteensen lukesteensen merged commit e8781b7 into master Oct 23, 2020
@lukesteensen lukesteensen deleted the config-rfc branch October 23, 2020 20:54
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Co-authored-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configuration macros v1 RFC
4 participants