-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add config composition RFC #4427
Conversation
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
- [ ] Add `Vec<dyn TransformFn>` field to `Pipeline` | ||
- [ ] Implement composed sources as facade that prepends relevant `TransformFn` | ||
to `Pipeline` passed to `SourceConfig::build` | ||
- [ ] Move `event_processed` internal events to topology wrappers instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, part of me wonders if there is an opportunity to have defined "primitive" and "meta" components, where meta components would just be two primitive components behind a config facade as described here.
We already have examples:
- Sources like humio which are a splunk hec facade
- Sinks that use the
BatchedHttpSink
which is itself a wrapper overHttpSink
. - Swimlanes sorta fits this?
As you can see, this implies "meta" components can themselves contain "meta" components. 🤔
Greenfieldy stuff below
This could factor into how we do things like config parsing even. So folks could later define their own "meta" pipelines like json_parser
-> regex
-> geoip
and have that macro'd, or even configurable. (Basically your step 4!)
Say like:
# /etc/vector/config.toml
[sources.my_meta_source]
type = "meta"
meta.definition = "/mnt/network/shared/etc/vector/meta/yell_at_canadians.toml"
inputs = ["k8s"]
ip_field = "address"
And then
# /mnt/network/shared/etc/vector/meta/yell_at_canadians.toml
[meta]
variables.ip_field = "string"
[transforms.geoip]
# ...
source = "{{ meta.variables.ip_field }}"
[transforms.lua]
# ... uppercase anything from input that has a Canadian IP
This would allow users to build small reproducable, configurable units. 🤔
these two reasons, I doubt that level (4) is worthwhile right now (this could | ||
change when/if we have more data-driven config definition in general). | ||
|
||
My proposal is that we initally focus on level (2) while collecting data on use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we have cases of 3
already, in that some sinks are a codec-ish -> batchedhttp -> http
.
allow users to directly configure how to parse the incoming data as part of the | ||
source config itself. With that feature implemented, it would be relatively | ||
straightforward to do something similar to level (1) but expanding to both | ||
a source and an included codec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also consider sinks when discussing codecs. :) Seems like that work is left for a pt 2 though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, codecs are basically a way for users to do something similar manually. Definitely related, but achieves a different goal.
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see this work get started. We can expand on this in pt 2.
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
The main problem is that this does not mesh well with the config traits as they | ||
currently exist and the API can be confusing. To do this properly would likely | ||
involve deeper changes to the config traits to better support this kind of | ||
staged building. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may entail a layer between configuration and components. I believe this is similar to what has been discussed with a config "compiler".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good start to me! I agree that we'll get a lot of value just from level 2.
Just as another example, consuming CloudWatch Logs via the aws_kinesis_firehose
source requires configuration like:
[sources.firehose]
# General
type = "aws_kinesis_firehose"
address = "127.0.0.1:9000"
access_key = "secret"
[transforms.cloudwatch]
type = "aws_cloudwatch_logs_subscription_parser"
inputs = ["firehose"]
# only if the user was logging JSON to CloudWatch Logs
[transforms.json]
type = "json_parser"
inputs = ["cloudwatch"]
It seems like level 2 could encompass simplify the first bit of this at least (the firehose source + cloudwatch logs parser).
We should actually be able to handle this whole use case since we can stack multiple transforms in the |
Signed-off-by: Luke Steensen <luke.steensen@gmail.com> Co-authored-by: Ana Hobden <operator@hoverbear.org> Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Closes #3791