-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add config composition RFC #4427
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# RFC 3791 - 2020-10-06 - Composing Components: Part 1 | ||
|
||
Vector is designed to be very modular, and the current tool for composing those | ||
modules is the TOML config file. This gives users a great deal of flexibility, | ||
but it can require configurations that are a bit verbose and require more of | ||
users than other pre-built, specific solutions. | ||
|
||
One way that Vector could get some of the best of both worlds would be to make | ||
it easy to create pre-built "chunks" of config that users could configure as | ||
normal components. These would be bundles of lower-level components wired | ||
together with adjusted default values for the specific use case. | ||
|
||
## Scope | ||
|
||
This RFC focuses on enabling rapid development of "composed" sources (e.g. NGINX | ||
logs) within our existing architecture. A more complete solution for composing | ||
arbitrary components is deferred to a later RFC. | ||
|
||
## Motivation | ||
|
||
We need a way to quickly assemble Vector components that address specific use | ||
cases. This will allow us to improve ease of use without spending significant | ||
development time on each individual use case. It will allow us to focus | ||
development time on reuseable components without forcing users to do the work of | ||
assembling them from scratch. | ||
|
||
## Internal Proposal | ||
|
||
There are multiple levels at which we could implement this type of | ||
functionality: | ||
|
||
1. Manually implement new component as config facade over one existing component | ||
2. Manually implement new component as config facade over one source and one | ||
codec transform | ||
3. Manually implement new component as config expanding to arbitrary pipeline of | ||
components | ||
4. Automatically derive new component from data describing arbitrary pipeline of | ||
components | ||
|
||
We currently are at level (1), where we can do things like implement the Humio | ||
sink as a wrapper around the existing Splunk HEC sink. | ||
|
||
The next simplest is level (2). While it's not implemented yet, we do have | ||
existing plans to introduce the idea of a codec attached to sources. This would | ||
allow users to directly configure how to parse the incoming data as part of the | ||
source config itself. With that feature implemented, it would be relatively | ||
straightforward to do something similar to level (1) but expanding to both | ||
a source and an included codec. | ||
|
||
Level (3) becomes more complicated. We currently have a limited ability for | ||
transforms to expand to multiple transforms via `TransformConfig::expand`, and | ||
this could theoretically be generalized to include sources and sinks as well. | ||
The main problem is that this does not mesh well with the config traits as they | ||
currently exist and the API can be confusing. To do this properly would likely | ||
involve deeper changes to the config traits to better support this kind of | ||
staged building. | ||
Comment on lines
+53
to
+56
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this may entail a layer between configuration and components. I believe this is similar to what has been discussed with a config "compiler". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, exactly. |
||
|
||
Finally, layer (4) would allow defining these compositions via TOML instead of | ||
Rust code. This is somewhat similar to the idea of snippets that has been | ||
floated previously, but with a few key differences. The main one is that they | ||
would be built directly into Vector at compile time instead of loaded at | ||
runtime. This means they would need to be integrated into our build process and | ||
changing them would require recompiling Vector. They would also require | ||
a sufficiently general composition API to be exposed via TOML, which would be | ||
difficult to come up with for such a wide variety of potential pipelines. For | ||
these two reasons, I doubt that level (4) is worthwhile right now (this could | ||
change when/if we have more data-driven config definition in general). | ||
|
||
My proposal is that we initally focus on level (2) while collecting data on use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like we have cases of |
||
cases that require level (3). It is my assumption that the largest number of | ||
these types of composed components will be similar to the example of the NGINX | ||
source. We will want to combine an existing source (file) with an existing | ||
transform (regex or grok parser) and provide NGINX-specific default values for | ||
each. Focusing on these simpler cases will dramatically decrease how much | ||
complexity we need to add before being able to reap the value. | ||
|
||
## Rationale | ||
|
||
This set of changes unblocks the most user-facing value with the least required | ||
investment, and it does so without compromising future plans for deeper | ||
architectural changes. | ||
|
||
## Plan of Attack | ||
|
||
- [ ] Implement `TransformFn` from the [Architecture | ||
RFC](https://github.com/timberio/vector/blob/master/rfcs/2020-06-18-2625-architecture-revisit.md), | ||
switch non-task transforms to it | ||
- [ ] Add `Vec<dyn TransformFn>` field to `Pipeline` | ||
- [ ] Implement composed sources as facades that prepend the relevant `TransformFn` | ||
to the `Pipeline` passed to `SourceConfig::build` | ||
- [ ] Move `event_processed` internal events to topology wrappers instead of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yay! |
||
components themselves to avoid double counting or incorrect tagging (likely | ||
within `impl Transform for TransformFn` for now) | ||
|
||
Then later we can choose to push towards level (3) as needed: | ||
|
||
- [ ] Make `TransformConfig::expand` into first-class stage, splitting the | ||
existing config `build` methods | ||
- [ ] Allow new expansion stage to work for all components, not just transforms | ||
- [ ] Consider introducing more fine-grained internal component types designed | ||
to be composed into user-facing sources, transforms, and sinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also consider sinks when discussing codecs. :) Seems like that work is left for a pt 2 though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, codecs are basically a way for users to do something similar manually. Definitely related, but achieves a different goal.