-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(vrl): remote store/enrichment table RFC #20495
base: master
Are you sure you want to change the base?
Conversation
@jszwedko Can you take a look at this? |
Thanks @lsampras ! I was OOO last week, but we'll give this a review this week. |
Thank you for writing this. It is definitely something that could be quite useful. The biggest show stopper to getting this implemented is the lack of Existing enrichment tables work by loading the data into memory at startup. This would not be doable for these proposals given the need to also write to the database. I think there are several possible ways to move forward with this.
|
The main problem with any kind of async VRL function is unpredictable and likely extremely poor performance. Normal programming languages can just say that this is the responsibility of the user to manage, but the purpose of VRL is to be reliably fast, deterministic, etc. Strong constraints on what you're able to do is how we achieve that (see the lack of even things like basic loops). I would actually suggest a different pattern for this kind of read/write use case. Instead of a single instance of VRL that is trying to do both, I would split the reads and writes into different pipelines. The write path, rather than needing to write to an enrichment table, would do the appropriate transformations and then write to the datastore via a normal sink. The read path could then use an enrichment table to read from that same datastore. This would require support for enrichment tables backed by remote datastores, but that's a pretty natural extension of the existing feature. One important caveat is that we'd still require there to be some level of local caching of queries to the remote datastore, because we do not want to block event processing in VRL on network calls. If you need data that is perfectly up to date and not subject to stale caches like above, the other option would be to keep everything local to the vector instance. We could do this by introducing something very similar to the Finally, the escape hatch option is always Lua. You can do pretty much anything you want there and simply deal with the performance consequences as they come up. |
@lukesteensen @StephenWakely thanks for your reviews,
For now I can think of 2 ways to solve this
Would we be fine with any of this? While Lua serves as an escape hatch having some sort of native support would add more safety & performance here. |
I'm looking into Vector and was missing support for this. We are using memcached with logstash to normalise user names and to enrich records from our CMDB. It's simple, but it works like a charm and enables us to provide more context in our logs that are written to the SIEM. An additional transform sounds like a good idea and makes the user aware of potential performance pitfalls. |
This would also be very useful for me. |
chore(vrl): remote store/enrichment table RFC
Readable
External stores support for enrichment discussed here: #17195