-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune expressions for the meta index lookup #1433
Conversation
I'll review this one in practice, I still have a 1TB database with Zeek streaming JSON on my hard drive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified the performance bump: I've seen a reduction from 2s to 0.4s on a similar query. There seems to be no visible performance penalty for regular queries.
I can reproduce the ASan failures in vast-test
locally. Please fix and add a changelog entry, other than that LGTM.
750f8aa
to
890c6a7
Compare
I'm not sure whether I should add a changelog entry for this. |
@tobim if the performance gains are substantial, it's a nice change item. As a user, I always enjoy reading gains if they are specific and to the point (while broad claims are rather a turnoff). |
890c6a7
to
4c52e02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me, but please make the changelog entry either a Feature of a Change instead of introducing a new category.
We can take advantage of the fact that all string fields are covered by the same type level synopsis. To be more specific: A query like `suricata.smb.host == "foo" || suricata.ssh.host == "foo"` would check whether the string "foo" is included the same type-level string synopsis twice. This commit adds a preprocessing step that removes predicates with duplicate strings from the expression as a preprocessing step for meta index lookups.
cf66ce4
to
a9e7364
Compare
📔 Description
We can take advantage of the fact that all string fields are covered by the same type level synopsis. To be more specific: A query like
suricata.smb.host == "foo" || suricata.ssh.host == "foo"
would check whether the string "foo" is included the same type-level string synopsis twice.This PR adds a preprocessing step that removes predicates with duplicate strings from the expression as a preprocessing step for meta index lookups.
In my local comparisons with 12723 partitions with
eve.log
data I get the a 4 times speedup for the queryvast export null 'net.domain == "alhgeoafh" || net.hostname == "alhgeoafh"'
.A release build on master finishes the command in ~ 930 ms, this branch is done after ~ 240 ms.
📝 Checklist
🎯 Review Instructions
Try to reproduce my results.