Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add
doc_values
for fields that need to be sorted or aggregated in ElasticSearch, and disable all others. #12782Add
doc_values
for fields that need to be sorted or aggregated in ElasticSearch, and disable all others. #12782Changes from all commits
e5ce5f9
947544f
dc0ce3a
a2ecc10
349525a
d1aa343
6847818
c0bc965
554244f
511b1fb
9700b10
43b51cb
91e48fb
e0e27fa
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does alarm start time need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fields used for sorting and aggregation need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me one potential issue here, if we restrict to only add fields used in this repo, it might break third parties’s custom plugin if they add more features to their own plugin by aggregating/sorting some fields that we didn’t enable doc_values. The extensibility will be restricted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will. Generally, it may be worth to see how much benefit we will get from this change. Could you try a benchmark about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But to be honest, we never guarantee users could read data from elasticsearch on their own, we only guarantee from our GraphQL/PromQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanahmily can you share where you find disabling
doc_values
can speed up the query performance?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, he was talking about BanyanDB, and the concept was from this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key use case is when we talk about
_traffic
indices and trace/log indices. Conditions for that, is not used for sorting and aggregation. So, we could reduce the payload.In the original issue, I only proposed three use cases. Nothing more in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I’m wondering is this, what “payload” are we trying to reduce, as mentioned, disabling
doc_values
is mainly for reducing disk space, I don’t see how it would speed up in terms of query performance like you said here #12782 (comment). If reducing disk space is our goal, disabling all possible fields will maximize the outcome, that’s why I tend to disable doc_values by default and opt in those fields that need this feature.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing disk space is a benefit, and
speeding up
is just from smaller files/indices perspective, which is BanyanDB side asking about.Sorry for misleading.