-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic source: load text from stored fields #87480
Conversation
Adds support for loading `text` fields that have `store: true`. We could likely load *any* stored fields, but I wanted to blaze the trail using something fairly useful.
Force push incoming to resolve merge conflicts. |
cloud deploy robot, please build me an image |
run elasticsearch-ci/part-2 |
|
||
/** | ||
* Write values for this document. | ||
*/ | ||
void write(XContentBuilder b) throws IOException; | ||
void write(FieldsVisitor fieldsVisitor, XContentBuilder b) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I actually need fieldsVisitor
here - I think advanceToDoc
can grab it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I wonder if we can avoid having it as a parameter in any of these methods and instead pass it StoredFieldSourceLoader implementations directly? Having a method param that is only used by a specific subset of implementations feels off to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in the ctor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could move it to the leaf
method pretty easily. But it's kind of tricky because you have to advance the state in a specific way. And holding on to a reference to the thing for a while feels like it is more "at a distance". Like, we take a docId
as a parameter, but we only use it if we're using doc values.
@elasticmachine retest this please |
@romseygeek i think this is ready for another round when you are ready for it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the API! I left a few questions.
throw new IllegalArgumentException( | ||
"field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it doesn't have doc values" | ||
); | ||
} | ||
if (fieldType().ignoreAbove() != Defaults.IGNORE_ABOVE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ignore_above
not work if stored=true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't store the field if it is above ignore_above
.
public abstract class SortedNumericDocValuesSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader { | ||
private final String name; | ||
private final String simpleName; | ||
private CheckedConsumer<XContentBuilder, IOException> writer = b -> {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads a bit weirdly to me, does it make more sense to leave write
as abstract and just overload it in the two implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are per-segment writers. I'll see if I can make it less janky.
|
||
private final String name; | ||
private final String simpleName; | ||
private CheckedConsumer<XContentBuilder, IOException> writer = b -> {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
server/src/main/java/org/elasticsearch/index/mapper/SourceLoader.java
Outdated
Show resolved
Hide resolved
&& kwd.hasNormalizer() == false | ||
&& kwd.fieldType().ignoreAbove() == KeywordFieldMapper.Defaults.IGNORE_ABOVE) { | ||
if (kwd.hasNormalizer() == false | ||
&& kwd.fieldType().ignoreAbove() == KeywordFieldMapper.Defaults.IGNORE_ABOVE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this work with ignore_above=true
and stored=true
on the keyword subfield?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal. We don't store the field if it is above ignore_above
.
@romseygeek , I pushed a patch to remove the weird:
thing. I think it's more like what we want when we want to support |
run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are more cleanups to do around stored field loading, but this is a great start. Thanks for all the iterations!
Woooh! Thanks for all the iterations too. I think we got something much nicer through them. |
I'll work on adding some docs for this after I cover |
When I added support for stored fields to synthetic _source (elastic#87480) I accidentally caused a performance regression. Our friends working on building the nightly charts for tsdb caught it. It looked like: ``` | 50th percentile latency | default_1k | 20.1228 | 41.289 | 21.1662 | ms | +105.18% | | 90th percentile latency | default_1k | 26.7402 | 42.5878 | 15.8476 | ms | +59.27% | | 99th percentile latency | default_1k | 37.0881 | 45.586 | 8.49786 | ms | +22.91% | | 99.9th percentile latency | default_1k | 43.7346 | 48.222 | 4.48742 | ms | +10.26% | | 100th percentile latency | default_1k | 46.057 | 56.8676 | 10.8106 | ms | +23.47% | ``` This fixes the regression and puts us in line with how we were: ``` | 50th percentile latency | default_1k | 20.1228 | 24.023 | 3.90022 | ms | +19.38% | | 90th percentile latency | default_1k | 26.7402 | 29.7841 | 3.04392 | ms | +11.38% | | 99th percentile latency | default_1k | 37.0881 | 36.8038 | -0.28428 | ms | -0.77% | | 99.9th percentile latency | default_1k | 43.7346 | 39.0192 | -4.71531 | ms | -10.78% | | 100th percentile latency | default_1k | 46.057 | 42.9181 | -3.13889 | ms | -6.82% | ``` A 20% bump in the 50% latency isn't great, but it four microseconds per document which is acceptable.
When I added support for stored fields to synthetic _source (#87480) I accidentally caused a performance regression. Our friends working on building the nightly charts for tsdb caught it. It looked like: ``` | 50th percentile latency | default_1k | 20.1228 | 41.289 | 21.1662 | ms | +105.18% | | 90th percentile latency | default_1k | 26.7402 | 42.5878 | 15.8476 | ms | +59.27% | | 99th percentile latency | default_1k | 37.0881 | 45.586 | 8.49786 | ms | +22.91% | | 99.9th percentile latency | default_1k | 43.7346 | 48.222 | 4.48742 | ms | +10.26% | | 100th percentile latency | default_1k | 46.057 | 56.8676 | 10.8106 | ms | +23.47% | ``` This fixes the regression and puts us in line with how we were: ``` | 50th percentile latency | default_1k | 20.1228 | 24.023 | 3.90022 | ms | +19.38% | | 90th percentile latency | default_1k | 26.7402 | 29.7841 | 3.04392 | ms | +11.38% | | 99th percentile latency | default_1k | 37.0881 | 36.8038 | -0.28428 | ms | -0.77% | | 99.9th percentile latency | default_1k | 43.7346 | 39.0192 | -4.71531 | ms | -10.78% | | 100th percentile latency | default_1k | 46.057 | 42.9181 | -3.13889 | ms | -6.82% | ``` A 20% bump in the 50% latency isn't great, but it four microseconds per document which is acceptable.
Adds support for loading
text
andkeyword
fields that havestore: true
. We could likely load any stored fields, but Iwanted to blaze the trail using something fairly useful.