Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic source: load text from stored fields #87480

Merged
merged 36 commits into from
Aug 17, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 7, 2022

Adds support for loading text and keyword fields that have
store: true. We could likely load any stored fields, but I
wanted to blaze the trail using something fairly useful.

Adds support for loading `text` fields that have `store: true`. We could
likely load *any* stored fields, but I wanted to blaze the trail using
something fairly useful.
@nik9000 nik9000 marked this pull request as draft June 7, 2022 19:28
@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2022

Force push incoming to resolve merge conflicts.

This was referenced Jun 7, 2022
@nik9000 nik9000 added >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics cloud-deploy Publish cloud docker image for Cloud-First-Testing labels Jun 7, 2022
@nik9000
Copy link
Member Author

nik9000 commented Jun 9, 2022

cloud deploy robot, please build me an image

@nik9000
Copy link
Member Author

nik9000 commented Jun 9, 2022

run elasticsearch-ci/part-2

@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

/**
* Write values for this document.
*/
void write(XContentBuilder b) throws IOException;
void write(FieldsVisitor fieldsVisitor, XContentBuilder b) throws IOException;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I actually need fieldsVisitor here - I think advanceToDoc can grab it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I wonder if we can avoid having it as a parameter in any of these methods and instead pass it StoredFieldSourceLoader implementations directly? Having a method param that is only used by a specific subset of implementations feels off to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in the ctor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move it to the leaf method pretty easily. But it's kind of tricky because you have to advance the state in a specific way. And holding on to a reference to the thing for a while feels like it is more "at a distance". Like, we take a docId as a parameter, but we only use it if we're using doc values.

@csoulios csoulios self-requested a review August 1, 2022 14:35
@nik9000 nik9000 requested a review from romseygeek August 1, 2022 15:15
@wandergeek wandergeek added cloud-deploy Publish cloud docker image for Cloud-First-Testing and removed cloud-deploy Publish cloud docker image for Cloud-First-Testing labels Aug 10, 2022
@wandergeek
Copy link

@elasticmachine retest this please

@nik9000
Copy link
Member Author

nik9000 commented Aug 15, 2022

@romseygeek i think this is ready for another round when you are ready for it!

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the API! I left a few questions.

throw new IllegalArgumentException(
"field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it doesn't have doc values"
);
}
if (fieldType().ignoreAbove() != Defaults.IGNORE_ABOVE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ignore_above not work if stored=true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't store the field if it is above ignore_above.

public abstract class SortedNumericDocValuesSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
private final String name;
private final String simpleName;
private CheckedConsumer<XContentBuilder, IOException> writer = b -> {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit weirdly to me, does it make more sense to leave write as abstract and just overload it in the two implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are per-segment writers. I'll see if I can make it less janky.


private final String name;
private final String simpleName;
private CheckedConsumer<XContentBuilder, IOException> writer = b -> {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

&& kwd.hasNormalizer() == false
&& kwd.fieldType().ignoreAbove() == KeywordFieldMapper.Defaults.IGNORE_ABOVE) {
if (kwd.hasNormalizer() == false
&& kwd.fieldType().ignoreAbove() == KeywordFieldMapper.Defaults.IGNORE_ABOVE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this work with ignore_above=true and stored=true on the keyword subfield?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal. We don't store the field if it is above ignore_above.

@nik9000
Copy link
Member Author

nik9000 commented Aug 16, 2022

@romseygeek , I pushed a patch to remove the weird:

private CheckedConsumer<XContentBuilder, IOException> writer = b -> {};

thing. I think it's more like what we want when we want to support ignore_above as well. And I think it's more readable. Have a look!

@nik9000
Copy link
Member Author

nik9000 commented Aug 16, 2022

run elasticsearch-ci/bwc

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are more cleanups to do around stored field loading, but this is a great start. Thanks for all the iterations!

@nik9000
Copy link
Member Author

nik9000 commented Aug 17, 2022

I think there are more cleanups to do around stored field loading, but this is a great start. Thanks for all the iterations!

Woooh! Thanks for all the iterations too. I think we got something much nicer through them.

@nik9000 nik9000 merged commit 79a8979 into elastic:main Aug 17, 2022
@nik9000
Copy link
Member Author

nik9000 commented Aug 17, 2022

I'll work on adding some docs for this after I cover ignore_above. The words will merge conflict otherwise.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Aug 24, 2022
When I added support for stored fields to synthetic _source (elastic#87480) I
accidentally caused a performance regression. Our friends working on
building the nightly charts for tsdb caught it. It looked like:
```
|   50th percentile latency | default_1k | 20.1228 | 41.289  | 21.1662  | ms | +105.18% |
|   90th percentile latency | default_1k | 26.7402 | 42.5878 | 15.8476  | ms |  +59.27% |
|   99th percentile latency | default_1k | 37.0881 | 45.586  |  8.49786 | ms |  +22.91% |
| 99.9th percentile latency | default_1k | 43.7346 | 48.222  |  4.48742 | ms |  +10.26% |
|  100th percentile latency | default_1k | 46.057  | 56.8676 | 10.8106  | ms |  +23.47% |
```

This fixes the regression and puts us in line with how we were:
```
|   50th percentile latency | default_1k | 20.1228 | 24.023  |  3.90022 | ms |  +19.38% |
|   90th percentile latency | default_1k | 26.7402 | 29.7841 |  3.04392 | ms |  +11.38% |
|   99th percentile latency | default_1k | 37.0881 | 36.8038 | -0.28428 | ms |   -0.77% |
| 99.9th percentile latency | default_1k | 43.7346 | 39.0192 | -4.71531 | ms |  -10.78% |
|  100th percentile latency | default_1k | 46.057  | 42.9181 | -3.13889 | ms |   -6.82% |
```

A 20% bump in the 50% latency isn't great, but it four microseconds per
document which is acceptable.
nik9000 added a commit that referenced this pull request Aug 26, 2022
When I added support for stored fields to synthetic _source (#87480) I
accidentally caused a performance regression. Our friends working on
building the nightly charts for tsdb caught it. It looked like:
```
|   50th percentile latency | default_1k | 20.1228 | 41.289  | 21.1662  | ms | +105.18% |
|   90th percentile latency | default_1k | 26.7402 | 42.5878 | 15.8476  | ms |  +59.27% |
|   99th percentile latency | default_1k | 37.0881 | 45.586  |  8.49786 | ms |  +22.91% |
| 99.9th percentile latency | default_1k | 43.7346 | 48.222  |  4.48742 | ms |  +10.26% |
|  100th percentile latency | default_1k | 46.057  | 56.8676 | 10.8106  | ms |  +23.47% |
```

This fixes the regression and puts us in line with how we were:
```
|   50th percentile latency | default_1k | 20.1228 | 24.023  |  3.90022 | ms |  +19.38% |
|   90th percentile latency | default_1k | 26.7402 | 29.7841 |  3.04392 | ms |  +11.38% |
|   99th percentile latency | default_1k | 37.0881 | 36.8038 | -0.28428 | ms |   -0.77% |
| 99.9th percentile latency | default_1k | 43.7346 | 39.0192 | -4.71531 | ms |  -10.78% |
|  100th percentile latency | default_1k | 46.057  | 42.9181 | -3.13889 | ms |   -6.82% |
```

A 20% bump in the 50% latency isn't great, but it four microseconds per
document which is acceptable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud-deploy Publish cloud docker image for Cloud-First-Testing >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants