-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group field caps response by index mapping hash #83494
Conversation
1eecca4
to
015e0e2
Compare
015e0e2
to
edef31b
Compare
Pinging @elastic/es-search (Team:Search) |
Hi @dnhatn, I've created a changelog YAML for you. |
Thank you, no worries! It's not urgent.
That's correct. I think this is good enough now. We compact responses from each cluster and between them on the coordinating node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one high-level question -- instead of what we do in this PR, would it be possible to detect duplicate mappings on the field caps coordinator, before sending all the node/ shard requests? Could this let us avoid sending duplicate requests in the first place, and keep the response merging logic simple?
Yes, we can apply your suggestion in RequestDispatcher for requests without |
@jtibshirani I can make your suggestion in a follow-up. |
I see, I forgot that this wouldn't work with |
Yes, that's correct, but we still have to deduplicate the response when sending it back to the local cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me, I left a few small comments. I also wondered if we could use the mapping hashes in TransportFieldCapabilitiesAction#merge
to avoid merging merging in the same information twice.
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesIndexResponse.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to look into the merge
optimization in a follow-up. This looks good to me. I know @javanna was hoping to take a look too when he had time.
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesIndexResponse.java
Show resolved
Hide resolved
@jtibshirani Thanks for your review. |
* upstream/master: (167 commits) Mute FrozenSearchableSnapshotsIntegTests#testCreateAndRestorePartialSearchableSnapshot Mute LdapSessionFactoryTests#testSslTrustIsReloaded Fix spotless violation from last commit Mute GeoGridTilerTestCase#testGeoGridSetValuesBoundingBoxes_UnboundedGeoShapeCellValues Small formatting clean up (elastic#84144) Always re-run Feature migrations which have encountered errors (elastic#83918) [DOCS] Clarify `orientation` usage for WKT and GeoJSON polygons (elastic#84025) Group field caps response by index mapping hash (elastic#83494) Shrink join queries in slow log (elastic#83914) TSDB: Reject the nested object fields that are configured time_series_dimension (elastic#83920) [DOCS] Remove note about partial response from Bulk API docs (elastic#84053) Allow regular data streams to be migrated to tsdb data streams. (elastic#83843) [DOCS] Fix `ignore_unavailable` parameter definition (elastic#84071) Make Metadata extend AbstractCollection (elastic#83791) Add API specs for OpenID Connect APIs Revert "Clean up for superuser role name references (elastic#83627)" (elastic#84096) Update Lucene analysis base url (elastic#84094) Avoid null threadContext in ResultDeduplicator (elastic#84093) Use static empty store files metadata (elastic#84034) Preserve context in snapshotDeletionListeners (elastic#84089) ... # Conflicts: # x-pack/plugin/rollup/build.gradle
This commit utilizes the index mapping hash to share the fields-caps for indices with the same index mapping to reduce the memory usage and the size of transport messages. Closes elastic#78665 Closes elastic#82879
Took me a while, but I ended up diving into this :) I think it's a great improvement. One question I have is whether we think we have enough test coverage around the changes we made to the serialization here. I see a different path depending on whether the mapping hash is available, starting from the version that supports it, and I wonder if we should have more tests to cover what happens when nodes with multiple versions that may or may not provide the mapping hash are involved. The logic seems more complex than a simple "serialize the field or not depending on the stream version" hence I get anxious. |
@javanna I am working on some BWC tests for field caps. I will open a PR when they are ready. |
thanks @dnhatn for pointing me to the randomized test, that' definitely a good test. What causes confusion for me is that I don't see where we call |
We have reduced the memory usage of field-caps requests targeting many indices in 8.2+ (see #83494). Unfortunately, we still receive OOM reports in 7.17. I think we should push some contained improvements to reduce the memory usage for those requests in 7.17. I have looked into several options. This PR reduces the memory usage of field-caps responses by replace HashMap with ArrayList for the field responses to eliminate duplicated string names and internal nodes of Map.
This commit utilizes the index mapping hash to share the fields-caps for indices with the same index mapping to reduce the memory usage and the size of transport messages.
Closes #78665
Closes #82879