Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to lucene-9.0.0-snapshot-32a0a16aff0 #73324

Merged
merged 93 commits into from
Sep 21, 2021

Conversation

romseygeek
Copy link
Contributor

This commit upgrades elasticsearch 8.0 to use a snapshot of
Lucene 9.0.0.

@romseygeek romseygeek added >enhancement :Analytics/Aggregations Aggregations :Search/Search Search-related issues that do not fall into other categories :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 labels May 24, 2021
@romseygeek romseygeek self-assigned this May 24, 2021
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels May 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@romseygeek
Copy link
Contributor Author

There are a few nocommits, but surprisingly few. Everything compiles.

Still to do:

  • I have silently remapped the SIMPLEFS store type to just use NIOFS. I'm not sure if this is correct and we will just deprecate the option, or if it should throw an error.
  • The Version 7.14 constant has the wrong lucene version on it, because we haven't released 8.9 yet
  • There are a bunch of classes around segment memory accounting that I have left in but are now essentially redundant, eg RamAccountingRefreshListener and SegmentStats. Segment itself also now has unused memoryInBytes and ramTree fields, which I am guessing should be removed but will need some careful BWC handling.
  • CuckooFilter uses PackedInts.getReader() and save(), both of which are now gone. Again this was being used as part of the Streamable impl and so will need BWC to be addressed when this is reimplemented.

@dnhatn
Copy link
Member

dnhatn commented May 25, 2021

I have silently remapped the SIMPLEFS store type to just use NIOFS. I'm not sure if this is correct and we will just deprecate the option, or if it should throw an error.

I think we should deprecate the simplefs store option in 7.x and remove it in 8.x.

@iverase
Copy link
Contributor

iverase commented Jun 1, 2021

Hey @romseygeek,

I have been looking into the CuckooFilter issue. I think we should have our or version of PackedInts.Mutable, but before anything we need to improve our bwc testing for that part of the code. It seems it is never exercise so we can easily introduce a bug: #73585

@dnhatn
Copy link
Member

dnhatn commented Jul 16, 2021

@romseygeek We need to build a new Lucene 9.0 snapshot as the current snapshot doesn't have the version 8.9.0, which is being used in 7.14 and 7.15.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have only reviewed the bits affected for the change on endianness of Lucene directory API. I think we should revisit the strategy on KeyStoreWrapper and MetadataStateFormat but it can be done as a follow up.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through with a focus on the new vector format. I left minor comments and also noticed we need to update LuceneFilesExtensions to contain the vector index file .vex.

It'd also be good to extend the disk usage analyzer to cover vectors, but this seems best as a follow-up.

@@ -187,6 +189,16 @@ public NumericDocValues getNormValues(String field) throws IOException {
throw new UnsupportedOperationException();
}

@Override
public VectorValues getVectorValues(String field) throws IOException {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems best to throw UnsupportedOperationException here.

@@ -258,6 +247,48 @@ public StoredFieldsReader clone() {

}

private static class FilterStoredFieldVisitor extends StoredFieldVisitor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new private class when we already have one in org.elasticsearch.search.internal?

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed all files and I have a question, but changes look great.

@@ -22,6 +22,8 @@ grant codeBase "${codebase.elasticsearch-secure-sm}" {
grant codeBase "${codebase.elasticsearch}" {
// needed for loading plugins which may expect the context class loader to be set
permission java.lang.RuntimePermission "setContextClassLoader";
// needed for SPI class loading
permission java.lang.RuntimePermission "accessDeclaredMembers";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce the scope of this permission?

Copy link
Contributor Author

@romseygeek romseygeek Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easily, because it's needed by the PluginsService (which used to farm this requirement out to lucene but can't any longer because lucene has removed the class). @rjernst and I have talked very briefly about it, and I see that you've added it to the follow-up list on #74057

@romseygeek
Copy link
Contributor Author

I looked through with a focus on the new vector format. I left minor comments and also noticed we need to update LuceneFilesExtensions to contain the vector index file .vex.

It'd also be good to extend the disk usage analyzer to cover vectors, but this seems best as a follow-up.

I think both of these come under the disk usage and field usage actions in #74057

@romseygeek romseygeek changed the title Upgrade to lucene-9.0.0-snapshot-a12260eb950 Upgrade to lucene-9.0.0-snapshot-32a0a16aff0 Sep 21, 2021
@romseygeek romseygeek merged commit 524d1ea into elastic:master Sep 21, 2021
@romseygeek romseygeek deleted the lucene-9-upgrade branch September 21, 2021 09:48
@iverase iverase removed the :Analytics/Aggregations Aggregations label Sep 27, 2021
@elasticmachine elasticmachine removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement :Search/Search Search-related issues that do not fall into other categories Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. Team:Search Meta label for search team v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants