Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Lucene to 9.x #179

Closed
lfcnassif opened this issue Jul 3, 2020 · 6 comments · Fixed by #888
Closed

Upgrade to Lucene to 9.x #179

lfcnassif opened this issue Jul 3, 2020 · 6 comments · Fixed by #888
Assignees

Comments

@lfcnassif
Copy link
Member

lfcnassif commented Jul 3, 2020

We are using a very old 4.9 version. A lot of Lucene APIs changed and it will need some work. This is a sensible upgrade and will break back compatibility. At least Lucene query performance and memory usage evolved a lot in 2018 and 2019: https://home.apache.org/~mikemccand/lucenebench/

TermQueries increased from 40/s to 1k/s !

@lfcnassif
Copy link
Member Author

lfcnassif commented Jul 6, 2020

migrated to lucene-5.5.5 as a first step in 17babcd

@lfcnassif
Copy link
Member Author

lfcnassif commented Jul 15, 2020

Migration to lucene-6.6.6 c5a44db merged in master.

Now we can use multidimensional points.

@lfcnassif lfcnassif removed their assignment Jun 30, 2021
@lfcnassif lfcnassif self-assigned this Nov 24, 2021
@lfcnassif
Copy link
Member Author

I'm resuming this work. It's harder than expected, because DocValues, used in lots of places, are not random accessible anymore since Lucene 7.0:
https://issues.apache.org/jira/browse/LUCENE-7407

lfcnassif added a commit that referenced this issue Dec 3, 2021
- try to optimize memory usage of int[][] and long[][] arrays
@lfcnassif
Copy link
Member Author

Lucene 9.0.0 was recently released: https://lucene.apache.org/core/corenews.html#apache-lucenetm-900-available
I highlight Support for indexing high-dimensionality numeric vectors to perform nearest-neighbor search, using the Hierarchical Navigable Small World graph algorithm
It is a state-of-the-art algorithm that could be used in photoDNA, similar images, similar faces and future vector based features look up.
But Lucene 9.0 requires JDK 11 or newer.

What other devs think about setting JDK 11 as minimum version for IPED, as it will use by default the embedded JDK 11 at processing and analysis time? Users won't have issues if they use the *.exe files and will receive a warning before exit if they execute the *.jar files with an older JDK version.
I'm +1.

@fmpfeifer
Copy link
Member

I think that JDK 11 as minimum requirement is almost mandatory now.
Supporting JDK 8 is already holding us back.
my 2p.

@lfcnassif
Copy link
Member Author

Supporting JDK 8 is already holding us back.

What JDK 11 feature/library would you like to use?

lfcnassif added a commit that referenced this issue Dec 21, 2021
@lfcnassif lfcnassif changed the title Upgrade to Lucene to 8.x Upgrade to Lucene to 9.x Dec 23, 2021
lfcnassif added a commit that referenced this issue Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants