Optimize binary search call #13595

dungba88 · 2024-07-21T07:34:55Z

Description

I think advance is usually not called in backward, so we can run the binary search from the current position +1 instead of 0. Also cap the fromIndex to scoreDocs.length to avoid overflow.

From the Javadoc of advance

   * <p>The behavior of this method is <b>undefined</b> when called with <code> target &le; current
   * </code>, or after the iterator has exhausted. Both cases may result in unpredicted behavior.

kaivalnp

Nice catch! Do we have suitable benchmarks to measure this (ideally a combination of lexical + vector search)?

jpountz · 2024-07-31T09:14:22Z

This makes sense to me. Maybe we should go one step further and perform an exponential search instead, e.g. by reusing IntArrayDocIdSetIterator.

dungba88 · 2024-08-01T10:17:25Z

The advance will keep reducing the array size and we will generally advance small steps ahead right? Then I think exponential search makes sense. I'll try to use IntArrayDocIdSetIterator in next rev

gsmiller · 2024-08-01T14:27:52Z

I think exponential search will only outperform binary search in this case if we expect the next target to be relatively close to the "min" we're constantly "pushing up" (thanks to your change). Is that the case? (Specifically, I think the math works out that exponential search is only better if the target is in the next sqrt(N) elements where N is the size of the remaining list being searched... but I could be wrong).

jpountz · 2024-08-01T14:38:52Z

If DocIdSetIterator#advance gets called on large increments, then there are only so many calls that can be done because the doc ID space is quickly exhausted. However, if you only advance by small intervals, then you could end up with millions of calls to DocIdSetIterator#advance, so this tends to be the case worth optimizing. E.g. the recent change to skip data removed the higher levels of skip data that help advance by large increments, yet queries observed a speedup rather than a slowdown.

gsmiller · 2024-08-01T15:25:23Z

Ah yeah, OK thanks @jpountz. Makes sense.

dungba88 · 2024-08-02T03:37:42Z

@jpountz I was reading IntArrayDocIdSetIterator, it is a private class only exposed through IntArrayDocIdSet. I think we need to extend the capability here (storing the score, having both score + doc ID instead of just doc ID). I think we can just re-implement the exponential search here. I can cut follow-up to move implementation to a common place, maybe ArrayUtil. WDYT?

github-actions · 2024-08-17T00:19:11Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

dungba88 · 2024-12-15T08:36:17Z

@jpountz I have changed the code to exponential search, and move the functionality to ArrayUtil. We still need two different method for int[] and generic array, as Java doesn't seem to allow genericize primitive types. Can you give some feedbacks on the new revision?

Optimize binary search call

974fb09

kaivalnp approved these changes Jul 24, 2024

View reviewed changes

github-actions bot added the Stale label Aug 17, 2024

dungba88 added 2 commits December 15, 2024 10:45

Move exponential search to ArrayUtil

c94d577

merge from main

bcdcc1b

github-actions bot removed the Stale label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize binary search call #13595

Optimize binary search call #13595

dungba88 commented Jul 21, 2024

kaivalnp left a comment

jpountz commented Jul 31, 2024

dungba88 commented Aug 1, 2024

gsmiller commented Aug 1, 2024

jpountz commented Aug 1, 2024

gsmiller commented Aug 1, 2024

dungba88 commented Aug 2, 2024

github-actions bot commented Aug 17, 2024

dungba88 commented Dec 15, 2024

Optimize binary search call #13595

Are you sure you want to change the base?

Optimize binary search call #13595

Conversation

dungba88 commented Jul 21, 2024

Description

kaivalnp left a comment

Choose a reason for hiding this comment

jpountz commented Jul 31, 2024

dungba88 commented Aug 1, 2024

gsmiller commented Aug 1, 2024

jpountz commented Aug 1, 2024

gsmiller commented Aug 1, 2024

dungba88 commented Aug 2, 2024

github-actions bot commented Aug 17, 2024

dungba88 commented Dec 15, 2024