Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let DocIdSetIterator optimize loading into a FixedBitSet. #14069

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 16, 2024

This is an iteration on #14064. The benefits of this approach are that the API is a bit nicer and allows optimizing not only when doc IDs are stored in an int[]. The downside is that it only helps non-scoring disjunctions for now, but we can look into scoring disjunctions later on.

This is an iteration on apache#14064. The benefits of this approach are that the API
is a bit nicer and allows optimizing not only when doc IDs are stored in an
int[]. The downside is that it only helps non-scoring disjunctions for now, but
we can look into scoring disjunctions later on.
@jpountz jpountz added this to the 10.2.0 milestone Dec 16, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Dec 16, 2024

luceneutil still gives a big speedup on wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      DismaxTerm      615.71      (6.0%)      601.37      (5.7%)   -2.3% ( -13% -    9%) 0.528
                    CombinedTerm       32.69      (2.3%)       32.05      (1.6%)   -2.0% (  -5% -    1%) 0.114
             CountFilteredOrMany        3.68      (6.2%)        3.64      (5.0%)   -1.2% ( -11% -   10%) 0.736
                            Term      486.74      (5.4%)      481.06      (3.6%)   -1.2% (  -9% -    8%) 0.688
             CountFilteredPhrase       25.50      (2.0%)       25.23      (1.6%)   -1.1% (  -4% -    2%) 0.365
                     OrStopWords       36.92      (3.0%)       36.60      (4.6%)   -0.9% (  -8% -    6%) 0.720
                      OrHighHigh       57.11      (3.0%)       56.62      (5.0%)   -0.8% (  -8% -    7%) 0.747
          CountFilteredOrHighMed       67.39      (0.9%)       66.92      (1.2%)   -0.7% (  -2% -    1%) 0.285
         CountFilteredOrHighHigh       56.61      (1.1%)       56.24      (1.6%)   -0.7% (  -3% -    2%) 0.459
                      AndHighMed      131.57      (2.5%)      130.73      (2.7%)   -0.6% (  -5% -    4%) 0.695
                  FilteredPhrase       30.22      (0.9%)       30.06      (0.3%)   -0.5% (  -1% -    0%) 0.194
                    FilteredTerm      156.35      (2.1%)      155.49      (3.2%)   -0.5% (  -5% -    4%) 0.751
                          Fuzzy1       83.79      (1.7%)       83.42      (3.0%)   -0.4% (  -4% -    4%) 0.768
                     CountPhrase        4.28      (2.9%)        4.26      (1.2%)   -0.4% (  -4% -    3%) 0.782
                        Or3Terms      181.12      (1.7%)      180.43      (2.9%)   -0.4% (  -4% -    4%) 0.801
                     AndHighHigh       45.52      (3.1%)       45.34      (3.0%)   -0.4% (  -6% -    5%) 0.845
                       And3Terms      180.31      (1.9%)      179.72      (1.7%)   -0.3% (  -3% -    3%) 0.771
                          Fuzzy2       78.96      (1.5%)       78.72      (2.8%)   -0.3% (  -4% -    4%) 0.836
             FilteredAndHighHigh       63.19      (1.9%)       63.05      (1.2%)   -0.2% (  -3% -    2%) 0.820
                        PKLookup      284.36      (1.1%)      283.79      (2.8%)   -0.2% (  -4% -    3%) 0.884
                       OrHighMed      205.60      (3.0%)      205.45      (4.6%)   -0.1% (  -7% -    7%) 0.976
              FilteredAndHighMed      130.63      (2.6%)      130.71      (1.4%)    0.1% (  -3% -    4%) 0.963
                        Wildcard       79.74      (3.7%)       79.85      (2.2%)    0.1% (  -5% -    6%) 0.943
            FilteredAndStopWords       48.16      (1.9%)       48.22      (1.6%)    0.1% (  -3% -    3%) 0.897
                 AndHighOrMedMed       45.82      (1.4%)       45.90      (1.1%)    0.2% (  -2% -    2%) 0.808
              FilteredOrHighHigh       64.29      (2.4%)       64.44      (2.3%)    0.2% (  -4% -    5%) 0.880
                FilteredOr3Terms      167.12      (1.3%)      167.64      (0.8%)    0.3% (  -1% -    2%) 0.637
                AndMedOrHighHigh       58.62      (2.2%)       58.82      (2.3%)    0.3% (  -4% -    4%) 0.812
                   TermTitleSort      158.66      (1.6%)      159.26      (1.0%)    0.4% (  -2% -    3%) 0.652
              Or2Terms2StopWords      171.04      (2.5%)      171.80      (2.9%)    0.4% (  -4% -    5%) 0.795
     FilteredAnd2Terms2StopWords      198.44      (1.3%)      199.35      (0.6%)    0.5% (  -1% -    2%) 0.463
               FilteredOrHighMed      154.95      (1.5%)      155.67      (1.3%)    0.5% (  -2% -    3%) 0.604
                    AndStopWords       32.77      (2.3%)       32.93      (2.4%)    0.5% (  -4% -    5%) 0.748
             And2Terms2StopWords      167.37      (2.1%)      168.18      (1.7%)    0.5% (  -3% -    4%) 0.689
                  FilteredOrMany       17.15      (1.1%)       17.24      (1.7%)    0.5% (  -2% -    3%) 0.571
                CountAndHighHigh       55.25      (3.0%)       55.54      (1.1%)    0.5% (  -3% -    4%) 0.711
              CombinedOrHighHigh       19.45      (0.2%)       19.59      (0.9%)    0.7% (   0% -    1%) 0.095
                 DismaxOrHighMed      174.72      (2.7%)      175.99      (3.4%)    0.7% (  -5% -    6%) 0.704
             FilteredOrStopWords       43.21      (2.9%)       43.54      (2.8%)    0.8% (  -4% -    6%) 0.674
               CombinedOrHighMed       73.76      (0.4%)       74.38      (1.0%)    0.8% (   0% -    2%) 0.070
      FilteredOr2Terms2StopWords      148.83      (1.8%)      150.09      (1.4%)    0.8% (  -2% -    4%) 0.402
                  FilteredIntNRQ      113.50     (13.5%)      114.47     (14.5%)    0.9% ( -23% -   33%) 0.922
                          IntNRQ      114.55     (13.6%)      115.58     (14.5%)    0.9% ( -23% -   33%) 0.919
              CombinedAndHighMed       55.84      (1.7%)       56.34      (2.0%)    0.9% (  -2% -    4%) 0.445
             CombinedAndHighHigh       15.36      (1.6%)       15.51      (2.0%)    1.0% (  -2% -    4%) 0.402
               FilteredAnd3Terms      193.72      (2.0%)      195.75      (0.9%)    1.0% (  -1% -    4%) 0.283
                 CountAndHighMed      159.57      (3.9%)      161.37      (2.0%)    1.1% (  -4% -    7%) 0.562
                   TermMonthSort     3912.20      (2.7%)     3964.82      (2.4%)    1.3% (  -3% -    6%) 0.405
                       CountTerm    10986.48      (3.0%)    11153.37      (2.4%)    1.5% (  -3% -    7%) 0.373
                DismaxOrHighHigh      119.28      (4.0%)      121.39      (3.8%)    1.8% (  -5% -    9%) 0.475
                      OrHighRare      263.69      (9.0%)      269.30      (9.7%)    2.1% ( -15% -   22%) 0.718
                          OrMany       19.84      (2.1%)       20.31      (2.9%)    2.3% (  -2% -    7%) 0.149
               TermDayOfYearSort      650.76      (3.4%)      666.38      (3.1%)    2.4% (  -3% -    9%) 0.239
                 FilteredPrefix3      131.68      (5.7%)      135.96      (2.8%)    3.2% (  -4% -   12%) 0.256
                         Prefix3      139.59      (6.2%)      144.69      (2.8%)    3.7% (  -5% -   13%) 0.230
                          Phrase       14.52      (8.1%)       15.08      (5.4%)    3.8% (  -8% -   18%) 0.380
                      TermDTSort      285.66      (7.7%)      301.26      (8.6%)    5.5% ( -10% -   23%) 0.289
                  CountOrHighMed      135.38      (2.3%)      189.34      (3.5%)   39.9% (  33% -   46%) 0.000
                 CountOrHighHigh       72.89      (1.6%)      121.23      (5.0%)   66.3% (  58% -   74%) 0.000
                     CountOrMany        6.78     (15.4%)       11.58      (4.8%)   70.7% (  43% -  107%) 0.000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant