Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom vector_script engine. #6

Closed
wants to merge 3 commits into from
Closed

Conversation

jtibshirani
Copy link
Owner

@jtibshirani jtibshirani commented Sep 28, 2019

This helps quantify the performance difference due to scripting. Results on random-s-100-angular:

Algorithm                             Recall      QPS
EsBruteforce(baseline)                1.000       63.304
EsBruteforce(vector_query)            1.000       74.759
EsBruteforce(vector_script)           1.000       66.950
EsBruteforce(vector_script_direct)    1.000       66.245
EsBruteforce(direct_docvalues)        1.000       68.860
  • baseline is the current approach to vector scoring (in elasticsearch 7.5)
  • vector_query runs the dedicated query which avoids scripting altogether (Add a dedicated vector_score query. #4)
  • vector_script uses the custom script engine from this PR (commit 2212966)
  • vector_script_direct uses the custom script engine from this PR, but computes the similarity directly instead of using CosineSimilarity (commit 65d73f1)
  • direct_docvalues builds on vector_script_direct and accesses docvalues directly as opposed to using LeafDocLookup (commit ef8c261)

This calculation omits some safety checks around vector length for simplicity
(these weren't found to have a significant effect on performance).
@jtibshirani jtibshirani deleted the vector-script-query branch April 16, 2020 17:53
jtibshirani pushed a commit that referenced this pull request Mar 11, 2021
…astic#69765)

Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation.

For example the query

```sql
SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j
```

failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization:

```
OrderBy[[Order[j{r}#4,ASC,LAST]]]                                             ! OrderBy[[Order[i{r}#2,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 
```

By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above:

```
OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]]                                     = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 
 ```

The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated.

Fixes elastic#67237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant