-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BytesRefIterator to TermInSetQuery #13806
Conversation
Addresses apache#13804 TermInSetQuery used to have an accessor to its terms that was removed in apache#12173 to protect leaking internal encoding details. This introduces an accessor to the term data in the query that doesn't expose internal but merely allows iterating over the decoded BytesRef, making inspection of the querys content possible again.
0fe5c78
to
9d60804
Compare
good solution. could we consider also fixing the visitor to use this approach (vs passing a RunAutomaton or something awful?) |
and the question is not for this PR, just a general one. It seems the only "real user" of |
@@ -141,6 +135,11 @@ public long getTermsCount() { | |||
return termData.size(); | |||
} | |||
|
|||
public BytesRefIterator getBytesRefIterator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add javadocs for the method and @experimental
for now?
its possible we could "fix visitor" api in the future where we wouldn't need this public method specific to this query as well (see general thoughts on PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks.
also, it would be good to get an idea of the use-case. The problem is, this query can hold many terms:
|
Thanks for taking a look @rmuir ! I have been digging a bit through history, it seems like it used to be possible to get all the terms via The main usecase we have is monitor alike, where you have queries stored in the index, and want to run them against incoming documents. In order to pre-filter the queries and reduce the amount of them that we need to run, we extract terms from them upon indexing and put them in a separate field that we later use to apply pre-filtering. I checked in the monitor code and it looks like its query visitor (from With that, I am going to go ahead and merge this PR for now to main, that unblocks us for now, and we can continue the discussion about the long term approach. |
@cbuescher could you add an entry to CHANGES.txt, under 9.12 please? I am thinking that this should be backported so it provides a replacement for the deprecated method before it gets removed. I can take care of adding a link to it in the javadocs of the deprecated term in 9.x when I do the backporting. |
@javanna done, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TermInSetQuery used to have an accessor to its terms that was removed in #12173 to protect leaking internal encoding details. This introduces an accessor to the term data in the query that doesn't expose internals but merely allows iterating over the decoded BytesRef, making inspection of the querys content possible again. Closes #13804
Addresses #13804
TermInSetQuery used to have an accessor to its terms that was removed in #12173 to protect leaking internal encoding details. This introduces an accessor to the term data in the query that doesn't expose internal but merely allows iterating over the decoded BytesRef, making inspection of the querys content possible again.