-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conditional token filter to elasticsearch #31958
Merged
Merged
Changes from all commits
Commits
Show all changes
111 commits
Select commit
Hold shift + click to select a range
e5b20de
WIP
romseygeek da0fd1e
WIP
romseygeek df9bffc
WIP
romseygeek 402ed36
WIP
romseygeek fb7c21d
WIP
romseygeek d8f0170
docs
romseygeek bcee3f0
tests
romseygeek bba5939
d'oh
romseygeek 21cd02f
class name change in SPI
romseygeek 4315682
docs
romseygeek 52955df
Broekn
romseygeek dd139c7
nuke unit test
romseygeek 1d5deff
Merge branch 'master' into scripted-analysis
romseygeek 57a73f2
feedback
romseygeek 609951b
Term -> Token; move ScriptContext into module
romseygeek 801a704
Re-instate link in StringFunctionUtils javadocs
f923d9c
Docs: Change formatting of Cloud options
clintongormley a21fb82
Docs: Restyled cloud link in getting started
clintongormley 62fea58
[Rollup] Use composite's missing_bucket (#31402)
polyfractal 35a6774
Test: Fix a second case of bad watch creation
hub-cap 9203e50
Remove deprecated AnalysisPlugin#requriesAnalysisSettings method (#32…
romseygeek 849e690
Add second level of field collapsing (#31808)
mayya-sharipova de213a8
Mute ML AutodetectMemoryLimitIT#testTooManyPartitions on Windows (#32…
3bbc8c6
Watcher: cleanup ensureWatchExists use (#31926)
hub-cap 4440df5
Add secure setting for watcher email password (#31620)
hub-cap 2ab7db3
HLRC: Add xpack usage api (#31975)
rjernst 2183fff
Adds a new auto-interval date histogram (#28993)
colings86 0e7a6b4
lazy snapshot repository initialization (#31606)
vladimirdolzhenko 315999d
Watcher: Make settings reloadable (#31746)
hub-cap b959534
fix typo
94d3311
Clean Up Snapshot Create Rest API (#31779)
jdconrad 2945c3a
[Rollup] Histo group config should support scaled_floats (#32048)
polyfractal e48de6a
Mute failing tests
ec470f2
Replace Ingest ScriptContext with Custom Interface (#32003)
original-brownbear f446f91
Add nio http transport to security plugin (#32018)
Tim-Brooks bd20e99
Fix compile issues introduced by merge (#32058)
Tim-Brooks 71cd43b
SCRIPTING: Remove unused MultiSearchTemplateRequestBuilder (#32049)
original-brownbear e045ad6
Cleanup Duplication in `PainlessScriptEngine` (#31991)
original-brownbear dabbba1
Fix broken OpenLDAP Vagrant QA test
tvernum 82e8fce
Turn off real-mem breaker in single node tests
danielmitterdorfer 9d48815
Turn off real-mem breaker in REST tests
danielmitterdorfer 040bc9d
[Test] Mute MlJobIT#testDeleteJobAfterMissingAliases
jimczi 142d24a
Remove unused params from SSource and Walker (#31935)
ce8b3e3
[Tests] Fix failure due to changes exception message (#32036)
a1ad7a1
Fix BWC check after backport
jimczi bbe1b7c
Unmute field collapsing rest tests
jimczi 391641c
Ensure only parent breaker trips in unit test
danielmitterdorfer 6ec52fe
[Rollup] Fix duplicate field names in test (#32075)
jimczi ced669b
[TEST] Consistent algorithm usage (#32077)
jkakavas e38e69c
[Rollup] Replace RollupIT with a ESRestTestCase version (#31977)
polyfractal 4a9fbe7
Scripting: Remove dead code from painless module (#32064)
original-brownbear 5f130a2
Painless: Separate PainlessLookup into PainlessLookup and PainlessLoo…
jdconrad 1dd0279
Use correct formatting for links (#29460)
dedemorton 53f029b
Watcher: Store username on watch execution (#31873)
hub-cap e325526
Tweaked Elasticsearch Service links for SEO
debadair e514ad0
Tweaked Elasticsearch Service links for SEO
debadair 50d8fa8
[test] turn on host io cache for opensuse (#32053)
andyb-elastic c0ffec7
DOCS: put LIMIT 10 to the SQL query (#32065)
ahmedakef ee4ef86
SQL: allow LEFT and RIGHT as function names (#32066)
costin 9371e77
Revert "[test] disable packaging tests for suse boxes"
andyb-elastic 780697f
[Rollup] Add new capabilities endpoint for concrete rollup indices (#…
polyfractal 1106355
Switch non-x-pack to new style requests (#32106)
nik9000 7ce9926
Bypass highlight query terms extraction on empty fields (#32090)
jimczi 8ff5735
Painless: Move and Rename Several Methods in the lookup package (#32105)
jdconrad 97fbe49
Add Index UUID to `/_stats` Response (#31871)
original-brownbear ca2844f
[Test] Modify assert statement for ssl handshake (#32072)
bizybot 8bad2c6
Add exclusion option to `keep_types` token filter (#32012)
89bce93
Fix put mappings java API documentation (#31955)
c2ee07b
Enable testing in FIPS140 JVM (#31666)
jkakavas 24547e8
Check that client methods match API defined in the REST spec (#31825)
javanna 5a383c2
Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
davidkyle 5bad3a8
Updates the build to gradle 4.9 (#32087)
alpar-t 94330d8
Relax TermVectors API to work with textual fields other than TextFiel…
markharwood 3d0854d
Handle TokenizerFactory TODOs (#32063)
original-brownbear 70d2db3
Ensure to release translog snapshot in primary-replica resync (#32045)
dnhatn 5afea06
[ML] Move analyzer dependencies out of categorization config (#32123)
droberts195 bb9fae0
[ML] Wait for aliases in multi-node tests (#32086)
davidkyle b31dc36
Docs: Fix missing example script quote (#32010)
aptxx a7c8e07
Re-disable packaging tests on suse boxes
andyb-elastic 7490ec6
Remove empty @param from Javadoc
jkakavas a481ef6
Painless: Fix Bug with Duplicate PainlessClasses (#32110)
jdconrad 346edfa
Build: Move shadow customizations into common code (#32014)
nik9000 cdf5c8a
Disable C2 from using AVX-512 on JDK 10 (#32138)
jasontedor a835503
Build: Make additional test deps of check (#32015)
rjernst ff7ff36
Painless: Add PainlessClassBuilder (#32141)
jdconrad 101458b
Build: Skip jar tests if jar disabled
nik9000 88c4f6c
Switch distribution to new style Requests (#30595)
nik9000 8486f24
Remove versionType from translog (#31945)
dnhatn 413d211
ESIndexLevelReplicationTestCase doesn't support replicated failures b…
bleskes fa16875
[DOCS] Update TLS on Docker for 6.3 (#32114)
ninaspitfire 688deeb
Fix `range` queries on `_type` field for singe type indices (#31756)
ace3771
Term -> Token; deps
romseygeek 84ee20e
Fix CP for namingConventions when gradle home has spaces (#31914)
alpar-t 947fa2e
Fix Java 11 javadoc compile problem
e67cede
A replica can be promoted and started in one cluster state update (#3…
bleskes 8924ac3
Add EC2 credential test for repository-s3 (#31918)
vladimirdolzhenko b79dc6a
Add more contexts to painless execute api (#30511)
martijnvg d225048
use before instead of onOrBefore
martijnvg 997ebe8
Improve docs for search preferences (#32159)
DaveCTurner f591095
Fix BwC Tests looking for UUID Pre 6.4 (#32158)
original-brownbear 67a4dcb
Call setReferences() on custom referring tokenfilters in _analyze (#3…
romseygeek 93ecf1d
Merge conflicts
romseygeek 945fadf
more docs
romseygeek 303de4f
tests for all script variables
romseygeek 33b9da4
Merge branch 'master' into scripted-analysis
romseygeek b775f71
Merge branch 'master' into scripted-analysis
romseygeek 1dff0f6
merge error
romseygeek 546aa11
checkstyle
romseygeek 701fbf2
headers
romseygeek 396843f
Use actual painless syntax, not my own made-up syntax
romseygeek 8bc3d65
Merge branch 'master' into scripted-analysis
romseygeek File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
docs/painless/painless-contexts/painless-analysis-predicate-context.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
[[painless-analysis-predicate-context]] | ||
=== Analysis Predicate Context | ||
|
||
Use a painless script to determine whether or not the current token in an | ||
analysis chain matches a predicate. | ||
|
||
*Variables* | ||
|
||
`params` (`Map`, read-only):: | ||
User-defined parameters passed in as part of the query. | ||
|
||
`token.term` (`CharSequence`, read-only):: | ||
The characters of the current token | ||
|
||
`token.position` (`int`, read-only):: | ||
The position of the current token | ||
|
||
`token.positionIncrement` (`int`, read-only):: | ||
The position increment of the current token | ||
|
||
`token.positionLength` (`int`, read-only):: | ||
The position length of the current token | ||
|
||
`token.startOffset` (`int`, read-only):: | ||
The start offset of the current token | ||
|
||
`token.endOffset` (`int`, read-only):: | ||
The end offset of the current token | ||
|
||
`token.type` (`String`, read-only):: | ||
The type of the current token | ||
|
||
`token.keyword` ('boolean`, read-only):: | ||
Whether or not the current token is marked as a keyword | ||
|
||
*Return* | ||
|
||
`boolean`:: | ||
Whether or not the current token matches the predicate | ||
|
||
*API* | ||
|
||
The standard <<painless-api-reference, Painless API>> is available. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
90 changes: 90 additions & 0 deletions
90
docs/reference/analysis/tokenfilters/condition-tokenfilter.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
[[analysis-condition-tokenfilter]] | ||
=== Conditional Token Filter | ||
|
||
The conditional token filter takes a predicate script and a list of subfilters, and | ||
only applies the subfilters to the current token if it matches the predicate. | ||
|
||
[float] | ||
=== Options | ||
[horizontal] | ||
filter:: a chain of token filters to apply to the current token if the predicate | ||
matches. These can be any token filters defined elsewhere in the index mappings. | ||
|
||
script:: a predicate script that determines whether or not the filters will be applied | ||
to the current token. Note that only inline scripts are supported | ||
|
||
[float] | ||
=== Settings example | ||
|
||
You can set it up like: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
PUT /condition_example | ||
{ | ||
"settings" : { | ||
"analysis" : { | ||
"analyzer" : { | ||
"my_analyzer" : { | ||
"tokenizer" : "standard", | ||
"filter" : [ "my_condition" ] | ||
} | ||
}, | ||
"filter" : { | ||
"my_condition" : { | ||
"type" : "condition", | ||
"filter" : [ "lowercase" ], | ||
"script" : { | ||
"source" : "token.getTerm().length() < 5" <1> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
<1> This will only apply the lowercase filter to terms that are less than 5 | ||
characters in length | ||
|
||
And test it like: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
POST /condition_example/_analyze | ||
{ | ||
"analyzer" : "my_analyzer", | ||
"text" : "What Flapdoodle" | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[continued] | ||
|
||
And it'd respond: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"tokens": [ | ||
{ | ||
"token": "what", <1> | ||
"start_offset": 0, | ||
"end_offset": 4, | ||
"type": "<ALPHANUM>", | ||
"position": 0 | ||
}, | ||
{ | ||
"token": "Flapdoodle", <2> | ||
"start_offset": 5, | ||
"end_offset": 15, | ||
"type": "<ALPHANUM>", | ||
"position": 1 | ||
} | ||
] | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE | ||
<1> The term `What` has been lowercased, because it is only 4 characters long | ||
<2> The term `Flapdoodle` has been left in its original case, because it doesn't pass | ||
the predicate |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
40 changes: 40 additions & 0 deletions
40
...sis-common/src/main/java/org/elasticsearch/analysis/common/AnalysisPainlessExtension.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.analysis.common; | ||
|
||
import org.elasticsearch.painless.spi.PainlessExtension; | ||
import org.elasticsearch.painless.spi.Whitelist; | ||
import org.elasticsearch.painless.spi.WhitelistLoader; | ||
import org.elasticsearch.script.ScriptContext; | ||
|
||
import java.util.Collections; | ||
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class AnalysisPainlessExtension implements PainlessExtension { | ||
|
||
private static final Whitelist WHITELIST = | ||
WhitelistLoader.loadFromResourceFiles(AnalysisPainlessExtension.class, "painless_whitelist.txt"); | ||
|
||
@Override | ||
public Map<ScriptContext<?>, List<Whitelist>> getContextWhitelists() { | ||
return Collections.singletonMap(AnalysisPredicateScript.CONTEXT, Collections.singletonList(WHITELIST)); | ||
} | ||
} |
87 changes: 87 additions & 0 deletions
87
...lysis-common/src/main/java/org/elasticsearch/analysis/common/AnalysisPredicateScript.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.analysis.common; | ||
|
||
import org.elasticsearch.script.ScriptContext; | ||
|
||
/** | ||
* A predicate based on the current token in a TokenStream | ||
*/ | ||
public abstract class AnalysisPredicateScript { | ||
|
||
/** | ||
* Encapsulation of the state of the current token | ||
*/ | ||
public static class Token { | ||
public CharSequence term; | ||
public int pos; | ||
public int posInc; | ||
public int posLen; | ||
public int startOffset; | ||
public int endOffset; | ||
public String type; | ||
public boolean isKeyword; | ||
|
||
public CharSequence getTerm() { | ||
return term; | ||
} | ||
|
||
public int getPositionIncrement() { | ||
return posInc; | ||
} | ||
|
||
public int getPosition() { | ||
return pos; | ||
} | ||
|
||
public int getPositionLength() { | ||
return posLen; | ||
} | ||
|
||
public int getStartOffset() { | ||
return startOffset; | ||
} | ||
|
||
public int getEndOffset() { | ||
return endOffset; | ||
} | ||
|
||
public String getType() { | ||
return type; | ||
} | ||
|
||
public boolean isKeyword() { | ||
return isKeyword; | ||
} | ||
} | ||
|
||
/** | ||
* Returns {@code true} if the current term matches the predicate | ||
*/ | ||
public abstract boolean execute(Token token); | ||
|
||
public interface Factory { | ||
AnalysisPredicateScript newInstance(); | ||
} | ||
|
||
public static final String[] PARAMETERS = new String[]{ "token" }; | ||
public static final ScriptContext<Factory> CONTEXT = new ScriptContext<>("analysis", Factory.class); | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to leave this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a backwards breaking change?
ScriptPlugin#getContexts()
needs to change return value fromList<ScriptContext>
toList<ScriptContext<?>>
. Which ought to be done, but it's a separate change. I'll open an issue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening an issue makes sense. Yeah, it is a separate thing.