-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripted analysis components #26100
Comments
We agreed to do it but there are some challenges on the way, in particular:
|
This is what concerns me the most. Perhaps this is a case where the scripting capability needs to have an api shim over it, in order to maintain the inability to modify these scripts? |
Yeah. We talked about, for example, forcing you to use inline scripts in the index settings. Then they can't change but you are stuck using inline scripts. |
cc @elastic/es-search-aggs |
We have condition filters now (#31958) and we should have scriptable stop filters soon (#33431). We also have the multiplexing filter which allows you to emit different variations on a token at the same position. I think the only thing remaining is to have a token mutating filter that allows you to change the bytes of a token using a script. The tricky part here will be the API. We could just expose CharTermAttribute but that's not a trivial class to use, particularly if you want to prepend characters to a token. I'm also slightly worried that it would be easy to write a script that ends up inadvertently creating a bunch of objects, which you really don't want to do in the fast loop of an analysis chain. One option could be to create a new type that wraps CharTermAttribute, with append(), prepend() and substring() methods, which would cover most use cases. More complicated substitutions can already be done using PatternReplaceFilter. |
+1 to only allow mutating bytes
Maybe we can look at existing stemmers and see what building blocks would be useful to implement them. For instance I think reimplementing EnglishMinimalStemFilter would make a great example in our docs. |
❤️ We have a tool in the docs that calls the analyze API on a bunch of strings with two different analyzers and compares the results. It might help here. |
As an additional datapoint where a scripted filter might be interesting: #34402. |
This issue has been opened for 5 years and had no activity in the last 2. A couple of scripted components have been added, and we are not currently planning to add more, hence I am closing. Let's reopen in case the need comes up again in the future. |
If you have specific analysis needs, the only way to do it today is to write a plugin. This is quite some work since the plugin needs to be rebuilt for every release so it might be a bit frustrating if your needs are simple.
We could give users the ability to write custom analysis components using scripts so that such simple needs could be addressed with a vanilla installation of Elasticsearch.
The text was updated successfully, but these errors were encountered: