Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException in ElisionFilter on _analyze #43002

Closed
telendt opened this issue Jun 7, 2019 · 2 comments · Fixed by #43083
Closed

NullPointerException in ElisionFilter on _analyze #43002

telendt opened this issue Jun 7, 2019 · 2 comments · Fixed by #43083
Assignees
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@telendt
Copy link
Contributor

telendt commented Jun 7, 2019

Elasticsearch version (bin/elasticsearch --version):
Version: 7.1.1, Build: default/docker/7a013de/2019-05-23T14:04:00.380842Z, JVM: 12.0.1
(but happens in older versions too)

Steps to reproduce:

curl -sH 'Content-Type: application/json' 'localhost:9200/_analyze' -d '
{
  "text": "l’avion",
  "tokenizer": "standard",
  "filter": ["elision"]
}' | jq .

output:

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[12241dcbb809][172.20.0.2:9300][indices:admin/analyze[s]]"
      }
    ],
    "type": "null_pointer_exception",
    "reason": null
  },
  "status": 500
}

Provide logs (if relevant):

{
   "type":"server",
   "timestamp":"2019-06-07T19:15:38,854+0000",
   "level":"WARN",
   "component":"r.suppressed",
   "cluster.name":"docker-search-cluster",
   "node.name":"12241dcbb809",
   "cluster.uuid":"R-Zf6up6TXiv9aUhvgK03A",
   "node.id":"Fc9JysDfQm6Rp9FqaCu_Cg",
   "message":"path: /_analyze, params: {}",
   "stacktrace":[
      "org.elasticsearch.transport.RemoteTransportException: [12241dcbb809][172.20.0.2:9300][indices:admin/analyze[s]]",
      "Caused by: java.lang.NullPointerException",
      "at org.apache.lucene.analysis.util.ElisionFilter.incrementToken(ElisionFilter.java:66) ~[lucene-analyzers-common-8.0.0.jar:8.0.0 2ae4746365c1ee72a0047ced7610b2096e438979 - jimczi - 2019-03-08 11:59:47]",
      "at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.simpleAnalyze(TransportAnalyzeAction.java:276) ~[elasticsearch-7.1.1.jar:7.1.1]",
      "at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:251) ~[elasticsearch-7.1.1.jar:7.1.1]",
      "at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:170) ~[elasticsearch-7.1.1.jar:7.1.1]",
      "at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:81) ~[elasticsearch-7.1.1.jar:7.1.1]",
      "at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$1.doRun(TransportSingleShardAction.java:117) [elasticsearch-7.1.1.jar:7.1.1]",
      "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-7.1.1.jar:7.1.1]",
      "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.1.1.jar:7.1.1]",
      "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
      "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
      "at java.lang.Thread.run(Thread.java:835) [?:?]"
   ]
}

Few words of context:
Elision filter seems to work fine to me in regular analyzer (created at the index creation time), so it's probably something related to the way filters are used by analyze API.

@jaymode jaymode added the :Search Relevance/Analysis How text is split into tokens label Jun 7, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jimczi jimczi added the >bug label Jun 10, 2019
@romseygeek romseygeek self-assigned this Jun 10, 2019
@romseygeek
Copy link
Contributor

Elision filter needs an 'articles' setting to work properly, which is missing from the analyze HTTP call you're making. It definitely shouldn't be producing an NPE though, it should give a more informative error at construction time.

romseygeek added a commit that referenced this issue Jun 27, 2019
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
romseygeek added a commit that referenced this issue Jun 27, 2019
When a named token filter or char filter is passed as part of an Analyze API
request with no index, we currently try and build the relevant filter using no
index settings. However, this can miss cases where there is a pre-configured
filter defined in the analysis registry. One example here is the elision filter, which
has a pre-configured version built with the french elision set; when used as part
of normal analysis, this preconfigured set is used, but when used as part of the
Analyze API we end up with NPEs because it tries to instantiate the filter with
no index settings.

This commit changes the Analyze API to check for pre-configured filters in the case
that the request has no index defined, and is using a name rather than a custom
definition for a filter.

It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram`
tokenizer to make their settings consistent with the defaults used when creating
them with no settings

Closes #43002
Closes #43621 
Closes #43582
romseygeek added a commit that referenced this issue Jun 27, 2019
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
romseygeek added a commit that referenced this issue Jun 27, 2019
When a named token filter or char filter is passed as part of an Analyze API
request with no index, we currently try and build the relevant filter using no
index settings. However, this can miss cases where there is a pre-configured
filter defined in the analysis registry. One example here is the elision filter, which
has a pre-configured version built with the french elision set; when used as part
of normal analysis, this preconfigured set is used, but when used as part of the
Analyze API we end up with NPEs because it tries to instantiate the filter with
no index settings.

This commit changes the Analyze API to check for pre-configured filters in the case
that the request has no index defined, and is using a name rather than a custom
definition for a filter.

It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram`
tokenizer to make their settings consistent with the defaults used when creating
them with no settings

Closes #43002
Closes #43621
Closes #43582
@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants