Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require [articles] setting in elision filter #43083

Merged
merged 8 commits into from
Jun 27, 2019

Conversation

romseygeek
Copy link
Contributor

We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Fixes #43002

@romseygeek romseygeek self-assigned this Jun 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@telendt
Copy link
Contributor

telendt commented Jun 11, 2019

If it requires setting, shouldn't elision filter provider be wrapped in requiresAnalysisSettings at the time of registering?

Also - why are those settings required for _analyze but not in a "regular" analyzer?

(I do not have any elision specific settings in my index and everything works fine there)

@romseygeek
Copy link
Contributor Author

romseygeek commented Jun 24, 2019

If it requires setting, shouldn't elision filter provider be wrapped in requiresAnalysisSettings at the time of registering?

It should, thanks for catching!

(I do not have any elision specific settings in my index and everything works fine there)

Are you sure you're actually using the elision filter during indexing? If I set things up without any settings, then I get NPEs during indexing:

PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "elision" : {
          "type" : "elision"
        }
      },
      "analyzer": {
        "default": {
          "tokenizer": "standard",
          "filter": [
            "elision"
          ]
        }
      }
    }
  }
}

POST test/_doc/1 
{
  "field1" : "l'avion"
}
{
  "error": {
    "root_cause": [
      {
        "type": "null_pointer_exception",
        "reason": null
      }
    ],
    "type": "null_pointer_exception",
    "reason": null
  },
  "status": 500
}

@romseygeek
Copy link
Contributor Author

Aha, but there's a prebuilt elision filter that by default uses French articles, so if you don't define any filters at all but still add elision to your list of filters, then you get that prebuilt filter; and indeed, the _analyze API doesn't appear to pick this up. So we do need to add this PR, but there is more work to do afterwards.

@telendt
Copy link
Contributor

telendt commented Jun 24, 2019

Aha, but there's a prebuilt elision filter that by default uses French articles, so if you don't define any filters at all but still add elision to your list of filters, then you get that prebuilt filter

Yes, that's how it's used in my index. Sorry for imprecise description.

@romseygeek
Copy link
Contributor Author

I opened #43568 as a follow-up to deal with pre-configured filters in the Analyze API

@romseygeek romseygeek requested a review from cbuescher June 26, 2019 08:23
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek change LGTM, but can you briefly clarify how this will be handled when backporting to 7.3? The way I currently understand your comments here is that there an elision filter without an explicite definition in the settings will still use the built-in "french" one? Just want to make sure we're not breaking existing mappings, even if they might be silently behave buggy, on a minor when backporting.

@romseygeek
Copy link
Contributor Author

an elision filter without an explicit definition in the settings will still use the built-in "french" one

That's correct. The only change we're making here is to capture a mis-configured filter early - currently, if you set up an elision filter in an explicit mapping but don't give it an articles set, then it will throw a NullPointerException when it is first used. Now, you'll get an error when you create the mapping.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romseygeek , I left a comment

@@ -35,6 +35,9 @@

ElisionTokenFilterFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
super(indexSettings, name, settings);
if (settings.get("articles") == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hidden but it's also possible to set articles_path to reference a file so we should also check this alternative.
I am not sure that a lot of users are aware of this and I doubt that it is very useful but we should handle it for bwc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

articles_path isn't documented, I'll add something to the tokenfilter docs about it as well.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating, I left a minor comment but feel free to ignore.

@@ -35,6 +35,9 @@

ElisionTokenFilterFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
super(indexSettings, name, settings);
if (settings.get("articles") == null && settings.get("articles_path") == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that parseArticles doesn't return a null value instead ?

@romseygeek romseygeek merged commit d2c696d into elastic:master Jun 27, 2019
romseygeek added a commit that referenced this pull request Jun 27, 2019
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NullPointerException in ElisionFilter on _analyze
6 participants