Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate use of htmlStrip as name for HtmlStripCharFilter #27429

Merged
merged 2 commits into from
Apr 19, 2018

Conversation

cbuescher
Copy link
Member

@cbuescher cbuescher commented Nov 17, 2017

The camel case name htmlStip should be removed in favour of html_strip, but
we need to deprecate it first. This change adds deprecation warnings for indices
which are create after 6.3.0 and logs deprecation warnings for these.

@cbuescher
Copy link
Member Author

I'm not 100% sure this is the best place to actually issue a deprecation warning for a deprecated analysis component name, I was e.g. suprised that this will trigger a warnign for each index request to a new index that uses an analyzer with the deprecated name (see rest test). Happy to discuss any other options.

@lcawl lcawl added v6.2.0 and removed v6.1.0 labels Dec 12, 2017
@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jan 17, 2018

@cbuescher What do you think of the approach to issue a deprecation warning when a request with htmlStrip is processed, something like: https://gist.github.com/mayya-sharipova/27cdcdc427327235d126c11280ee01ba?

I am also wondering what happens in the next version that will not have these deprecations, and an index still contains htmlStrip as a char filter?

@colings86 colings86 added v6.3.0 and removed v6.2.0 labels Jan 22, 2018
@romseygeek
Copy link
Contributor

cc @elastic/es-search-aggs

@cbuescher cbuescher removed the v6.3.0 label Apr 13, 2018
The camel case name `htmlStip` should be removed in favour of `html_strip`, but
we need to deprecate it first. This change adds deprecation warnings for lucene
indices with lucene version larger than 7.0.0 and logs deprecation warnings for
those cases.
@cbuescher
Copy link
Member Author

@mayya-sharipova sorry for the long silence of this, I took a look at your suggestion. What I like about wrapping the deprecation logging inside the PreConfiguredCharFilter#create function is that we can log deprecation warning based on index versions, so using the old filter name in old indices shouldn't have any effect. What I still don't like about my current solution is that for new indices we add warning headers to each request. The deprecation log itself should be fine because DeprecationLogger#deprecatedAndMaybeLog should supress most duplication, but it seems like we create a new HTMLStripCharFilter for each new token stream.
I spent some time looking for a good place to emit the warning only once on new index creation, but couldn't find any so far. Maybe @jpountz has a suggestion?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time looking for a good place to emit the warning only once on new index creation, but couldn't find any so far. Maybe @jpountz has a suggestion?

Analysis components should be cached per thread, so these warnings should stop after the analyzer has been used in every thread of the index/search threadpools. I think your approach is fine.

@mayya-sharipova mayya-sharipova removed their request for review April 19, 2018 10:48
@cbuescher
Copy link
Member Author

I think your approach is fine

@jpountz thanks, I was also thinking about the client side which will get warning headers back with every request I think, that isn't de-duplicated like the in the deprecation logs. But I also guess that is something the client can ignore easily usually. Will keep this open for a bit but will merge later if there are no other objections.
Another thing, I haven't labeled it yet with 6.x but already use the 6_3_0 version in the PR, so I was planning to backport this. Is that okay too or does this requiere any additional notes in the migration docs etc? I checked an we don't advertise the htmlStrip name in any docs at least since 5.6 (maybe even longer), so most ppl. should be using html_strip anyway, wdyt?

@jpountz
Copy link
Contributor

jpountz commented Apr 19, 2018

I was also thinking about the client side which will get warning headers back with every request I think

Is it true? Token filters are supposed to be cached per thread, so after some requests, there should not be any warnings in the headers anymore?

I think we should only add it to the migration notes of 7.0, when the camelcase alias gets removed.

@cbuescher
Copy link
Member Author

Is it true? Token filters are supposed to be cached per thread, so after some requests, there should not be any warnings in the headers anymore?

I think so, thats at least how I interpret the Yaml Rest test that I added in this PR. Adding two documents to the index that uses the old deprecated name issues warning headers twice. I suppose the reason is because the response headers are added here regardless of how the boolean "log" flag that is used for de-duplication is set. In a way it makes sense: while one warning in the logs should usually be enough to warn admin-type users, clients could be different and if we only warn once the second user that should get that warning doesn't see it anymore.
Maybe I'm also misinterpreting this, will try using curl quickly.

@cbuescher
Copy link
Member Author

Token filters are supposed to be cached per thread

Turns out the repeated warning headers in the responses are expected since this is a char filters as opposed to tokenizers or token filters, which are cached differently.
We cannot really avoid the warning headers in the rest responses here, but the deprecation warning will be logged only once (or at last infrequently) and the response headers can easily be ignored.

@cbuescher cbuescher merged commit 24763d8 into elastic:master Apr 19, 2018
cbuescher pushed a commit that referenced this pull request Apr 19, 2018
The camel case name `htmlStip` should be removed in favour of `html_strip`, but
we need to deprecate it first. This change adds deprecation warnings for indices 
with version starting with 6.3.0 and logs deprecation warnings in this cases.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 19, 2018
* master:
  Remove extra spaces from changelog
  Add support to match_phrase query for zero_terms_query. (elastic#29598)
  Fix incorrect references to 'zero_terms_docs' in query parsing error messages. (elastic#29599)
  Build: Move java home checks to pre-execution phase (elastic#29548)
  Avoid side-effect in VersionMap when assertion enabled (elastic#29585)
  [Tests] Remove accidental logger usage
  Add tests for ranking evaluation with aliases (elastic#29452)
  Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (elastic#27429)
  Update plan for the removal of mapping types. (elastic#29586)
  [Docs] Add rankEval method for Jva HL client
  Make ranking evaluation details accessible for client
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 19, 2018
* master: (21 commits)
  Remove bulk fallback for write thread pool (elastic#29609)
  Fix an incorrect reference to 'zero_terms_docs' in match_phrase queries.
  Update the version compatibility for zero_terms_query in match_phrase.
  Account translog location to ram usage in version map
  Remove extra spaces from changelog
  Add support to match_phrase query for zero_terms_query. (elastic#29598)
  Fix incorrect references to 'zero_terms_docs' in query parsing error messages. (elastic#29599)
  Build: Move java home checks to pre-execution phase (elastic#29548)
  Avoid side-effect in VersionMap when assertion enabled (elastic#29585)
  [Tests] Remove accidental logger usage
  Add tests for ranking evaluation with aliases (elastic#29452)
  Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (elastic#27429)
  Update plan for the removal of mapping types. (elastic#29586)
  [Docs] Add rankEval method for Jva HL client
  Make ranking evaluation details accessible for client
  Rename the bulk thread pool to write thread pool (elastic#29593)
  [Test] Minor changes to rank_eval tests (elastic#29577)
  Fix missing node id prefix in startup logs (elastic#29534)
  Added painless execute api. (elastic#29164)
  test: also assert deprecation warning after clusters have been closed.
  ...
martijnvg added a commit that referenced this pull request Apr 20, 2018
* es/master: (32 commits)
  TEST: Unmute testPrimaryRelocationWhileIndexing
  Remove remaining tribe node references (#29574)
  Never leave stale delete tombstones in version map (#29619)
  Do not serialize common stats flags using ordinal (#29600)
  Remove stale comment from JVM stats (#29625)
  TEST: Mute testPrimaryRelocationWhileIndexing
  Remove bulk fallback for write thread pool (#29609)
  Fix an incorrect reference to 'zero_terms_docs' in match_phrase queries.
  Update the version compatibility for zero_terms_query in match_phrase.
  Account translog location to ram usage in version map
  Remove extra spaces from changelog
  Add support to match_phrase query for zero_terms_query. (#29598)
  Fix incorrect references to 'zero_terms_docs' in query parsing error messages. (#29599)
  Build: Move java home checks to pre-execution phase (#29548)
  Avoid side-effect in VersionMap when assertion enabled (#29585)
  [Tests] Remove accidental logger usage
  Add tests for ranking evaluation with aliases (#29452)
  Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429)
  Update plan for the removal of mapping types. (#29586)
  [Docs] Add rankEval method for Jva HL client
  ...
martijnvg added a commit that referenced this pull request Apr 20, 2018
* es/6.x: (28 commits)
  TEST: Unmute testPrimaryRelocationWhileIndexing
  Never leave stale delete tombstones in version map (#29619)
  Do not serialize common stats flags using ordinal (#29600)
  Remove stale comment from JVM stats (#29625)
  TEST: Mute testPrimaryRelocationWhileIndexing
  Remove 7.0.0 from 6.x changelog (#29621)
  Add support to match_phrase query for zero_terms_query. (#29598)
  Account translog location to ram usage in version map
  Avoid side-effect in VersionMap when assertion enabled (#29585)
  Build: Move java home checks to pre-execution phase (#29548)
  Add tests for ranking evaluation with aliases (#29452)
  [Test] Fix assertion in SearchDocumentationIT
  Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429)
  test: Assert deprecated http.enebled setting warning
  Update plan for the removal of mapping types. (#29586)
  [Docs] Add rankEval method for Jva HL client
  Make ranking evaluation details accessible for client
  Rename the bulk thread pool to write thread pool (#29593)
  [Test] Minor changes to rank_eval tests (#29577)
  test: Assert deprecated http.enebled setting warning
  ...
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request May 3, 2018
)

The camel case name `nGram` should be removed in favour of `ngram` and
similar for `edgeNGram` and `edge_ngram`. Before removal, we need to
deprecate the camel case names first. This change adds deprecation
warnings for indices with versions 6.4.0 and higher and logs deprecation
warnings.
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants