Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Document how to rebuild analyzers #30498

Merged
merged 5 commits into from
May 14, 2018

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 9, 2018

Adds documentation for how to rebuild all the built in analyzers and
tests for that documentation using the mechanism added in #29535.

Closes #29499

Adds documentation for how to rebuild all the built in analyzers and
tests for that documentation using the mechanism added in elastic#29535.

Closes elastic#29499
@nik9000 nik9000 added >docs General docs changes review :Search Relevance/Analysis How text is split into tokens v7.0.0 v6.4.0 labels May 9, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some small minor changes

recreate it as a `custom` analyzer and modify it, usually by adding
token filters. Usually, you should prefer the
<<keyword, Keyword type>> when you want strings that are not split
into tokens, but just in case you need it, this his would recreate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this his" -> "this" ?

"tokenizer": {
"split_on_non_word": {
"type": "pattern",
"stopwords": "\\W+" <1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be pattern instead of stopwords?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Now I have to figure out how the tests passed when it was wrong....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is because \\W+ is the default. And because we don't complain if you pass extra stuff here.

[float]
=== Definition

The `simple` anlzyer consists of:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"anlzyer" -> "analyzer"


If you need to customize the `pattern` analyzer beyond the configuration
parameters then you need to recreate it as a `custom` analyzer and modify
it, usually by adding token filters. This would recreate the built in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"built in" -> "built-in"
in this place and in all other places

@nik9000 nik9000 merged commit 9881bfa into elastic:master May 14, 2018
nik9000 added a commit that referenced this pull request May 14, 2018
Adds documentation for how to rebuild all the built in analyzers and
tests for that documentation using the mechanism added in #29535.

Closes #29499
@nik9000
Copy link
Member Author

nik9000 commented May 14, 2018

Thanks @mayya-sharipova! I've fixed it up as you requested and merged and backported.

dnhatn added a commit that referenced this pull request May 15, 2018
* 6.x:
  Revert "Silence IndexUpgradeIT test failures. (#30430)"
  [DOCS] Remove references to changelog and to highlights
  Revert "Mute ML upgrade test (#30458)"
  [ML] Fix BWC version for backport of #30125
  [Docs] Improve section detailing translog usage (#30573)
  [Tests] Relax allowed delta in extended_stats aggregation (#30569)
  Fail if reading from closed KeyStoreWrapper (#30394)
  [ML] Reverse engineer Grok patterns from categorization results (#30125)
  Derive max composite buffers from max content len
  Update build file due to doc file rename
  SQL: Extract SQL request and response classes (#30457)
  Remove the changelog (#30593)
  Revert "Add deprecation warning for default shards (#30587)"
  Silence IndexUpgradeIT test failures. (#30430)
  Add deprecation warning for default shards (#30587)
  [DOCS] Adds 6.4.0 release highlight pages
  [DOCS] Adds release highlight pages (#30590)
  Docs: Document how to rebuild analyzers (#30498)
  [DOCS] Fixes title capitalization in security content
  LLRest: Add equals and hashcode tests for Request (#30584)
  [DOCS] Fix realm setting names (#30499)
  [DOCS] Fix path info for various security files (#30502)
  Docs: document precision limitations of geo_bounding_box (#30540)
  Fix non existing javadocs link in RestClientTests
  Auto-expand replicas only after failing nodes (#30553)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Relevance/Analysis How text is split into tokens v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants