Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fixing categorization token highlighting for multi-line messages #103007

Conversation

jgowdyelastic
Copy link
Member

@jgowdyelastic jgowdyelastic commented Jun 22, 2021

Fixes issue introduced in elastic/elasticsearch#73828

Multi-lined messages are now no longer lost after the end of the last matched token.

Issue as described by @droberts195
The problem arises when there’s a token that ends at the end of the first line of the message.
Because the first_non_blank_line char filter deletes everything after it, that token is reported as ending at the very end of the original message, even though it’s short.
Then, in the highlighting, the UI replaces the last token on the first line plus all the other lines with the single short token.
Thus making it look like the second and subsequent lines never existed

Solution is to base our end_offset on the token length, rather than the supplied end_offset from the analyze endpoint

@jgowdyelastic jgowdyelastic added bug Fixes for quality problems that affect the customer experience review :ml Feature:Anomaly Detection ML anomaly detection v8.0.0 v7.14.0 labels Jun 22, 2021
@jgowdyelastic jgowdyelastic self-assigned this Jun 22, 2021
@jgowdyelastic jgowdyelastic requested a review from a team as a code owner June 22, 2021 20:32
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 changed the title [ML] Fixing categorization tokens for multi-line messages [ML] Fixing categorization token highlighting for multi-line messages Jun 23, 2021
@droberts195
Copy link
Contributor

Fixes issue introduced in elastic/elasticsearch#73828

Just to clarify, elastic/elasticsearch#73828 means people using the default categorization analyzer would have seen the problem from 7.14, but it's always been a bug that if you had a char_filter that deleted the end of the original message and had a token that was right at the end of the truncated message then this problem will have occurred. The difference is that in 7.13 and earlier the default categorization analyzer didn't contain a char_filter, so only advanced users who defined their own categorization analyzer would see this.

I am not suggesting that we backport this to 7.13, but I do think it should be release noted as a bug fix, just in case someone ever observes the effect on an older version.

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jgowdyelastic
Copy link
Member Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jgowdyelastic

@jgowdyelastic jgowdyelastic added the auto-backport Deprecated - use backport:version if exact versions are needed label Jun 29, 2021
@jgowdyelastic jgowdyelastic merged commit 824463a into elastic:master Jun 29, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 29, 2021
…3007)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jun 29, 2021
…103627)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: James Gowdy <jgowdy@elastic.co>
@jgowdyelastic jgowdyelastic deleted the fixing-categorization-tokens-for-multi-line-messages branch July 19, 2021 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed bug Fixes for quality problems that affect the customer experience Feature:Anomaly Detection ML anomaly detection :ml release_note:fix review v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants