-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fixing categorization token highlighting for multi-line messages #103007
[ML] Fixing categorization token highlighting for multi-line messages #103007
Conversation
Pinging @elastic/ml-ui (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just to clarify, elastic/elasticsearch#73828 means people using the default categorization analyzer would have seen the problem from 7.14, but it's always been a bug that if you had a I am not suggesting that we backport this to 7.13, but I do think it should be release noted as a bug fix, just in case someone ever observes the effect on an older version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
…3007) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
💚 Backport successful
This backport PR will be merged automatically after passing CI. |
Fixes issue introduced in elastic/elasticsearch#73828
Multi-lined messages are now no longer lost after the end of the last matched token.
Issue as described by @droberts195
The problem arises when there’s a token that ends at the end of the first line of the message.
Because the first_non_blank_line char filter deletes everything after it, that token is reported as ending at the very end of the original message, even though it’s short.
Then, in the highlighting, the UI replaces the last token on the first line plus all the other lines with the single short token.
Thus making it look like the second and subsequent lines never existed
Solution is to base our
end_offset
on the token length, rather than the suppliedend_offset
from the analyze endpoint