Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16577

andsel · 2024-10-17T09:51:25Z

Non clean backport of #16482

The differences are:

usage of data.convertToString().split(context, delimiter, MINUS_ONE); instead of data.convertToString().split(delimiter, -1);
avoid to extend BuffererdTokenir test cases from org.logstash.RubyTestBase which was introduced in Refactor: drop redundant (jruby-complete.jar) dependency #13159
JDK 8 compatibilities:
- Arrays.asList vs List.of
- assertThrows method from JUnit5 not available in JUnit4 so reimplemented in the test

Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met.

Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter. Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array. Furthermore it translates the buftok_spec tests into JUnit tests.

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files (and/or docker env variables)
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

[ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

…ines bigger then sizeLimit (elastic#16482) Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met. Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter. Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array. Furthermore it translates the `buftok_spec` tests into JUnit tests.

- usage of `data.convertToString().split(context, delimiter, MINUS_ONE);` instead of `data.convertToString().split(delimiter, -1);` - avoid to extend BuffererdTokenir test cases from `org.logstash.RubyTestBase` which was introduced in elastic#13159 - JDK 8 compatibilities: - `Arrays.asList` vs `List.of` - `assertThrows` method from JUnit5 not available in JUnit4 so reimplemented in the test

elasticmachine · 2024-10-17T11:02:36Z

💚 Build Succeeded

Buildkite Build
Commit: 363b89b

History

💔 Build #1706 failed 5b864ab

cc @andsel

jsvd

LGTM

…to completely consume lines in case of lines bigger then sizeLimit (elastic#16577)" This reverts commit b4ca550.

…letely consume lines in case of lines bigger then sizeLimit (#16577)" (#16713) This reverts commit b4ca550.

andsel added backport v7.17.26 labels Oct 17, 2024

andsel self-assigned this Oct 17, 2024

jsvd added the status:needs-triage label Oct 17, 2024

andsel changed the title ~~Backport #16482 Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit~~ Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit Oct 17, 2024

andsel requested a review from jsvd October 18, 2024 15:08

jsvd approved these changes Oct 18, 2024

View reviewed changes

jsvd added status:approved and removed status:needs-triage labels Oct 18, 2024

edmocosta merged commit b4ca550 into elastic:7.17 Oct 22, 2024
3 checks passed

jsvd removed the status:approved label Oct 22, 2024

donoghuc added a commit to donoghuc/logstash that referenced this pull request Nov 21, 2024

Revert "Backport elastic#16482 to 7.17: Bugfix for BufferedTokenizer …

0b3b30d

…to completely consume lines in case of lines bigger then sizeLimit (elastic#16577)" This reverts commit b4ca550.

donoghuc added a commit that referenced this pull request Nov 21, 2024

Revert "Backport #16482 to 7.17: Bugfix for BufferedTokenizer to comp…

bd1fa4e

…letely consume lines in case of lines bigger then sizeLimit (#16577)" (#16713) This reverts commit b4ca550.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16577

Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16577

andsel commented Oct 17, 2024 •

edited

Loading

elasticmachine commented Oct 17, 2024

jsvd left a comment

Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16577

Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16577

Conversation

andsel commented Oct 17, 2024 • edited Loading

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

elasticmachine commented Oct 17, 2024

💚 Build Succeeded

History

jsvd left a comment

Choose a reason for hiding this comment

andsel commented Oct 17, 2024 •

edited

Loading