fix: switch to bufio.Reader for KubeTask output parsing #2641

hairyhum · 2024-01-31T23:27:29Z

Change Overview

Use bufio.Reader and ReadLine instead of bufio.Scanner
bufio.Scanner has buffer limit and can fail the function if output is too big

Pull request type

Please check the type of change your PR introduces:

Issues

fixes [BUG] "Split lines failed: bufio.Scanner: token too long" for phase with a lot of output #2612

Test Plan

💪 Manual
⚡ Unit test
💚 E2E

bufio.Scanner has buffer limit and can fail the function if output is too big fixes #2612

e-sumin · 2024-02-02T08:45:07Z

@hairyhum The bufio.Scanner had a problem - its buffer was limited and could be overwhelmed by some output. Your solution does fix this problem, but it introduces another issue - there is no protection against complete memory exhaustion. This is what I meant during the Community meeting when I said that I couldn't see a simple solution.

e-sumin · 2024-02-02T09:11:49Z

Guys, please correct me if I'm mistaken:

When can we expect an issue?

When we stream logs from the pod while running kubeTask:

kanister/pkg/function/kube_task.go

Line 84 in ef67ce4

out, err := output.LogAndParse(ctx, r)

What does this function do?
https://github.com/kanisterio/kanister/blob/master/pkg/output/stream.go#L53
1 - It reads the output line by line and
1.1 - Logs it
1.2 - Performs some output parsing

What does the parsing function do?
https://github.com/kanisterio/kanister/blob/master/pkg/output/output.go#L93
1 - It waits for the occurrence of ###Phase-output###
2 - It captures all text from this marker until the end of the line
3 - It assumes that the captured text is a JSON object (key, value)

What do we know about the value in (3)?
1 - It could be kopia snapshot info (a compact structure)
2 - It could be the result of a custom invocation of kando output (examples are taken from our blueprints):

    kando output backupPrefixLocation $BACKUP_PREFIX_LOCATION
    kando output localSnapshotPrefixLocation $LOCAL_SNAPSHOT_PREFIX_LOCATION
    kando output replicaCount {{ len .StatefulSet.Pods }}
    kando output s3path ${s3_path}

We can assume that they are also compact.

So, here's the (slightly more complex) solution I see:
In the LogAndParse function https://github.com/kanisterio/kanister/blob/master/pkg/output/stream.go#L53
1 - Read the stream in small chunks (let's say 65kb)
1.1 - If a line fits into this buffer, handle the output line by line as we currently do
1.2 - If a line does not fit into the buffer, log it in chunks of 65kb
2 - Perform output parsing as we are currently doing (expecting the lines with ###Phase-output### to be small and fit into 65kb)

hairyhum · 2024-02-02T17:02:00Z

@e-sumin I agree that there's a risk of higher memory usage

1.2 - If a line does not fit into the buffer, log it in chunks of 65kb
2 - Perform output parsing as we are currently doing (expecting the lines with ###Phase-output### to be small and fit into 65kb)

So the assumption there would be that the output line bigger than 65kb would not have ###Phase-output### in there?
Currently (and it can be not a good thing) the phase output assumptions are:

Line shorter than 65kb
Line contains ###Phase-output###
Everything after ###Phase-output### until the end of the line is the output

If line is > 65kb right now - there's an error.

As I understand, you propose to keep the assumptions with a difference that if the line is >65kb - it does not cause an error and just getting logged.

The issue I see with it is that we introduce a hidden assumption which may silently ignore some lines with ###Phase-output###

Alternative things we can do:

Bump supported line length. Then the question is now much considering potential memory issues
Require ###Phase-output### to be in the beginning of the line (e.g. first 100 bytes), this is sort of a breaking change because we currently don't require it. And then we know that the line is an output and we may or may not restrict the size of output value. It would make sense to do that.
Scan long lines until we reach ###Phase-output### and then scan until the end of the line. Again we could add a limit for the output itself. This preserves support for current format while removing the error.

WDYT?

Also for the limit of phase output value 65kb is rather small, we can bump it up a bit.

hairyhum · 2024-02-09T17:43:37Z

New change follows the Scan long lines until we reach ###Phase-output### and then scan until the end of the line. approach.
This allows to support existing behaviour and long lines both for non-output and output.
Bytes are read in chunks which should have lower memory usage.

TODO: add unit tests.

e-sumin · 2024-02-10T01:03:26Z

@hairyhum I've added some draft for unit test, which is able to generate random input.

pkg/output/stream_test.go

…rter lines Fix some out of bounds errors Don't run the tests twice

pkg/output/stream.go

pkg/output/output_test.go

pkg/output/stream.go

e-sumin

LGTM

fix: switch to bufio.Reader for KubeTask output parsing

2ddb227

bufio.Scanner has buffer limit and can fail the function if output is too big fixes #2612

hairyhum requested review from pavannd1 and viveksinghggits January 31, 2024 23:27

infraq added this to In Progress in Kanister Jan 31, 2024

viveksinghggits approved these changes Feb 1, 2024

View reviewed changes

Kanister automation moved this from In Progress to Reviewer approved Feb 1, 2024

mlavi requested a review from e-sumin February 1, 2024 17:50

WIP: scan kube task output in chunks

331e367

e-sumin added 2 commits February 10, 2024 02:01

Bugfix - empty line dumped to log redundantly

dde88dd

Add unit tests draft

8c15e43

e-sumin reviewed Feb 10, 2024

View reviewed changes

pkg/output/stream_test.go Outdated Show resolved Hide resolved

hairyhum added 3 commits February 12, 2024 14:21

fix: set read buffer size to 64kb to keep existing behaviour with sho…

dbdb985

…rter lines Fix some out of bounds errors Don't run the tests twice

test: support filters in make test command

76c51eb

test: test and lint issues

e24c134

hairyhum force-pushed the bug-bufio-buffer branch from ad836c6 to e24c134 Compare February 12, 2024 23:22

hairyhum requested review from e-sumin and viveksinghggits February 12, 2024 23:23

Couple additional tests

961bc55