Properly update offset in case of unparasable line #22685

ChrsMark · 2020-11-20T12:06:39Z

What does this PR do?

This PR adds a fix for cases of docker reader meets an unparsable line and skips it (introduced at #12268). In such cases we should properly update the offset by adding the skipped bytes so as to point to the right byte.

Why is it important?

Having a wrong offset in the registry will make harvester start from the wrong offset in case of Filebeat's restarts or reopened files which will lead to another ErrLineUnparsable. Offset will never be healed from now on.

Testing notes

make python-env
source ./build/python-env/bin/activate
make filebeat.test
pytest tests/system/test_container.py

Signed-off-by: chrismark <chrismarkou92@gmail.com>

elasticmachine · 2020-11-20T12:06:41Z

Pinging @elastic/integrations-platforms (Team:Platforms)

Signed-off-by: chrismark <chrismarkou92@gmail.com>

elasticmachine · 2020-11-20T12:49:15Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #22685 updated
Start Time: 2021-03-31T21:40:08.143+0000
Duration: 56 min 16 sec
Commit: e59afad

Test stats 🧪

Test	Results
Failed	0
Passed	46514
Skipped	5132
Total	51646

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	46514
Skipped	5132
Total	51646

jsoriano

Good catch. This will need to be backported to 7.10 and 6.8 too.

jsoriano · 2020-11-20T15:56:49Z

filebeat/input/log/harvester.go

@@ -333,7 +333,8 @@ func (h *Harvester) Run() error {
 				logp.Info("File is inactive: %s. Closing because close_inactive of %v reached.", h.state.Source, h.config.CloseInactive)
 			case reader.ErrLineUnparsable:
 				logp.Info("Skipping unparsable line in file: %v", h.state.Source)
-				//line unparsable, go to next line
+				//line unparsable, update offset and go to next line
+				h.state.Offset += int64(message.Bytes)


Should we also update read offset metrics as some lines below?

beats/filebeat/input/log/harvester.go

Lines 360 to 361 in 015d379

// Update metics of harvester as event was sent

h.metrics.readOffset.Set(state.Offset)

@urso it would be good to have your input here

Yes, these two should always be kept in sync.

exekias · 2020-11-20T16:14:09Z

Great work @ChrsMark! would it be possible to add some unit tests in the harvester code?

ChrsMark · 2020-11-23T12:24:24Z

Great work @ChrsMark! would it be possible to add some unit tests in the harvester code?

Harvester's tests so far are not really detailed (https://github.com/exekias/beats/blob/7007d97c6aadc4621d58d9d3122dd8e0c5115bb5/filebeat/input/log/harvester_test.go#L127) and hence it would require some effort to add UTs from scratch to cover that case, I think we can rely on system tests. wdyt?

urso · 2020-11-23T18:13:19Z

Having a wrong offset in the registry will make harvester start from the wrong offset in case of Filebeat's restarts or reopened files which will lead to another ErrLineUnparsable. Offset will never be healed from now on.

Why doesn't Filebeat recover here? Do you have a link to an issue?

(Not asking to clean this up now, but wondering if we have some debt to open an issue for):
The harvesters purpose is to setup the reader on restart, receive and publish events from the reader. Would the processing more stable if the reader would handle the "unparsable" case instead (drop event and continue parsing)? In that case the offset would still be old and the reader would have to skip the unparsable contents on restart again, but the registry might be in a more predictable state, right?

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2020-11-24T09:53:18Z

@urso Filebeat will recover and will be able to parse lines after the first failure however the offset will continue to be inconsistent and hence every time that harvesting is being restarted the first parse will fail since it starts with wrong offset and will not be a valid CRI nor docker-json line.

We don't have an issue for this, I can open one if you want, we just found it with @exekias while investigating an SDH(see link above).

exekias · 2020-11-24T12:00:30Z

The issue happens like this:

For some reason we end up in the wrong offset, we still need to investigate why that happens (maybe a truncate on a symlink?).

Problem is that once you reach this state, the current harvester code won't heal from it. When the input reads the new line, it cannot parse it because of being in the wrong offset, it still updates the message.Bytes. When we handle this error we don't update the offset with the new Bytes data, so any further update to the offset is wrong, as it missed the update from the wrong line.

The problem happens again as soon as you restart the input

urso · 2020-11-26T14:18:31Z

Problem is that once you reach this state, the current harvester code won't heal from it. When the input reads the new line, it cannot parse it because of being in the wrong offset, it still updates the message.Bytes. When we handle this error we don't update the offset with the new Bytes data, so any further update to the offset is wrong, as it missed the update from the wrong line.

I see. Actually the reader is required to only return valid messages. If we need to update the offset based on skipped contents, we maybe have to reconsider the return type from the Reader interface to also report the amount of bytes that have been consumed in order to produce the message. Actually message.Bytes was supposed to be this, report the number of bytes that have been consumed from the file. For example after converting from JIS to UT-8 the length of the contents and the value of message.Bytes differ (there are other cases where these differ).

The harvester can not tell which offset is correct or not, as the harvester does not care about the contents. The issue looks like truncate indeed. Here the reader should make an attempt to check if that is the case "very likely if this comes from a container" and send an ErrTruncate, which will force the Harvester to consider the file to be new. If the reader thinks the file is just broken, the reader should make an attempt to find a safe point in the file it continue to read from without having to notify the harvester at all.
All errors but ErrUnparasable are about detecting underlying changes that invalidate the reader, basically considering us the file to be 'done' as is. It looks like it rather makes sense to remove ErrUnparasable + ensure that readers always report state that is continuable upon restart.

exekias · 2021-03-26T10:30:18Z

I think we should get back to this issue, we have seem a few cases related to this. In general I agree this looks like a truncate, but the fact that we stop sending logs after this creates a really bad experience.

I would rather get this in with the proposed changes as we keep investigating issues related to the different log rotation mechanisms

ChrsMark · 2021-03-29T11:47:47Z

So if I understand correctly the preferred way to fix this is to change DockerJSONReader so as to find the next valid line by skipping the error at

beats/libbeat/reader/readjson/docker_json.go

Line 205 in 315a17e

return message, reader.ErrLineUnparsable

(continue the for loop instead of returning error)?

exekias · 2021-03-29T14:36:57Z

That could be an option to improve things. I liked what @urso said about ErrTruncate, I didn't know about this, do we? perhaps this would be a better way to handle this situation. It would make the input start over from the beginning, so we would miss less logs.

The only problem I see with trying to detect truncate situations is that permanent errors in the format (or bugs in our side) would end up in an infinite input restart, which could be a bad idea, I guess we can counter measure this with backoffs.

urso · 2021-03-29T15:02:47Z

Truncation is the most likely the cause, but no necessarily. If the input is configured with tail_files: true, we have another potential cause: whenever the input is restarted, the next offset might be in the middle of some event, as the current file size is taken as the next start offset. In that case it should be handled by the container input.

In general I would prefer to fix the docker json parser by 'ignoring' the error. The parser reports how many bytes it has consumed in order to generate the current 'message'. The parser can ignore the failure by resetting its internal buffer while keeping the current byte count. The new filestream input can detect file truncation asynchronously and restart the harvester in time, even if the harvester is blocked in the output and can't detect the truncation for that reason. This reduces the chance of missing logs when the file is truncated. Having the parser fixed, will allow us to reuse the parser in the filestream input in the future and be able to handle possible errors due to truncation, or having the wrong offset for other unknown causes.

For reference:

Async file truncation detection (WIP): Add tests and support for truncated files in filestream input #24424
Composable file parsing POC: Add support for parsers in filestream input #24763

…ssue

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2021-03-30T13:35:47Z

@urso @exekias I tried to moved it to the direction of 'ignoring' the error in the reader for now based on @urso's latest comment. Let me know what you think.

exekias · 2021-03-30T13:58:56Z

libbeat/reader/readjson/docker_json_test.go

+			Content: []byte{},
+			Bytes:   0,
+		}, io.EOF
+	}


Can you add a test were the first line is broken (from a truncate) but the next one is valid? we should get the content from the valid one only but bytes should account for both

exekias

This came along really simple in the end! I left a comment around testing

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2021-03-31T05:58:17Z

This came along really simple in the end! I left a comment around testing

Added

(cherry picked from commit 655984e)

jsoriano · 2021-04-01T08:16:38Z

filebeat/input/log/harvester.go

-			case reader.ErrLineUnparsable:
-				logp.Info("Skipping unparsable line in file: %v", h.state.Source)
-				//line unparsable, go to next line
-				continue


I love when issues get solved by removing code 🙂

…able line (#24886)

…ble line (#24885)

…sable line (#24887)

* upstream/master: (91 commits) [Filebeat] Change okta.target to nested field (elastic#24636) Add RFC5424 format support for syslog input (elastic#23954) Fix links to Beats product pages (elastic#24821) [DOCS] Fix 'make setup' instructions for a new beat (elastic#24944) Remove duplicate decode_xml entry (elastic#24941) [libbeat] Add wineventlog schema to decode_xml processor (elastic#24726) [Elastic Agent] Add check for URL set when cert and cert key. (elastic#24904) feat: stage execution cache (elastic#24780) Fix error in Journalbeat commands (elastic#24880) Add baseline ECS 1.9.0 upgrade (elastic#24909) [Elastic Agent] Cloud container legacy apm files. (elastic#24896) [Elastic Agent]: Reduce allowed socket path length (elastic#24914) Add ability to destroy indices with wildcards in testing (elastic#24915) Add status subcommand to report status of running daemon. (elastic#24856) Fix types of fields GetHits and Ops in Metricbeat module for Couchbase (elastic#23287) Add support for Filestream input in elastic agent. (elastic#24820) Implement k8s secrets provider for Agent (elastic#24789) Sort processor list in docs (elastic#24874) Add support for SCRAM authentication in kafka metricbeat module (elastic#24810) Properly update offset in case of unparasable line (elastic#22685) ...

…unparasable line (elastic#24886)

Properly update offset in case of unparasable line

3d45fdc

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark added bug Team:Platforms Label for the Integrations - Platforms team v7.11.0 labels Nov 20, 2020

ChrsMark requested review from urso and exekias November 20, 2020 12:06

ChrsMark self-assigned this Nov 20, 2020

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 20, 2020

add changelog

015d379

Signed-off-by: chrismark <chrismarkou92@gmail.com>

jsoriano added v7.10.1 v6.8.13 labels Nov 20, 2020

jsoriano reviewed Nov 20, 2020

View reviewed changes

Update metrics state

6102bee

Signed-off-by: chrismark <chrismarkou92@gmail.com>

This was referenced Dec 28, 2020

Kubernetes auto-discover does not play well with container annotations #9768

Closed

Change CRI-O parsing to use RFC3339Nano #10951

Closed

ChrsMark added 2 commits March 30, 2021 12:17

Merge remote-tracking branch 'upstream/master' into fix_docker_json_i…

f625a58

…ssue

Ignore parseline error

d58c350

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark added 2 commits March 30, 2021 16:28

fmt

324b490

Signed-off-by: chrismark <chrismarkou92@gmail.com>

fix

67dcd5f

Signed-off-by: chrismark <chrismarkou92@gmail.com>

exekias reviewed Mar 30, 2021

View reviewed changes

exekias suggested changes Mar 30, 2021

View reviewed changes

Add test

e59afad

Signed-off-by: chrismark <chrismarkou92@gmail.com>

urso approved these changes Mar 31, 2021

View reviewed changes

ChrsMark merged commit 655984e into elastic:master Apr 1, 2021

ChrsMark mentioned this pull request Apr 1, 2021

Cherry-pick #22685 to 7.x: Properly update offset in case of unparasable line #24885

Merged

ChrsMark added a commit to ChrsMark/beats that referenced this pull request Apr 1, 2021

Properly update offset in case of unparasable line (elastic#22685)

04d5081

(cherry picked from commit 655984e)

ChrsMark added the v7.13.0 label Apr 1, 2021

ChrsMark mentioned this pull request Apr 1, 2021

Cherry-pick #22685 to 7.12: Properly update offset in case of unparasable line #24886

Merged

ChrsMark added a commit to ChrsMark/beats that referenced this pull request Apr 1, 2021

Properly update offset in case of unparasable line (elastic#22685)

6c40f10

(cherry picked from commit 655984e)

ChrsMark added v7.12.1 and removed v7.10.1 v7.11.0 labels Apr 1, 2021

jsoriano reviewed Apr 1, 2021

View reviewed changes

ChrsMark mentioned this pull request Apr 1, 2021

Cherry-pick #22685 to 6.8.x: Properly update offset in case of unparasable line #24887

Merged

ChrsMark added a commit that referenced this pull request Apr 1, 2021

Cherry-pick #22685 to 7.12: Properly update offset in case of unparas…

152766b

…able line (#24886)

ChrsMark added a commit that referenced this pull request Apr 1, 2021

Cherry-pick #22685 to 7.x: Properly update offset in case of unparasa…

f8d3aeb

…ble line (#24885)

ChrsMark added a commit that referenced this pull request Apr 2, 2021

Cherry-pick #22685 to 6.8.x: Properly update offset in case of unpara…

5709736

…sable line (#24887)

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023

Cherry-pick elastic#22685 to 7.12: Properly update offset in case of …

b641b8f

…unparasable line (elastic#24886)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly update offset in case of unparasable line #22685

Properly update offset in case of unparasable line #22685

ChrsMark commented Nov 20, 2020 •

edited

Loading

elasticmachine commented Nov 20, 2020

elasticmachine commented Nov 20, 2020 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

jsoriano left a comment

jsoriano Nov 20, 2020

exekias Nov 20, 2020

urso Nov 23, 2020

exekias commented Nov 20, 2020

ChrsMark commented Nov 23, 2020 •

edited

Loading

urso commented Nov 23, 2020

ChrsMark commented Nov 24, 2020

exekias commented Nov 24, 2020

urso commented Nov 26, 2020

exekias commented Mar 26, 2021

ChrsMark commented Mar 29, 2021

exekias commented Mar 29, 2021

urso commented Mar 29, 2021 •

edited

Loading

ChrsMark commented Mar 30, 2021

exekias Mar 30, 2021

exekias left a comment

ChrsMark commented Mar 31, 2021

jsoriano Apr 1, 2021

	// Update metics of harvester as event was sent
	h.metrics.readOffset.Set(state.Offset)

Properly update offset in case of unparasable line #22685

Properly update offset in case of unparasable line #22685

Conversation

ChrsMark commented Nov 20, 2020 • edited Loading

What does this PR do?

Why is it important?

Testing notes

elasticmachine commented Nov 20, 2020

elasticmachine commented Nov 20, 2020 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

jsoriano left a comment

Choose a reason for hiding this comment

jsoriano Nov 20, 2020

Choose a reason for hiding this comment

exekias Nov 20, 2020

Choose a reason for hiding this comment

urso Nov 23, 2020

Choose a reason for hiding this comment

exekias commented Nov 20, 2020

ChrsMark commented Nov 23, 2020 • edited Loading

urso commented Nov 23, 2020

ChrsMark commented Nov 24, 2020

exekias commented Nov 24, 2020

urso commented Nov 26, 2020

exekias commented Mar 26, 2021

ChrsMark commented Mar 29, 2021

exekias commented Mar 29, 2021

urso commented Mar 29, 2021 • edited Loading

ChrsMark commented Mar 30, 2021

exekias Mar 30, 2021

Choose a reason for hiding this comment

exekias left a comment

Choose a reason for hiding this comment

ChrsMark commented Mar 31, 2021

jsoriano Apr 1, 2021

Choose a reason for hiding this comment

ChrsMark commented Nov 20, 2020 •

edited

Loading

elasticmachine commented Nov 20, 2020 •

edited by jenkins-beats-ci bot

Loading

ChrsMark commented Nov 23, 2020 •

edited

Loading

urso commented Mar 29, 2021 •

edited

Loading