Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Newlines in logfmt sometimes work, sometimes don't #3927

Closed
Timbus opened this issue Jul 1, 2021 · 5 comments · Fixed by #3932
Closed

[Bug] Newlines in logfmt sometimes work, sometimes don't #3927

Timbus opened this issue Jul 1, 2021 · 5 comments · Fixed by #3932

Comments

@Timbus
Copy link
Contributor

Timbus commented Jul 1, 2021

If you have a literal "control char" in your log, encased in logfmt quotes like so:

test="hello
world"

It works fine when put through logfmt. And I can't really see a problem with that.

However the parser does see a problem, but this logic only applies if it first detects an escape sequence:

case c == '"', c < ' ':

So this fails to parse because of the newline:

test="hello\\again
world"

Unfortunately the only control char I escape in my logs is a literal backslash or quote. I didn't see a need for any other form of escapes because each line is already encapsulated in json when I send it to loki.

Simply removing the check on c < ' ' fixes the issue, still passes all tests, allows literal newlines, looks better in loki when scrolling through unformatted lines, has consistent behavior, uhhh and saves a few picoseconds of CPU.

Here's some tests you could put in parser_test.go:

		{
			"newline logfmt",
			[]byte("foobar=\"foo\nbar baz\""),
			labels.Labels{
				{Name: "a", Value: "b"},
			},
			labels.Labels{
				{Name: "a", Value: "b"},
				{Name: "foobar", Value: "foo\nbar baz"},
			},
		},
		{
			"escaped logfmt",
			[]byte(`foobar="foo ba\\r baz"`),
			labels.Labels{
				{Name: "a", Value: "b"},
			},
			labels.Labels{
				{Name: "a", Value: "b"},
				{Name: "foobar", Value: `foo ba\r baz`},
			},
		},
		{
			"newline and escaped logfmt",
			[]byte("foobar=\"foo bar\nb\\\\az\""),
			labels.Labels{
				{Name: "a", Value: "b"},
			},
			labels.Labels{
				{Name: "a", Value: "b"},
				{Name: "foobar", Value: "foo bar\nb\\az"},
			},
		},
@cyriltovena
Copy link
Contributor

Thanks for the report. If you can send a PR I'll take a look, I'd like to make sure this has no side effects.

@cyriltovena
Copy link
Contributor

I'm wondering if

"hello
world"

is a valid quoted string ? I would have expected

"hello\\nworld"

@Timbus
Copy link
Contributor Author

Timbus commented Jul 5, 2021

Why isn't it valid though. Loki uses json to encapsulate a log line. It is impossible for it to -not- be valid as the delimiters have already been found.
It is bizarre to me that I have to double-encode the format when it benefits no one to do so.

However if you would prefer to think of this as an X Y problem, the bigger issue is actually that loki doesn't take non-label attributes, so we are forced to add them as part of the log line and I used logfmt instead of double-encoding json, which is more readable at a glance. It would be trivial for loki to work with non-indexed labels by appending them to the log line and transparently parsing them on load, but that's a much bigger discussion and for now I'd rather just ship some logs without needing to write a bunch of encoders thanks.

@cyriltovena
Copy link
Contributor

Why isn't it valid though. Loki uses json to encapsulate a log line. It is impossible for it to -not- be valid as the delimiters have already been found.

Not sure what you mean about json here ? I'm not against the change I'm trying to fully understand it.

@Timbus
Copy link
Contributor Author

Timbus commented Jul 7, 2021

What I mean is loki stores logs by 'message', not by 'line'. When you send logs to loki it does not care if there are newlines in the log, because it is already encoded in json when sent. The original log aggregator using logfmt was some heroku tool that parsed individual lines only, so it needed escapes.
Loki is not tailing a log stream and doesn't evaluate control chars, so it doesn't need to care about control characters being in a string. It's just wasted CPU cycles.

A token containing hello\nworld is certainly also valid and will still be decoded as expected even after this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants