Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/tracing: simplify log messages in trace recordings #70502

Merged
merged 3 commits into from
Sep 30, 2021

Conversation

andreimatei
Copy link
Contributor

Before this patch, the RecordedSpan proto stored log messages in a very
awkward way: each message was stored as a collection of key/values, with
only one such pair present (using a well-known key). This was confusing,
unnecessary, hard to work with and hard to track for figuring out what
keys and values are in there (with the answer being only one key). This
patch simplifies the log messages, making them be represented by a
single string as nature intended. A bunch of code gets simplified in
consequence.

Release note: None

@andreimatei andreimatei requested a review from a team as a code owner September 21, 2021 16:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member

tbg commented Sep 22, 2021 via email

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Overall :lgtm:, just a couple minor nits and questions.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @andreimatei)


pkg/util/tracing/recording.go, line 93 at r1 (raw file):

""

is empty string needed here?


pkg/util/tracing/recording.go, line 203 at r1 (raw file):

	for _, k := range tags {
		if !first {
			sb.WriteRune(' ')

nit: do we need this if we already wrote whitespace above on line 190?

	if len(sp.Tags) > 0 {
		sb.WriteRune(' ')
	}

If so, instead of tracking first here, can we just write the whitespace prior to the loop if len(tags) > 0? Could further simplify the loop.


pkg/util/tracing/recording.go, line 363 at r3 (raw file):

Quoted 5 lines of code…
				Fields: []jaegerjson.KeyValue{{
					Value: l.Msg(),

					Type:  "STRING",
				}},

I'm not experienced with Jaeger, so wanted to confirm - is it okay to omit a key here? Will it still display nicely in a jaeger UI without a key?

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge until #70562 is either discarded or merged, thanks! That PR will need backported if it does get merged to master, and there will be time pressure.

Well, is #70562 discarded? I don't think this patch would get in your way.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier)


pkg/util/tracing/recording.go, line 93 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…
""

is empty string needed here?

removed


pkg/util/tracing/recording.go, line 203 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: do we need this if we already wrote whitespace above on line 190?

	if len(sp.Tags) > 0 {
		sb.WriteRune(' ')
	}

If so, instead of tracking first here, can we just write the whitespace prior to the loop if len(tags) > 0? Could further simplify the loop.

This was writing the spaces before every tag except the first. But you're right that this code was needlessly complicated. Simplified.


pkg/util/tracing/recording.go, line 363 at r3 (raw file):

Previously, abarganier (Alex Barganier) wrote…
				Fields: []jaegerjson.KeyValue{{
					Value: l.Msg(),

					Type:  "STRING",
				}},

I'm not experienced with Jaeger, so wanted to confirm - is it okay to omit a key here? Will it still display nicely in a jaeger UI without a key?

It rendered fine. But on second thought, you're right; the log messages do show up with an "event" key when the spans are created and collected by Jaeger through OpenTelemetry. So I've given them the same key on this code path (this is the code path used by explain analyze (debug) and friends), for consistency.

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about #70562, but everything else :lgtm:!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 21 of 21 files at r7, 12 of 12 files at r8, 12 of 12 files at r9, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @andreimatei)


pkg/server/node_tenant.go, line 33 at r9 (raw file):

		for j := range sp.Logs {
			record := &sp.Logs[j]
			record.Message = record.Message.Redact()

what's going on here?

  1. did you intend to edit Message in-place inside the span? Don't you want to take a copy?
  2. why does this commit need to add a Redact call? I thought Tobias' PR was already taking care of doing the right thing? Did we forget this call in the previous PR? (If so, we'll need to create a separate PR with just this fix and issue an emergency backport)

pkg/util/tracing/recording.go, line 81 at r7 (raw file):

// TODO(andrei): this should be unified with
// SessionTracing.generateSessionTraceVTable().
func (r Recording) String() string {

You could have Recording implement SafeFormatter
and propagate the markers in the individual log entries.


pkg/util/tracing/recording.go, line 139 at r7 (raw file):

}

// FindLogMessage returns the first log message in the recording that matches

Can you explain this function is used only in testing
(maybe even rename it to TestingFindLogMessage)

and mention that it's relatively slow and heavy on allocations.


pkg/util/tracing/recording.go, line 212 at r7 (raw file):

	for _, l := range sp.Logs {
		lastLog := ownLogs[len(ownLogs)-1]
		ownLogs = append(ownLogs, conv("event:"+l.Msg(), l.Time, lastLog.Timestamp))

ugh, is there a way to prepend that "event:" sometime later, e.g. when logRecord instances are formatted?


pkg/util/tracing/recording.go, line 226 at r7 (raw file):

			lastLog := ownLogs[len(ownLogs)-1]
			var sb redact.StringBuilder
			sb.SafeString("structured:")

that looks like a regression. Prior to your change, in Tobias's PR we were assuming that structured payloads were safe. Now you're making this code consider the entire JSON thing as unsafe. Was that intnetional?

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @knz, and @tbg)


pkg/server/node_tenant.go, line 33 at r9 (raw file):

did you intend to edit Message in-place inside the span? Don't you want to take a copy?

I did. I'm doing what the function was already doing - modifying the recording (this function doesn't return anything). I've added a comment to the function.

why does this commit need to add a Redact call?

The record.Message field is new; that's what this commit does. It's a replacement for the old Fields collection.


pkg/util/tracing/recording.go, line 81 at r7 (raw file):

Previously, knz (kena) wrote…

You could have Recording implement SafeFormatter
and propagate the markers in the individual log entries.

ack. I'll leave for next time though.


pkg/util/tracing/recording.go, line 139 at r7 (raw file):

Previously, knz (kena) wrote…

Can you explain this function is used only in testing
(maybe even rename it to TestingFindLogMessage)

and mention that it's relatively slow and heavy on allocations.

Well the function is not particularly test specific... It's fairly natural for only tests to be interested in doing any sort of comprehension (or rather, sloppy reg-ex based comprehension) of trace recordings, but as far as the function itself is concerned, I don't see any big reason to keep it away from production code.
As to the performance of the thing, well, it is what it is. I don't really thinks it warrants a comment...

FWIW, this patch isn't really doing much to this function. To the extent that it does anything, it just makes the fact that there's a StripMarkers in there more visible by lifting it out of the LogRecord.Msg() method.


pkg/util/tracing/recording.go, line 212 at r7 (raw file):

Previously, knz (kena) wrote…

ugh, is there a way to prepend that "event:" sometime later, e.g. when logRecord instances are formatted?

Not without making things more complicated. All of the span start message, the log message, and the structured messages are represented as the same logRecord type, which doesn't encode the difference between them. It's not worth doing too much for this function, it's only used by Recording.String().

Perhaps you were taking objection to using + between a literal and a RedactableString, since it wasn't obvious what will happen to the literal? See now, I've turned it into a sb.Printf.


pkg/util/tracing/recording.go, line 226 at r7 (raw file):

Previously, knz (kena) wrote…

that looks like a regression. Prior to your change, in Tobias's PR we were assuming that structured payloads were safe. Now you're making this code consider the entire JSON thing as unsafe. Was that intnetional?

Well before my change, safe versus unsafe considerations did not apply to this function. This function is only used by Recording.String(), and it returned opentracing.LogRecord, which doesn't understand such things. All markers were removed from all messages; structured on unstructured. Now, this function returns redactable messages. It doesn't matter too much, since the only caller, Recording.String(), strips all the markers anyway.

As far as the safety of structured messages, I'm not sure it has been determined that they're indeed safe. We can't strip them at the boundary between KV and a tenant, because SQL needs them. I'm not sure they're all safe since, for example, a ContentionEvent has a key in it, and I think that key can come from a meta range. I think I raised this on Slack a few days ago and I don't think I heard back. Anyway as far as I can tell, I don't think we "show" these events to users (e.g. I don't think they show up for show trace for session). Perhaps @tbg can chime on whether there's been thinking around them.
Anyway, for the purposes of this code here, I think there's no harm in considering them unsafe, and also it doesn't matter because again the caller strips all the markers anyway.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 12 files at r11, 13 of 14 files at r12, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, and @knz)


pkg/util/tracing/recording.go, line 139 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Well the function is not particularly test specific... It's fairly natural for only tests to be interested in doing any sort of comprehension (or rather, sloppy reg-ex based comprehension) of trace recordings, but as far as the function itself is concerned, I don't see any big reason to keep it away from production code.
As to the performance of the thing, well, it is what it is. I don't really thinks it warrants a comment...

FWIW, this patch isn't really doing much to this function. To the extent that it does anything, it just makes the fact that there's a StripMarkers in there more visible by lifting it out of the LogRecord.Msg() method.

The reason why I pinged you on this is that I find it sad to re-call StripMarker on this every time the logs are being traversed. That re-processing (and the allocs that go with it) is ok for test code, but not for production code.


pkg/util/tracing/recording.go, line 212 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Not without making things more complicated. All of the span start message, the log message, and the structured messages are represented as the same logRecord type, which doesn't encode the difference between them. It's not worth doing too much for this function, it's only used by Recording.String().

Perhaps you were taking objection to using + between a literal and a RedactableString, since it wasn't obvious what will happen to the literal? See now, I've turned it into a sb.Printf.

your previous code was fine. I was just annoyed by the extra heap alloc and string copy implied here.


pkg/util/tracing/recording.go, line 226 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Well before my change, safe versus unsafe considerations did not apply to this function. This function is only used by Recording.String(), and it returned opentracing.LogRecord, which doesn't understand such things. All markers were removed from all messages; structured on unstructured. Now, this function returns redactable messages. It doesn't matter too much, since the only caller, Recording.String(), strips all the markers anyway.

As far as the safety of structured messages, I'm not sure it has been determined that they're indeed safe. We can't strip them at the boundary between KV and a tenant, because SQL needs them. I'm not sure they're all safe since, for example, a ContentionEvent has a key in it, and I think that key can come from a meta range. I think I raised this on Slack a few days ago and I don't think I heard back. Anyway as far as I can tell, I don't think we "show" these events to users (e.g. I don't think they show up for show trace for session). Perhaps @tbg can chime on whether there's been thinking around them.
Anyway, for the purposes of this code here, I think there's no harm in considering them unsafe, and also it doesn't matter because again the caller strips all the markers anyway.

Ok I'm 100% on board with the idea that these structured events are generally unsafe until proven safe. As such, your current implementation is OK.

You can add a TODO here (and in the docstring of MessageToJSONString) that we should update the json serializer to do the same as the structured events in eventpb, where the JSON serializer ((EventPayload).AppendJSONFields) does redactability separately per-field using field tags.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @knz)


pkg/util/tracing/recording.go, line 139 at r7 (raw file):

Previously, knz (kena) wrote…

The reason why I pinged you on this is that I find it sad to re-call StripMarker on this every time the logs are being traversed. That re-processing (and the allocs that go with it) is ok for test code, but not for production code.

ok, added an efficiency note


pkg/util/tracing/recording.go, line 212 at r7 (raw file):

Previously, knz (kena) wrote…

your previous code was fine. I was just annoyed by the extra heap alloc and string copy implied here.

well, once I looked at it, I didn't like the old code because to me it wasn't clear at all if the "event" part was going to be considered safe or not. I think it was going to be considered safe, but I think that's more a bug/limitation of the type system rather than a good thing, no? What I have now is more explicit and better, isn't it?


pkg/util/tracing/recording.go, line 226 at r7 (raw file):

Previously, knz (kena) wrote…

Ok I'm 100% on board with the idea that these structured events are generally unsafe until proven safe. As such, your current implementation is OK.

You can add a TODO here (and in the docstring of MessageToJSONString) that we should update the json serializer to do the same as the structured events in eventpb, where the JSON serializer ((EventPayload).AppendJSONFields) does redactability separately per-field using field tags.

Added the TODO to MessageToJSONString.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 15 files at r15.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, and @knz)


pkg/util/tracing/recording.go, line 212 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well, once I looked at it, I didn't like the old code because to me it wasn't clear at all if the "event" part was going to be considered safe or not. I think it was going to be considered safe, but I think that's more a bug/limitation of the type system rather than a good thing, no? What I have now is more explicit and better, isn't it?

this is good too.

Some utilities were using the clunky opentracing.LogRecord for no
reason.

Release note: None
Before this patch, the RecordedSpan proto stored log messages in a very
awkward way: each message was stored as a collection of key/values, with
only one such pair present (using a well-known key). This was confusing,
unnecessary, hard to work with and hard to track for figuring out what
keys and values are in there (with the answer being only one key). This
patch simplifies the log messages, making them be represented by a
single string as nature intended. A bunch of code gets simplified in
consequence.

Release note: None
@andreimatei
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 29, 2021

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@craig
Copy link
Contributor

craig bot commented Sep 29, 2021

GitHub status checks took too long to complete, so bors is giving up. You can adjust bors configuration to have it wait longer if you like.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, and @knz)

@craig
Copy link
Contributor

craig bot commented Sep 29, 2021

Build failed:

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#70896 for the failure

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, and @knz)

@craig
Copy link
Contributor

craig bot commented Sep 30, 2021

Build failed:

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure is #70659

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, and @knz)

@craig
Copy link
Contributor

craig bot commented Sep 30, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 30, 2021

Build succeeded:

@craig craig bot merged commit 2e11a18 into cockroachdb:master Sep 30, 2021
@andreimatei andreimatei deleted the tracing.log-fields branch October 1, 2021 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants