-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sanatize structured metadata at query time #13983
Conversation
Will this break anything for keys coming from OTel attributes, since (IIRC) periods are common there? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
I have some concerns about the behaviour at query time if two different keys sanitize to the same value (i.e., hello.world
and hello!world
both sanitize to hello_world
), but I don't know enough about the LogQL engine yet to know if this would be a real issue.
pkg/logql/log/pipeline.go
Outdated
|
||
func replaceChars(str string, offsets []int) string { | ||
offsets = offsets[:0] | ||
for i, r := range str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at prometheus NormalizeLabel, we also need to handle labels starting with a number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great find. I didn't realize NormalizeLabel
was a thing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should validate structured metadata in the future at ingest time. I would vote instead to disable validation for returned labels and structured metadata. This seems trickier to implement but more future-proof.
Only labels should be validate at ingest time.
EDIT: we should avoid query-time validation/normalization.
(cherry picked from commit 3bf7fa9)
There's a bug in structured metadata where Loki can accept characters that are invalid in prometheus label names. A subsequent PR will reject inputs but the data that's already been ingested is not queryable.
As a workaround, this PR sanatizes structured metadata label names at query time. In the case where no bad inputs exist, there isn't a meaningful performance difference. Otherwise, there is 1 alloc per bad input. We could do this with byte slices to avoid incurring allocs, but doing it this way is the most straightforward way to ensure we aren't missing multi-byte characters that may exist in label names.
benchmark before:
benchmark after