-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki: Add multi-tenancy support based off labels in stream #2587
Conversation
- Added new logic to get org/user ID from stream labels
- Add tests
Thanks for the PR @Champ-Goblem! we are a bit backed up at the moment and there is quite a bit to unpack here so it may take us a little bit to get a good response to you on this! |
746b875
to
1341fa1
Compare
Codecov Report
@@ Coverage Diff @@
## master #2587 +/- ##
==========================================
- Coverage 62.87% 61.37% -1.51%
==========================================
Files 170 172 +2
Lines 15051 13239 -1812
==========================================
- Hits 9464 8126 -1338
+ Misses 4827 4369 -458
+ Partials 760 744 -16
|
I have a question @Champ-Goblem ... My use case is mostly around limiting ingestion (having the capability to throttle customers at the distributor layer), while allowing for a pretty open query (all tenants can query all tenants logs). This is not possible today, as the auth header can only include one tenant per request. Would these changes enable something like that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I was finally able to take a look at this in depth -- thanks for the PR! I think what you're trying to do seems reasonable (collect logs for multiple tenants from a single collector).
After I thought about it for a bit, I'm not sure this is the way that we should do this. The agents will already need to be tenant-aware in order to assign some tenant label. Wouldn't it make more sense to reduce the coordination burdens between agent<->loki by adding a promtail stage to choose which tenant a log line is assigned to? That would allow the same functionality without the need to complicate the distributor code or expose these extra configs server side (which would require coordination with the agent anyway).
What do you think?
// InjectLabelForID injects a field into the context that specifies using a label rather than orgID from header | ||
func InjectLabelForID(ctx context.Context, label string, undefined string) context.Context { | ||
if label == "" || undefined == "" { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't return nil or the subsequent WithContext
will panic.
if label == "" || undefined == "" { | ||
return nil | ||
} | ||
newCtx := context.WithValue(ctx, interface{}("useLabelAsOrgID"), label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WithValue
calls should use an unexported key type (see https://golang.org/pkg/context/#WithValue) as a precaution.
} | ||
// Try get the associated label from the stream, otherwise we can use undefined | ||
if label != "" { | ||
re := regexp.MustCompile(label + "=\"([^ ]+)\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be stored so that it doesn't recompile on every request.
This should use the existing label parsing functions instead (https://github.com/grafana/loki/blob/master/pkg/logql/parser.go#L50).
f.BoolVar(&c.Enabled, "multitenancy.enabled", false, "Enable multi-tenancy mode") | ||
f.StringVar(&c.Type, "multitenancy.type", "auth", "Where to get the Org ID for multi-tenancy ") | ||
f.StringVar(&c.Label, "multitenancy.label", "", "Specifies the label to use for Org ID (in label mode)") | ||
f.StringVar(&c.Undefined, "multitenancy.undefined", "unlabeled", "Sepcifies the name to use when log doesnt contain the label (in label mode)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather default this to fake
for consistency with our defaults when auth is not enabled.
// designed as a drop in upgrade to the original user.ExtractOrgID | ||
func GetUserIDFromContextAndStringLabels(ctx context.Context, labels string) (string, error) { | ||
userID, err := user.ExtractOrgID(ctx) | ||
label, undefined := GetLabelFromContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more readable to encode the validation into the GetLabelFromContext
code such that it returns label, undefined, ok
where ok
is true when both label
and undefined
are non-empty.
Hi @midnightconman, unfortunately this change currently doesn't support something like that, although it was another idea that I would like to see implemented as well if this PR or a similar PR gets accepted. Currently the label processing is only done on the api/push endpoint and defaults to the original implementation of setting the org ID via a header for the query endpoints. |
@owen-d Thanks for the review, I see where you are coming from and it was also an alternative idea when I thought about viable solutions. I ended up choosing to implement it server side in order to cut down on the number of network requests that would potentially be made. Splitting the logs by tenant on the collector side we could only group logs of same tenant together each time we pushed, that means at each push interval there could be at most a request per tenant and if we had a large multi-tenant system, this could start to amount to a lot of requests. Wheres by implementing the logic server side we can bundle all of the log lines together and then separate them out once they reach loki, theoretically only needing one network request to send logs for a set of tenants at each push interval. Other than that, I am happy to make those suggest code changes once we deside on the best path moving forward. |
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
whatever came of this? The need still exists for multi-tenancy within a single cluster |
Is there any plan to reopen this PR? |
What this PR does / why we need it:
Allows a user to specify a label in order to base org ID off its value, continues to support the two original methods (auth over http header and fake-auth).
I propose this design change so that a cluster running multiple user workloads for different tenants with a single set of collectors can classify the logs better than current implementation where the tenant ID (org ID) can only be statically set in the config for promtail (similar to a suggestion from a previous issue).
I'm happy to make adjustments based on your verdicts.
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
I require some advice on altering the documentation in order to add the new schema for multi-tenancy:
This schema is designed to replace the original
auth_enabled: <bool>
Checklist