Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki: Add multi-tenancy support based off labels in stream #2587

Closed
wants to merge 13 commits into from

Conversation

Champ-Goblem
Copy link
Contributor

@Champ-Goblem Champ-Goblem commented Sep 1, 2020

What this PR does / why we need it:
Allows a user to specify a label in order to base org ID off its value, continues to support the two original methods (auth over http header and fake-auth).

I propose this design change so that a cluster running multiple user workloads for different tenants with a single set of collectors can classify the logs better than current implementation where the tenant ID (org ID) can only be statically set in the config for promtail (similar to a suggestion from a previous issue).

I'm happy to make adjustments based on your verdicts.

Which issue(s) this PR fixes:
None

Special notes for your reviewer:
I require some advice on altering the documentation in order to add the new schema for multi-tenancy:

multi_tenancy:
  enabled: <bool>
  type: <auth|label>
  label: <label name> (can be nil for type auth)
  undefined: <fallback org ID for streams without label> (can be nil for type auth)

This schema is designed to replace the original auth_enabled: <bool>
Checklist

  • Tests updated - New tests added for multi-tenancy package as well
  • Documentation added

@CLAassistant
Copy link

CLAassistant commented Sep 1, 2020

CLA assistant check
All committers have signed the CLA.

@slim-bean
Copy link
Collaborator

Thanks for the PR @Champ-Goblem! we are a bit backed up at the moment and there is quite a bit to unpack here so it may take us a little bit to get a good response to you on this!

@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2020

Codecov Report

Merging #2587 into master will decrease coverage by 1.50%.
The diff coverage is 57.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2587      +/-   ##
==========================================
- Coverage   62.87%   61.37%   -1.51%     
==========================================
  Files         170      172       +2     
  Lines       15051    13239    -1812     
==========================================
- Hits         9464     8126    -1338     
+ Misses       4827     4369     -458     
+ Partials      760      744      -16     
Impacted Files Coverage Δ
pkg/loki/loki.go 0.00% <0.00%> (ø)
pkg/loki/modules.go 3.81% <0.00%> (-0.34%) ⬇️
pkg/multitenancy/multitenancy_config.go 53.84% <53.84%> (ø)
pkg/multitenancy/multitenancy_middleware.go 73.33% <73.33%> (ø)
pkg/distributor/distributor.go 77.94% <86.04%> (-0.87%) ⬇️
pkg/logql/step_evaluator.go 57.14% <0.00%> (-9.53%) ⬇️
pkg/logql/marshal/labels.go 66.66% <0.00%> (-8.34%) ⬇️
pkg/logproto/timestamp.go 40.00% <0.00%> (-6.81%) ⬇️
pkg/logentry/stages/labeldrop.go 53.33% <0.00%> (-6.67%) ⬇️
pkg/chunkenc/encoding_helpers.go 59.25% <0.00%> (-6.37%) ⬇️
... and 167 more

@midnightconman
Copy link

I have a question @Champ-Goblem ...

My use case is mostly around limiting ingestion (having the capability to throttle customers at the distributor layer), while allowing for a pretty open query (all tenants can query all tenants logs). This is not possible today, as the auth header can only include one tenant per request.

Would these changes enable something like that?

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I was finally able to take a look at this in depth -- thanks for the PR! I think what you're trying to do seems reasonable (collect logs for multiple tenants from a single collector).

After I thought about it for a bit, I'm not sure this is the way that we should do this. The agents will already need to be tenant-aware in order to assign some tenant label. Wouldn't it make more sense to reduce the coordination burdens between agent<->loki by adding a promtail stage to choose which tenant a log line is assigned to? That would allow the same functionality without the need to complicate the distributor code or expose these extra configs server side (which would require coordination with the agent anyway).

What do you think?

// InjectLabelForID injects a field into the context that specifies using a label rather than orgID from header
func InjectLabelForID(ctx context.Context, label string, undefined string) context.Context {
if label == "" || undefined == "" {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't return nil or the subsequent WithContext will panic.

if label == "" || undefined == "" {
return nil
}
newCtx := context.WithValue(ctx, interface{}("useLabelAsOrgID"), label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WithValue calls should use an unexported key type (see https://golang.org/pkg/context/#WithValue) as a precaution.

}
// Try get the associated label from the stream, otherwise we can use undefined
if label != "" {
re := regexp.MustCompile(label + "=\"([^ ]+)\"")
Copy link
Member

@owen-d owen-d Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be stored so that it doesn't recompile on every request.

This should use the existing label parsing functions instead (https://github.com/grafana/loki/blob/master/pkg/logql/parser.go#L50).

f.BoolVar(&c.Enabled, "multitenancy.enabled", false, "Enable multi-tenancy mode")
f.StringVar(&c.Type, "multitenancy.type", "auth", "Where to get the Org ID for multi-tenancy ")
f.StringVar(&c.Label, "multitenancy.label", "", "Specifies the label to use for Org ID (in label mode)")
f.StringVar(&c.Undefined, "multitenancy.undefined", "unlabeled", "Sepcifies the name to use when log doesnt contain the label (in label mode)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather default this to fake for consistency with our defaults when auth is not enabled.

// designed as a drop in upgrade to the original user.ExtractOrgID
func GetUserIDFromContextAndStringLabels(ctx context.Context, labels string) (string, error) {
userID, err := user.ExtractOrgID(ctx)
label, undefined := GetLabelFromContext(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more readable to encode the validation into the GetLabelFromContext code such that it returns label, undefined, ok where ok is true when both label and undefined are non-empty.

@Champ-Goblem
Copy link
Contributor Author

Hi @midnightconman, unfortunately this change currently doesn't support something like that, although it was another idea that I would like to see implemented as well if this PR or a similar PR gets accepted. Currently the label processing is only done on the api/push endpoint and defaults to the original implementation of setting the org ID via a header for the query endpoints.

@Champ-Goblem
Copy link
Contributor Author

@owen-d Thanks for the review, I see where you are coming from and it was also an alternative idea when I thought about viable solutions. I ended up choosing to implement it server side in order to cut down on the number of network requests that would potentially be made. Splitting the logs by tenant on the collector side we could only group logs of same tenant together each time we pushed, that means at each push interval there could be at most a request per tenant and if we had a large multi-tenant system, this could start to amount to a lot of requests. Wheres by implementing the logic server side we can bundle all of the log lines together and then separate them out once they reach loki, theoretically only needing one network request to send logs for a set of tenants at each push interval.
My other reason for implementing it server side is that the functionality then isnt collector dependent, if someone wanted to use a different collector to promtail, but still wanted the multi-tenancy feature, all they would need to do is enable it in the loki config.

Other than that, I am happy to make those suggest code changes once we deside on the best path moving forward.

@stale
Copy link

stale bot commented Oct 31, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Oct 31, 2020
@stale stale bot closed this Nov 7, 2020
@travis-sobeck
Copy link

whatever came of this? The need still exists for multi-tenancy within a single cluster

@kavin-kr
Copy link

Is there any plan to reopen this PR?

ankeshh pushed a commit to ankeshh/loki that referenced this pull request May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants