-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki: Allow configuring query_store_max_look_back_period when running a filesystem store and boltdb-shipper #2073
Conversation
…esystem store, which allows for boltdb-shipper to horizontally scale filesystem store. Signed-off-by: Ed Welch <edward.welch@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the code looks good to me. Just some minor nits and questions.
|
||
Scaling up is as easy as adding more loki instances and letting them talk to the same ring. | ||
|
||
Scaling down is possible but manual, you would need to shutdown the loki instance and then physically copy the chunks directory and its index files in their entirety to another Loki instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other instance would also have index files with same name so it would not be just copying of files. I guess we would have to provide a tool to merge boltdb files. We will have to make it clear in the docs until we have that tool.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each ingester should be writing files with a unique name such that you should be able to copy ones file to another without conflicts
uploader := fmt.Sprintf("%s-%d", s.cfg.IngesterName, time.Now().Unix())
However, I think I may have found a bug here, this is what I see for my current test ingesters, it looks like the lifecycler ID was empty when they created their files
ed@ed-VirtualBox:/tmp$ ls loki1/chunks/index/index_2628
-1589422114
ed@ed-VirtualBox:/tmp$ ls loki2/chunks/index/index_2628
-1589422113
These still wouldn't collide because the timestamps were different but they should also have the unique lifecycler ID in front
My suspicion here is they created the files before joining the ring? Can you take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at how this works, there is a real chicken and egg problem here when initializing the shipper and trying to get the ingester ID to use in the name file.
I'm not sure it's possible to do this, but maybe you have some ideas?
I'm wondering instead if we should try a different strategy and use the hostname or allow for the name to be configured in the yaml directly?
@@ -84,6 +84,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) { | |||
f.Float64Var(&cfg.SyncMinUtilization, "ingester.sync-min-utilization", 0, "Minimum utilization of chunk when doing synchronization.") | |||
f.IntVar(&cfg.MaxReturnedErrors, "ingester.max-ignored-stream-errors", 10, "Maximum number of ignored stream errors to return. 0 to return all errors.") | |||
f.DurationVar(&cfg.MaxChunkAge, "ingester.max-chunk-age", time.Hour, "Maximum chunk age before flushing.") | |||
f.DurationVar(&cfg.QueryStoreMaxLookBackPeriod, "ingester.query-store-max-look-back-period", 0, "How far back should an ingester be allowed to query the store for data, for use only with boltdb-shipper index and filesystem object store. -1 for infinite.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the requirement should be boltdb
(not boltdb_shipper
) as index store and filesystem as object store because there is no use of having boltdb_shipper
without shared object store otherwise you would be copying files somewhere on the same filesystem which is essentially same as not using boltdb_shipper
at all. I might be wrong here or might be missing your point. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my above assumption is true, that the file names are unique, I specified boltdb_shipper
so that you could copy the files between ingesters to allow scaling down, otherwise what you said is true that there isn't much reason to use boltdb_shipper
for this rather than boltdb
.
With the exception of one other reason I could think of, it would be easier to take backups or even move the Loki to another host because the index files can be directly copied with the chunks directory while Loki is still running, I don't think this is true for boltdb
I don't think you can safely copy an open boltdb file.
@@ -64,7 +64,7 @@ type Config struct { | |||
ingesterClientFactory func(cfg client.Config, addr string) (client.HealthAndIngesterClient, error) | |||
|
|||
QueryStore bool `yaml:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can maybe get rid of this because now we mostly rely on QueryStoreMaxLookBackPeriod
to enable/disable querying the store from ingesters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems correct what @sandeepsukhani is saying ?
Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
… just to make sure there are not collisions if for some reason the ID is empty (which can happen if the hostname doesn't resolve for some reason) Signed-off-by: Ed Welch <edward.welch@grafana.com>
…abled. Signed-off-by: Ed Welch <edward.welch@grafana.com>
…ore so they don't duplicate query efforts Signed-off-by: Ed Welch <edward.welch@grafana.com>
Signed-off-by: Ed Welch <edward.welch@grafana.com>
return cfg.Configs[i] | ||
} | ||
|
||
func calculateMaxLookBack(pc chunk.PeriodConfig, maxLookBackConfig, maxChunkAge time.Duration) (time.Duration, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That deserve a test and it seems simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, yeah, i thought i put in the description or somewhere I wanted to test this but hadn't done it yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think you should remove QueryStore
and add more tests.
But otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't finished review yet (dont wait for me), but thanks for the doc.
directory: /tmp/loki/ | ||
``` | ||
|
||
A folder is created for every tenant all the chunks for one tenant are stored in that directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A folder is created for every tenant all the chunks for one tenant are stored in that directory. | |
A folder is created for every tenant and all the chunks for one tenant are stored in that directory. |
|
||
A folder is created for every tenant all the chunks for one tenant are stored in that directory. | ||
|
||
If loki is run in single-tenant mode, all the chunks are put in a folder named `fake` which is the synthesized tenant name used for single tenant mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If loki is run in single-tenant mode, all the chunks are put in a folder named `fake` which is the synthesized tenant name used for single tenant mode. | |
If loki is run in single-tenant mode, all the chunks are put in a folder named `fake` which is the placeholder tenant name used for single tenant mode. |
Signed-off-by: Ed Welch <edward.welch@grafana.com>
I will do a follow up to remove |
This should allow horizontal scaling of a filesystem store on separate nodes as the ingesters can query the store
Signed-off-by: Ed Welch edward.welch@grafana.com