-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(blooms): disk-backed queue for the bloom-planner #14874
Conversation
de63e40
to
1e1d05b
Compare
1e1d05b
to
015e123
Compare
} | ||
|
||
// RegisterFlagsWithPrefix registers flags for the bloom-planner configuration. | ||
func (cfg *Config) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) { | ||
f.IntVar(&cfg.MaxQueuedTasksPerTenant, prefix+".max-tasks-per-tenant", 30000, "Maximum number of tasks to queue per tenant.") | ||
f.BoolVar(&cfg.StoreTasksOnDisk, prefix+".store-tasks-on-disk", false, "Whether to store tasks on disk.") | ||
f.StringVar(&cfg.TasksDiskDirectory, prefix+".tasks-disk-directory", "/tmp/bloom-planner-queue", "Directory to store tasks on disk. All files will be removed on startup.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.StringVar(&cfg.TasksDiskDirectory, prefix+".tasks-disk-directory", "/tmp/bloom-planner-queue", "Directory to store tasks on disk. All files will be removed on startup.") | |
f.StringVar(&cfg.TasksDiskDirectory, prefix+".tasks-disk-directory", "/tmp/bloom-planner-queue", "Directory to persist tasks on disk across restarts.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not yet doing this.
All files will be removed on startup
Removing this bit.
// TODO(salvacorts): We can recover the queue from the disk here. | ||
// Until then, we just remove all the files in the directory so the disk | ||
// doesn't grow indefinitely. | ||
if q.cfg.StoreTasksOnDisk && q.cfg.CleanTasksDirectory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cannot recover tasks from disk yet, we should not expose any settings for it.
CleanTaskDirectory
should be false
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not expose any settings for it.
I don't think we are. Which one do you mean?
CleanTasksDirectory
will be gone when we are able to recover tasks from the disk.
Until then, if we do not recover from the disk, we will keep leaking files into the disk across restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ofc it should be true
. No need for a setting then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return filepath.Join("tasks", task.Tenant(), task.Table(), task.ID()) | ||
func getTaskPath(task *protos.ProtoTask) string { | ||
table := protos.FromProtoDayTableToDayTable(task.Table) | ||
taskFile := task.Id + ".task" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use .protobuf
extension to indicate that the content is a binary encoded protobuf object
useDisk bool | ||
}{ | ||
{ | ||
name: "in-memory", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could be more explicit:
name: "in-memory", | |
name: "in-memory", | |
useDisk: false, |
Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
Co-authored-by: Christian Haudum <christian.haudum@gmail.com> (cherry picked from commit b646861)
What this PR does / why we need it:
We store chunkrefs for each task. As a result, the queue can grow and not fit into memory. This PR adds a new set of options to configure the queue to store the tasks in the filesystem. As a result, the queue only keeps the task metadata (e.g. results channel, time in queue, retires...) and the path to the task in the disk.
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR