-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compactor - "write postings: exceeding max size of 64GiB" #1424
Comments
@jojohappy I'm not sure |
@claytono Sorry for the delay. You are right here. I wonder how to check that error? Maybe it is hard to work around. I'm sorry that I'm not very familiar with |
@jojohappy I think this is a tricky one unless the issue is fixed in the TSDB code. I suspect this doesn't crop up with prometheus itself since it's less aggressive about compaction. Right now my best guess on how thanos might handle this would be to have a maximum block size and if a block is already that big, it won't compact it any further. This might also help with some scenarios using time based partitioning |
It looks like this is being addressed upstream: prometheus/prometheus#5884 |
Sorry for the delay! @claytono IMO, I agree with you, but also have some questions:
How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.
Sorry, I can't catch your point. Could you explain it?
Yes, if it is fixed, that will be great. |
I don't think we really can. I think the best we could do would be to have a flag that allowed us to specify a maximum size and index needs to be, and past that we don't try to compact it any more.
We're planning to use the new time based partitioning. As part of this we expect we'll want to rebalance the partitions occasionally as compaction and downsampling occurs to keep the amount of memory usage roughly equivalent between thanos store instances. The limiting factor for us for thanos-store capacity is memory usage. That's roughly proportional to index size, so we'd like to ensure we don't end up with thanos stores serving blocks so big that it's just a few blocks. If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks. Right now I think the best options we have is to limit max compaction, but that's a fairly coarse tool. Ideally we'd like to just tell the compactor something like "Don't compact index files that are already x GB, they're big enough". That said, this isn't really related to the issue I opened, it would just be a nice side effect if this needed to be fixed or worked around in Thanos. |
Awesome! Thank you for your detailed case.
How about downsampling? If we skip compaction, maybe we could not do downsampling, because of lack of enough series in the blocks. Further information is here.
What's your idea for the
I think it is not our goal for compaction, did you compare the time cost for long time period querying between those two ways? Now @bwplotka is following the issue #1471 about reducing store memory usage, I think you can also follow that if you would like to. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm currently running into this and am curious if there is something I can do to help mitigate the issue. Currently compact errors out with this issue and restarts every few hours. |
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions. |
We just hit this issue as well. This limit is due to the use of 32 bit integers within Prometheus' TSDB code. See here for a good explanation. There is an open bug report as well as PR that will hopefully resolve the issue. |
Could this be re-opened until this is resolved? |
Yea, let's solve this. I have another suggestion: I would suggest actually not waiting for a fix here and rather set upper limit for block and start splitting them based on size. This is tracked by #2340 Especially on large setups and WITH some deduplication (vertical or offline) enabled and planned we will have a problem with huge TB size blocks pretty soon. At some point indexing does not scale well with number of series, so we might want to split it.... |
That sounds good to me and was another fear of mine as well. Sure we could increase the index limit but we will reach that again some day. |
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
This is still a big issue for us. We now have several compactor instances turned off due to this limit. |
We're also now running into this (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757) |
On it this sprint.
Kind Regards,
Bartek Płotka (@bwplotka)
…On Thu, 29 Oct 2020 at 08:22, Ben Kochie ***@***.***> wrote:
We're also now running into this (
https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757)
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1424 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVA3O2QWPPSKM3URYSV723SNEJ25ANCNFSM4IMAUWDQ>
.
|
As a workaround, would it be possible to add a json tag to the meta.json so that the compactor can be told to skip it? Something like this: {
"thanos": {
"no_compaction": true
}
} |
The main problem with this approach is that this block might be one portion of a bigger block that "have" to be created in order to get to 2w compaction level. Need to think about it more, how to make this work, but something like this could be introduced ASAP, yes. I am working on some block size capped splitting, so compaction will result in two blocks, not one etc. Having blocks capped at XXX GB is not only nice for this issue but also caps how much you need to download for manual interactions, potential deletions, downsampling, and other operations. Just curious: @SuperQ can you tell what size of the block is right now? (mostly sizes of index I am interested in) Probably more than one in order to get 64GB+ postings. |
I should have some workaround today. |
Here's some metdata on what's failing:
And an example meta.json
Due to the interaction of the operator, and helm, we have an excess of overly long labels. |
The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Update. The code is done and in review to fix this issue.
|
* compact: Added support for no-compact markers in planner. The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
You can manually exclude blocks from compaction, but PR for automatic flow for this is still in review: #3410 will close this issue once merged. |
…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…e over 64GB. (#3410) * compact: Added index size limiting planner detecting output index size over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments; added changelog. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Skipped flaky test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…3409) * compact: Added support for no-compact markers in planner. The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: thanos-io#1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>
…e over 64GB. (thanos-io#3410) * compact: Added index size limiting planner detecting output index size over 64GB. Fixes: thanos-io#1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments; added changelog. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Skipped flaky test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>
Thanos, Prometheus and Golang version used
Prometheus version: 2.11.1
What happened
When running the compactor on we get the following output (sanitized to remove internal hostnames):
What you expected to happen
Compaction would succeed, or skip these block without error, since they cannot be compacted.
How to reproduce it (as minimally and precisely as possible):
Try to compact blocks that result in the
postings
part of the resulting index to exceed 64GFull logs to relevant components
See log lines above. Let me know if anything else would be useful, but I think that's all we have that is relevant
Anything else we need to know
These blocks are from a pair of prometheus hosts we have scraping all targets in one of our data centers. Here is more information on the blocks themselves:
Each of these blocks represent 8 hours of data.
This appears to be related to prometheus/prometheus#5868. I'm not sure if the limit is going to change in the TSDB library, but it would be nice in the mean time if this was considered a warning, or if we could somehow specify a threshold to prevent it from trying to compact these blocks, since they're already fairly large.
The text was updated successfully, but these errors were encountered: