compactor - "write postings: exceeding max size of 64GiB" #1424

claytono · 2019-08-15T18:24:25Z

Thanos, Prometheus and Golang version used

Prometheus version: 2.11.1

thanos, version 0.6.0 (branch: master, revision: 579671460611d561fcf991182b3f60e91ea04843)
  build user:       root@packages-build-1127f72e-908b-40dc-b9e1-f95dcacaaa62-sck6t
  build date:       20190806-16:13:48
  go version:       go1.12.7

What happened

When running the compactor on we get the following output (sanitized to remove internal hostnames):

2019-08-13T15:13:54-07:00 observability-compactor-123456 user:notice thanos-compact-observability-036[20057]: level=error ts=2019-08-13T22:13:54.054751118Z caller=main.go:199 msg="running command failed" err="compaction failed: compaction failed for group 0@{host=\"observability-prom-A98765\"}: compact blocks [thanos-compact-tDlEs/compact/0@{host=\"observability-prom-A98765\"}/01DHRV2ADD0DGW2V0WB84D472P thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHTN0B5GK4M9K1YF344C599J thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHTYSE4G3CPN6PT73JD4NFPC thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHV863B7P32AR6CM46ZJCJDR thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHY92FF48P4CPQMGPVMEVJE8 thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHYJM1E1F58STJERMF6PWY0M]: 2 errors: write compaction: write postings: write postings: exceeding max size of 64GiB; exceeding max size o
2019-08-13T15:13:54-07:00 observability-compactor-123456 user:notice thanos-compact-observability-036[20057]: f 64GiB"

What you expected to happen

Compaction would succeed, or skip these block without error, since they cannot be compacted.

How to reproduce it (as minimally and precisely as possible):

Try to compact blocks that result in the postings part of the resulting index to exceed 64G

Full logs to relevant components

See log lines above. Let me know if anything else would be useful, but I think that's all we have that is relevant

Anything else we need to know

These blocks are from a pair of prometheus hosts we have scraping all targets in one of our data centers. Here is more information on the blocks themselves:

01DHRV2ADD0DGW2V0WB84D472P
- Total size: 90.173 GBytes (96822298627 Bytes)
- Index size 19.146 GBytes (20558092904 Bytes)
- Postings (unique label pairs): 289667
- Postings entries (total label pairs): 1593491890
01DHTN0B5GK4M9K1YF344C599J
- Total size: 103.220 GBytes (110831397125 Bytes)
- Index size: 22.640 GBytes (24309975943 Bytes)
- Postings (unique label pairs): 347394
- Postings entries (total label pairs): 1914640990
01DHTYSE4G3CPN6PT73JD4NFPC
- Total size: 102.934 GBytes (110524213458 Bytes)
- Index size: 23.570 GBytes (25308181103 Bytes)
- Postings (unique label pairs): 321638
- Postings entries (total label pairs): 2014402216
01DHV863B7P32AR6CM46ZJCJDR
- Total size: 90.703 GBytes (97391214724 Bytes)
- Index size19.473 GBytes (20909065106 Bytes)
- Postings (unique label pairs): 295153
- Postings entries (total label pairs): 1625440000
01DHY92FF48P4CPQMGPVMEVJE8
- Total size: 103.066 GBytes (110666647216 Bytes)
- Index size: 24.510 GBytes (26317160818 Bytes)
- Postings (unique label pairs): 383836
- Postings entries (total label pairs): 2092985560
01DHYJM1E1F58STJERMF6PWY0M
- Total size: 97.399 GBytes (104581383915 Bytes)
- Index size: 21.636 GBytes (23231951641 Bytes)
- Postings (unique label pairs): 322795
- Postings entries (total label pairs): 1818507147

Each of these blocks represent 8 hours of data.

This appears to be related to prometheus/prometheus#5868. I'm not sure if the limit is going to change in the TSDB library, but it would be nice in the mean time if this was considered a warning, or if we could somehow specify a threshold to prevent it from trying to compact these blocks, since they're already fairly large.

The text was updated successfully, but these errors were encountered:

claytono · 2019-08-27T17:38:25Z

@jojohappy I'm not sure question is the right tag for this. I believe this is a bug.

jojohappy · 2019-08-28T02:32:32Z

@claytono Sorry for the delay. You are right here. I wonder how to check that error? Maybe it is hard to work around. I'm sorry that I'm not very familiar with tsdb.

claytono · 2019-08-28T12:53:33Z

@jojohappy I think this is a tricky one unless the issue is fixed in the TSDB code. I suspect this doesn't crop up with prometheus itself since it's less aggressive about compaction. Right now my best guess on how thanos might handle this would be to have a maximum block size and if a block is already that big, it won't compact it any further. This might also help with some scenarios using time based partitioning

claytono · 2019-08-28T16:28:38Z

It looks like this is being addressed upstream: prometheus/prometheus#5884

jojohappy · 2019-09-06T09:39:37Z

Sorry for the delay! @claytono

IMO, I agree with you, but also have some questions:

how thanos might handle this would be to have a maximum block size

How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.

This might also help with some scenarios using time based partitioning

Sorry, I can't catch your point. Could you explain it?

It looks like this is being addressed upstream: prometheus/prometheus#5884

Yes, if it is fixed, that will be great.

claytono · 2019-09-06T12:06:43Z

How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.

I don't think we really can. I think the best we could do would be to have a flag that allowed us to specify a maximum size and index needs to be, and past that we don't try to compact it any more.

This might also help with some scenarios using time based partitioning

Sorry, I can't catch your point. Could you explain it?

We're planning to use the new time based partitioning. As part of this we expect we'll want to rebalance the partitions occasionally as compaction and downsampling occurs to keep the amount of memory usage roughly equivalent between thanos store instances. The limiting factor for us for thanos-store capacity is memory usage. That's roughly proportional to index size, so we'd like to ensure we don't end up with thanos stores serving blocks so big that it's just a few blocks. If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks. Right now I think the best options we have is to limit max compaction, but that's a fairly coarse tool. Ideally we'd like to just tell the compactor something like "Don't compact index files that are already x GB, they're big enough".

That said, this isn't really related to the issue I opened, it would just be a nice side effect if this needed to be fixed or worked around in Thanos.

jojohappy · 2019-09-09T10:43:47Z

Awesome! Thank you for your detailed case.

we don't try to compact it any more.

How about downsampling? If we skip compaction, maybe we could not do downsampling, because of lack of enough series in the blocks. Further information is here.

Don't compact index files that are already x GB

What's your idea for the x? I think we are difficult to find the right x.

If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks

I think it is not our goal for compaction, did you compare the time cost for long time period querying between those two ways?

Now @bwplotka is following the issue #1471 about reducing store memory usage, I think you can also follow that if you would like to.

stale · 2020-01-11T05:42:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Lemmons · 2020-01-31T17:58:19Z

I'm currently running into this and am curious if there is something I can do to help mitigate the issue. Currently compact errors out with this issue and restarts every few hours.

stale · 2020-03-03T04:37:10Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

ipstatic · 2020-04-20T14:06:40Z

We just hit this issue as well. This limit is due to the use of 32 bit integers within Prometheus' TSDB code. See here for a good explanation. There is an open bug report as well as PR that will hopefully resolve the issue.

ipstatic · 2020-04-20T14:11:24Z

Could this be re-opened until this is resolved?

bwplotka · 2020-04-21T14:30:32Z

Yea, let's solve this.

I have another suggestion: I would suggest actually not waiting for a fix here and rather set upper limit for block and start splitting them based on size. This is tracked by #2340

Especially on large setups and WITH some deduplication (vertical or offline) enabled and planned we will have a problem with huge TB size blocks pretty soon. At some point indexing does not scale well with number of series, so we might want to split it....

cc @brancz @pracucci @squat

ipstatic · 2020-04-21T14:54:26Z

That sounds good to me and was another fear of mine as well. Sure we could increase the index limit but we will reach that again some day.

stale · 2020-05-21T16:26:01Z

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

ipstatic · 2020-10-24T20:15:25Z

This is still a big issue for us. We now have several compactor instances turned off due to this limit.

SuperQ · 2020-10-29T07:22:09Z

We're also now running into this (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757)

bwplotka · 2020-10-29T07:56:25Z

On it this sprint. Kind Regards, Bartek Płotka (@bwplotka)

…

On Thu, 29 Oct 2020 at 08:22, Ben Kochie ***@***.***> wrote: We're also now running into this ( https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1424 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVA3O2QWPPSKM3URYSV723SNEJ25ANCNFSM4IMAUWDQ> .

SuperQ · 2020-11-01T14:32:49Z

As a workaround, would it be possible to add a json tag to the meta.json so that the compactor can be told to skip it?

Something like this:

{
  "thanos": {
    "no_compaction": true
  }
}

bwplotka · 2020-11-02T09:11:34Z

The main problem with this approach is that this block might be one portion of a bigger block that "have" to be created in order to get to 2w compaction level. Need to think about it more, how to make this work, but something like this could be introduced ASAP, yes.

I am working on some block size capped splitting, so compaction will result in two blocks, not one etc. Having blocks capped at XXX GB is not only nice for this issue but also caps how much you need to download for manual interactions, potential deletions, downsampling, and other operations.

Just curious: @SuperQ can you tell what size of the block is right now? (mostly sizes of index I am interested in) Probably more than one in order to get 64GB+ postings.

bwplotka · 2020-11-02T09:35:15Z

I should have some workaround today.

SuperQ · 2020-11-02T10:59:26Z

Here's some metdata on what's failing:

  8.96 GiB  2020-09-20T17:41:23Z  gs://gitlab-gprd-prometheus/01EJP50AZ3P1EN2NWMFPMC5BA7/index
  3.78 GiB  2020-09-21T05:56:53Z  gs://gitlab-gprd-prometheus/01EJQHMW5K9VP7FWW7VD3M9FQB/index
  8.19 GiB  2020-09-23T06:57:05Z  gs://gitlab-gprd-prometheus/01EJWQJQ0M2F33JV47CEFKFK5N/index
 14.08 GiB  2020-09-25T08:00:39Z  gs://gitlab-gprd-prometheus/01EK1XNFBHZ0MB86HBEZ9NFSVV/index
 12.72 GiB  2020-09-27T07:21:45Z  gs://gitlab-gprd-prometheus/01EK711SSHZHPDH5V274M8ESAY/index
 11.35 GiB  2020-09-29T07:32:43Z  gs://gitlab-gprd-prometheus/01EKC6VXQ4A3ZANP0E30KY5AWM/index
 13.31 GiB  2020-10-23T10:03:34Z  gs://gitlab-gprd-prometheus/01ENA7V7YW31BVM1QFRJ57TDVX/index

And an example meta.json

{
	"ulid": "01EKC6VXQ4A3ZANP0E30KY5AWM",
	"minTime": 1601164800000,
	"maxTime": 1601337600000,
	"stats": {
		"numSamples": 24454977719,
		"numSeries": 49668912,
		"numChunks": 229130151
	},
	"compaction": {
		"level": 3,
		"sources": [
			"01EK6R1EW9Y76QPSHE2S2GZ1SC",
			"01EK6YX646B7RPN4FFFCDR62KP",
			"01EK75RXC6HZBKGRB4N8APP638",
			"01EK7CMMM611CJ4TJD04BBG6MM",
			"01EK7KGBWNC515RFKBDMX4HJSP",
			"01EK7TC3463VCS3Z4CXZMEET0X",
			"01EK817TC6KVQ59R73D48BRAC9",
			"01EK883HM7X7V1SFCH8NZY6FW4",
			"01EK8EZ8W5N2JK5ZZY2V4058YG",
			"01EK8NV048MXK605DPJ710DMKS",
			"01EK8WPQC7ZY51T32WHFMBNPKS",
			"01EK93JEN36YBXK58KMTWMPJCK",
			"01EK9AE5X3WR35X0FGP6Y1A0PR",
			"01EK9H9X586N62EGE972PWPT95",
			"01EK9R5MD2B11YRBPE62ETQJX8",
			"01EK9Z1BN8DFE5H765E8JTDEJC",
			"01EKA5X2XE7959NND514559190",
			"01EKACRT59F6KDA9WJ9GP3NZSW",
			"01EKAKMHD7VA5WXK10S8EH4274",
			"01EKATG8N9Q6EYRR3MYHSE3H10",
			"01EKB1BZX6SZRVKMMG83R1SZ2J",
			"01EKB87Q5DHDGVK0MGS5BE8PFH",
			"01EKBF3ED853GQ0TBWT5YTCTGS",
			"01EKBNZ5NNE8GACZDW32RTBWR0"
		],
		"parents": [
			{
				"ulid": "01EK89TVSX9ZZAET1FYXC1WHH6",
				"minTime": 1601164800000,
				"maxTime": 1601193600000
			},
			{
				"ulid": "01EK8HESS4GD6YS55ASCCXGA2N",
				"minTime": 1601193600000,
				"maxTime": 1601222400000
			},
			{
				"ulid": "01EK9CXXF25T12TBV0XRGRXD63",
				"minTime": 1601222400000,
				"maxTime": 1601251200000
			},
			{
				"ulid": "01EKA8B4MF78WYY3FM115Q4C6T",
				"minTime": 1601251200000,
				"maxTime": 1601280000000
			},
			{
				"ulid": "01EKB3XJWV3FMJXSMQ493JV6WN",
				"minTime": 1601280000000,
				"maxTime": 1601308800000
			},
			{
				"ulid": "01EKBZBX8MCBM2T05Y739KDRGV",
				"minTime": 1601308800000,
				"maxTime": 1601337600000
			}
		]
	},
	"version": 1,
	"thanos": {
		"labels": {
			"cluster": "gprd-gitlab-gke",
			"env": "gprd",
			"environment": "gprd",
			"monitor": "default",
			"prometheus": "monitoring/gitlab-monitoring-promethe-prometheus",
			"prometheus_replica": "prometheus-gitlab-monitoring-promethe-prometheus-0",
			"provider": "gcp",
			"region": "us-east1"
		},
		"downsample": {
			"resolution": 0
		},
		"source": "compactor"
	}
}

Due to the interaction of the operator, and helm, we have an excess of overly long labels.

…k size). Related to #1424 and #3068 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka · 2020-11-05T10:17:03Z

Update. The code is done and in review to fix this issue.

no-compact-mark.json added compact: Added support for no-compact markers in planner. #3409
Auto marking of blocks that would cause the block to be too big (trigger issue of 64GB index) compact: Added index size limiting planner detecting output index size over 64GB. #3410

* compact: Added support for no-compact markers in planner. The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka · 2020-11-06T16:50:21Z

You can manually exclude blocks from compaction, but PR for automatic flow for this is still in review: #3410 will close this issue once merged.

…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

…e over 64GB. (#3410) * compact: Added index size limiting planner detecting output index size over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments; added changelog. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Skipped flaky test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

…3409) * compact: Added support for no-compact markers in planner. The planner algo was adapted to avoid unnecessary changes to blocks caused by excluded blocks, so we can quickly switch to different planning logic in next iteration. Fixes: thanos-io#1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>

…e over 64GB. (thanos-io#3410) * compact: Added index size limiting planner detecting output index size over 64GB. Fixes: thanos-io#1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments; added changelog. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Skipped flaky test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>

jojohappy added question component: compact labels Aug 20, 2019

jojohappy added bug difficulty: hard help wanted and removed question labels Aug 28, 2019

stale bot added the stale label Jan 11, 2020

stale bot closed this as completed Jan 18, 2020

jojohappy reopened this Feb 2, 2020

stale bot removed the stale label Feb 2, 2020

stale bot added the stale label Mar 3, 2020

stale bot closed this as completed Mar 10, 2020

pracucci reopened this Apr 20, 2020

stale bot removed the stale label Apr 20, 2020

stale bot added the stale label May 21, 2020

stale bot added the stale label Oct 24, 2020

stale bot removed the stale label Oct 24, 2020

bwplotka added a commit that referenced this issue Nov 2, 2020

proposal: Added proposal for vertical block sharding (limit size bloc…

a90602e

…k size). Related to #1424 and #3068 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka mentioned this issue Nov 2, 2020

proposal: Added proposal for vertical block sharding (limit size of block) #3390

Closed

bwplotka mentioned this issue Nov 4, 2020

compact: Added support for no-compact markers in planner. #3409

Merged

bwplotka added a commit that referenced this issue Nov 4, 2020

compact: Added index size limiting planner detecting output index siz…

99f0ee5

…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka mentioned this issue Nov 4, 2020

compact: Added index size limiting planner detecting output index size over 64GB. #3410

Merged

bwplotka added a commit that referenced this issue Nov 4, 2020

compact: Added index size limiting planner detecting output index siz…

177d3c8

…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka closed this as completed in #3409 Nov 6, 2020

bwplotka reopened this Nov 6, 2020

bwplotka added a commit that referenced this issue Nov 6, 2020

compact: Added index size limiting planner detecting output index siz…

092b1bc

…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka added a commit that referenced this issue Nov 9, 2020

compact: Added index size limiting planner detecting output index siz…

d6a92a5

…e over 64GB. Fixes: #1424 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka closed this as completed in #3410 Nov 9, 2020

PrayagS mentioned this issue Feb 14, 2024

compactor: Irregular compaction and downsampling #7127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compactor - "write postings: exceeding max size of 64GiB" #1424

compactor - "write postings: exceeding max size of 64GiB" #1424

claytono commented Aug 15, 2019 •

edited

Loading

claytono commented Aug 27, 2019

jojohappy commented Aug 28, 2019

claytono commented Aug 28, 2019

claytono commented Aug 28, 2019

jojohappy commented Sep 6, 2019 •

edited

Loading

claytono commented Sep 6, 2019

jojohappy commented Sep 9, 2019

stale bot commented Jan 11, 2020

Lemmons commented Jan 31, 2020

stale bot commented Mar 3, 2020

ipstatic commented Apr 20, 2020

ipstatic commented Apr 20, 2020

bwplotka commented Apr 21, 2020

ipstatic commented Apr 21, 2020

stale bot commented May 21, 2020

ipstatic commented Oct 24, 2020

SuperQ commented Oct 29, 2020

bwplotka commented Oct 29, 2020 via email

SuperQ commented Nov 1, 2020

bwplotka commented Nov 2, 2020

bwplotka commented Nov 2, 2020

SuperQ commented Nov 2, 2020 •

edited

Loading

bwplotka commented Nov 5, 2020

bwplotka commented Nov 6, 2020

compactor - "write postings: exceeding max size of 64GiB" #1424

compactor - "write postings: exceeding max size of 64GiB" #1424

Comments

claytono commented Aug 15, 2019 • edited Loading

claytono commented Aug 27, 2019

jojohappy commented Aug 28, 2019

claytono commented Aug 28, 2019

claytono commented Aug 28, 2019

jojohappy commented Sep 6, 2019 • edited Loading

claytono commented Sep 6, 2019

jojohappy commented Sep 9, 2019

stale bot commented Jan 11, 2020

Lemmons commented Jan 31, 2020

stale bot commented Mar 3, 2020

ipstatic commented Apr 20, 2020

ipstatic commented Apr 20, 2020

bwplotka commented Apr 21, 2020

ipstatic commented Apr 21, 2020

stale bot commented May 21, 2020

ipstatic commented Oct 24, 2020

SuperQ commented Oct 29, 2020

bwplotka commented Oct 29, 2020 via email

SuperQ commented Nov 1, 2020

bwplotka commented Nov 2, 2020

bwplotka commented Nov 2, 2020

SuperQ commented Nov 2, 2020 • edited Loading

bwplotka commented Nov 5, 2020

bwplotka commented Nov 6, 2020

claytono commented Aug 15, 2019 •

edited

Loading

jojohappy commented Sep 6, 2019 •

edited

Loading

SuperQ commented Nov 2, 2020 •

edited

Loading