Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making ingester head compaction jitter Zone aware. #5928

Merged
merged 6 commits into from
May 7, 2024

Conversation

alanprot
Copy link
Member

@alanprot alanprot commented May 6, 2024

What this PR does:
#5919 We introduced a jitter on the head compaction. This PR is proposing to change that jitter to take in consideration the AZ of the ingester.

Lets say we configure to have a head compaction interval set at 15 minutes across 3 Availability Zones (AZs). Prior to this change, the ingesters would determine a random jitter and perform the head compaction at any minute within each 15-minute interval. This change ensure that within every 15-minute interval, Ingesters in AZ1 execute head compaction from minute 0 to 5, those in AZ2 from minute 5 to 10, and those in AZ3 from minute 10 to 15.

I added a log on every tick to show the time of the tick and the AZ of the ingester on a 15 minutes period:

aws.az tick_time count(*)
us-east-1d 2024-05-06 18:59 1
us-east-1d 2024-05-06 18:58 9
us-east-1d 2024-05-06 18:57 7
us-east-1d 2024-05-06 18:56 11
us-east-1d 2024-05-06 18:55 6
us-east-1b 2024-05-06 18:54 11
us-east-1b 2024-05-06 18:53 5
us-east-1b 2024-05-06 18:52 7
us-east-1b 2024-05-06 18:51 8
us-east-1b 2024-05-06 18:50 11
us-east-1a 2024-05-06 18:49 5
us-east-1a 2024-05-06 18:48 10
us-east-1a 2024-05-06 18:47 8
us-east-1a 2024-05-06 18:46 8
us-east-1a 2024-05-06 18:45 8

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • [NA] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@alanprot alanprot force-pushed the slotted-ticker branch 4 times, most recently from bdd5410 to 26c9789 Compare May 6, 2024 23:36
@alanprot
Copy link
Member Author

alanprot commented May 7, 2024

@yeya24 Can you take a look? :D

I will update the changelog after if we go with it.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. It looks great and I think it makes sense. Just some nits.

pkg/util/time.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Show resolved Hide resolved
pkg/ingester/ingester.go Show resolved Hide resolved
alanprot added 6 commits May 7, 2024 09:47
Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: alanprot <alanprot@gmail.com>
@alanprot alanprot marked this pull request as ready for review May 7, 2024 16:48
@yeya24 yeya24 merged commit c11dc24 into cortexproject:master May 7, 2024
16 checks passed
@CharlieTLe
Copy link
Member

These tests seem to fail when it's experiencing cpu starvation from other tests running concurrently because of the usage of a wall clock and the timing could be off depending on how busy the CPU is. Could we use a mock clock or increase the duration in the tests to decrease the chance of race conditions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants