-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TSDB creation conflicting with transfer #1818
Conversation
3494f29
to
6f4253e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have a chance to look at the tests, but other changes LGTM
pkg/ingester/ingester_v2.go
Outdated
// a transfer in occurs. | ||
if !force { | ||
ingesterState := i.lifecycler.GetState() | ||
if ingesterState != ring.ACTIVE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can combine these 2, also combine it with the above if !force
check to be single if
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
bb44c41
to
d8a317e
Compare
pkg/ring/model.go
Outdated
@@ -184,7 +184,7 @@ func (d *Desc) TokensFor(id string) (tokens, other []uint32) { | |||
func (i *IngesterDesc) IsHealthy(op Operation, heartbeatTimeout time.Duration) bool { | |||
if op == Write && i.State != ACTIVE { | |||
return false | |||
} else if op == Read && i.State == JOINING { | |||
} else if op == Read && i.State != ACTIVE && i.State != LEAVING { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC pending ingesters shouldn't have any tokens; how did you trigger this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should occur in the paths where we use forAllIngesters()
. I think one path is:
- Querier receive a
LabelNames
request - Querier forwards the request to the distributor
- The distributor runs
client.LabelNames()
indistributor.forAllIngesters()
distributor.forAllIngesters()
callsring.GetAll()
ring.GetAll()
returns all ingesters which satisfy theIsHealthy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bboreham What's your take on this? If this change is a problem, I could rollback it in order to move one with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for enumerating these.
pkg/ingester/ingester_v2_test.go
Outdated
testData := testData | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, removed.
748dcbb
to
7a81a93
Compare
Nice work! I appreciate the refactoring on the read path ( |
7a81a93
to
6e4635a
Compare
You got some merge conflicts now |
1495357
to
fadbbdc
Compare
@gouthamve I've fixed the conflicts after rebasing. May you help me moving forward with this PR, please? |
Active development going on here :) you got more conflicts |
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…n JOINING state Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…ll require a dedicated discussion Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
fadbbdc
to
772ed25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does:
In this PR:
Suggest to enable "Hide whitespace changes" when reviewing for an easier diff.
Why?
I found an issue leading to data loss and/or TSDB wal corruption, which should be fixed by this PR. What happens:
PENDING
state, so a query request can hit an ingester even if theACTIVE
state has not been reached yet (not desired, fixed)If a query hit a new ingester in the
PENDING
state before the transfer starts (it switches toJOINING
then), the query creates a TSDB opening the wal segment00000000
. The subsequent transfer will overwrite the wal segments, while the00000000
file reference is still kept by the opened TSDB.Moreover, if it get transferred a wal where a checkpoint already occurred, the
00000000
segment is not part of the transfer but the segment 0 is created by the opened TSDB which leads to an inconsistent segments numbering on disk (ie. 0, checkpoint.6, 7, 8, 9) which cause TSDB to fail a subsequent re-opening.Which issue(s) this PR fixes:
No reported issue.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]