-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor replicate execution method to not iterate the origin bucket #2906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments but overall LGTM!
metas, partials, err := rs.fetcher.Fetch(ctx) | ||
if err != nil && metas == nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should log the error if metas != nil && err != nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good to log the error here, but kind of duplicate.
I just check the code, only here returns metas and error at the same time. In this case, partials != nil, so we will log the partial meta errors later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm the if err != nil && metas == nil {
feels like code smell to me though. It just looks wrong from outside which can lead to bugs in future:
- Inside implementation can change, and metas can be produced without partial for some reason - then we will miss an error.
- Even if we say it makes sense, maybe another person will come here and refactor it anyway as it looks like error can be missed.
On other hand, shouldn't replicate FAIL on any error spotted? Even if we failed to read one block? Sometimes we heavily rely on replication status so this has to be heavily tested and controlled, ideally with failed metric being somewhere (:
CI failure is unrelated. Please take a look at this pr again |
eb9fe91
to
2d6d058
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the CI failure seems related to the changes
af3a2a8
to
7eafd1c
Compare
Thanks, the previous problem was caused by the |
7eafd1c
to
e5f0737
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e5f0737
to
4b15ae5
Compare
Can I get a review for this? I think it is close. |
4b15ae5
to
f89065a
Compare
f89065a
to
8d2e154
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
8d2e154
to
a75bef5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, some comments though, before merging (:
metas, partials, err := rs.fetcher.Fetch(ctx) | ||
if err != nil && metas == nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm the if err != nil && metas == nil {
feels like code smell to me though. It just looks wrong from outside which can lead to bugs in future:
- Inside implementation can change, and metas can be produced without partial for some reason - then we will miss an error.
- Even if we say it makes sense, maybe another person will come here and refactor it anyway as it looks like error can be missed.
On other hand, shouldn't replicate FAIL on any error spotted? Even if we failed to read one block? Sometimes we heavily rely on replication status so this has to be heavily tested and controlled, ideally with failed metric being somewhere (:
pkg/replicate/scheme.go
Outdated
return nil | ||
} | ||
for id, partialError := range partials { | ||
level.Info(rs.logger).Log("msg", "failed to fetch block meta. Skipping.", "block_uuid", id.String(), "err", partialError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this failure? This is actually valid state of blocks being potentially uploaded, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like level.Info(rs.logger).Log("msg", "block meta not uploaded yet. Skipping.", "block_uuid", id.String()) r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartialError has 2 cases, one is meta.json not found
, another one is meta.json corrupted
. What about the corrupted case, is it an error or not?
For different obj store implementation, is it possible that one uploaded file is corrupted? If it is this case, how can we differentiate the case that one file is totally corrupted or still being uploaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrupted case, is it an error or not?
For current flow it's not error. We consider corrupted meta same as no meta.json as the reason is usually same - partial upload (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated.
ddc233c
to
c8140be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded to comment, that still needs to be addressed, otherwise LGTM!
Thanks!
pkg/replicate/scheme.go
Outdated
return nil | ||
} | ||
for id, partialError := range partials { | ||
level.Info(rs.logger).Log("msg", "failed to fetch block meta. Skipping.", "block_uuid", id.String(), "err", partialError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrupted case, is it an error or not?
For current flow it's not error. We consider corrupted meta same as no meta.json as the reason is usually same - partial upload (:
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
c8140be
to
f1079be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Changelog entry could be better, but we can fix it later.
@@ -28,6 +28,12 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re | |||
- [#3331](https://github.com/thanos-io/thanos/pull/3331) Disable Azure blob exception logging | |||
- [#3341](https://github.com/thanos-io/thanos/pull/3341) Disable Azure blob syslog exception logging | |||
|
|||
### Changed | |||
|
|||
- [#2906](https://github.com/thanos-io/thanos/pull/2906) Tools: Refactor Bucket replicate execution. Removed all `thanos_replicate_origin_.*` metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically not need to mention refactor, just metric changes (:
Whoop 🥳 🎊 I'll work on #2979. |
Nice work @yeya24 |
💪 |
…hanos-io#2906) * Refactor replicate execution Signed-off-by: Ben Ye <yb532204897@gmail.com> * return error when fetcher gets an error Signed-off-by: Ben Ye <yb532204897@gmail.com> * address feedback Signed-off-by: Ben Ye <yb532204897@gmail.com> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>
Signed-off-by: Ben Ye yb532204897@gmail.com
Changes
In the previous implementation, we are iterating the origin bucket, and each time, we use metaFetcher to fetch all meta files. But this seems not necessary, we can just fetch meta files once at the beginning.
Verification