-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promclient: drop version check #1163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall good - as we agreed on issue, but we some suggestions to make it work.
cmd/thanos/sidecar.go
Outdated
"Prometheus doesn't support flags endpoint, skip validation", "version", m.version.Original()) | ||
return nil | ||
} | ||
|
||
flags, err := promclient.ConfiguredFlags(ctx, logger, m.promURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add flow for distincting 404 from that will show that either it's not Prometheus (this will be caught later on, so fine). Maybe via simple sentinel erorr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this has to retry ): As we removed first retry which helped us to wait until Prometheus is rdy (often the case). Thanks of first retry you removed we wouldn't need to retry here, as we know Prometheus must be rdy. This is not the case now without retry you removed ):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if my response is clear (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. The 404 is a bit fiddly, because we see this. But PTAL
pkg/promclient/promclient.go
Outdated
} | ||
|
||
if resp.StatusCode == 200 || resp.StatusCode == 404 { // 404, just give up | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like
var NotFoundFlags = errors.New("No flag endpoint found")
has to be returned here. And let caller decide what to do -> e.g we should log on 404 and, grab flags on 200 and error on anything else IMO (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being strict ):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm not sure how you want to do that with the retry logic? If I understand that code correctly it retries upon seeing an error, meaning it will retry NotFoundFlags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we need to pass it somehow. This question is just implementation detail. We either move retry logic to caller (I prefer that - more explicit) and just retry on certain errors or just pass error in separate var here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, moved and added a few things back to not make too many changes.
I think maybe we should keep the check but make the check to check for |
@povilasv we cannot. As users can build prom on their own so they can put whatever in a version. If they use promu and build from master it will have some commit + date + something version. Let's drop version check and try flags instead |
[ Quoting <notifications@github.com> in "Re: [improbable-eng/thanos] promcli..." ]
bwplotka commented on this pull request.
> - return Flags{}, errors.Errorf("got non-200 response code: %v, response: %v", resp.StatusCode, string(b))
+ if err := runutil.Retry(2*time.Second, ctx.Done(), func() error {
+
+ resp, err := http.DefaultClient.Do(req.WithContext(ctx))
+ if err != nil {
+ return errors.Wrapf(err, "request config against %s", u.String())
+ }
+ defer runutil.CloseWithLogOnErr(logger, resp.Body, "query body")
+
+ body, err = ioutil.ReadAll(resp.Body)
+ if err != nil {
+ return errors.New("failed to read body")
+ }
+
+ if resp.StatusCode == 200 || resp.StatusCode == 404 { // 404, just give up
+ return nil
Well we need to pass it somehow. This question is just implementation detail. We either move retry logic to caller (I prefer that - more explicit) and just retry on certain errors or just pass error in separate var here.
sgtm. To be clear, basically move the runutil.Retry to whatever is calling this
right now (don't have the code open atm)
|
The version check would check the prometheus version and if > 2.2.0 would query the flags endpoint to do some extra validation. Remove this for just checking the flags endpoints and if it responds do the validation; otherwise log the failure, but continue the startup. Signed-off-by: Miek Gieben <miek@miek.nl>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small knits, other than that LGTM 🥇
Signed-off-by: Miek Gieben <miek@miek.nl>
ok, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @bwplotka take another look when you have time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but still something we could improve. WDYT? @miekg
if m.version == nil { | ||
level.Warn(logger).Log("msg", "fetched version is nil or invalid. Unable to know whether Prometheus supports /version endpoint, skip validation") | ||
return nil | ||
flags := promclient.Flags{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but it might be bit confusing - it's not really following the logical path, which can surprise reader. If there is no flags the whole function should just return. We workaround simple noFlagEndpoint bool
with assuming some values and unneccesary validating them below - prone to bug in future. But not a blocker, we can merge this and fix it later.
Happy to provide further fixes l, after potential merge.
I'm still wondering about the whole mechanism, as you can reload prom
configs on the fly...
…On Wed, 22 May 2019, 13:56 Bartek Płotka, ***@***.***> wrote:
***@***.**** approved this pull request.
LGTM, but still something we could improve. WDYT? @miekg
<https://github.com/miekg>
------------------------------
In cmd/thanos/sidecar.go
<#1163 (comment)>
:
> @@ -321,20 +305,27 @@ func runSidecar(
}
func validatePrometheus(ctx context.Context, logger log.Logger, m *promMetadata) error {
- if m.version == nil {
- level.Warn(logger).Log("msg", "fetched version is nil or invalid. Unable to know whether Prometheus supports /version endpoint, skip validation")
- return nil
+ flags := promclient.Flags{
This works, but it might be bit confusing - it's not really following the
logical path, which can surprise reader. If there is no flags the whole
function should just return. We workaround simple noFlagEndpoint bool
with assuming some values and unneccesary validating them below - prone to
bug in future. But not a blocker, we can merge this and fix it later.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1163>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACWIW2NQTGGNVQ5BMZ66I3PWU7I7ANCNFSM4HNVMDOA>
.
|
Is there really a problem with it? Min/max block duration in Prometheus are configured via command line options and AFAIK you can't change them without restarting the whole process, and if you are going to restart the whole process then Sidecar will get "disconnected" from Prometheus and it will recheck those again after getting "reconnected". Have I missed something? 🤔 |
ah, I wasn't aware it's a command line flag; that makes more sense then 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but it might be bit confusing - it's not really following the logical path, which can surprise reader. If there is no flags the whole function should just return. We workaround simple noFlagEndpoint bool with assuming some values and unneccesary validating them below - prone to bug in future. But not a blocker, we can merge this and fix it later.
This needs to be fixed at some point, but let's merge this to unblock custom builds 👍
Thanks @miekg !
The version check would check the prometheus version and if > 2.2.0
would query the flags endpoint to do some extra validation.
Remove this for just checking the flags endpoints and if it responds do
the validation; otherwise log the failure, but continue the startup.
Signed-off-by: Miek Gieben miek@miek.nl
Changes
Verification