Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop error log for "job scaling blocked due to active deployment" #515

Closed
evandam opened this issue Jul 19, 2021 · 3 comments · Fixed by #542
Closed

Drop error log for "job scaling blocked due to active deployment" #515

evandam opened this issue Jul 19, 2021 · 3 comments · Fixed by #542

Comments

@evandam
Copy link

evandam commented Jul 19, 2021

Just a slight quality-of-life request - can logs for this be dropped to WARN or maybe even INFO?

To me it seems this is expected behavior, and wouldn't be considered an error. It would be nice to drop the severity on this one so alerts for real errors won't trip over this and it won't have to be filtered otherwise.

2021-07-19T16:37:34.048Z [ERROR] policy_eval.worker: failed to evaluate policy: eval_id=70c8ea00-1c40-9ffb-f732-29cddb212153 eval_token=8e13fa3c-8c68-2aa8-da50-be561332f67e id=3fe0dfca-376c-6a37-deba-dd84d95e9a2b policy_id=0b03eba3-153e-4c7f-dab9-325d3cff760e queue=horizontal error="failed to scale target: failed to scale group /: Unexpected response code: 400 (job scaling blocked due to active deployment)"

Low priority - just a thought. Thanks folks!

@lgfa29
Copy link
Contributor

lgfa29 commented Jul 19, 2021

Thanks @evandam. I think dropping the log level in this case makes sense, it's an error from Nomad, but kind of expected at the Autoscaler level.

@evandam
Copy link
Author

evandam commented Aug 20, 2021

Hey @lgfa29

I'm not sure how this is hooked up, but I was curious if dropping the log level would also prevent showing error events here? To me it seems like the same deal where it's not really an error, but maybe good to know still.

Screen Shot 2021-08-20 at 1 12 54 PM

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 20, 2021

Yes, they are kind of related. If you try to scale a job during a deployment, the endpoint will trigger an error since the scaling action wasn't able to proceed.

This error is then logged in the Autoscaler since the evaluation didn't complete successfully.

We could drop that scaling event in this case as well if it makes sense 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants