We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
We encountered an issue about scenario with 2 checks in same policy : ScaleIn and ScaleNone => ScaleNone (https://www.nomadproject.io/docs/autoscaling/internals/checks#scalein-and-scalenone-scalenone)
My policy :
scaling "cluster_node_app_LOW" { enabled = true min = 6 max = 40 policy { cooldown = "5m" evaluation_interval = "2m" check "LOW_memory_allocated_percentage" { source = "nomad-apm" query = "percentage-allocated_memory" strategy "threshold" { within_bounds_trigger = 1 upper_bound = 75 lower_bound = 0 delta = -1 } } check "LOW_cpu_allocated_percentage" { source = "nomad-apm" query = "percentage-allocated_cpu" strategy "threshold" { within_bounds_trigger = 1 upper_bound = 75 lower_bound = 0 delta = -1 } } target "aws-asg" { dry-run = "false" aws_asg_name = "node_app" node_class = "app" node_drain_deadline = "2m" node_purge = "true" } } }
Here is an extract of the logs :
2021-09-28T13:43:16.197Z [DEBUG] internal_plugin.nomad-apm: collected node pool resource data: allocated_cpu=307666 allocated_memory=536466 allocatable_cpu=332500 allocatable_memory=1076880 2021-09-28T13:43:16.198Z [TRACE] policy_eval.worker.check_handler: metric result: check=LOW_cpu_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg ts="2021-09-28 13:43:16.198019281 +0000 UTC m=+622.419012236" value=92.53112781954887 2021-09-28T13:43:16.198Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=LOW_cpu_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg count=36 2021-09-28T13:43:16.198Z [TRACE] internal_plugin.threshold: checking how many data points are within bounds: actionType=delta check_name=LOW_cpu_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 2021-09-28T13:43:16.198Z [TRACE] internal_plugin.threshold: found 0 data points within bounds: actionType=delta check_name=LOW_cpu_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 2021-09-28T13:43:16.198Z [TRACE] internal_plugin.threshold: not enough data points within bounds: actionType=delta check_name=LOW_cpu_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 2021-09-28T13:43:16.198Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg 2021-09-28T13:43:16.198Z [DEBUG] policy_eval.worker.check_handler: querying source: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg query=node_percentage-allocated_memory/app/class source=nomad-apm 2021-09-28T13:43:16.198Z [DEBUG] internal_plugin.nomad-apm: performing node pool APM query: query=node_percentage-allocated_memory/app/class 2021-09-28T13:43:17.535Z [DEBUG] internal_plugin.nomad-apm: collected node pool resource data: allocated_cpu=307666 allocated_memory=536466 allocatable_cpu=332500 allocatable_memory=1076880 2021-09-28T13:43:17.535Z [TRACE] policy_eval.worker.check_handler: metric result: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg ts="2021-09-28 13:43:17.53552563 +0000 UTC m=+623.756518575" value=49.81669266770671 2021-09-28T13:43:17.535Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg count=36 2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: checking how many data points are within bounds: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: found 1 data points within bounds: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: calculating new count: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: calculated scaling strategy results: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 new_count=35 direction=down 2021-09-28T13:43:17.535Z [TRACE] policy_eval.worker: check LOW_memory_allocated_percentage selected: id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster target=aws-asg direction=down count=35 2021-09-28T13:43:17.535Z [INFO] policy_eval.worker: scaling target: id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster target=aws-asg from=36 to=35 reason="scaling down because metric is within bounds" meta=map[nomad_policy_id:badcf8cd-1373-efc7-7444-a6aa0d1aa134]
We are not expecting scale In here because cpu check has no datapoints within bounds.
We think problem could be here :
nomad-autoscaler/plugins/builtin/strategy/threshold/plugin/plugin.go
Line 105 in 8daf57d
eval.Action.Direction = sdk.ScaleDirectionNone return eval, nil
Let me know if you need more infos.
Thank you for your support.
The text was updated successfully, but these errors were encountered:
Hi @jonathanlambert-iadvize and thanks for the amazing detail in this issue. I'll label it up and hopefully we can take a look into this soon.
Sorry, something went wrong.
Thank you @jonathanlambert-iadvize, your analysis was spot on 🙂
Successfully merging a pull request may close this issue.
Hello,
We encountered an issue about scenario with 2 checks in same policy : ScaleIn and ScaleNone => ScaleNone (https://www.nomadproject.io/docs/autoscaling/internals/checks#scalein-and-scalenone-scalenone)
My policy :
Here is an extract of the logs :
We are not expecting scale In here because cpu check has no datapoints within bounds.
We think problem could be here :
nomad-autoscaler/plugins/builtin/strategy/threshold/plugin/plugin.go
Line 105 in 8daf57d
Something like this would help to have a eval with a direction instead of null value :
Let me know if you need more infos.
Thank you for your support.
The text was updated successfully, but these errors were encountered: