Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScaleNone expected but scaleIn occurs #529

Closed
jonathanlambert-iadvize opened this issue Sep 28, 2021 · 2 comments · Fixed by #537
Closed

ScaleNone expected but scaleIn occurs #529

jonathanlambert-iadvize opened this issue Sep 28, 2021 · 2 comments · Fixed by #537

Comments

@jonathanlambert-iadvize
Copy link

jonathanlambert-iadvize commented Sep 28, 2021

Hello,

We encountered an issue about scenario with 2 checks in same policy : ScaleIn and ScaleNone => ScaleNone (https://www.nomadproject.io/docs/autoscaling/internals/checks#scalein-and-scalenone-scalenone)

My policy :

scaling "cluster_node_app_LOW" {
  enabled = true
  min     = 6
  max     = 40
  policy {
    cooldown            = "5m"
    evaluation_interval = "2m"

    check "LOW_memory_allocated_percentage" {
      source = "nomad-apm"
      query  = "percentage-allocated_memory"
      strategy "threshold" {
        within_bounds_trigger = 1
        upper_bound           = 75
        lower_bound           = 0
        delta            = -1
      }
    }

    check "LOW_cpu_allocated_percentage" {
      source = "nomad-apm"
      query  = "percentage-allocated_cpu"
      strategy "threshold" {
        within_bounds_trigger = 1
        upper_bound           = 75
        lower_bound           = 0
        delta            = -1
      }
    }

      target "aws-asg" {
        dry-run             = "false"
        aws_asg_name        = "node_app"
        node_class          = "app"
        node_drain_deadline = "2m"
        node_purge          = "true"
    }
  }
}

Here is an extract of the logs :

2021-09-28T13:43:16.197Z [DEBUG] internal_plugin.nomad-apm: collected node pool resource data: allocated_cpu=307666 allocated_memory=536466 allocatable_cpu=332500 allocatable_memory=1076880
2021-09-28T13:43:16.198Z [TRACE] policy_eval.worker.check_handler: metric result: check=LOW_cpu_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg ts="2021-09-28 13:43:16.198019281 +0000 UTC m=+622.419012236" value=92.53112781954887
2021-09-28T13:43:16.198Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=LOW_cpu_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg count=36
2021-09-28T13:43:16.198Z [TRACE] internal_plugin.threshold: checking how many data points are within bounds: actionType=delta check_name=LOW_cpu_allocated_percentage current_count=36 lower_bound=0 upper_bound=75
2021-09-28T13:43:16.198Z [TRACE] internal_plugin.threshold: found 0 data points within bounds: actionType=delta check_name=LOW_cpu_allocated_percentage current_count=36 lower_bound=0 upper_bound=75
2021-09-28T13:43:16.198Z [TRACE] internal_plugin.threshold: not enough data points within bounds: actionType=delta check_name=LOW_cpu_allocated_percentage current_count=36 lower_bound=0 upper_bound=75
2021-09-28T13:43:16.198Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg
2021-09-28T13:43:16.198Z [DEBUG] policy_eval.worker.check_handler: querying source: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg query=node_percentage-allocated_memory/app/class source=nomad-apm
2021-09-28T13:43:16.198Z [DEBUG] internal_plugin.nomad-apm: performing node pool APM query: query=node_percentage-allocated_memory/app/class
2021-09-28T13:43:17.535Z [DEBUG] internal_plugin.nomad-apm: collected node pool resource data: allocated_cpu=307666 allocated_memory=536466 allocatable_cpu=332500 allocatable_memory=1076880
2021-09-28T13:43:17.535Z [TRACE] policy_eval.worker.check_handler: metric result: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg ts="2021-09-28 13:43:17.53552563 +0000 UTC m=+623.756518575" value=49.81669266770671
2021-09-28T13:43:17.535Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=LOW_memory_allocated_percentage id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster source=nomad-apm strategy=threshold target=aws-asg count=36
2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: checking how many data points are within bounds: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75
2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: found 1 data points within bounds: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75
2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: calculating new count: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75
2021-09-28T13:43:17.535Z [TRACE] internal_plugin.threshold: calculated scaling strategy results: actionType=delta check_name=LOW_memory_allocated_percentage current_count=36 lower_bound=0 upper_bound=75 new_count=35 direction=down
2021-09-28T13:43:17.535Z [TRACE] policy_eval.worker: check LOW_memory_allocated_percentage selected: id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster target=aws-asg direction=down count=35
2021-09-28T13:43:17.535Z [INFO]  policy_eval.worker: scaling target: id=1a4b47f2-026e-6cf8-aa0a-6117abbbd741 policy_id=badcf8cd-1373-efc7-7444-a6aa0d1aa134 queue=cluster target=aws-asg from=36 to=35 reason="scaling down because metric is within bounds" meta=map[nomad_policy_id:badcf8cd-1373-efc7-7444-a6aa0d1aa134]

We are not expecting scale In here because cpu check has no datapoints within bounds.

We think problem could be here :


Something like this would help to have a eval with a direction instead of null value :

eval.Action.Direction = sdk.ScaleDirectionNone
return eval, nil

Let me know if you need more infos.

Thank you for your support.

@jrasell
Copy link
Member

jrasell commented Oct 28, 2021

Hi @jonathanlambert-iadvize and thanks for the amazing detail in this issue. I'll label it up and hopefully we can take a look into this soon.

@lgfa29
Copy link
Contributor

lgfa29 commented Nov 12, 2021

Thank you @jonathanlambert-iadvize, your analysis was spot on 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants