E2E test for cpu scaler #2441

Ritikaa96 · 2022-01-05T10:41:37Z

This PR add E2E tests for cpu scaler.

Checklist

Commits are signed with Developer Certificate of Origin (DCO - learn more)
Tests have been added
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Changelog has been updated

Ritikaa96 · 2022-01-05T10:44:04Z

JorTurFer

LGTM!
Thanks for doing it ❤️
Please, update the changelog (adding this PR under improvements section) and let me trigger it before merging :)

JorTurFer · 2022-01-05T11:10:33Z

/run-e2e cpu.test.*
Update: You can check the progres here

pending changelog

JorTurFer · 2022-01-05T13:20:57Z

/run-e2e cpu.test.*
Update: You can check the progres here

tests/scalers/cpu.test.ts

Ritikaa96 · 2022-01-06T10:53:24Z

Hi, @JorTurFer , @zroubalik I've changed the image and updated the changelog.
PTAL!

JorTurFer · 2022-01-06T15:12:42Z

/run-e2e cpu.test.*
Update: You can check the progres here

zroubalik · 2022-01-06T16:29:09Z

/run-e2e cpu.test.* Update: You can check the progres here

@Ritikaa96 seems like the test is failing ^

Ritikaa96 · 2022-01-10T12:09:38Z

/run-e2e cpu.test.* Update: You can check the progres here

@Ritikaa96 seems like the test is failing ^

Hi @zroubalik , I tried the same file and ran npx ava scalers/cpu.test.ts to test it. The test is successfully run on my system. Do you think its because of images used? i saw we have used busybox in graphite scaler too.

zroubalik · 2022-01-11T10:05:29Z

/run-e2e cpu.test.* Update: You can check the progres here

@Ritikaa96 seems like the test is failing ^

Hi @zroubalik , I tried the same file and ran npx ava scalers/cpu.test.ts to test it. The test is successfully run on my system. Do you think its because of images used? i saw we have used busybox in graphite scaler too.

I don't think that busybox is the problem. Let me rerun the test.

zroubalik · 2022-01-11T10:05:32Z

/run-e2e cpu.test.*
Update: You can check the progres here

Ritikaa96 · 2022-01-11T10:41:07Z

/run-e2e cpu.test.* Update: You can check the progres here

@zroubalik , Seems like now the job is deployed successfully whereas it wasn't in the previous run. But now the next step is getting failed. any suggestions where it can be going wrong?

JorTurFer · 2022-01-16T12:47:44Z

I can reproduce the error locally, let me check if I figure out any tip or solution

JorTurFer · 2022-01-16T13:22:01Z

In my local, the generated load expires too early and that's why never reaches 5 instances.
Maybe you could reduce the expected instances or extend the load time to ensure that it's enough, maybe extending it to 600 or 800 and deleting the job after the scaling out step to ensure that the load finishes. Other option is using a Deployment instead of a Job for better management of the load generation (you could maintain the load despite the HPA is not started yet).

Note: If you extend the load, set the maximum amount of instances instead of having 100 (default), for this e2e test it's not important the final calculation of the instances, it depends on different factors that could change, the important point is that it scales out/in.

Another tip to avoid waiting for 8 minutes is using the advanced section of the ScaledObject, I have reduced it to 3 minutes only setting scaleDown.stabilizationWindowSeconds to 0.

I have also been troubleshooting the test and my solution is this:

import * as fs from 'fs'
import * as sh from 'shelljs'
import * as tmp from 'tmp'
import test from 'ava'
import { waitForDeploymentReplicaCount } from './helpers'

const testNamespace = 'cpu-test'
const deployMentFile = tmp.fileSync()
const triggerFile = tmp.fileSync()


test.before(t => {
  sh.config.silent = true
  sh.exec(`kubectl create namespace ${testNamespace}`)

  fs.writeFileSync(deployMentFile.name, deployMentYaml)
  t.is(
    0,
    sh.exec(`kubectl apply -f ${deployMentFile.name} --namespace ${testNamespace}`).code,
    'Deploying php deployment should work.'
  )
  t.is(0, sh.exec(`kubectl rollout status deploy/php-apache -n ${testNamespace}`).code, 'Deployment php rolled out succesfully')
})

test.serial('Deployment should have 1 replicas on start', t => {
  const replicaCount = sh.exec(
    `kubectl get deployment.apps/php-apache --namespace ${testNamespace} -o jsonpath="{.spec.replicas}"`
  ).stdout
  t.is(replicaCount, '1', 'replica count should start out as 1')
})

test.serial(`Creating Job should work`, async t => {
  fs.writeFileSync(triggerFile.name, triggerJob)
  t.is(
    0,
    sh.exec(`kubectl apply -f ${triggerFile.name} --namespace ${testNamespace}`).code,
    'creating job should work.'
  )
})

test.serial(`Deployment should scale to 4 after 3 minutes`, async t => {
  // check replicacount on constant triggering :
  t.true(await waitForDeploymentReplicaCount(4, 'php-apache', testNamespace, 18, 10000), 'Replica count should be 5 after 180 seconds')
})

test.serial(`Deleting Job should work`, async t => {
  fs.writeFileSync(triggerFile.name, triggerJob)
  t.is(
    0,
    sh.exec(`kubectl delete -f ${triggerFile.name} --namespace ${testNamespace}`).code,
    'Deleting job should work.'
  )
})

test.serial(`Deployment should scale back to 1 after 3 minutes`, async t => {
  // check for the scale down :
  t.true(await waitForDeploymentReplicaCount(1, 'php-apache', testNamespace, 18, 10000), 'Replica count should be 1 after 8 minutes')
})

test.after.always.cb('clean up workload test related deployments', t => {
  const resources = [
    'deployment.apps/php-apache',
    'jobs.batch/trigger-job',
  ]
  for (const resource of resources) {
    sh.exec(`kubectl delete ${resource} --namespace ${testNamespace}`)
  }
  sh.exec(`kubectl delete namespace ${testNamespace}`)
  t.end()
})

const deployMentYaml = `apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
        imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaledobject
  labels:
    run: php-apache
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 0
  maxReplicaCount:  4
  scaleTargetRef:
    name: php-apache
  triggers:
  - type: cpu
    metadata:
      type: Utilization
      value: "50"`
const triggerJob = `apiVersion: batch/v1
kind: Job
metadata:
  name: trigger-job
  namespace: cpu-test
spec:
  template:
    spec:
      containers:
      - image: busybox
        name: test
        command: ["/bin/sh"]
        args: ["-c", "for i in $(seq 1 600);do wget -q -O- http://php-apache.cpu-test.svc/;sleep 0.1;done"]
      restartPolicy: Never
  activeDeadlineSeconds: 600
  backoffLimit: 3`

Basically I have limited the max instances to 4, increased the load duration, deleted the load job after the scaling out and modify the HPA behavior to scale in faster reducing the timeout from 8 minutes to 3.

Feel free to use it, ignore it, or take some parts.

BTW, Thanks for helping with e2e tests ❤️❤️❤️❤️

CHANGELOG.md

Ritikaa96 · 2022-01-16T19:53:23Z

Basically I have limited the max instances to 4, increased the load duration, deleted the load job after the scaling out, and modify the HPA behavior to scale in faster reducing the timeout from 8 minutes to 3.

Hi @JorTurFer, Thanks for taking out the time to test it locally.
I have limited the max instances to 4, increased the load duration, deleted the load job after the scaling out, and modify the HPA behavior to scale in faster reducing the timeout from 8 minutes to 3

Deleting the job in mid and modifying HPA behavior to scale in faster helped a lot in reducing the time and after corrections, the whole test merely takes a few minutes. Thanks a lot for the suggestions, I wonder why I didn't think of this before.

As per the more logical bet, I've changed the constant expected final replica count check to one where we focus on whether it is scaling out or not, with:

test.serial(`Deployment should scale in next 3 minutes`, async t => {
  // check for increased replica count on constant triggering :
  t.true(await waitForDeploymentScaleUp(2, 'php-apache', testNamespace, 18, 10000), 'Replica count should scale up in next 3 minutes')
})

Also on setting scaleDown.stabilizationWindowSeconds to 0 in ScaledObject, the deployment is going to immediately scale down to 0, consequentially creating a problem in its scaling up with the trigger, though adding minReplicaCount resolved that.

PTAL! I hope crossed out the apparent reason the test failed last time.

JorTurFer · 2022-01-17T07:46:14Z

/run-e2e cpu.test.*
Update: You can check the progres here

JorTurFer

LGTM, ❤️
Let's wait till test execution 🤞
Only 2 small things:

Please add the missing resources that I said in the comment
Is waitForDeploymentScaleUp really needed? I mean, you could set maxReplicaCount to 2, and you will not need it anymore.

tests/scalers/cpu.test.ts

updated cpu.test.ts, Resolved conflicts from changelog.md

Ritikaa96 · 2022-01-17T12:11:52Z

Hi, @JorTurFer , PTAL.

JorTurFer · 2022-01-17T12:14:41Z

/run-e2e cpu.test.*
Update: You can check the progres here

JorTurFer

LGTM!
Thanks for your contribution ❤️ ❤️ ❤️ ❤️

Ritikaa96 requested a review from a team as a code owner January 5, 2022 10:41

JorTurFer previously approved these changes Jan 5, 2022

View reviewed changes

JorTurFer self-requested a review January 5, 2022 11:11

JorTurFer requested a review from a team January 5, 2022 13:53

zroubalik reviewed Jan 6, 2022

View reviewed changes

tests/scalers/cpu.test.ts Outdated Show resolved Hide resolved

Ritikaa96 changed the title ~~E2E cpu scaler~~ E2E test for cpu scaler Jan 6, 2022

JorTurFer reviewed Jan 16, 2022

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

Ritikaa96 force-pushed the e2e-cpu-scaler branch from afa6c78 to 91a5808 Compare January 16, 2022 19:18

JorTurFer reviewed Jan 17, 2022

View reviewed changes

tests/scalers/cpu.test.ts Outdated Show resolved Hide resolved

Ritikaa96 force-pushed the e2e-cpu-scaler branch from 91a5808 to bead6d3 Compare January 17, 2022 10:57

Signed-off-by: Ritikaa96 <ritika@india.nec.com>

46c0e15

updated cpu.test.ts, Resolved conflicts from changelog.md

Ritikaa96 force-pushed the e2e-cpu-scaler branch from bd6fb97 to 46c0e15 Compare January 17, 2022 11:38

JorTurFer approved these changes Jan 17, 2022

View reviewed changes

JorTurFer merged commit 633923a into kedacore:main Jan 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E test for cpu scaler #2441

E2E test for cpu scaler #2441

Ritikaa96 commented Jan 5, 2022 •

edited

Loading

Ritikaa96 commented Jan 5, 2022

JorTurFer left a comment •

edited

Loading

JorTurFer commented Jan 5, 2022 •

edited by github-actions bot

Loading

JorTurFer commented Jan 5, 2022 •

edited by github-actions bot

Loading

Ritikaa96 commented Jan 6, 2022

JorTurFer commented Jan 6, 2022 •

edited by github-actions bot

Loading

zroubalik commented Jan 6, 2022

Ritikaa96 commented Jan 10, 2022

zroubalik commented Jan 11, 2022

zroubalik commented Jan 11, 2022 •

edited by github-actions bot

Loading

Ritikaa96 commented Jan 11, 2022

JorTurFer commented Jan 16, 2022 •

edited

Loading

JorTurFer commented Jan 16, 2022 •

edited

Loading

Ritikaa96 commented Jan 16, 2022

JorTurFer commented Jan 17, 2022 •

edited by github-actions bot

Loading

JorTurFer left a comment •

edited

Loading

Ritikaa96 commented Jan 17, 2022

JorTurFer commented Jan 17, 2022 •

edited by github-actions bot

Loading

JorTurFer left a comment

E2E test for cpu scaler #2441

E2E test for cpu scaler #2441

Conversation

Ritikaa96 commented Jan 5, 2022 • edited Loading

Checklist

Ritikaa96 commented Jan 5, 2022

JorTurFer left a comment • edited Loading

Choose a reason for hiding this comment

JorTurFer commented Jan 5, 2022 • edited by github-actions bot Loading

JorTurFer commented Jan 5, 2022 • edited by github-actions bot Loading

Ritikaa96 commented Jan 6, 2022

JorTurFer commented Jan 6, 2022 • edited by github-actions bot Loading

zroubalik commented Jan 6, 2022

Ritikaa96 commented Jan 10, 2022

zroubalik commented Jan 11, 2022

zroubalik commented Jan 11, 2022 • edited by github-actions bot Loading

Ritikaa96 commented Jan 11, 2022

JorTurFer commented Jan 16, 2022 • edited Loading

JorTurFer commented Jan 16, 2022 • edited Loading

Ritikaa96 commented Jan 16, 2022

JorTurFer commented Jan 17, 2022 • edited by github-actions bot Loading

JorTurFer left a comment • edited Loading

Choose a reason for hiding this comment

Ritikaa96 commented Jan 17, 2022

JorTurFer commented Jan 17, 2022 • edited by github-actions bot Loading

JorTurFer left a comment

Choose a reason for hiding this comment

Ritikaa96 commented Jan 5, 2022 •

edited

Loading

JorTurFer left a comment •

edited

Loading

JorTurFer commented Jan 5, 2022 •

edited by github-actions bot

Loading

JorTurFer commented Jan 5, 2022 •

edited by github-actions bot

Loading

JorTurFer commented Jan 6, 2022 •

edited by github-actions bot

Loading

zroubalik commented Jan 11, 2022 •

edited by github-actions bot

Loading

JorTurFer commented Jan 16, 2022 •

edited

Loading

JorTurFer commented Jan 16, 2022 •

edited

Loading

JorTurFer commented Jan 17, 2022 •

edited by github-actions bot

Loading

JorTurFer left a comment •

edited

Loading

JorTurFer commented Jan 17, 2022 •

edited by github-actions bot

Loading