Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Eager mode for ScaledJobs #5114

Open
junekhan opened this issue Oct 23, 2023 · 14 comments · May be fixed by #5872
Open

Add Eager mode for ScaledJobs #5114

junekhan opened this issue Oct 23, 2023 · 14 comments · May be fixed by #5872
Labels
feature-request All issues for new features that have not been committed to needs-discussion

Comments

@junekhan
Copy link
Contributor

Proposal

KEDA doesn't scale up job number to the max according to this piece of description and information around. But in our case, we want to launch as many jobs as possible and keep the queue as empty. So, can we have an Eager mode in addition so users like me can shorten the waiting time in general?

Use-Case

We want to launch as many jobs as possible and keep the queue as empty.

Is this a feature you are interested in implementing yourself?

Maybe

Anything else?

No response

@junekhan junekhan added feature-request All issues for new features that have not been committed to needs-discussion labels Oct 23, 2023
@SpiritZhou
Copy link
Contributor

Which scaler do you use? I think set the para queueLength on queue scaler can shorten the time.

@junekhan
Copy link
Contributor Author

  triggers:
    - type: rabbitmq
      metadata:
        queueName: queue
        hostFromEnv: RABBITMQ_URL
        mode: QueueLength
        value: "1"
---

@SpiritZhou Thanks for your reply. We configured the scaler as above.
Increasing the queueLength seems unacceptable to us since it keeps pods running as long as these pods haven't consumed as many messages as the number assigned to queueLength. It means that allocated resources for these pods won't be recycled, although we hope the cluster can scale down once there's no incoming message.

@junekhan junekhan changed the title Add Eager model for ScaledJobs Add Eager mode for ScaledJobs Oct 26, 2023
@zroubalik
Copy link
Member

What is your ScaledJob config? What do you set as a maxReplicaCount?

@junekhan
Copy link
Contributor Author

---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: worker
spec:
  jobTargetRef:
    template:
      metadata:
        labels:
          app: worker
      spec:
        containers:
          - name: worker
            image: worker-image
            imagePullPolicy: Always
            resources:
              requests:
                memory: "4Gi"
                cpu: "1"
            env:
              - name: WORKER_TYPE
                value: pilot
              - name: RB_USERNAME
                valueFrom:
                  secretKeyRef:
                    name: rabbitmq-instance-default-user
                    key: username
              - name: RB_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: rabbitmq-instance-default-user
                    key: password
              - name: RABBITMQ_URL
                value: amqp://$(RB_USERNAME):$(RB_PASSWORD)@rabbitmq-instance.arc:5672
        restartPolicy: Never
    backoffLimit: 1
  pollingInterval: 10 # Optional. Default: 30 seconds
  maxReplicaCount: 10 # Optional. Default: 100
  successfulJobsHistoryLimit: 0 # Optional. Default: 100. How many completed jobs should be kept.
  failedJobsHistoryLimit: 5 # Optional. Default: 100. How many failed jobs should be kept.
  triggers:
    - type: rabbitmq
      metadata:
        queueName: woker_queue
        hostFromEnv: RABBITMQ_URL
        mode: QueueLength
        value: "1"

@zroubalik Basically like this.

@JorTurFer
Copy link
Member

I have a question, if you want to process as much as possible, and then finish, doesn't ScaledObject fit better with your use case? You can set queueLength: 1 and all the instances will be removed at the end if you set minReplicaCount:0

@junekhan
Copy link
Contributor Author

junekhan commented Oct 27, 2023

@JorTurFer Because these workers are long-running inherently, I preferred ScaledJob to ScaledObject naturally. This section concerned me as pods can be terminated unexpectedly in terms of ScaledObject.

Getting back to ScaledJob, let's imagine a case with 3 running pods and another 3 messages standing in line, and each of them takes 3 hours or even longer to run. Does it sound better if we empty the queue and run 6 pods in parallel within our affordable limit which is 10 replicas?

@JorTurFer
Copy link
Member

You are right, ScaledJob is the best fit for your use case.
In any case, if you set a message per pod, you will schedule a job per message, until the maxReplicaCount, so it's the fasters as possible, isn't it enough?
Based on your previous comments, you want to scale to max when there is any message, and scale to 0 when there isn't any message, am I right?
If yes, you could teak this behaviour using v2.12 and the experimental feature for scaling modifiers (based on formulas). The best part of this, is that you can use ScaledObject, allowing the scaling to 0 when there isn't any message in the queue

@junekhan
Copy link
Contributor Author

junekhan commented Oct 30, 2023

@JorTurFer You are correct. Thanks for the solution. I will try it out.

@zroubalik
Copy link
Member

Getting back to ScaledJob, let's imagine a case with 3 running pods and another 3 messages standing in line, and each of them takes 3 hours or even longer to run. Does it sound better if we empty the queue and run 6 pods in parallel within our affordable limit which is 10 replicas?

We should revisit the scaling behavior for ScaledJob as that ^ is something that should be doable.

Copy link

stale bot commented Jan 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jan 1, 2024
Copy link

stale bot commented Jan 9, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Jan 9, 2024
@zroubalik zroubalik reopened this Jan 10, 2024
@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Jan 10, 2024
Copy link

stale bot commented Mar 10, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Mar 10, 2024
@zroubalik zroubalik removed the stale All issues that are marked as stale due to inactivity label Mar 11, 2024
Copy link

stale bot commented May 10, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label May 10, 2024
Copy link

stale bot commented May 18, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed May 18, 2024
@zroubalik zroubalik reopened this May 20, 2024
@zroubalik zroubalik removed the stale All issues that are marked as stale due to inactivity label May 20, 2024
@junekhan junekhan linked a pull request Jun 8, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All issues for new features that have not been committed to needs-discussion
Projects
Status: Proposed
Development

Successfully merging a pull request may close this issue.

4 participants