Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume missing from job spec after updating scaled job manifest #1894

Closed
choppedpork opened this issue Jun 16, 2021 · 9 comments
Closed

Volume missing from job spec after updating scaled job manifest #1894

choppedpork opened this issue Jun 16, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@choppedpork
Copy link

choppedpork commented Jun 16, 2021

Report

Following a seemingly unrelated change to a ScaledJob configuration the volumes entry disappeared from the manifest within the cluster resulting in jobs failing to launch. This is the abbreviated change I was deploying (obtained via helm diff revision):

  # Source: a-b-c/templates/scaledjob.yaml
  apiVersion: keda.sh/v1alpha1
  kind: ScaledJob
  metadata:
(...)
  spec:
    jobTargetRef:
      parallelism: 1
      activeDeadlineSeconds: 86400
      backoffLimit: 5
      template:
        metadata:
          annotations:
            "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
            prometheus.io/path: /management/prometheus
            prometheus.io/scrape: "true"
-           latest-timestamp: "20210615092819"
+           latest-timestamp: "20210616120657"
(...)
          containers:
            - name: a-b-c
              securityContext:
                capabilities:
                  drop:
                  - ALL
                readOnlyRootFilesystem: true
-               runAsNonRoot: true
-               runAsUser: 1000
              image: "a-b-c:latest"
              imagePullPolicy: Always
(...)
              volumeMounts:
              - mountPath: /x/storage/media
                name: media
                subPath: media
(...)
          volumes:
          - name: media
            nfs:
              server: 1.2.3.4
              path: /storage
          restartPolicy: Never
    pollingInterval: 30
    successfulJobsHistoryLimit: 5
    failedJobsHistoryLimit: 5
    maxReplicaCount: 40
    scalingStrategy:
      strategy: accurate
    triggers:
      - type: prometheus
        metadata:
          serverAddress: http://monitoring-prometheus1:9090
          metricName: job_queue_size
          query: sum(job_queue_count{environment="qa3", service="a-b-c"}) by (service)
          threshold: "1"
(...)

Expected Behavior

Change deployed successfully

Actual Behavior

Even though the ScaledJob manifest was rendered as expected by helm (and persisted as such in the helm release data), the actual scaled job object was missing the volumes section of the job spec causing the following error:

{
  "level": "error",
  "ts": 1623840367.1724842,
  "logger": "scaleexecutor",
  "msg": "Failed to create a new Job",
  "scaledJob.Name": "a-b-job",
  "scaledJob.Namespace": "a",
  "error": "Job.batch \"a-b-job-cs42p\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"media\"",
  "stacktrace": "github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale\n\t/workspace/pkg/scaling/executor/scale_jobs.go:48\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers\n\t/workspace/pkg/scaling/scale_handler.go:209\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop\n\t/workspace/pkg/scaling/scale_handler.go:146"
}

After spending some time analysing the differences between both environments where the change was deployed to (the successful and the failed one) as well as changes between the revisions I felt like I've hit a dead end in terms of investigating so I've manually removed the scaled job spec using kubectl, redeployed with helm (outside of the constantly changing latest-timestamp annotation and re-creating the scaled job there were no changes)... and everything is back to normal.

Steps to Reproduce the Problem

At the moment I do not have one - this issue manifested only in one of the two clusters I've deployed the change to (FWIW the other cluster is GKE whereas the issue occurred in EKS). I do not currently have the capacity to test this elsewhere, unfortunately - but do let me know if that's going to be required and I should be able to arrange something in the next few days.

Logs from KEDA operator

No response

KEDA Version

2.2.0

Kubernetes Version

1.19

Platform

Amazon Web Services

Scaler Details

prometheus

Anything else?

I do not have a deep understanding of the CRD lifecycle - I'm running on an assumption here that helm submits the manifest which is then processed by the operator and an object is created, rather than this being possibly a bug in helm where the volumes section was somehow skipped (even though it's present in the manifest for this release). Hope I got this right and apologies for polluting your issues if not!

@choppedpork choppedpork added the bug Something isn't working label Jun 16, 2021
@TsuyoshiUshio
Copy link
Contributor

Hi @choppedpork

Thank you for the detail report!

I had a look the code, We just deep copy the jobTargetRef then create a new scaled job. Both the definition both https://github.com/kedacore/keda/blob/main/api/v1alpha1/scaledjob_types.go#L29 and logic https://github.com/kedacore/keda/blame/main/pkg/scaling/executor/scale_jobs.go#L106 hasn't change for a while. I can't find particular issue point. Is the volume exists, there is no outage?

Can we wait until someone else or you can reproduce the issue? Then we can investigate more.

@choppedpork
Copy link
Author

@TsuyoshiUshio thanks for having a look. The volume did indeed exist at the time - after removing and redeploying the scaled job definition everything was back to normal. I was quite surprised with this - especially as the volumes key was the only key missing (ie. the spec wasn't just abruptly truncated), my assumption was that the jobspec part of the manifest would be just verbatim copied but wondered if there could be a templating engine (or similar) in the pipeline somewhere that hit an odd edge case - even more surprised this happened now that I know that is not the case!

More than happy to park this until another report surfaces - I have not seen the issue again myself.

@choppedpork
Copy link
Author

Some more (thought I appreciate almost certainly not enough) info as I've just hit this issue again. This is the same cluster but a different application to which I've added scaledJob support - some variables are now removed from the equation:

  • in this case the job uses two volumes, one being an emptyDir so I think we can discount network storage unavailability
  • the only change between the subsequent deployments (after introducing a Keda object into the cahrt) of this scaledJob was the tag/version of the container in the job spec
  • this scaledJob is almost identical to the one I've hit an issue with previously - the only differences being volumes, port, image

After trawling through the application logs I can confirm that a few jobs were successfully launched since adding this scaledJob - the volumes "disappeared" as part of a deployment were the only change was the docker image tag for the job container. Considering this it should, at least theoretically, be relatively straightforward for me to try and reproduce this by changing the tag. Is there any settings (eg. logging) I could tweak or any other debugging I could do to help triage this further?

@tomkerkhove
Copy link
Member

@zroubalik should be able to help with that

@zroubalik
Copy link
Member

zroubalik commented Jun 24, 2021

Log levels: https://github.com/kedacore/keda/blob/main/BUILD.md#setting-log-levels

Just to confirm, this is happening on one cluster with each ScaledJob that you apply there? Is there a fresh installation of KEDA or was it updated from some earlier version? I recommend you to fully uninstall KEDA and install a new 2.3.0 version. And be sure that you have actually removed KEDA crds as part of the uninstall. (kubectl get crd | grep keda)

I am just guessing, but only thing that might be somehow related is this change: #1686 so let's be sure that you have up to date CRDS. But honestly I don't belive that's related, just a guess 🤷‍♂️

The problem is indeed very strange.

@choppedpork
Copy link
Author

Thanks @zroubalik!

This has so far only happened on one cluster and I've had an issue with both of the two ScaledJobs we currently have there. We've started our Keda adventure on 2.2 so no upgrades yet - will try to get it up to 2.3 asap and bump the operator log level to debug.

I've had a look at both #1686 and the kubernetes-sigs/controller-tools bug and I'm increasingly thinking the problem is more likely to be happening within k8s itself or even helm (even though I've discarded that possibility early on looking at generated release manifests). One variable I haven't properly considered yet is that the GKE cluster I've not seen this issue in is on 1.18 whereas the one where I've seen this happen is EKS 1.19... which just adds to the possibility that the issue might be occurring outside of Keda entirely. I'll try to dedicate some time to coming up with a reliable repro scenario.

@choppedpork
Copy link
Author

choppedpork commented Jul 15, 2021

edit: crossed out information subsequently found to be completely irrelevant and caused by a templating bug

The plot, unfortunately, thickens. My colleague just hit this issue on three separate clusters. Since my last update we've updated to k8s 1.20 and are using KEDA 2.3 (via 2.3.2 helm chart). These clusters are all in EKS. Interestingly (and rather annoyingly) the issue is 100% reproducible in all of them ie. we can't actually get it to work - my previous workaround of removing the scaled job definition did not work in either of these cases and neither did a full undeploy of KEDA (incl. removing the CRDs) followed by a redeploy. The good news is that this means I've got some debug logs captured but the bad news is that there's nothing out of the ordinary in them (see below). We've ruled out this being a helm issue this time by attempting to create a new scaledjob (using the same spec) via kubectl apply. After applying the spec is exactly how we'd expect it to be... except for the volumes key which is missing :/

In the below logs a-b-job is the one having issues (as it depends on a volume mount) whereas x-z-job works fine - it's almost identical in terms of the spec (differences are image and port) but does not use volumes.

I0715 12:21:20.548367       1 request.go:655] Throttling request took 1.04769519s, request: GET:https://172.20.0.1:443/apis/node.k8s.io/v1?timeout=32s
{"level":"info","ts":1626351681.6014469,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1626351681.6026566,"logger":"controllers.ScaledObject","msg":"Running on Kubernetes 1.20+","version":"v1.20.4-eks-6b7464"}
{"level":"info","ts":1626351681.6028056,"logger":"setup","msg":"Starting manager"}
{"level":"info","ts":1626351681.6028216,"logger":"setup","msg":"KEDA Version: 2.3.0"}
{"level":"info","ts":1626351681.6028247,"logger":"setup","msg":"Git Commit: 29febffc1ec32f41836082c587ba860b7c59d3cd"}
{"level":"info","ts":1626351681.6028273,"logger":"setup","msg":"Go Version: go1.15.6"}
{"level":"info","ts":1626351681.6028368,"logger":"setup","msg":"Go OS/Arch: linux/amd64"}
I0715 12:21:21.603024       1 leaderelection.go:243] attempting to acquire leader lease foo/operator.keda.sh...
{"level":"info","ts":1626351681.6030254,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
I0715 12:21:39.040124       1 leaderelection.go:253] successfully acquired lease foo/operator.keda.sh
{"level":"debug","ts":1626351699.0401707,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ConfigMap","namespace":"foo","name":"operator.keda.sh","uid":"68b2b651-cad0-42ad-89c0-e1bb60a2e523","apiVersion":"v1","resourceVersion":"257082"},"reason":"LeaderElection","message":"keda-operator-74b6c8d7c8-q6zpn_c8aa6d97-2626-408a-9089-6c3c9a30b47c became leader"}
{"level":"info","ts":1626351699.0403023,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"keda.sh","reconcilerKind":"TriggerAuthentication","controller":"triggerauthentication","source":"kind source: /, Kind="}
{"level":"info","ts":1626351699.0403106,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob","source":"kind source: /, Kind="}
{"level":"info","ts":1626351699.04029,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"keda.sh","reconcilerKind":"ClusterTriggerAuthentication","controller":"clustertriggerauthentication","source":"kind source: /, Kind="}
{"level":"info","ts":1626351699.0403981,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledObject","controller":"scaledobject","source":"kind source: /, Kind="}
{"level":"info","ts":1626351699.1405826,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"keda.sh","reconcilerKind":"TriggerAuthentication","controller":"triggerauthentication"}
{"level":"info","ts":1626351699.140736,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledObject","controller":"scaledobject","source":"kind source: /, Kind="}
{"level":"info","ts":1626351699.240714,"logger":"controller","msg":"Starting workers","reconcilerGroup":"keda.sh","reconcilerKind":"TriggerAuthentication","controller":"triggerauthentication","worker count":1}
{"level":"info","ts":1626351699.2408056,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"keda.sh","reconcilerKind":"ClusterTriggerAuthentication","controller":"clustertriggerauthentication"}
{"level":"info","ts":1626351699.2408144,"logger":"controller","msg":"Starting workers","reconcilerGroup":"keda.sh","reconcilerKind":"ClusterTriggerAuthentication","controller":"clustertriggerauthentication","worker count":1}
{"level":"info","ts":1626351699.2409205,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob"}
{"level":"info","ts":1626351699.2409337,"logger":"controller","msg":"Starting workers","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob","worker count":1}
{"level":"info","ts":1626351699.2409644,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledObject","controller":"scaledobject"}
{"level":"info","ts":1626351699.240987,"logger":"controller","msg":"Starting workers","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledObject","controller":"scaledobject","worker count":1}
{"level":"info","ts":1626351704.5787332,"logger":"controllers.ScaledJob","msg":"Reconciling ScaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"info","ts":1626351704.5787542,"logger":"controllers.ScaledJob","msg":"Adding Finalizer for the ScaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"info","ts":1626351704.5958037,"logger":"controllers.ScaledJob","msg":"Detected ScaleType = Job","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"info","ts":1626351704.6960938,"logger":"controllers.ScaledJob","msg":"Deleting jobs owned by the previous version of the scaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job","Number of jobs to delete":8}
{"level":"debug","ts":1626351704.696145,"logger":"controllers.ScaledJob","msg":"Starting a new ScaleLoop","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"info","ts":1626351704.696223,"logger":"controllers.ScaledJob","msg":"Initializing Scaling logic according to ScaledObject Specification","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"debug","ts":1626351704.6962407,"logger":"controllers.ScaledJob","msg":"ScaledJob is defined correctly and is ready to scaling","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"debug","ts":1626351704.6962612,"logger":"controller","msg":"Successfully Reconciled","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob","name":"a-b-job","namespace":"foo"}
{"level":"info","ts":1626351704.6963358,"logger":"controllers.ScaledJob","msg":"Reconciling ScaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"info","ts":1626351704.6963482,"logger":"controllers.ScaledJob","msg":"Detected ScaleType = Job","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"debug","ts":1626351704.6963305,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351704.6963599,"logger":"controllers.ScaledJob","msg":"Deleting jobs owned by the previous version of the scaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job","Number of jobs to delete":8}
{"level":"debug","ts":1626351704.696366,"logger":"controllers.ScaledJob","msg":"Starting a new ScaleLoop","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"debug","ts":1626351704.6963463,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"a-b-job","uid":"74a865e7-1717-43ad-b81e-e7732bb0cdf8","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257117"},"reason":"KEDAScalersStarted","message":"Started scalers watch"}
{"level":"info","ts":1626351704.6963909,"logger":"controllers.ScaledJob","msg":"Initializing Scaling logic according to ScaledObject Specification","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"debug","ts":1626351704.6963944,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"a-b-job","uid":"74a865e7-1717-43ad-b81e-e7732bb0cdf8","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257117"},"reason":"ScaledJobReady","message":"ScaledJob is ready for scaling"}
{"level":"debug","ts":1626351704.6964102,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"a-b-job","uid":"74a865e7-1717-43ad-b81e-e7732bb0cdf8","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257117"},"reason":"ScaledJobReady","message":"ScaledJob is ready for scaling"}
{"level":"debug","ts":1626351704.6963038,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"debug","ts":1626351704.6963978,"logger":"controllers.ScaledJob","msg":"ScaledJob is defined correctly and is ready to scaling","ScaledJob.Namespace":"foo","ScaledJob.Name":"a-b-job"}
{"level":"debug","ts":1626351704.696422,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"debug","ts":1626351704.696428,"logger":"controller","msg":"Successfully Reconciled","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob","name":"a-b-job","namespace":"foo"}
{"level":"debug","ts":1626351704.6964705,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351704.7994597,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351704.7994986,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351704.7994792,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351704.7995937,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351704.800564,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351704.800583,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351704.8006084,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351704.8006134,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351704.8006165,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351704.8006203,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo"}
{"level":"debug","ts":1626351704.8006291,"logger":"scalehandler","msg":"Watching with pollingInterval","type":"ScaledJob","namespace":"foo","name":"a-b-job","PollingInterval":30}
{"level":"info","ts":1626351704.8006551,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351704.8006656,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351704.8006947,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351704.8006997,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351704.8007033,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351704.800707,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo"}
{"level":"debug","ts":1626351704.8007164,"logger":"scalehandler","msg":"Watching with pollingInterval","type":"ScaledJob","namespace":"foo","name":"a-b-job","PollingInterval":30}
{"level":"debug","ts":1626351704.8007264,"logger":"scalehandler","msg":"Context canceled","type":"ScaledJob","namespace":"foo","name":"a-b-job"}
{"level":"debug","ts":1626351734.800766,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351734.8022485,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351734.8022628,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351734.8034718,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351734.803482,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351734.803531,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351734.803539,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351734.8035438,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351734.8035486,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo"}
{"level":"info","ts":1626351749.289601,"logger":"controllers.ScaledJob","msg":"Reconciling ScaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"info","ts":1626351749.2896214,"logger":"controllers.ScaledJob","msg":"Adding Finalizer for the ScaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"info","ts":1626351749.3047543,"logger":"controllers.ScaledJob","msg":"Detected ScaleType = Job","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"info","ts":1626351749.3047903,"logger":"controllers.ScaledJob","msg":"Deleting jobs owned by the previous version of the scaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job","Number of jobs to delete":8}
{"level":"debug","ts":1626351749.304799,"logger":"controllers.ScaledJob","msg":"Starting a new ScaleLoop","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"info","ts":1626351749.3048432,"logger":"controllers.ScaledJob","msg":"Initializing Scaling logic according to ScaledObject Specification","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"debug","ts":1626351749.3048532,"logger":"controllers.ScaledJob","msg":"ScaledJob is defined correctly and is ready to scaling","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"debug","ts":1626351749.3048594,"logger":"controller","msg":"Successfully Reconciled","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob","name":"x-z-job","namespace":"foo"}
{"level":"info","ts":1626351749.3048997,"logger":"controllers.ScaledJob","msg":"Reconciling ScaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"info","ts":1626351749.3049097,"logger":"controllers.ScaledJob","msg":"Detected ScaleType = Job","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"info","ts":1626351749.3049197,"logger":"controllers.ScaledJob","msg":"Deleting jobs owned by the previous version of the scaledJob","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job","Number of jobs to delete":8}
{"level":"debug","ts":1626351749.3049247,"logger":"controllers.ScaledJob","msg":"Starting a new ScaleLoop","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"debug","ts":1626351749.30493,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"x-z-job","env-var-name":"CONSUL_HOST"}
{"level":"debug","ts":1626351749.30493,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"x-z-job","uid":"47b4773c-09b2-48ed-acb5-2a4b5d9d7bf5","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257287"},"reason":"KEDAScalersStarted","message":"Started scalers watch"}
{"level":"info","ts":1626351749.3049626,"logger":"controllers.ScaledJob","msg":"Initializing Scaling logic according to ScaledObject Specification","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"debug","ts":1626351749.304988,"logger":"controllers.ScaledJob","msg":"ScaledJob is defined correctly and is ready to scaling","ScaledJob.Namespace":"foo","ScaledJob.Name":"x-z-job"}
{"level":"debug","ts":1626351749.3049986,"logger":"controller","msg":"Successfully Reconciled","reconcilerGroup":"keda.sh","reconcilerKind":"ScaledJob","controller":"scaledjob","name":"x-z-job","namespace":"foo"}
{"level":"debug","ts":1626351749.3049963,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"x-z-job","uid":"47b4773c-09b2-48ed-acb5-2a4b5d9d7bf5","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257287"},"reason":"ScaledJobReady","message":"ScaledJob is ready for scaling"}
{"level":"debug","ts":1626351749.3049874,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"x-z-job","env-var-name":"CONSUL_HOST"}
{"level":"debug","ts":1626351749.3050182,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"x-z-job","uid":"47b4773c-09b2-48ed-acb5-2a4b5d9d7bf5","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257287"},"reason":"ScaledJobReady","message":"ScaledJob is ready for scaling"}
{"level":"debug","ts":1626351749.3049371,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"x-z-job","env-var-name":"CONSUL_HOST"}
{"level":"debug","ts":1626351749.3050444,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"x-z-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351749.30665,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351749.306664,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351749.3067267,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351749.3067398,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351749.3079839,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351749.3079946,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351749.3080027,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351749.3080113,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351749.3080184,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351749.3080292,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"info","ts":1626351749.3080328,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351749.3080442,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351749.308053,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351749.308058,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo"}
{"level":"debug","ts":1626351749.308067,"logger":"scalehandler","msg":"Watching with pollingInterval","type":"ScaledJob","namespace":"foo","name":"x-z-job","PollingInterval":30}
{"level":"debug","ts":1626351749.308034,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351749.3080792,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo"}
{"level":"debug","ts":1626351749.308088,"logger":"scalehandler","msg":"Watching with pollingInterval","type":"ScaledJob","namespace":"foo","name":"x-z-job","PollingInterval":30}
{"level":"debug","ts":1626351749.3080955,"logger":"scalehandler","msg":"Context canceled","type":"ScaledJob","namespace":"foo","name":"x-z-job"}
{"level":"debug","ts":1626351764.8036873,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351764.8051345,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351764.8051524,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351764.806322,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351764.8063383,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351764.8063688,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351764.806375,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351764.8063786,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351764.8063827,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo"}
{"level":"debug","ts":1626351779.3081942,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"x-z-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351779.3098607,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":false}
{"level":"info","ts":1626351779.309878,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351779.311202,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":0}
{"level":"info","ts":1626351779.311218,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":0}
{"level":"info","ts":1626351779.3112597,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351779.3112724,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351779.311278,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351779.3112833,"logger":"scaleexecutor","msg":"No change in activity","scaledJob.Name":"x-z-job","scaledJob.Namespace":"foo"}
{"level":"debug","ts":1626351794.8065164,"logger":"scalehandler","msg":"cannot resolve env to a value. fieldRef and resourceFieldRef env are skipped","type":"ScaledJob","namespace":"foo","name":"a-b-job","env-var-name":"CONSUL_HOST"}
{"level":"info","ts":1626351794.8080416,"logger":"scalehandler","msg":"Active trigger","Scaler":{},"isTriggerActive":true}
{"level":"info","ts":1626351794.8080592,"logger":"scalehandler","msg":"Scaler targetAverageValue","Scaler":{},"targetAverageValue":1}
{"level":"info","ts":1626351794.8093889,"logger":"scalehandler","msg":"QueueLength Metric value","Scaler":{},"queueLength":4}
{"level":"info","ts":1626351794.8094027,"logger":"scalehandler","msg":"Scaler is active","Scaler":{}}
{"level":"info","ts":1626351794.8094068,"logger":"scalehandler","msg":"Scaler maxValue","maxValue":4}
{"level":"info","ts":1626351794.809444,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of running Jobs":0}
{"level":"info","ts":1626351794.8094573,"logger":"scaleexecutor","msg":"Scaling Jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of pending Jobs ":0}
{"level":"debug","ts":1626351794.8094616,"logger":"scaleexecutor","msg":"Selecting Scale Strategy","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","specified":"accurate","selected":"accurate"}
{"level":"debug","ts":1626351794.8094664,"logger":"scaleexecutor","msg":"At least one scaler is active","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo"}
{"level":"info","ts":1626351794.8561623,"logger":"scaleexecutor","msg":"Creating jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Effective number of max jobs":4}
{"level":"info","ts":1626351794.856187,"logger":"scaleexecutor","msg":"Creating jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of jobs":4}
{"level":"error","ts":1626351794.882787,"logger":"scaleexecutor","msg":"Failed to create a new Job","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","error":"Job.batch \"a-b-job-nhbrk\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"tmp\"","stacktrace":"github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale\n\t/workspace/pkg/scaling/executor/scale_jobs.go:48\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers\n\t/workspace/pkg/scaling/scale_handler.go:221\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop\n\t/workspace/pkg/scaling/scale_handler.go:156"}
{"level":"error","ts":1626351794.8875687,"logger":"scaleexecutor","msg":"Failed to create a new Job","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","error":"Job.batch \"a-b-job-s8c8g\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"tmp\"","stacktrace":"github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale\n\t/workspace/pkg/scaling/executor/scale_jobs.go:48\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers\n\t/workspace/pkg/scaling/scale_handler.go:221\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop\n\t/workspace/pkg/scaling/scale_handler.go:156"}
{"level":"error","ts":1626351794.8923187,"logger":"scaleexecutor","msg":"Failed to create a new Job","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","error":"Job.batch \"a-b-job-6m947\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"tmp\"","stacktrace":"github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale\n\t/workspace/pkg/scaling/executor/scale_jobs.go:48\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers\n\t/workspace/pkg/scaling/scale_handler.go:221\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop\n\t/workspace/pkg/scaling/scale_handler.go:156"}
{"level":"error","ts":1626351794.904223,"logger":"scaleexecutor","msg":"Failed to create a new Job","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","error":"Job.batch \"a-b-job-c5bl9\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"tmp\"","stacktrace":"github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale\n\t/workspace/pkg/scaling/executor/scale_jobs.go:48\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers\n\t/workspace/pkg/scaling/scale_handler.go:221\ngit.luolix.top/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop\n\t/workspace/pkg/scaling/scale_handler.go:156"}
{"level":"info","ts":1626351794.9042645,"logger":"scaleexecutor","msg":"Created jobs","scaledJob.Name":"a-b-job","scaledJob.Namespace":"foo","Number of jobs":4}
{"level":"debug","ts":1626351794.9043229,"logger":"controller-runtime.manager.events","msg":"Normal","object":{"kind":"ScaledJob","namespace":"foo","name":"a-b-job","uid":"74a865e7-1717-43ad-b81e-e7732bb0cdf8","apiVersion":"keda.sh/v1alpha1","resourceVersion":"257117"},"reason":"KEDAJobsCreated","message":"Created 4 jobs"}

@choppedpork
Copy link
Author

Hello,

I think I've got some good news for yourselves as the last issue turned out to be a templating problem on our side - I was distracted by it looking so similar to the previous occurrence and started digging deeper instead of re-evaluating the basics...

I am currently reviewing whether the originally reported two instances of this issue could have been caused by the same problem - we have two templating engines in play here so this is not unlikely. Based on this and the fact that there's still not much for you to go on I will close this issue now.

@tomkerkhove
Copy link
Member

Good to hear and sorry for not replying sooner!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants