Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[infra.ci.jenkins.io] Cron triggered build are not started since 2022-02-26 #2803

Closed
dduportal opened this issue Mar 1, 2022 · 34 comments · Fixed by jenkins-infra/pipeline-library#315

Comments

@dduportal
Copy link
Contributor

Service

infra.ci.jenkins.io

Summary

Since the 24th of february 2022, we do not see any build starting with a cron trigger (example: jenkins-infra/kubernetes-manageemnt every 30 min, or Terraform jobs daily).

It seems correcelmated to the weekly 2.336 update, but cannot be sure as other element also changed the 23/24: it's a hunch but not a proof.

Reproduction steps

No response

@MarkEWaite
Copy link

I confirmed that cron jobs worked as expected for me in Jenkins 2.336, at least for a job that was configured to run every two minutes.

@dduportal
Copy link
Contributor Author

Capture d’écran 2022-03-01 à 18 01 29

The error seems to come with the ternary operator. We are digging in this direction

@lemeurherve
Copy link
Member

lemeurherve commented Mar 1, 2022

What we had before and worked:

(1)

pipeline {
  agent none

  options {
    buildDiscarder(logRotator(numToKeepStr: '10'))
    timeout(time: 30, unit: 'MINUTES')
    disableConcurrentBuilds()
  }

  triggers {
    cron (env.BRANCH_NAME == 'main' ? 'H/30 * * * *' : '')
  }

  stages {

What we've tried:

(2)

pipeline {
  agent none

  options {
    buildDiscarder(logRotator(numToKeepStr: '10'))
    timeout(time: 30, unit: 'MINUTES')
    disableConcurrentBuilds()
  }

  triggers {
    cron (env.BRANCH_IS_PRIMARY ? 'H/30 * * * *' : '')
  }

  stages {

(3)

String cronPeriod = env.BRANCH_IS_PRIMARY ? 'H/30 * * * *' : ''

pipeline {
  agent none

  options {
    buildDiscarder(logRotator(numToKeepStr: '10'))
    timeout(time: 30, unit: 'MINUTES')
    disableConcurrentBuilds()
  }

  triggers {
    cron (cronPeriod)
  }

  stages {

Notes: for this last try, an echo "cronPeriod: ${crondPeriod}" later in one of the first steps returns the expected value:

cronPeriod: H/30 * * * *

cron ('H/30 * * * *') alone triggers cron jobs.

@dduportal
Copy link
Contributor Author

Cc @MarkEWaite can you check on your own setup if you see the same behavior with a declarative pipeline using a ternary form?

@jglick
Copy link

jglick commented Mar 1, 2022

Try

  triggers {
    cron("${BRANCH_NAME == 'main' ? 'H/30 * * * *' : ''}")
  }

perhaps.

If you find yourself trying tricks like this, just move to Scripted.

@dduportal
Copy link
Contributor Author

If you find yourself trying tricks like this, just move to Scripted.

That sounds like a lot of pain for only 1 feature to be honest. I would use the crontab on all the branches instead.

Scripted is powerful and your tip makes sense, but honestly, my brain is not wired at all for "coding my pipeline", even after 7 years using scripted.

@dduportal
Copy link
Contributor Author

try

triggers {
cron("${BRANCH_NAME == 'main' ? 'H/30 * * * *' : ''}")
}

Thanks, we're going to try this one.

Weird thing: I wanted to check if this issue also happened with scripted, and it appears that yes, it is.

The pipeline https://github.com/jenkins-infra/aws/blob/main/Jenkinsfile_k8s utilizes the shared library https://github.com/jenkins-infra/pipeline-library/blob/master/vars/terraform.groovy#L33 which is full scripted. And there has been no builds on this one since the 24th, which is weird.

We are now trying the recomendation that @jglick and @jnord (thanks folks!) gave us to use scripted , on a simpler pipeline not involving shared library (to remove as much moving pieces as possible) + checking the Jenkins log carefully

@dduportal
Copy link
Contributor Author

OK, so it does not sound related to any pipeline syntax: even the formal classic syntax cron ('H/30 * * * *') did not work since our "experiments" yesterday.

Currently capturing the logs to see what is happening.

@dduportal
Copy link
Contributor Author

New checks:

  • The config.xml file is up-to-date with the pipeline configuration: the main branch has the
 <triggers>
        <string>hudson.triggers.TimerTrigger</string>
  </triggers>

set up as expected, but the UI dos not show the option selected (in the "view configuration").

  • Reloading the configuration from disk did not change anything, neither "casc reload + reload config"
  • The logs (not published here because sensitive data) are not showing any warning or error, as far as I can tell, regarding trigger or cron
  • There is no alert in the UI (the error/warning bells)

=> next step: gotta try to delete pod to force a full startup phase, and also a full "backup + rollback to version of 3 weeks ago"

@dduportal dduportal changed the title [infra.ci.jenkins.io] Cron triggered build are not started since 2022-02-24 [infra.ci.jenkins.io] Cron triggered build are not started since 2022-02-26 Mar 2, 2022
@dduportal
Copy link
Contributor Author

dduportal commented Mar 2, 2022

  • Backed-up the jenkins_home
  • Rollbacked manually to 2.335 to see what happens => nothing changed
  • Deleted the k8s management jobs and reloading Casc => cron is now taken in account

@dduportal
Copy link
Contributor Author

  • Setting the instance back to 2.337 + restoring the job kubernetes from backup to see if it changes anything

dduportal added a commit to jenkins-infra/kubernetes-management that referenced this issue Mar 2, 2022
@dduportal
Copy link
Contributor Author

  • Tried (on the job kubernetes-management) to delete the config.xml file of the main branch + scanning the repo: no change

@jglick
Copy link

jglick commented Mar 2, 2022

That config.xml does not look right. It should contain e.g.

  <properties>
    <org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>
      <triggers>
        <hudson.triggers.TimerTrigger>
          <spec>H/30 * * * *</spec>
        </hudson.triggers.TimerTrigger>
      </triggers>
    </org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>
  </properties>

That is what I get from running 2.319.3, installing pipeline-model-definition, and running a (standalone) Pipeline defined as

pipeline {
    agent none
    triggers {
        cron 'H/30 * * * *'
    }
    stages {
        stage('x') {
            steps {
                echo 'ok'
            }
        }
    }
}

Similarly when I create a multibranch Pipeline with a Git branch source, after waiting for branch indexing and the automatic initial build of the master branch project, though in that case there is also a org.jenkinsci.plugins.workflow.multibranch.BranchJobProperty entry as expected.

@jglick
Copy link

jglick commented Mar 2, 2022

Oh and

diff --git Jenkinsfile Jenkinsfile
index e20b16b..d03e940 100644
--- Jenkinsfile
+++ Jenkinsfile
@@ -1,7 +1,7 @@
 pipeline {
     agent none
     triggers {
-        cron 'H/30 * * * *'
+        cron "${BRANCH_NAME == 'master' ? 'H/20 * * * *' : ''}"
     }
     stages {
         stage('x') {

did indeed work as expected for me after pushing to master and also creating a branch based on that. The master branch project is using the H/20 schedule, and the other branch project is using an empty schedule.

Also tried * * * * * on master and confirmed that builds are kicked off every minute on the master branch project but not on the other. “works on my machine”

@jglick
Copy link

jglick commented Mar 2, 2022

Upgraded to 2.337 and updated Pipeline plugins accordingly. All still seems to be working.

Oh I think I see what you were confused by.

    <org.jenkinsci.plugins.pipeline.modeldefinition.actions.DeclarativeJobPropertyTrackerAction plugin="pipeline-model-definition@…">
      <jobProperties/>
      <triggers>
        <string>hudson.triggers.TimerTrigger</string>
      </triggers>
      <parameters/>
      <options/>
    </org.jenkinsci.plugins.pipeline.modeldefinition.actions.DeclarativeJobPropertyTrackerAction>

is correct. This is not the definition of the trigger, though; this is merely recording the fact that Declarative syntax did at some point specify a trigger, rather than it being via GUI configuration (which is of course impossible for a branch project anyway, but never mind that). The action trigger definition is in <properties> not <actions>.

@dduportal
Copy link
Contributor Author

@jglick oh interesting thanks! We're going to dive in that direction.

As for now, the only success was "rollback to 2.335 & the plugins defined in https://github.com/jenkins-infra/docker-jenkins-weekly/releases/tag/0.42.3-2.335" + "Delete the whole multibranch job from UI & reload JCasc to recreate it".

@dduportal
Copy link
Contributor Author

  • The "faulty" pipeline has the following properti section in the config.xml for the main branch:
<org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>
      <triggers/>
    </org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>

which mean that the pipeline does NOT write the correct config (and the config <-> UI <-> behavior is coherent)

  • If I delete the job directory and trigger scan organization, then it checks all the PR builds and trigger a build on main but no change on the config (or UI)

@dduportal
Copy link
Contributor Author

OK, we were able to find a way to reproduce the behavior, at least on this job on this instance:

  • Run a first build with the cron trigger defined in pipeline. Make sure that this build execute a long running task (let's say for 4-5 min)
  • Build config is updated as soon as the jenkinsfile is parsed => check the UI / config.xml, the cron trigger is enabled with the rule specified in jenkinsfile
  • While this build is running, update the Jenkinsfile in GH
  • A 2nd build starts and we see that the cron is disabled immediatly by the new build, even when pending

@jglick
Copy link

jglick commented Mar 2, 2022

Offhand sounds like a bug for pipeline-model-definition-plugin (Declarative). Did you check behavior of the corresponding Scripted syntax using the properties step?

@dduportal
Copy link
Contributor Author

Offhand sounds like a bug for pipeline-model-definition-plugin (Declarative). Did you check behavior of the corresponding Scripted syntax using the properties step?

Currently trying 2 scripted cases:

  • On a pipeline
  • On a pipeline through shared library

@dduportal
Copy link
Contributor Author

I confirm that the same problem happens with pipeline in full scripted:

There is definitively something fishy

@dduportal
Copy link
Contributor Author

I need help on this one from a Jenkins expert contributor.

At the same time, I'm trying to "bissect" what elements (core, plugins, combination" could help me pin when the issue happen.

Working on the following angles:

@dduportal
Copy link
Contributor Author

dduportal commented Mar 3, 2022

@dduportal
Copy link
Contributor Author

Pinning to 2.335: the bug is there. It means that it's either a plugin, or the setup.

dduportal added a commit to jenkins-infra/kubernetes-management that referenced this issue Mar 4, 2022
dduportal added a commit to jenkins-infra/kubernetes-management that referenced this issue Mar 4, 2022
@dduportal
Copy link
Contributor Author

  • Since we can reproduce the issue by triggering a 2nd build, we tried disabling the disablingConcurentBuild() option of the pipeline: the issue still happens (but takes a bit longer to appear).
  • We are trying to pin to 2.337 with a set of plugin from 10 days ago (e.g. before the issue)

@dduportal
Copy link
Contributor Author

dduportal commented Mar 4, 2022

  • 2.337 with old plugins does not have the issue
  • Bumped the blueocean plugin suite (outside config as code)to 1.25.3 => no issue
  • Bumped the plugins "Amazon EC2", "Dark Theme", "Azure VM Agents", "Datadog" and OpenTelemetry (outside config as code) to latest 25.3 => no issue

@timja
Copy link
Member

timja commented Mar 4, 2022

Good to know it wasn't dark theme 😂

@lemeurherve
Copy link
Member

Good to know it wasn't dark theme 😂

"The cron was lost in the dark"

@dduportal
Copy link
Contributor Author

OK, seems like that the culprit was the pipeline basic step plugin. Currently trying to confirm this.

@lemeurherve lemeurherve added this to the infra-team-sync-2022-03-08 milestone Mar 4, 2022
@jglick
Copy link

jglick commented Mar 4, 2022

You mean some update to workflow-basic-steps? Seems unlikely on the face of it, since this behavior of defining triggers is in workflow-job + pipeline-model-definition.

@dduportal
Copy link
Contributor Author

@jglick thanks for the pointers! I was (again) too fast to make conclusions: I might have found something but it will wait for next week.

@dduportal
Copy link
Contributor Author

Damn, the bug appears randomly whatever plugin combination I try. It's a mess, not sure how to handle this: we need help (we can delegate admin access to the instance, do whatever is needed).

Let's see after the weekend.

@dduportal
Copy link
Contributor Author

Thanks @ lot @daniel-beck for triple-checking!

@dduportal
Copy link
Contributor Author

Closing this issue as we were able to identify a short term fix + the PR jenkins-infra/pipeline-library#315 was opened for long term.

Good news: it's not a bug to weekly core or any plugin!

Bad news: it's an UX issue for non Jenkins-experts :'(

Many many thanks for everyone who helped and spent time on this to unblock us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants