Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci.jenkins.io] ACI agent are all failing #3274

Closed
dduportal opened this issue Nov 30, 2022 · 14 comments
Closed

[ci.jenkins.io] ACI agent are all failing #3274

dduportal opened this issue Nov 30, 2022 · 14 comments
Assignees
Labels
bug Something isn't working ci.jenkins.io

Comments

@dduportal
Copy link
Contributor

Service(s)

ci.jenkins.io

Summary

Since ~24 hours, all the ACI agent are failing.

On Azure portal, the containers are present, the image is successfully pulled, but the container process fails immediatly and is killed (then it is restart for a few times before Jenkins reaches the aci-plugin timeout and removes the ACI deployment).

The agent images where changed recently (but it was hard to catch due to us using the "latest" tag for each :'( ). Might be related to jenkinsci/docker-agent#320 (comment).

Reproduction steps

No response

@dduportal
Copy link
Contributor Author

  • Short term (for the upcoming hours): we've pinned the container images to the latest known versions until we determine what is failing.

@dduportal dduportal removed the triage Incoming issues that need review label Nov 30, 2022
@smerle33
Copy link
Contributor

smerle33 commented Dec 1, 2022

As seen during a debug session with the new JDK19, we were able to reproduce the issue.

The ACI container logs says:

Unrecognized VM option ''
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

@dduportal
Copy link
Contributor Author

This sounds related to jenkinsci/docker-inbound-agent#227 (ping @slide for info)

@dduportal
Copy link
Contributor Author

After creating a permanent agent on ci.jenkins.io named jenkins-infra-window-test, we ran on a Docker Windows machine an interactive inbound agent container to debug:

> docker run --rm -ti --entrypoint=pwsh jenkinsciinfra/inbound-agent-maven:jdk19-nanoserver
$env:JENKINS_JAVA_OPTS = '-XX:+PrintCommandLineFlags' # Same as our ACI containers
Set-PSDebug -Trace 2 # Enable low level debug for powershell
C:/ProgramData/Jenkins/jenkins-agent.ps1 -Url https://ci.jenkins.io/ -Secret <redacted> -Name jenkins-infra-window-test

Excerpt that points to https://github.com/jenkinsci/docker-inbound-agent/pull/227/files#diff-c0115ea87ae1728b4332277b7b3cd3742bb5ad159ec0a5578027b078b8aa123bR104 as the culprit:

# ...
DEBUG:   96+      >>>> $AgentArguments = @()
DEBUG:     ! SET $AgentArguments = ''.
DEBUG:   98+     if( >>>> ![System.String]::IsNullOrWhiteSpace($JenkinsJavaOpts)) {
DEBUG:  104+          >>>> $AgentArguments += Invoke-Expression "echo $JenkinsJavaOpts"
DEBUG:    1+  >>>> echo -XX:+PrintCommandLineFlags
DEBUG:     ! CALL function '<ScriptBlock>'
DEBUG:     ! SET $AgentArguments = '-XX: +PrintCommandLineFlags'.
DEBUG:  107+      >>>> $AgentArguments += @("-cp", "C:/ProgramData/Jenkins/agent.jar", "hudson.remoting.jnlp.Main", "-headless")
DEBUG:     ! SET $AgentArguments = '-XX: +PrintCommandLineFlags -cp C:/ProgramData…'.
# ...

It seems that the content of the variable JENKINS_JAVA_OPTS is interpolated by powershell: there is a "space" in the flag causing the JVM error message Unrecognized VM option '' (-XX:)

@dduportal
Copy link
Contributor Author

dduportal commented Dec 1, 2022

Tested successfully: using

$AgentArguments += @(`"$JenkinsJavaOpts`")

instead of

$AgentArguments += Invoke-Expression "echo $JenkinsJavaOpts"

as proposed in jenkinsci/docker-inbound-agent#227 (comment)

@slide
Copy link

slide commented Dec 1, 2022

We just need to revert that PR for now, the proposed change you have above won't work for all cases.

@dduportal
Copy link
Contributor Author

We just need to revert that PR for now, the proposed change you have above won't work for all cases.

Would a documentation PR would be ok instead of rollbacking the feature? (ref. jenkinsci/docker-agent#599)

@slide
Copy link

slide commented Dec 1, 2022

Yes, I didn't see the other comment before commenting here.

@dduportal
Copy link
Contributor Author

Yes, I didn't see the other comment before commenting here.

Make sense. I was annoyed to have to rollback such a useful feature instead of fixing or documenting it (besides as you underlined, we can fix the infra issue by using quotes correctly)

@dduportal
Copy link
Contributor Author

Test in progress with the new JDK19 ACI template and quoted flag:

Capture d’écran 2022-12-01 à 15 24 20

@dduportal
Copy link
Contributor Author

Test works! https://ci.jenkins.io/job/Infra/job/acceptance-tests/job/check-agent-availability/1832/console

Incoming PR for ci.jenkins.io config (along with updatecli)

@slide
Copy link

slide commented Dec 1, 2022

Excellent!

@dduportal
Copy link
Contributor Author

@dduportal
Copy link
Contributor Author

Closing as we do not have the problem anymore thanks to the tip from @slide 🍯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci.jenkins.io
Projects
None yet
Development

No branches or pull requests

3 participants