-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hetzner agents are (still) not cleaned up every time #56
Comments
Hi @sandrinr, thank you for bug report. Can you please provide:
|
Hi @rkosegi, thanks for picking it up. Now, it happened again. Again after a restart of the Jenkins Docker container. Again the first node(s) spawned did not get shut down when they should be. The output of the script
As I see it that node should be shut down (all conditions are met). But somehow, I think, the code to decide whether to shut the node down is never called. |
Adding another datapoint: The issue was currently happening for us all the time. There were no automatic deletion of agents anymore. Also restarts did not help. What I did now, is to go to the Cloud settings and change the shutdown policy to time out based, save and change it back to, billing cycle based and saved. Now agents were automatically deleted as expected. My guess is that it will now work for a while until it stops working again. |
@rkosegi I think I can share some updates. Recently the issue came up again. I started to dig and made the following discovery. I wanted to see the actual retention strategy active on the Hetzner cloud nodes. For that I ran the following script: def hetznerComputers = Jenkins.instance.computers.findAll {it instanceof cloud.dnation.jenkins.plugins.hetzner.HetznerServerComputer}
hetznerComputers.each { println("${it.name}: ${it.retentionStrategy}") } The output of the script during the time when nodes where not shutdown correctly was:
Also, when deleting all nodes manually and triggering a build that will trigger the creation of a new node, the retention strategy will be the same:
Actually, even restarting Jenkins did not change this. After a restart and triggering a node creation it was the same:
I was able to fix the issue by changing some setting in the Hetzner cloud settings, in my case a changed the max number of allowed nodes, after that (and without Jenkins restart), when triggering the creation of a new node its retention policy was what I expected:
And also the node was shut down as expected by the retention strategy. So the issue seems to be that after some point all newly created nodes get the retention policy I was unable to find out where this is coming from. Even when checking the main Jenkins <shutdownPolicy class="cloud.dnation.jenkins.plugins.hetzner.shutdown.BeforeHourWrapsPolicy"/> Since the issue seems to persist even during restarts, this needs to be persisted somewhere... Also, I have no idea what is causing the nodes to have the retention policy |
Hi @sandrinr, this is very interesting finding. I will try to find some time soon to look into that. |
@sandrinr I hope I fixed root cause, but I want to emphasize that when you restart Jenkins controller, plugin assumes that no servers were running. |
@rkosegi We have the fix in operation for a while now and never experienced the issue again. Thank you very much for the fix 🙏 . |
@sandrinr that are great news, thanks for confirmation. |
Jenkins and plugins versions report
Environment
What Operating System are you using (both controller, and any agents involved in the problem)?
Ubuntu 20.04 as the host, Jenkins is running inside a Docker container, using the image
jenkins/jenkins:lts
.Reproduction steps
Unclear...
Expected Results
Agents are always deleted according to the configured shutdown policy.
Actual Results
After a while, agents are not deleted anymore but are used basically as if they would be static agents.
Anything else?
After manually deleting those stale agents in Jenkins, Jenkins creates new ones when there is demand and there is a chance these new agents actually are being cleaned up again according to the configured shutdown policy.
Example:
Consider the following screenshot from our Jenkins' cloud statistics:
In this screenshot you can see three areas: blue, red and purple.
Maybe important additional information:
[0]
The text was updated successfully, but these errors were encountered: