Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKIN-57795] Orphan nodes logic #448

Merged
merged 9 commits into from
Jun 18, 2020
Merged

Conversation

pyieh
Copy link
Contributor

@pyieh pyieh commented Apr 14, 2020

Addresses issue of orphan nodes not being re-attached when the plugin has already hit an instance cap. Because the orphan re-attachment logic was tied to the EC2Cloud.provision() logic, if the instance cap was hit, then provisioning wasn't even attempted and the orphan re-attachment isn't triggered.

Add functionality to EC2Cloud to attempt to re-attach orphan/stopped nodes even when its unable to provision more nodes due to hitting the instance cap.

…nodes even when its unable to provision more nodes due to hitting the instance cap
@res0nance
Copy link
Contributor

The test seems to be failing

@res0nance res0nance self-requested a review April 15, 2020 02:22
@pyieh
Copy link
Contributor Author

pyieh commented Apr 15, 2020

The tests were passing for me locally. I've added some more ignores as per a suggestion online to see if that clears up the failing PowerMock test.

Copy link
Contributor

@res0nance res0nance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also see which functions can be made static. There is also a visiblefortesting annotation that should help without making things non-private

@@ -645,6 +627,21 @@ else if (jenkinsInstance.isTerminating()) {
}
}

private void attachSlavesToJenkins(Jenkins jenkins, List<EC2AbstractSlave> slaves, SlaveTemplate t) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void attachSlavesToJenkins(Jenkins jenkins, List<EC2AbstractSlave> slaves, SlaveTemplate t) throws IOException {
private static void attachSlavesToJenkins(Jenkins jenkins, List<EC2AbstractSlave> slaves, SlaveTemplate t) throws IOException {

import java.io.PrintStream;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you not use star imports?

String subnetId = chooseSubnetId();
String subnetId;
if (rotateSubnet) {
subnetId = chooseSubnetId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can choose subnet id handle this specific?

Comment on lines 731 to 733
List<String> groupIds = null;
if (!getSecurityGroupSet().isEmpty()) {
groupIds = getEc2SecurityGroups(ec2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List<String> groupIds = null;
if (!getSecurityGroupSet().isEmpty()) {
groupIds = getEc2SecurityGroups(ec2);
if (!getSecurityGroupSet().isEmpty()) {
List<String> groupIds = getEc2SecurityGroups(ec2);

Comment on lines +771 to +773
HashMap<RunInstancesRequest, List<Filter>> ret = new HashMap<>();
ret.put(riRequest, diFilters);
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HashMap<RunInstancesRequest, List<Filter>> ret = new HashMap<>();
ret.put(riRequest, diFilters);
return ret;
return Collections.singletonMap(riRequest, diFilters);

src/main/java/hudson/plugins/ec2/EC2Cloud.java Outdated Show resolved Hide resolved
tagList.add(tagSpecification.clone().withResourceType(ResourceType.Volume));
riRequest.setTagSpecifications(tagList);

return (HashMap<RunInstancesRequest, List<Filter>>) Collections.singletonMap(riRequest, diFilters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast should not be required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the cast, there's a type check complaint. I've removed using Collections.singletonMap and just reverted it to using the HashMap with a single entry.

@thoulen thoulen requested a review from res0nance April 27, 2020 07:51
@pyieh
Copy link
Contributor Author

pyieh commented May 4, 2020

@res0nance there seems to be some upper bound dependencies checks failing. Can you explain how those can be fixed? They seem to be coming from other dependencies that this project doesn't manage?

@res0nance
Copy link
Contributor

Looks like this was caused by a pipeline library issue (It's failing on master as well), I'll work on fixing it

@pyieh
Copy link
Contributor Author

pyieh commented May 11, 2020

@res0nance Wondering if there's been any update on this? It seems to be failing with a different error now.

@res0nance
Copy link
Contributor

I've fixed the CI on our part it looks like an infra issue right now

@jetersen
Copy link
Member

jetersen commented May 12, 2020

I wrote this piece of groovy script to remove orphaned disconnected agents.
Would this PR solve this issue?

Specifally seeing EC2 agent on Linux that will stay connected for a good while and than suddenly get stuck on Connection terminated and never get cleaned up.

import jenkins.model.Jenkins
import hudson.plugins.ec2.EC2AbstractSlave

println "Cleaning up offline EC2 slaves..."
Jenkins.instance.slaves.each { node ->
  if(!(node in EC2AbstractSlave)) return
  def computer = node.getComputer()
  if (computer && !computer.isOffline()) return
  def logs = computer.getLog()
  if (!logs) return
  if (!"Connection terminated".equals(logs.readLines().last())) return
  println "Deleting ${node.name}"
  computer.doDoDelete()
}

null

@jetersen
Copy link
Member

link to see current commit author: https://github.com/jenkinsci/ec2-plugin/commit/fc6528041fa2d6958a265ba96ef73ab7c8d1165d.patch

@pyieh
You might want to set up an email alias on your GitHub account 🥇
https://help.github.com/articles/adding-an-email-address-to-your-github-account/

Or look into fixing your git user.email config 😅
https://help.github.com/articles/setting-your-commit-email-address-in-git
then you would need to amend your commits: https://stackoverflow.com/a/3042512
after changing author

to get proper credit for your commits 🏆

@pyieh
Copy link
Contributor Author

pyieh commented May 12, 2020

@jetersen yeah I realized I was pushing to my personal repo using my work email. Honestly I'm not that concerned with "proper credit". It does show under my name as the committer, albeit not tied to my Github account so no worries.

@pyieh
Copy link
Contributor Author

pyieh commented May 18, 2020

@res0nance if everything looks good, can we get this merged?

@pyieh
Copy link
Contributor Author

pyieh commented Jun 1, 2020

@res0nance Wanted to follow up and see if we can get this merged. Thank you

@pyieh
Copy link
Contributor Author

pyieh commented Jun 17, 2020

@fcojfernandez Can I get my PR merged? @res0nance approved it a while back, but it hasn't been merged yet.

@fcojfernandez fcojfernandez merged commit a54f226 into jenkinsci:master Jun 18, 2020
@mbaranczak
Copy link

A big thank-you to pyieh for fixing this annoying bug.

I see that the patch has been merged into master, but there hasn't been an official release yet. Does anybody know when it will be released?

@manojtr
Copy link

manojtr commented Oct 26, 2020

Hello Folks, any update on the release with this fix?

@res0nance
Copy link
Contributor

This has been released since 1.51 you can see the release notes here https://github.com/jenkinsci/ec2-plugin/releases

@manojtr
Copy link

manojtr commented Nov 4, 2020

Today I tried 1.53 version and the issue is not resolved.

SlaveTemplate{ami='ami-038e073abe89730b3', labels='win2016dlp'}. Attempting to provision slave needed by excess workload of 1 units
Nov 04, 2020 12:36:01 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami='ami-038e073abe89730b3', labels='win2016dlp'}. Cannot provision - no capacity for instances: 0
Nov 04, 2020 12:36:01 PM WARNING hudson.plugins.ec2.EC2Cloud provisionCan't raise nodes for SlaveTemplate{ami='ami-038e073abe89730b3', labels='win2016dlp'}

However, Jenkins identified the node and the node details screen shows "Launch Agent" Button. But the agent is not running.

Please note, this is a windows agent.

@manojtr
Copy link

manojtr commented Nov 4, 2020

I wrote this piece of groovy script to remove orphaned disconnected agents.
Would this PR solve this issue?

Specifally seeing EC2 agent on Linux that will stay connected for a good while and than suddenly get stuck on Connection terminated and never get cleaned up.

import jenkins.model.Jenkins
import hudson.plugins.ec2.EC2AbstractSlave

println "Cleaning up offline EC2 slaves..."
Jenkins.instance.slaves.each { node ->
  if(!(node in EC2AbstractSlave)) return
  def computer = node.getComputer()
  if (computer && !computer.isOffline()) return
  def logs = computer.getLog()
  if (!logs) return
  if (!"Connection terminated".equals(logs.readLines().last())) return
  println "Deleting ${node.name}"
  computer.doDoDelete()
}

null

@jetersen how did you use this script? As an init script or a scheduled job running in a very short interval? Since the official fix is not working, I am looking to implement something like this. Thank you

@jetersen
Copy link
Member

jetersen commented Nov 4, 2020

As a pipeline job with a cron schedule that would curl the script console and use a jenkins token from credentials

@manojtr
Copy link

manojtr commented Nov 4, 2020

Thanks @jetersen for the quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants