Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add re-submission of tasks during spot interruption disconnects #485

Closed
wants to merge 7 commits into from

Conversation

blanked
Copy link

@blanked blanked commented Jul 27, 2020

This PR adds a new feature - re-submission of tasks for agents that are disconnected due to spot interruption event in AWS.

Description

Whenever an agent is disconnected, there are checks to determine if it is an unexpected disconnect and if the disconnection is a spot interruption event. If the answer is yes to both, the tasks that were running on the agent will be re-submitted to the queue.

Motivation

Builds may fail due to spot instances being terminated. This PR can help to reduce the number of build failures for spot interruption events.

Notes

This may or may not prevent build failures. There doesn't seem to be any documentation on how tasks can be resubmitted. This PR is inspired by another Jenkins plugin that has the suggested behaviour implemented - https://github.com/jenkinsci/ec2-fleet-plugin/blob/master/src/main/java/com/amazon/jenkins/ec2fleet/EC2FleetAutoResubmitComputerLauncher.java

@blanked blanked changed the title Add re-submission of tasks during spot interruption disconnect Add re-submission of tasks during spot interruption disconnects Jul 27, 2020
@blanked
Copy link
Author

blanked commented Jul 29, 2020

Ah ok there are some initialization errors in the tests that I have written that didn't occur when I ran the tests locally. I'll take a look at them soon..

@res0nance res0nance self-requested a review August 12, 2020 07:30
}

@DataBoundConstructor
public EC2SpotSlave(String name, String spotInstanceRequestId, String templateDescription, String remoteFS, int numExecutors, Mode mode, String initScript, String tmpDir, String labelString, List<? extends NodeProperty<?>> nodeProperties, String remoteAdmin, String jvmopts, String idleTerminationMinutes, List<EC2Tag> tags, String cloudName, int launchTimeout, AMITypeData amiType, ConnectionStrategy connectionStrategy, int maxTotalUses)
public EC2SpotSlave(String name, String spotInstanceRequestId, String templateDescription, String remoteFS, int numExecutors, Mode mode, String initScript, String tmpDir, String labelString, List<? extends NodeProperty<?>> nodeProperties, String remoteAdmin, String jvmopts, String idleTerminationMinutes, List<EC2Tag> tags, String cloudName, int launchTimeout, AMITypeData amiType, ConnectionStrategy connectionStrategy, int maxTotalUses, boolean restartSpotInterruption)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't modify an existing constructor, create a new one and deprecate the older one

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the new constructor and added @Deprecated annotation to the old constructor

Comment on lines 35 to 36

@RunWith(PowerMockRunner.class)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding
@PowerMockIgnore({"javax.crypto.*", "org.hamcrest.*", "javax.net.ssl.*", "com.sun.org.apache.xerces.*", "javax.xml.*", "org.xml.*"})
as seen AmazonEC2CloudUnitTest.java and EC2StepTest.java in might resolve the test failure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hey I actually forgot about this PR... let me try that, thanks!

@blanked blanked closed this Oct 16, 2020
@blanked blanked reopened this Oct 16, 2020
@blanked blanked closed this Oct 16, 2020
@blanked blanked reopened this Oct 16, 2020
@blanked blanked marked this pull request as ready for review October 16, 2020 10:52
@blanked blanked closed this Oct 16, 2020
@blanked blanked reopened this Oct 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants