Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload terminates to separate thread #433

Merged
merged 7 commits into from
Apr 9, 2020
Merged

Offload terminates to separate thread #433

merged 7 commits into from
Apr 9, 2020

Conversation

res0nance
Copy link
Contributor

@res0nance res0nance commented Feb 27, 2020

Offload termination of agents so as to not stall NodeProvisioner.update when NodeProvisioner.update is called it will hold the queue lock. In order to reduce the hotness of this lock schedule all terminates to be off thread

@res0nance res0nance marked this pull request as ready for review February 27, 2020 10:25
@@ -459,6 +465,11 @@ public Node reconfigure(final StaplerRequest req, JSONObject form) throws FormEx
return null;
}

@Override
public boolean isAcceptingTasks() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the javadocs this signals that the computer can't accept tasks so when terminating once out of the lock this node should not be able to receive a task

@res0nance res0nance added the enhancement Feature additions or enhancements label Feb 27, 2020
Copy link
Member

@jvz jvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to look good overall, though I'd like to avoid introducing sleep calls in new tests.

rs.check(c);
EC2AbstractSlave node = c.getNode();
while (node.terminateScheduled)
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I was going to suggest mocking the Clock instance in the test to advance time, but then I realized this is a spinlock. To be a more reliable test that depends less on the passage of time (i.e., stateful), use of CountDownLatch which counts down at the end of the Runnable submitted would be more appropriate.

Copy link
Contributor Author

@res0nance res0nance Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how do you propose that to work. Since the call happens inside terminate() I don't see a way of introducing it in tests.

EDIT: I think i understand what you mean, but do we want to change the boolean into something like a CountDownLatch we would need to reassign the latch everytime we call terminate (Which ideally should be once). I do agree that it makes the tests better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make it repeatable, there's a CyclicBarrier class you can use to reset the counter each time. Basically, you count down on terminate, and in the unit test, you use await() on the latch or barrier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for testability, it'd be ok to expose as package-private for unit tests (I think there might be a @VisibleForTesting annotation or something similar you can use).

@res0nance res0nance requested a review from jvz March 16, 2020 08:11
@res0nance res0nance requested a review from jvz March 18, 2020 07:12
Copy link
Member

@jvz jvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable approach! Perhaps one day someone will introduce RxJava or Reactor or something that makes it a little easier to do these types of things.

Copy link
Contributor

@fcojfernandez fcojfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, but these days have been a bit crazy

@res0nance res0nance merged commit 5d9cc45 into jenkinsci:master Apr 9, 2020
@res0nance res0nance deleted the offload-terminate branch April 9, 2020 04:17
@nigelarmstrong-bc
Copy link
Contributor

This PR seems to have introduced a critical bug @res0nance https://issues.jenkins-ci.org/browse/JENKINS-62062

@@ -109,6 +112,10 @@
/* The time at which we fetched the last instance data */
protected transient long lastFetchTime;

/** Terminate was scheduled */
@Nonnull
protected transient ResettableCountDownLatch terminateScheduled = new ResettableCountDownLatch(1, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@res0nance My java isn't very good, but I think when these objects are deserialized ResettableCountDownLatch is getting set to null by default and not a new object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can fix this in readResolve

@@ -459,6 +466,11 @@ public Node reconfigure(final StaplerRequest req, JSONObject form) throws FormEx
return null;
}

@Override
public boolean isAcceptingTasks() {
Copy link
Contributor

@roundtrip-rslobojan roundtrip-rslobojan Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of this method has resulted in a critical bug (as noted by @nigelarmstrong-bc) - this method does not exist on serialized versions of this class created by previous plugins, and you get this exception in Jenkins on any access:

Caught exception evaluating: c.iconClassName in /jenkins/ajaxExecutors. Reason: java.lang.reflect.InvocationTargetException
java.lang.NullPointerException
	at hudson.plugins.ec2.EC2AbstractSlave.isAcceptingTasks(EC2AbstractSlave.java:471)
	at hudson.model.Computer.isAcceptingTasks(Computer.java:1612)
	at hudson.slaves.SlaveComputer.isAcceptingTasks(SlaveComputer.java:177)
	at hudson.model.Computer.getIconClassName(Computer.java:753)
Caused: java.lang.reflect.InvocationTargetException

EDIT: upon further reflection (ha), I think @nigelarmstrong-bc is correct and readResolve is the place to fix this, as the new field's not getting initialized due to Java serialization semantics - the method is in fact there since the latest version of the class is present in the classloader, but its dependent field is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @thoulen ^^

res0nance pushed a commit that referenced this pull request Apr 29, 2020
* [JENKINS-62043] Fix NPE which broke Jenkins builds

In PR #433, which was released in AWS EC2 Plugin 1.50, a bug was introduced due to improper handling of an
important transient field which is accessed very frequently by the Jenkins server due to it being referenced in
`EC2AbstractSlave.isAcceptingTasks()`. This broke any Jenkins user which had running EC2 Nodes during the
update to plugin 1.50 (see https://issues.jenkins-ci.org/browse/JENKINS-62043 for details).

* Excluded false-positive nullcheck Spotbugs rule

Turned off a Spotbugs rule which incorrectly understands how Java serialization works, and which thought a transient field
couldn't be null in Object.readResolve() when it actually can be.

* Removed @nonnull annotation from transient field 

Removed a javax.annotation.Nonnull annotation from a transient field to silence a Spotbugs warning which
incorrectly flagged a field as never being null when it can in fact be null due to Java serialization semantics

* Removed unneeded NONNULL exclusion

Code in EC2AbstractSlave was updated to remove a javax.annotation.Nonnull annotation, so this Spotbugs warning should
no longer trigger and we can remove the exclusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature additions or enhancements
Projects
None yet
5 participants