Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-30101][JENKINS-30175] Simplify persistence design for temporarily offline status #9855

Merged
merged 12 commits into from
Oct 15, 2024

Conversation

Vlatombe
Copy link
Member

@Vlatombe Vlatombe commented Oct 11, 2024

There were complex code to synchronize transient state of Computer from Node and vice versa.
But since any temporarily offline cause is being set by a user and should be persisted in a Node, the Computer should actually delegate to Node instead of trying to synchronize its state.

Fixes JENKINS-30101 and JENKINS-30175

Testing done

Proposed changelog entries

  • Keep user offline reason when agent connects or disconnects for technical reasons.

Proposed upgrade guidelines

N/A

Submitter checklist

Desired reviewers

@mention

Before the changes are marked as ready-for-merge:

Maintainer checklist

There were complex code to synchronize transient state of Computer from Node. But since any temporarily offline cause is being set by a user and should be persisted in a node, the Computer should actually delegate to Node instead of trying to synchronize it
@Vlatombe Vlatombe requested a review from jglick October 11, 2024 13:45
Comment on lines 702 to 707
getNodeOrDie().setTemporaryOfflineCause(temporarilyOfflineCause);
if (temporarilyOfflineCause != null) {
Listeners.notify(ComputerListener.class, false, l -> l.onTemporarilyOffline(this, temporarilyOfflineCause));
} else {
Listeners.notify(ComputerListener.class, false, l -> l.onTemporarilyOnline(this));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field offlineCause is only for technical offline cause now. temporarilyOfflineCause only exists in Node now.

Comment on lines 358 to 359
var temporaryOfflineCause = getNodeOrDie().getTemporaryOfflineCause();
return temporaryOfflineCause == null ? offlineCause : temporaryOfflineCause;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining previous behaviour : if a temporary offline cause is defined, from outside PoV, it replaces the technical offline cause.

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNodeOrDie is probably wrong. Otherwise looks good.

core/src/main/java/hudson/model/Computer.java Outdated Show resolved Hide resolved
core/src/main/java/hudson/model/Computer.java Outdated Show resolved Hide resolved
core/src/main/java/hudson/model/Computer.java Outdated Show resolved Hide resolved
core/src/main/java/hudson/model/Node.java Outdated Show resolved Hide resolved
@Vlatombe Vlatombe requested a review from jglick October 11, 2024 16:05
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit in isOffline.

core/src/main/java/hudson/model/Computer.java Outdated Show resolved Hide resolved
Co-authored-by: Jesse Glick <jglick@cloudbees.com>
@mawinter69
Copy link
Contributor

mawinter69 commented Oct 14, 2024

I think this will solve several issues that I also addressed in #6152.
What is missing here is showing the reason why an agent is temporarily offline on the agents page when that reason differs from the agent being disconnected (e.g. lost connection)

@mawinter69
Copy link
Contributor

I think you will also need to adjust setOfflineCause.jelly in some way

Retain the original message for setting the node temporary offline
@Vlatombe
Copy link
Member Author

What is missing here is showing the reason why an agent is temporarily offline on the agents page when that reason differs from the agent being disconnected (e.g. lost connection)

Looks like a separate change impacting UX.

@Vlatombe Vlatombe requested review from jglick and a team October 14, 2024 07:50
Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mawinter69 mawinter69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously setting an agent temporarily offline also set the offline cause itself.
With this change once you set the agent temporarily offline it is not possible to get the offline cause that might have been set when disconnecting the agent after it was set temp offline.
Not sure if this will have implications for plugins.

getOfflineCauseReason will return an empty string now if an agent is only temp offline. This will break these tests I think:
https://github.com/jenkinsci/node-sharing-plugin/blob/ae54615d1213196f3ca91073caf7c2850758e85a/jth-tests/src/test/java/com/redhat/jenkins/nodesharing/SharedNodeCloudTest.java#L531
https://github.com/jenkinsci/openstack-cloud-plugin/blob/fc5384db4d5f1f4f2ebdec2aec7e28ce312cf0df/plugin/src/test/java/jenkins/plugins/openstack/compute/SingleUseSlaveTest.java#L137
Maybe also tests in https://github.com/jenkinsci/java-client-api and others

@Vlatombe
Copy link
Member Author

@mawinter69 Addressed your remark

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think. Seems worth running tests of the plugins mentioned.

* Use {@link #setTemporaryOfflineCause(OfflineCause)} instead.
*/
@Deprecated
public void setTemporarilyOffline(boolean temporarilyOffline, OfflineCause cause) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc is lost here especially that the second argument is only considered when the first is true. But maybe this doesn't matter as the method is now set to deprecated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm dropping existing javadoc to avoid repeats, and only refer to the current methods.

@Vlatombe
Copy link
Member Author

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Oct 14, 2024
@timja timja added the skip-changelog Should not be shown in the changelog label Oct 14, 2024
@mawinter69
Copy link
Contributor

Can we mark this as fix for
JENKINS-30101
JENKINS-30175
JENKINS-50313

@Vlatombe Vlatombe changed the title Simplify persistence design for temporarily offline status [JENKINS-30101][JENKINS-30175][JENKINS-50313] Simplify persistence design for temporarily offline status Oct 15, 2024
@Vlatombe Vlatombe changed the title [JENKINS-30101][JENKINS-30175][JENKINS-50313] Simplify persistence design for temporarily offline status [JENKINS-30101][JENKINS-30175] Simplify persistence design for temporarily offline status Oct 15, 2024
@Vlatombe
Copy link
Member Author

Keeping JENKINS-50313 out, since this PR doesn't cover showing both offline causes.

@Vlatombe Vlatombe removed the skip-changelog Should not be shown in the changelog label Oct 15, 2024
@timja timja merged commit b50cf51 into jenkinsci:master Oct 15, 2024
16 checks passed
@Vlatombe Vlatombe deleted the temporarily-offline-cause branch October 15, 2024 15:38
@MarkEWaite MarkEWaite added the bug For changelog: Minor bug. Will be listed after features label Oct 16, 2024
@MarkEWaite
Copy link
Contributor

@Vlatombe is there anything that we can do from Jenkins core to resolve the change in behavior that has been seen in the OpenStack cloud plugin or is a change needed in the plugin?

@Vlatombe
Copy link
Member Author

Vlatombe commented Nov 4, 2024

The offlineCause field semantic has changed slighly because the previous was ambiguous.

Filed jenkinsci/openstack-cloud-plugin#385, compatibility is retained when using the getter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For changelog: Minor bug. Will be listed after features ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants