Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4460: QuorumPeer overrides Thread.getId with different sema… #1942

Merged
merged 1 commit into from
Nov 3, 2022

Conversation

cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Nov 2, 2022

…ntics.

@cnauroth
Copy link
Contributor Author

cnauroth commented Nov 2, 2022

@eolivelli , @symat or @anmolnar , would you please review?

As discussed in JIRA issue ZOOKEEPER-4460, the ZooKeeper codebase currently contains an override of Thread#getId(), which will be incompatible with OpenJDK Project Loom, targeted to Java 19. I've taken the approach of renaming QuorumPeer#getId() to QuorumPeer#getMyId(), consistent with documented terminology for the ID assigned to a peer in a quorum. From what I can tell, it was never intentional to override Thread#getId(). It was just a logical choice for a method name in a class that coincidentally also extends Thread.

QuorumPeer is a server-side class that isn't part of a public API, so we don't need to preserve backward compatibility. As a safety measure, I reviewed the code for Apache Curator, where I suspected there might be a deeper integration with ZooKeeper internals for testing infrastructure. Even in Curator, I didn't find any calls to QuorumPeer#getId().

if (sid > self.getId()) {
LOG.info("Have smaller server identifier, so dropping the connection: (myId:{} --> sid:{})", self.getId(), sid);
if (sid > self.getMyId()) {
LOG.info("Have smaller server identifier, so dropping the connection: (myId:{} --> sid:{})", self.getMyId(), sid);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RESOURCE_LEAK: resource of type java.io.DataInputStream acquired by call to new() at line 498 is not released after line 514.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@@ -658,13 +658,13 @@ void lead() throws IOException, InterruptedException {
// us. We do this by waiting for the NEWLEADER packet to get
// acknowledged

waitForEpochAck(self.getId(), leaderStateSummary);
waitForEpochAck(self.getMyId(), leaderStateSummary);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 2 similar findings have been found in this PR


THREAD_SAFETY_VIOLATION: Read/Write race. Non-private method Leader.lead() indirectly reads with synchronization from this.leaderStateSummary. Potentially races with unsynchronized write in method Leader.lead().
Reporting because this access may occur on a background thread.


🔎 Expand here to view all instances of this finding
File Path Line Number
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java 606
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java 667

Visit the Lift Web Console to find more details in your report.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@sonatype-lift
Copy link

sonatype-lift bot commented Nov 2, 2022

⚠️ 52 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

@cnauroth
Copy link
Contributor Author

cnauroth commented Nov 2, 2022

The sonatype-lift warnings are potentially interesting, but unrelated to code changed by this patch.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

We should cut a release in order to let people test this.
I will report this to the JDK Quality Outreach program (zookeeper is participating)

@eolivelli
Copy link
Contributor

I think that we should port this to the active branches, at least 3.8

@cnauroth
Copy link
Contributor Author

cnauroth commented Nov 2, 2022

Thanks for the review, @eolivelli ! I've confirmed that this patch backports cleanly to branch-3.8 and branch-3.7, so I'll plan on committing to those branches. (Going back any further than that will hit merge conflicts.)

@cnauroth
Copy link
Contributor Author

cnauroth commented Nov 3, 2022

There was one test failure, which appears to be unrelated.

[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.109 s <<< FAILURE! - in org.apache.zookeeper.server.ConnectionMetricsTest
[ERROR] testRevalidateCount  Time elapsed: 1.195 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.zookeeper.test.QuorumUtil.getConnectionStringForServer(QuorumUtil.java:343)
	at org.apache.zookeeper.server.ConnectionMetricsTest.testRevalidateCount(ConnectionMetricsTest.java:65)

@cnauroth cnauroth merged commit cedf093 into apache:master Nov 3, 2022
@cnauroth cnauroth deleted the ZOOKEEPER-4460 branch November 3, 2022 05:50
asfgit pushed a commit that referenced this pull request Nov 3, 2022
ZOOKEEPER-4460: QuorumPeer overrides Thread.getId with different sema…

Signed-off-by: Enrico Olivelli <eolivelli@apache.org>
(cherry picked from commit cedf093)
asfgit pushed a commit that referenced this pull request Nov 3, 2022
ZOOKEEPER-4460: QuorumPeer overrides Thread.getId with different sema…

Signed-off-by: Enrico Olivelli <eolivelli@apache.org>
(cherry picked from commit cedf093)
(cherry picked from commit 941151c)
@cnauroth
Copy link
Contributor Author

cnauroth commented Nov 3, 2022

I have merged this to master, branch-3.8 and branch-3.7. @eolivelli , thank you for reviewing!

@li4wang
Copy link
Contributor

li4wang commented Nov 9, 2022

The test failure ConnectionMetricsTest.testRevalidateCount is related to the changes in this PR.

int follower1 = (int) util.getFollowerQuorumPeers().get(0).getId()

The test calls getId() to get the server id. Since getId() was changed to getMyId(), the super class Thread.getId() is called and the thread id was returned. There is no corresponding peer associated to the thread id, that's why NPE is thrown.

The test passed after changing getId() to getMyId(). I will open a JIRA to check in the fix. @eolivelli can you do a quick review and get it merged once the PR is submitted. Thanks

https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/test/java/org/apache/zookeeper/server/ConnectionMetricsTest.java#L60-L65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants