-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-2184. Improve TestRaftWithGrpc test stability #1177
base: master
Are you sure you want to change the base?
Conversation
@szetszwo @duongkame , can you help take it a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianghuazhu , thanks a lot for working on this! Please see the comments inlined.
ReferenceCountedObject<EntryWithData> entryWithData = null; | ||
try { | ||
entryWithData = getRaftLog().retainEntryWithData(next); | ||
if (!buffer.offer(entryWithData.get())) { | ||
entryWithData.release(); | ||
break; | ||
} | ||
offered.put(next, entryWithData); | ||
} catch (Exception e){ | ||
if (entryWithData != null) { | ||
entryWithData.release(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogAppenderDaemon failed
org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 4269
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.retainEntryWithData(SegmentedRaftLog.java:334)
at org.apache.ratis.server.leader.LogAppenderBase.nextAppendEntriesRequest(LogAppenderBase.java:264)
For the above particular exception, this change won't help since, when retainEntryWithData(..)
throws an exception, entryWithData
must be null.
This change will help if other methods (e.g. get()
, offer(..)
, put(..)
) throw an exception. However, these methods throw only runtime exceptions/errors (e.g. OutOfMemoryError). We may not need to handle it.
pom.xml
Outdated
@@ -643,7 +643,7 @@ | |||
<enableProcessChecker>all</enableProcessChecker> | |||
<forkedProcessTimeoutInSeconds>600</forkedProcessTimeoutInSeconds> | |||
<!-- @argLine is filled by jacoco maven plugin. @{} means late evaluation --> | |||
<argLine>-Xmx2g -XX:+HeapDumpOnOutOfMemoryError @{argLine}</argLine> | |||
<argLine>-Xmx8g -XX:+HeapDumpOnOutOfMemoryError @{argLine}</argLine> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When turning on the advanced reference trace, it does need more memory. I recall that I did similar change for running with advanced reference trace.
pom.xml
Outdated
<maxmem>2048m</maxmem> | ||
<maxmem>4096m</maxmem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that it needs more memory for compilation?
@szetszwo , I updated some comments. In RATIS-2184. |
6e941ad
to
415799f
Compare
@szetszwo , I updated it.
Therefore, I improved a few points:
|
What resources? We need to fix if it is really the case.
That's great! Let me start a build to repeating running many times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianghuazhu , Thanks for the update! Please see the comments inlined.
/*grpcServerMetrics.unregister(); | ||
CompletableFuture<LifeCycle.State> future = super.stopAsync(); | ||
if (appendLogRequestObserver != null) { | ||
appendLogRequestObserver.stop(); | ||
appendLogRequestObserver = null; | ||
} | ||
return future;*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change. Could you remove the commented code?
@@ -33,7 +33,6 @@ | |||
public final class ReferenceCountedLeakDetector { | |||
private static final Logger LOG = LoggerFactory.getLogger(ReferenceCountedLeakDetector.class); | |||
// Leak detection is turned off by default. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revert this whitespace change.
toReturn.set(entryRef); | ||
} else { | ||
try { | ||
final LogEntryProto entry = entryRef.retain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entryRef.retain()
should be called before the try-block. Otherwise, if it throws an exception, we will call release()
without successfully retained.
try { | ||
ref = retainLog(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous case, retainLog(index)
should be called before the try-block.
Started https://github.com/szetszwo/ratis/actions/runs/12398883445 |
Sorry, there seem to be some errors or omissions that have not been discovered. |
@jianghuazhu , The 10x100 build timed out. Let's retry with 10x10: |
@jianghuazhu , compared with the master, your branch does have improved the success rate
Could you clean up the code? We may merge it first and have some further improvement in a separated JIRA. |
Thanks @szetszwo . |
What changes were proposed in this pull request?
Improve stability of TestRaftWithGrpc tests.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-2184
How was this patch tested?
ci:
https://github.com/jianghuazhu/ratis/actions/runs/11792452996