Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength #7032

Merged
merged 3 commits into from
Aug 5, 2024

Conversation

ashishkumar50
Copy link
Contributor

What changes were proposed in this pull request?

TestBlockOutputStream.testWriteMoreThanFlushSize is flaky after HDDS-9844 fix.
It is due to addAckDataLength is not kept as thread safe which is causing wrong totalAckDataLength.
In this PR we are adding synchronized in AbstractCommitWatcher.addAckDataLength to make thread safe.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11280

How was this patch tested?

Verified in 10X10 run and it is green after the fix
https://github.com/ashishkumar50/ozone/actions/runs/10244918136/workflow

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashishkumar50 , thanks for working on this! Let's use AtomicLong as below.

+++ b/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/AbstractCommitWatcher.java
@@ -38,6 +38,7 @@
 import java.util.concurrent.ConcurrentSkipListMap;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.TimeoutException;
+import java.util.concurrent.atomic.AtomicLong;
 
 /**
  * This class executes watchForCommit on ratis pipeline and releases
@@ -62,7 +63,7 @@ abstract class AbstractCommitWatcher<BUFFER> {
 
   private final XceiverClientSpi client;
 
-  private long totalAckDataLength;
+  private final AtomicLong totalAckDataLength = new AtomicLong();
 
   AbstractCommitWatcher(XceiverClientSpi client) {
     this.client = client;
@@ -80,12 +81,11 @@ synchronized void updateCommitInfoMap(long index, List<BUFFER> buffers) {
 
   /** @return the total data which has been acknowledged. */
   long getTotalAckDataLength() {
-    return totalAckDataLength;
+    return totalAckDataLength.get();
   }
 
   long addAckDataLength(long acked) {
-    totalAckDataLength += acked;
-    return totalAckDataLength;
+    return totalAckDataLength.addAndGet(acked);
   }
 
   /**

@ashishkumar50
Copy link
Contributor Author

@szetszwo Thanks for the review.
Updated with AtomicLong.
It looks like there is still flakiness and it failed in second 10X10 run while the first 10X10 run was all success.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 looks correct to me. Merging it now.

@jojochuang jojochuang merged commit b781b54 into apache:HDDS-7593 Aug 5, 2024
39 checks passed
ashishkumar50 added a commit to ashishkumar50/ozone that referenced this pull request Aug 5, 2024
…apache#7032)

Co-authored-by: ashishk <ashishk@cloudera.com>
(cherry picked from commit b781b54)
errose28 added a commit to errose28/ozone that referenced this pull request Aug 7, 2024
* master: (181 commits)
  HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024)
  HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043)
  HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040)
  HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036)
  HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003)
  HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724)
  HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021)
  HDDS-11272. Statistics some node status information (apache#7025)
  HDDS-11278. Move code out of Hadoop util package (apache#7028)
  HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones
  HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026)
  HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032)
  HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990)
  HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023)
  HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007)
  HDDS-11224. Increase hdds.datanode.handler.count (apache#7011)
  HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012)
  HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014)
  HDDS-11226. Make ExponentialBackoffPolicy maxRetries configurable (apache#6985)
  HDDS-11260. [hsync] Add Ozone Manager protocol version (apache#7015)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/audit/DNAction.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerReportHandler.java
errose28 added a commit to errose28/ozone that referenced this pull request Aug 13, 2024
…p-supervisor

Merge conflicts are resolved but the change does not yet build.

* HDDS-10239-container-reconciliation: (183 commits)
  HDDS-10376. Add a Datanode API to supply a merkle tree for a given container. (apache#6945)
  HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024)
  HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043)
  HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040)
  HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036)
  HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003)
  HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724)
  HDDS-10926. Block deletion should update container merkle tree. (apache#6875)
  HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021)
  HDDS-11272. Statistics some node status information (apache#7025)
  HDDS-11278. Move code out of Hadoop util package (apache#7028)
  HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones
  HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026)
  HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032)
  HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990)
  HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023)
  HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007)
  HDDS-11224. Increase hdds.datanode.handler.count (apache#7011)
  HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012)
  HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
errose28 added a commit to errose28/ozone that referenced this pull request Aug 16, 2024
…rrupt-files

* HDDS-10239-container-reconciliation: (183 commits)
  HDDS-10376. Add a Datanode API to supply a merkle tree for a given container. (apache#6945)
  HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024)
  HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043)
  HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040)
  HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036)
  HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003)
  HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724)
  HDDS-10926. Block deletion should update container merkle tree. (apache#6875)
  HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021)
  HDDS-11272. Statistics some node status information (apache#7025)
  HDDS-11278. Move code out of Hadoop util package (apache#7028)
  HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones
  HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026)
  HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032)
  HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990)
  HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023)
  HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007)
  HDDS-11224. Increase hdds.datanode.handler.count (apache#7011)
  HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012)
  HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants