Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10696. Fix test failure caused by empty snapshot installation #6659

Merged
merged 2 commits into from
May 9, 2024

Conversation

hemantk-12
Copy link
Contributor

@hemantk-12 hemantk-12 commented May 8, 2024

What changes were proposed in this pull request?

After applying RATIS-2045, we saw lots of integration test failures as described in the HDDS-10696.

It is because of how MiniOzoneHACluster is set up. It doesn't set value for config ozone.om.db.dirs which OMDBCheckpointServlet is trying to access here.
So when ozone.om.db.dirs is not set, ServerUtils#getDBPath returns ozone.metadata.dirs. Because of that snapshotDir path: is getting evaluated to/Users/iamgroot/ozone-ws/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-e919072d-914a-4de6-9bfc-5b610697b58a/ozone-meta/db.snapshots, and the test is trying to access that doesn't exist causing NoSuchFileException.
In the actual snapshotDir path is /Users/iamgroot/ozone-ws/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-1146c9db-bf52-4c4b-94c9-131053141484/omNode-1/db.snapshots which is inside the ozone data directory.

To fix the issue, I changed the way we are trying to get the snapshotDir. So rather than creating one, it will get from RDBStore after the change dir should exist by default after snapshot feature.

Also removed Flaky annotation from the test because jira: HDDS-7880 has been resolved after RATIS-1960.

What is the link to the Apache JIRA

HDDS-10696

How was this patch tested?

Applied the patch on top of the changes where we saw the failure. And verify by running the full workflow and flaky-test-workflow 5x5 time.

@@ -317,8 +318,7 @@ private boolean getFilesForArchive(DBCheckpoint checkpoint,

// Get the snapshot files.
Set<Path> snapshotPaths = waitForSnapshotDirs(checkpoint);
Path snapshotDir = Paths.get(OMStorage.getOmDbDir(getConf()).toString(),
OM_SNAPSHOT_DIR);
Path snapshotDir = Paths.get(getSnapshotsParentDir()).getParent();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hemantk-12 , Why the parentDir of the getSnapshotsParentDir is the snapshotDir? Is it a typo?

Copy link
Contributor Author

@hemantk-12 hemantk-12 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the naming issue in getSnapshotsParentDir.
I renamed and changed the function to return snapshotDir path directly.
I also added a comment explaining it.

Please take another look.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hemantk-12 , thanks for clarifying it.

+1 the change looks good.

@adoroszlai adoroszlai changed the title HDDS-10696. Fixed ozone integration test fails because of empty snapshot installation HDDS-10696. Fix test failure caused by empty snapshot installation May 9, 2024
@adoroszlai adoroszlai merged commit 3f1a7ed into apache:master May 9, 2024
41 checks passed
@adoroszlai
Copy link
Contributor

Thanks @hemantk-12 for the patch, @szetszwo for the review.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 18, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants