Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed creating a merkle tree for [repair #c0c157f0-e14a-11ee-b320-bdc4e5fd08de on reaper_current/running_repairs, [(-8896895687978311387,-8888847627918162438], (-881535601694419919,-867088063190097011], (1356177826732174702,1357186491253880239], (6263469266031338450,6279809997892748801]]], /<IP>:7000 #1486

Open
kapilgit123 opened this issue Mar 26, 2024 · 3 comments

Comments

@kapilgit123
Copy link

kapilgit123 commented Mar 26, 2024

Project board link

ERROR] [ValidationExecutor:4] 2024-03-13 10:02:47,558 ValidationManager.java:173 - Validation failed.
java.lang.RuntimeException: Parent repair session with id = c0bb8b90-e14a-11ee-b320-bdc4e5fd08de has failed.
at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
at org.apache.cassandra.db.repair.CassandraValidationIterator.(CassandraValidationIterator.java:203)
at org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
at org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
at org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
at org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
at org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)
[ERROR] [ValidationExecutor:4] 2024-03-13 10:02:47,558 CassandraDaemon.java:581 - Exception in thread Thread[ValidationExecutor:4,1,main]
java.lang.RuntimeException: Parent repair session with id = c0bb8b90-e14a-11ee-b320-bdc4e5fd08de has failed.
at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
at org.apache.cassandra.db.repair.CassandraValidationIterator.(CassandraValidationIterator.java:203)
at org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
at org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
at org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
at org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
at org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)

Please note that we have ran the ./nodetool scrub command to check if it resolves the issue, but we get the same erorrs on all 6 cassandra nodes. This issue exists for all the keyspaces/tablenames on each cassandra node.

Cassandra version :- 3.11.6
Reaper version :- 1.1.0

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: REAP-12

@kapilgit123
Copy link
Author

@adejanovski

Please let me know if any other details are required for this issue

@adejanovski
Copy link
Contributor

@kapilgit123, I sure hope you're not using Reaper 1.1.0 😅

These stack trace aren't giving the reason why validation has failed.
It could be that the segment hit the timeout and you should check in the Reaper logs for how long this segment has been running.
If that's the case, the adaptive nature of the repairs should extend the timeout along the next attempts (assuming you're running a recent version of Reaper).
Otherwise you can change the segment timeout for this repair explicitly (or globally change the default timeout).

That's just an assumption and should be verified by checking the logs more thoroughly in both Reaper and Cassandra.

@kapilgit123
Copy link
Author

@adejanovski

I just confirmed the cassandra and reaper versions are as follows.
Cass 4.0.10 and Reaper 3.3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants