You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Replica shard fails with org.opensearch.OpenSearchException: Segment Replication failed due to NoSuchFileException on primary. This issue was initially observed in integration test and later while working on #2873. This is highly transient in nature and not easily reproducible.
[2022-08-11T12:24:19,715][ERROR][o.o.i.r.SegmentReplicationTargetService] [runTask-1] replication failure
org.opensearch.OpenSearchException: Segment Replication failed
at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onFailure(SegmentReplicationTargetService.java:235) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:178) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:149) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.StepListener.innerOnFailure(StepListener.java:82) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.NotifyOnceListener.onFailure(NotifyOnceListener.java:62) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$4.onFailure(ActionListener.java:190) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$6.onFailure(ActionListener.java:309) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFinalFailure(RetryableAction.java:201) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFailure(RetryableAction.java:193) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1379) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:420) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.transport.RemoteTransportException: [runTask-0][127.0.0.1:9300][internal:index/shard/replication/get_checkpoint_info]
Caused by: java.nio.file.NoSuchFileException: /Users/singhnjb/OpenSearch/build/testclusters/runTask-0/data/nodes/0/indices/0uR4QToFSC6j8KIyqEsbVA/0/index/segments_2
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23]
at org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:166) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23]
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23]
at org.opensearch.index.store.Store$MetadataSnapshot.checksumFromLuceneFile(Store.java:1043) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:1029) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.index.store.Store$MetadataSnapshot.<init>(Store.java:892) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.index.store.Store.getMetadata(Store.java:333) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.common.CopyState.<init>(CopyState.java:52) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.OngoingSegmentReplications.getCachedCopyState(OngoingSegmentReplications.java:81) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.OngoingSegmentReplications.prepareForReplication(OngoingSegmentReplications.java:140) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.SegmentReplicationSourceService$CheckpointInfoRequestHandler.messageReceived(SegmentReplicationSourceService.java:103) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.SegmentReplicationSourceService$CheckpointInfoRequestHandler.messageReceived(SegmentReplicationSourceService.java:86) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
[2022-08-11T12:24:19,864][WARN ][o.o.i.e.Engine ] [runTask-1] [test-index][0] failed engine [replication failure]
org.opensearch.OpenSearchException: Segment Replication failed
at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onFailure(SegmentReplicationTargetService.java:235) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:178) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:149) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.StepListener.innerOnFailure(StepListener.java:82) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.NotifyOnceListener.onFailure(NotifyOnceListener.java:62) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$4.onFailure(ActionListener.java:190) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$6.onFailure(ActionListener.java:309) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFinalFailure(RetryableAction.java:201) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFailure(RetryableAction.java:193) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1379) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:420) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
To Reproduce
This issue can be intermittently replicated by following below steps:
Describe the bug
Replica shard fails with
org.opensearch.OpenSearchException: Segment Replication failed
due toNoSuchFileException
on primary. This issue was initially observed in integration test and later while working on #2873. This is highly transient in nature and not easily reproducible.To Reproduce
This issue can be intermittently replicated by following below steps:
Repro via test
Expected behavior
Replica shard should not fail.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: