Skip to content

Commit

Permalink
Unmuted testMonitorClusterHealth test and added more logging.
Browse files Browse the repository at this point in the history
I suspect that the watch doesn't get removed after a index operation failure.

Based from the latest failure: elastic#32299 (comment)

Watch gets added several time (due to updates to the watch):
[2020-01-16T12:06:25,718][DEBUG][o.e.x.w.WatcherIndexingListener] [integTest-0] adding watch [cluster_health_watch] to trigger service
...
[2020-01-16T12:06:27,112][DEBUG][o.e.x.w.e.ExecutionService] [integTest-0] executing watch [cluster_health_watch]
...
[2020-01-16T12:06:27,159][DEBUG][o.e.x.w.WatcherIndexingListener] [integTest-0] adding watch [cluster_health_watch] to trigger service
...
[2020-01-16T12:06:27,975][DEBUG][o.e.x.w.e.ExecutionService] [integTest-0] executing watch [cluster_health_watch]
...
[2020-01-16T12:06:27,987][DEBUG][o.e.x.w.WatcherIndexingListener] [integTest-0] adding watch [cluster_health_watch] to trigger service

Then the watch gets deleted (no proof of that in the log),
but because of that the update to the watch fails:

```
[2020-01-16T12:06:28,009][DEBUG][o.e.x.w.e.ExecutionService] [integTest-0] failed to execute watch [cluster_health_watch]
org.elasticsearch.index.engine.VersionConflictEngineException: [cluster_health_watch]: version conflict, required seqNo [1], primary term [1]. but no document was found
        at org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1037) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1007) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:896) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:803) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:775) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:740) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:256) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:159) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:191) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:116) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:82) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:894) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.
0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:109) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:373) ~[elasticsearch-8.0.0
-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:296) ~[elasticsearch-8.0.0-SNAPSHOT.jar:
8.0.0-SNAPSHOT]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$24(IndexShard.java:2763) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2737) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:835) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:292) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:255) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:692) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:688) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
```

I think the additional logging will give more insight why the watch is not removed from the trigger service,
which causes the watch count to be off.

Relates to elastic#32299
  • Loading branch information
martijnvg committed Jan 21, 2020
1 parent d64e115 commit f6406c3
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,10 @@ public void postIndex(ShardId shardId, Engine.Index index, Exception ex) {
if (isWatchDocument(shardId.getIndexName())) {
logger.debug(() -> new ParameterizedMessage("removing watch [{}] from trigger", index.id()), ex);
triggerService.remove(index.id());
} else {
logger.debug(
() -> new ParameterizedMessage("not removing watch [{}] from trigger, because [{}] != [{}] configuration.active=[{}]",
index.id(), shardId.getIndexName(), configuration.index, configuration.active, ex));
}
}

Expand All @@ -167,9 +171,9 @@ public void postIndex(ShardId shardId, Engine.Index index, Exception ex) {
@Override
public Engine.Delete preDelete(ShardId shardId, Engine.Delete delete) {
if (isWatchDocument(shardId.getIndexName())) {
logger.debug("removing watch [{}] to trigger service via delete", delete.id());
triggerService.remove(delete.id());
}

return delete;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ protected Settings restAdminSettings() {
return Settings.builder().put(ThreadContext.PREFIX + ".Authorization", token).build();
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/32299")
public void testMonitorClusterHealth() throws Exception {
String watchId = "cluster_health_watch";

Expand Down

0 comments on commit f6406c3

Please sign in to comment.