You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing this on the main branch (HEAD: 6185ddc).
Describe the bug:
I noticed this behavior starting a few weeks ago, though admittedly I haven't attempted to run functional tests locally for several months, so I'm not sure when this started.
Steps to reproduce:
It appears that this error happens during the SO migration process, which is initiated by the ES archiver that we use to load fixtures for most of our test suites.
The strange thing is, while I can reliably reproduce this locally, CI doesn't seem to run into this problem. I have reproduced this with two different functional test suites, saved_object_api_integration and spaces_api_integration.
Start the functional test server (FTS) for a given suite, for example: node scripts/functional_tests_server.js --config x-pack/test/saved_object_api_integration/spaces_only/config.ts
Wait for the server to finish loading, you'll see a message like this:
Elasticsearch and Kibana are ready for functional testing. Start the functional tests
in another terminal session by running this command from this directory:
node scripts/functional_test_runner --config x-pack/test/saved_object_api_integration/spaces_only/config.ts
Start the functional test runner (FTR) for the same suite, for example: node scripts/functional_test_runner.js --config x-pack/test/saved_object_api_integration/spaces_only/config.ts
Observe tests running normally (see logs below)
Wait for a time, usually the crash happens in less than 1 minute
Observe test error (see logs below)
Normal FTR logs
└-> "before all" hook: beforeTestSuite.trigger for "should return 200 success [resolvetype/exact-match,resolvetype/conflict] and bad request [hiddentype/any] and not found [resolvetype/alias-match,resolvetype/disabled,resolvetype/does-not-exist]"
└-> "before all" hook for "should return 200 success [resolvetype/exact-match,resolvetype/conflict] and bad request [hiddentype/any] and not found [resolvetype/alias-match,resolvetype/disabled,resolvetype/does-not-exist]"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Loading "mappings.json"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Loading "data.json"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Deleted existing index ".kibana_task_manager_8.1.0_001"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Deleted existing index ".kibana_8.1.0_001"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Created index ".kibana_8.1.0_001"
│ debg [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] ".kibana_8.1.0_001" settings {"index":{"auto_expand_replicas":"0-1","number_of_replicas":"0","number_of_shards":"1"}}
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Indexed 37 docs into ".kibana"
│ debg Migrating saved objects
│ debg [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Migrated Kibana index after loading Kibana data
│ debg [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Ensured that default space exists in .kibana
└-> should return 200 success [resolvetype/exact-match,resolvetype/conflict] and bad request [hiddentype/any] and not found [resolvetype/alias-match,resolvetype/disabled,resolvetype/does-not-exist]
└-> "before each" hook: global before each for "should return 200 success [resolvetype/exact-match,resolvetype/conflict] and bad request [hiddentype/any] and not found [resolvetype/alias-match,resolvetype/disabled,resolvetype/does-not-exist]"
└- ✓ pass (125ms)
└-> "after all" hook for "should return 200 success [resolvetype/exact-match,resolvetype/conflict] and bad request [hiddentype/any] and not found [resolvetype/alias-match,resolvetype/disabled,resolvetype/does-not-exist]"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Unloading indices from "mappings.json"
│ warn since spaces are enabled, all objects other than the default space were deleted from .kibana rather than deleting the whole index
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Deleted existing index ".kibana"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Unloading indices from "data.json"
└-> "after all" hook: afterTestSuite.trigger for "should return 200 success [resolvetype/exact-match,resolvetype/conflict] and bad request [hiddentype/any] and not found [resolvetype/alias-match,resolvetype/disabled,resolvetype/does-not-exist]"
└-> "after all" hook: afterTestSuite.trigger in "_bulk_resolve"
Normal FTS logs
│ info [o.e.c.m.MetadataDeleteIndexService] [node-01] [.kibana_8.1.0_001/-VEfrTVtSHSgYTB17bF2aQ] deleting index
│ info [o.e.c.m.MetadataDeleteIndexService] [node-01] [.kibana_task_manager_8.1.0_001/FWsDyhcZTOugR9MRc5Xrog] deleting index
│ info [o.e.c.m.MetadataCreateIndexService] [node-01] [.kibana_8.1.0_001] creating index, cause [api], templates [], shards [1]/[0]
│ info [o.e.c.r.a.AllocationService] [node-01] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana_8.1.0_001][0]]])." previous.health="YELLOW" reason="shards started [[.kibana_8.1.0_001][0]]"
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/_i26TM2mRj6709Up8xm4VQ] update_mapping [_doc]
│ proc [kibana] [2021-12-29T14:21:10.026-05:00][INFO ][savedobjects-service] [.kibana_task_manager] INIT -> CREATE_NEW_TARGET. took: 2ms.
│ proc [kibana] [2021-12-29T14:21:10.027-05:00][INFO ][savedobjects-service] [.kibana] INIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 4ms.
│ proc [kibana] [2021-12-29T14:21:10.029-05:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT -> OUTDATED_DOCUMENTS_SEARCH_READ. took: 2ms.
│ info [o.e.c.m.MetadataCreateIndexService] [node-01] [.kibana_task_manager_8.1.0_001] creating index, cause [api], templates [], shards [1]/[1]
│ info [o.e.c.r.a.AllocationService] [node-01] updating number_of_replicas to [0] for indices [.kibana_task_manager_8.1.0_001]
│ proc [kibana] [2021-12-29T14:21:10.033-05:00][INFO ][savedobjects-service] [.kibana] Starting to process 9 documents.
│ proc [kibana] [2021-12-29T14:21:10.033-05:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_READ -> OUTDATED_DOCUMENTS_TRANSFORM. took: 4ms.
│ proc [kibana] [2021-12-29T14:21:10.036-05:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_TRANSFORM -> TRANSFORMED_DOCUMENTS_BULK_INDEX. took: 3ms.
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/_i26TM2mRj6709Up8xm4VQ] update_mapping [_doc]
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/_i26TM2mRj6709Up8xm4VQ] update_mapping [_doc]
│ info [o.e.c.r.a.AllocationService] [node-01] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana_task_manager_8.1.0_001][0]]])." previous.health="YELLOW" reason="shards started [[.kibana_task_manager_8.1.0_001][0]]"
│ proc [kibana] [2021-12-29T14:21:11.068-05:00][INFO ][savedobjects-service] [.kibana_task_manager] CREATE_NEW_TARGET -> MARK_VERSION_INDEX_READY. took: 1042ms.
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/_i26TM2mRj6709Up8xm4VQ] update_mapping [_doc]
│ proc [kibana] [2021-12-29T14:21:11.761-05:00][INFO ][savedobjects-service] [.kibana_task_manager] MARK_VERSION_INDEX_READY -> DONE. took: 693ms.
│ proc [kibana] [2021-12-29T14:21:11.761-05:00][INFO ][savedobjects-service] [.kibana_task_manager] Migration completed after 1737ms
│ proc [kibana] [2021-12-29T14:21:11.838-05:00][INFO ][savedobjects-service] [.kibana] TRANSFORMED_DOCUMENTS_BULK_INDEX -> OUTDATED_DOCUMENTS_SEARCH_READ. took: 1802ms.
│ proc [kibana] [2021-12-29T14:21:11.842-05:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_READ -> OUTDATED_DOCUMENTS_SEARCH_CLOSE_PIT. took: 4ms.
│ proc [kibana] [2021-12-29T14:21:11.844-05:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_CLOSE_PIT -> OUTDATED_DOCUMENTS_REFRESH. took: 2ms.
│ proc [kibana] [2021-12-29T14:21:11.872-05:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_REFRESH -> UPDATE_TARGET_MAPPINGS. took: 28ms.
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/_i26TM2mRj6709Up8xm4VQ] update_mapping [_doc]
│ proc [kibana] [2021-12-29T14:21:11.976-05:00][INFO ][savedobjects-service] [.kibana] UPDATE_TARGET_MAPPINGS -> UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK. took: 104ms.
│ info [o.e.t.LoggingTaskListener] [node-01] 3939 finished with response BulkByScrollResponse[took=110.2ms,timed_out=false,sliceId=null,updated=37,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
│ proc [kibana] [2021-12-29T14:21:12.185-05:00][INFO ][savedobjects-service] [.kibana] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK -> DONE. took: 209ms.
│ proc [kibana] [2021-12-29T14:21:12.185-05:00][INFO ][savedobjects-service] [.kibana] Migration completed after 2162ms
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/_i26TM2mRj6709Up8xm4VQ] update_mapping [_doc]
Error FTR logs
└-> "before all" hook for "should return 200 success [isolatedtype/space1-isolatedtype-id,sharedtype/all_spaces,sharedtype/default_and_space_1,sharedtype/only_space_1,sharecapabletype/only_space_1,globaltype/globaltype-id] and not found [isolatedtype/defaultspace-isolatedtype-id,isolatedtype/space2-isolatedtype-id,sharedtype/only_space_2,sharecapabletype/only_default_space,hiddentype/any,dashboard/does-not-exist]"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Loading "mappings.json"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Loading "data.json"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Deleted existing index ".kibana_task_manager_8.1.0_001"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Deleted existing index ".kibana_8.1.0_001"
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Created index ".kibana_8.1.0_001"
│ debg [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] ".kibana_8.1.0_001" settings {"index":{"auto_expand_replicas":"0-1","number_of_replicas":"0","number_of_shards":"1"}}
│ info [x-pack/test/saved_object_api_integration/common/fixtures/es_archiver/saved_objects/spaces] Indexed 37 docs into ".kibana"
│ debg Migrating saved objects
│ERROR [migrate saved objects] request failed (attempt=1/5): connect ECONNREFUSED 127.0.0.1:5620
│ERROR [migrate saved objects] request failed (attempt=2/5): connect ECONNREFUSED 127.0.0.1:5620
│ERROR [migrate saved objects] request failed (attempt=3/5): connect ECONNREFUSED 127.0.0.1:5620
│ERROR [migrate saved objects] request failed (attempt=4/5): connect ECONNREFUSED 127.0.0.1:5620
│ERROR [migrate saved objects] request failed (attempt=5/5): connect ECONNREFUSED 127.0.0.1:5620
└- ✖ fail: saved objects spaces only enabled _bulk_update within the space_1 space "before all" hook for "should return 200 success [isolatedtype/space1-isolatedtype-id,sharedtype/all_spaces,sharedtype/default_and_space_1,sharedtype/only_space_1,sharecapabletype/only_space_1,globaltype/globaltype-id] and not found [isolatedtype/defaultspace-isolatedtype-id,isolatedtype/space2-isolatedtype-id,sharedtype/only_space_2,sharecapabletype/only_default_space,hiddentype/any,dashboard/does-not-exist]"
│ Error: [migrate saved objects] request failed (attempt=5/5): connect ECONNREFUSED 127.0.0.1:5620 -- and ran out of retries
│ at KbnClientRequester.request (node_modules/@kbn/test/target_node/kbn_client/kbn_client_requester.js:158:15)
│ at runMicrotasks (<anonymous>)
│ at processTicksAndRejections (node:internal/process/task_queues:96:5)
│ at KbnClientSavedObjects.migrate (node_modules/@kbn/test/target_node/kbn_client/kbn_client_saved_objects.js:51:9)
│ at migrateKibanaIndex (node_modules/@kbn/es-archiver/target_node/lib/indices/kibana_index.js:67:3)
│ at loadAction (node_modules/@kbn/es-archiver/target_node/actions/load.js:84:5)
│ at EsArchiver.load (node_modules/@kbn/es-archiver/target_node/es_archiver.js:84:12)
│ at Object.apply (node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16)
Error FTS logs
│ info [o.e.c.m.MetadataDeleteIndexService] [node-01] [.kibana_8.1.0_001/zy6LgiktQoKhy2U85FfHQQ] deleting index
│ info [o.e.c.m.MetadataDeleteIndexService] [node-01] [.kibana_task_manager_8.1.0_001/kQs6lzRsTDuyXE3tz5O6oQ] deleting index
│ info [o.e.c.m.MetadataCreateIndexService] [node-01] [.kibana_8.1.0_001] creating index, cause [api], templates [], shards [1]/[0]
│ info [r.suppressed] [node-01] path: /.kibana_8.1.0/_doc/telemetry%3Atelemetry, params: {index=.kibana_8.1.0, id=telemetry:telemetry}
│ org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana_8.1.0][telemetry:telemetry]: routing [null]]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:217) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.onFailure(TransportSingleShardAction.java:202) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleException(TransportSingleShardAction.java:258) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1349) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1458) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1432) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:50) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:47) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:77) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:763) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
│ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
│ at java.lang.Thread.run(Thread.java:833) [?:?]
│ Caused by: org.elasticsearch.transport.RemoteTransportException: [node-01][127.0.0.1:9300][indices:data/read/get[s]]
│ Caused by: org.elasticsearch.index.shard.IllegalIndexShardStateException: CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]
│ at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:2075) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:1194) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:179) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:95) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:79) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:113) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:34) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:104) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:775) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ ... 3 more
│ proc [kibana] Unhandled Promise rejection detected:
│ proc [kibana]
│ info [o.e.c.r.a.AllocationService] [node-01] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana_8.1.0_001][0]]])." previous.health="YELLOW" reason="shards started [[.kibana_8.1.0_001][0]]"
│ proc [kibana] ResponseError: No shard available for [get [.kibana_8.1.0][telemetry:telemetry]: routing [null]]: no_shard_available_action_exception: [illegal_index_shard_state_exception] Reason: CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]
│ proc [kibana] at KibanaTransport.request (/Users/joe/GitHub/kibana-3/node_modules/@elastic/transport/src/Transport.ts:517:17)
│ proc [kibana] at runMicrotasks (<anonymous>)
│ proc [kibana] at processTicksAndRejections (node:internal/process/task_queues:96:5)
│ proc [kibana] at ApmClient.GetApi [as get] (/Users/joe/GitHub/kibana-3/node_modules/@elastic/elasticsearch/src/api/api/get.ts:59:10) {
│ proc [kibana] meta: {
│ proc [kibana] body: { error: [Object], status: 503 },
│ proc [kibana] statusCode: 503,
│ proc [kibana] headers: {
│ proc [kibana] 'x-opaque-id': 'unknownId',
│ proc [kibana] 'x-elastic-product': 'Elasticsearch',
│ proc [kibana] 'content-type': 'application/json;charset=utf-8',
│ proc [kibana] 'content-length': '663'
│ proc [kibana] },
│ proc [kibana] meta: {
│ proc [kibana] context: null,
│ proc [kibana] request: [Object],
│ proc [kibana] name: 'elasticsearch-js',
│ proc [kibana] connection: [Object],
│ proc [kibana] attempts: 0,
│ proc [kibana] aborted: false
│ proc [kibana] },
│ proc [kibana] warnings: [Getter]
│ proc [kibana] },
│ proc [kibana] isBoom: true,
│ proc [kibana] isServer: true,
│ proc [kibana] data: null,
│ proc [kibana] output: {
│ proc [kibana] statusCode: 503,
│ proc [kibana] payload: {
│ proc [kibana] message: 'No shard available for [get [.kibana_8.1.0][telemetry:telemetry]: routing [null]]: no_shard_available_action_exception: [illegal_index_shard_state_exception] Reason: CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]',
│ proc [kibana] statusCode: 503,
│ proc [kibana] error: 'Service Unavailable'
│ proc [kibana] },
│ proc [kibana] headers: {}
│ proc [kibana] },
│ proc [kibana] [Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/esUnavailable'
│ proc [kibana] }
│ proc [kibana]
│ proc [kibana] Terminating process...
│ proc [kibana] server crashed with status code 1
│ info [o.e.c.m.MetadataMappingService] [node-01] [.kibana_8.1.0_001/bHurwnsBTn2m5Akuc_ZXzA] update_mapping [_doc]
Expected behavior:
Kibana functional tests should not fail when run locally.
Any additional context:
The interesting thing to me in the logs is that, when the FTS errors out, I always see these lines, without fail:
│ info [r.suppressed] [node-01] path: /.kibana_8.1.0/_doc/telemetry%3Atelemetry, params: {index=.kibana_8.1.0, id=telemetry:telemetry}
│ org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana_8.1.0][telemetry:telemetry]: routing [null]]
I believe that's originating from the Telemetry plugin's getIsOptedIn function here:
my functional tests work fine, and I see the caught error in the FTS log:
Click to expand
│ info [o.e.c.m.MetadataDeleteIndexService] [node-01] [.kibana_task_manager_8.1.0_001/K1ZYrqBPTuufNdeTzBQ_LA] deleting index
│ info [o.e.c.m.MetadataDeleteIndexService] [node-01] [.kibana_8.1.0_001/8X3-zVeiQoax82cS8WUvlw] deleting index
│ info [o.e.c.m.MetadataCreateIndexService] [node-01] [.kibana_8.1.0_001] creating index, cause [api], templates [], shards [1]/[0]
│ info [r.suppressed] [node-01] path: /.kibana_8.1.0/_doc/telemetry%3Atelemetry, params: {index=.kibana_8.1.0, id=telemetry:telemetry}
│ org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana_8.1.0][telemetry:telemetry]: routing [null]]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:217) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.onFailure(TransportSingleShardAction.java:202) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleException(TransportSingleShardAction.java:258) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1349) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1458) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1432) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:50) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:47) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:77) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:763) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
│ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
│ at java.lang.Thread.run(Thread.java:833) [?:?]
│ Caused by: org.elasticsearch.transport.RemoteTransportException: [node-01][127.0.0.1:9300][indices:data/read/get[s]]
│ Caused by: org.elasticsearch.index.shard.IllegalIndexShardStateException: CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]
│ at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:2075) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:1194) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:179) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:95) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:79) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:113) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:34) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:104) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:775) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
│ ... 3 more
│ proc [kibana] [2021-12-29T14:46:59.760-05:00][ERROR][plugins.telemetry] WTF: No shard available for [get [.kibana_8.1.0][telemetry:telemetry]: routing [null]]: no_shard_available_action_exception: [illegal_index_shard_state_exception] Reason: CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]
│ info [o.e.c.r.a.AllocationService] [node-01] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana_8.1.0_001][0]]])." previous.health="YELLOW" reason="shards started [[.kibana_8.1.0_001][0]]"
The only other thing that stands out is that there was a recent addition to the Fleet plugin which now consumes this function from the Telemetry plugin (this was added ~2 months ago via #115180):
This is not enclosed in a try/catch, where the other usages of this function are.
So, it appears this function is getting called in the middle of the migration process, causing the server to crash.
What I can't work out is: why isn't this error happening in CI? Is the fleet plugin disabled in CI? Edit: it looks like the Fleet plugin can't be disabled. But the Telemetry plugin can be disabled, so maybe that's disabled in CI?
At any rate, while the Fleet plugin appears to be causing this bug, I think the root cause is really the Telemetry plugin, IMO this function should fail gracefully.
The text was updated successfully, but these errors were encountered:
jportner
added
bug
Fixes for quality problems that affect the customer experience
Team:Core
Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
labels
Dec 29, 2021
Is the fleet plugin disabled in CI? Edit: it looks like the Fleet plugin can't be disabled. But the Telemetry plugin can be disabled, so maybe that's disabled in CI?
According to the FT config files, it's enabled and opted-in for your specific config file (because it inherits it from the x-pack/test/api_integration/config.ts file).
I just ran it with the latest main and it didn't fail for me either. Probably the upstream user of the API fixed their unhandled promises?
I just ran it with the latest main and it didn't fail for me either. Probably the upstream user of the API fixed their unhandled promises?
I pulled the latest main just now and reproduced it again locally. Also took a look at the problematic consumer (fleet plugin) and they didn't change anything.
I guess there is some sort of race condition that is causing this function to be called during the migration process, and when that happens, an error occurs and crashes the Kibana server.
Kibana version:
I am experiencing this on the
main
branch (HEAD: 6185ddc).Describe the bug:
I noticed this behavior starting a few weeks ago, though admittedly I haven't attempted to run functional tests locally for several months, so I'm not sure when this started.
Steps to reproduce:
It appears that this error happens during the SO migration process, which is initiated by the ES archiver that we use to load fixtures for most of our test suites.
The strange thing is, while I can reliably reproduce this locally, CI doesn't seem to run into this problem. I have reproduced this with two different functional test suites,
saved_object_api_integration
andspaces_api_integration
.node scripts/functional_tests_server.js --config x-pack/test/saved_object_api_integration/spaces_only/config.ts
node scripts/functional_test_runner.js --config x-pack/test/saved_object_api_integration/spaces_only/config.ts
Normal FTR logs
Normal FTS logs
Error FTR logs
Error FTS logs
Expected behavior:
Kibana functional tests should not fail when run locally.
Any additional context:
The interesting thing to me in the logs is that, when the FTS errors out, I always see these lines, without fail:
I believe that's originating from the Telemetry plugin's
getIsOptedIn
function here:kibana/src/plugins/telemetry/server/plugin.ts
Line 137 in 00cc5fe
This isn't enclosed in a try/catch, but if I add one:
my functional tests work fine, and I see the caught error in the FTS log:
Click to expand
The only other thing that stands out is that there was a recent addition to the Fleet plugin which now consumes this function from the Telemetry plugin (this was added ~2 months ago via #115180):
kibana/x-pack/plugins/fleet/server/telemetry/sender.ts
Line 87 in 905d6b5
This is not enclosed in a try/catch, where the other usages of this function are.
So, it appears this function is getting called in the middle of the migration process, causing the server to crash.
What I can't work out is: why isn't this error happening in CI?
Is the fleet plugin disabled in CI?Edit: it looks like the Fleet plugin can't be disabled. But the Telemetry plugin can be disabled, so maybe that's disabled in CI?At any rate, while the Fleet plugin appears to be causing this bug, I think the root cause is really the Telemetry plugin, IMO this function should fail gracefully.
The text was updated successfully, but these errors were encountered: