Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition with migrations #101654

Closed
tylersmalley opened this issue Jun 8, 2021 · 4 comments
Closed

Possible race condition with migrations #101654

tylersmalley opened this issue Jun 8, 2021 · 4 comments
Labels
Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@tylersmalley
Copy link
Contributor

tylersmalley commented Jun 8, 2021

The backport of using the default distribution for all functional tests #94968 created many failures which were not seen during the original PR against master. These failures happen during the loading of esArchive data containing a .kibana index. During an investigation with @TinaHeiligers we determined that it appeared isolated to archives that were not utilizing an alias and directly creating a .kibana index and running migrations on it. The long-term fix here is to remove any usage of esArchiver with the saved object's indices and instead use the import API. This work has already begun, but it's going to take time and the concern here is this might be an actual issue. I think we need to understand if that is the case, or it's that we're doing something in esArchiver which is unsupported.

Minimal reproduction:

  • Switch to the 7.x branch (current SHA: 2e3b200)
  • Edit test/functional/config.js, removing the --oss Kibana server argument
diff --git a/test/functional/config.js b/test/functional/config.js
index 1048bd72dc5..b25d1da0218 100644
--- a/test/functional/config.js
+++ b/test/functional/config.js
@@ -40,7 +40,6 @@ export default async function ({ readConfigFile }) {
       ...commonConfig.get('kbnTestServer'),
       serverArgs: [
         ...commonConfig.get('kbnTestServer.serverArgs'),
-        '--oss',
         '--telemetry.optIn=false',
         '--savedObjects.maxImportPayloadBytes=10485760',
       ],
  • Download and extract esArchive which writes to the Kibana archive. Downloading the archive as there is a backport pending updating this mapping to use an alias. The esArchive this is taken from is the deprecations_service archive.
    wget -qO- https://gist.github.com/tylersmalley/cffdf820615ef661eec666e3fb1a1b96/archive/7b260c33ab207026b3a3782741a9449358707e2c.zip | jar xvf /dev/stdin
  • Start the functional tests server
    node scripts/functional_tests_server.js --config test/functional/config.js
  • Load the archive
node scripts/es_archiver.js -v --config test/functional/config.js load `pwd`/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c

The archive might need loaded a few times, as the failure doesn't appear consistent.

 debg Loading config file from "/Users/tyler/elastic/kibana/test/functional/config.js"
 debg Loading config file from "/Users/tyler/elastic/kibana/test/common/config.js"
 info [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  Loading "mappings.json"
 info [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  Loading "data.json"
 info [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  Deleted existing index ".kibana_7.14.0_001"
 info [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  Deleted existing index ".kibana_task_manager_7.14.0_001"
 info [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  Created index ".kibana"
 debg [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  ".kibana" settings {"index":{"number_of_replicas":"0","number_of_shards":"1"}}
 info [deprecations_service] Indexed 1 docs into ".kibana"
 debg Migrating saved objects
ERROR [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  request failed (attempt=1/5): socket hang up
ERROR [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  request failed (attempt=2/5): connect ECONNREFUSED 127.0.0.1:5620
ERROR [/home/tyler/elastic/kibana/cffdf820615ef661eec666e3fb1a1b96-7b260c33ab207026b3a3782741a9449358707e2c]  request failed (attempt=3/5): connect ECONNREFUSED 127.0.0.1:5620

The failure case is the migration fails and crashes the server.

   │ info [o.e.c.m.MetadataDeleteIndexService] [Tylers-MacBook-Pro.local] [.kibana_7.14.0_001/M-ENkb_8TXi9l-8t1WwAAw] deleting index
   │ info [o.e.c.m.MetadataDeleteIndexService] [Tylers-MacBook-Pro.local] [.kibana_task_manager_7.14.0_001/FVnOSmrDSNaXcgNdWsWVog] deleting index
   │ info [o.e.c.m.MetadataCreateIndexService] [Tylers-MacBook-Pro.local] [.kibana] creating index, cause [api], templates [], shards [1]/[0]
   │ info [o.e.c.r.a.AllocationService] [Tylers-MacBook-Pro.local] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana][0]]]).
   │ info [o.e.c.m.MetadataMappingService] [Tylers-MacBook-Pro.local] [.kibana/D5Ulw0umQk-b8IlxjRxj5g] update_mapping [_doc]
   │ proc [kibana] server    log   [11:41:17.548] [info][savedobjects-service] [.kibana_task_manager] INIT -> CREATE_NEW_TARGET. took: 5ms.
   │ proc [kibana] server    log   [11:41:17.552] [info][savedobjects-service] [.kibana] INIT -> LEGACY_SET_WRITE_BLOCK. took: 11ms.
   │ info [o.e.c.m.MetadataCreateIndexService] [Tylers-MacBook-Pro.local] [.kibana_task_manager_7.14.0_001] creating index, cause [api], templates [], shards [1]/[1]
   │ info [o.e.c.r.a.AllocationService] [Tylers-MacBook-Pro.local] updating number_of_replicas to [0] for indices [.kibana_task_manager_7.14.0_001]
   │ info [o.e.c.m.MetadataIndexStateService] [Tylers-MacBook-Pro.local] adding block write to indices [[.kibana/D5Ulw0umQk-b8IlxjRxj5g]]
   │ info [o.e.c.m.MetadataIndexStateService] [Tylers-MacBook-Pro.local] completed adding block write to indices [.kibana]
   │ proc [kibana] server    log   [11:41:18.424] [info][savedobjects-service] [.kibana] LEGACY_SET_WRITE_BLOCK -> LEGACY_CREATE_REINDEX_TARGET. took: 872ms.
   │ info [o.e.c.m.MetadataCreateIndexService] [Tylers-MacBook-Pro.local] [.kibana_pre6.5.0_001] creating index, cause [api], templates [], shards [1]/[1]
   │ info [o.e.c.r.a.AllocationService] [Tylers-MacBook-Pro.local] updating number_of_replicas to [0] for indices [.kibana_pre6.5.0_001]
   │ proc [kibana] server    log   [11:41:18.941] [info][savedobjects-service] [.kibana_task_manager] CREATE_NEW_TARGET -> MARK_VERSION_INDEX_READY. took: 1393ms.
   │ info [o.e.c.r.a.AllocationService] [Tylers-MacBook-Pro.local] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana_pre6.5.0_001][0]]]).
   │ proc [kibana] server    log   [11:41:19.336] [info][savedobjects-service] [.kibana_task_manager] MARK_VERSION_INDEX_READY -> DONE. took: 395ms.
   │ proc [kibana] server    log   [11:41:19.336] [info][savedobjects-service] [.kibana_task_manager] Migration completed after 1793ms
   │ proc [kibana] server    log   [11:41:19.526] [info][savedobjects-service] [.kibana] LEGACY_CREATE_REINDEX_TARGET -> LEGACY_REINDEX. took: 1102ms.
   │ proc [kibana] server    log   [11:41:19.540] [info][savedobjects-service] [.kibana] LEGACY_REINDEX -> LEGACY_REINDEX_WAIT_FOR_TASK. took: 14ms.
   │ info [o.e.c.m.MetadataCreateIndexService] [Tylers-MacBook-Pro.local] [.tasks] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]
   │ info [o.e.c.r.a.AllocationService] [Tylers-MacBook-Pro.local] updating number_of_replicas to [0] for indices [.tasks]
   │ proc [kibana] Unhandled Promise rejection detected:
   │ proc [kibana] 
   │ proc [kibana] ResponseError: Saved object index alias [.kibana_7.14.0] not found: index_not_found_exception
   │ proc [kibana]     at onBody (/Users/tyler/elastic/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:337:23)
   │ proc [kibana]     at IncomingMessage.onEnd (/Users/tyler/elastic/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:264:11)
   │ proc [kibana]     at IncomingMessage.emit (events.js:388:22)
   │ proc [kibana]     at endReadableNT (internal/streams/readable.js:1336:12)
   │ proc [kibana]     at processTicksAndRejections (internal/process/task_queues.js:82:21) {
   │ proc [kibana]   meta: {
   │ proc [kibana]     body: { error: [Object], status: 404 },
   │ proc [kibana]     statusCode: 404,
   │ proc [kibana]     headers: {
   │ proc [kibana]       'content-type': 'application/json; charset=UTF-8',
   │ proc [kibana]       'content-length': '445'
   │ proc [kibana]     },
   │ proc [kibana]     meta: {
   │ proc [kibana]       context: null,
   │ proc [kibana]       request: [Object],
   │ proc [kibana]       name: 'elasticsearch-js',
   │ proc [kibana]       connection: [Object],
   │ proc [kibana]       attempts: 0,
   │ proc [kibana]       aborted: false
   │ proc [kibana]     }
   │ proc [kibana]   },
   │ proc [kibana]   isBoom: true,
   │ proc [kibana]   isServer: true,
   │ proc [kibana]   data: null,
   │ proc [kibana]   output: {
   │ proc [kibana]     statusCode: 500,
   │ proc [kibana]     payload: {
   │ proc [kibana]       message: 'An internal server error occurred',
   │ proc [kibana]       statusCode: 500,
   │ proc [kibana]       error: 'Internal Server Error'
   │ proc [kibana]     },
   │ proc [kibana]     headers: {}
   │ proc [kibana]   },
   │ proc [kibana]   [Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/generalError'
   │ proc [kibana] }
   │ proc [kibana] 
   │ proc [kibana] Terminating process...
   │ proc [kibana]  server crashed  with status code 1

#101537 was created in hopes of circumventing this issue for now in the functional tests by using an alias to avoid the legacy migration step - however, that seems to have now created a similar failure on master, which is also flaky.

https://kibana-ci.elastic.co/job/elastic+kibana+master/14654/execution/node/794/log/

09:04:26           └-: discover sidebar
09:04:26             └-> "before all" hook in "discover sidebar"
09:04:26             └-> "before all" hook in "discover sidebar"
09:04:26               │ proc [kibana] {"log.level":"info","@timestamp":"2021-06-08T16:04:25.748Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"Sending error to Elastic APM: {\"id\":\"b3fce02fd127bc3191214a6fa44d221d\"}"}
09:04:26               │ proc [kibana]   log   [16:04:25.751] [error][dataEnhanced][data_enhanced][plugins] ResponseError: index_not_found_exception: [index_not_found_exception] Reason: no such index [.kibana_8.0.0] and [require_alias] request flag is [true] and [.kibana_8.0.0] is not an alias
09:04:26               │ proc [kibana]     at onBody (/dev/shm/workspace/kibana-build-9/node_modules/@elastic/elasticsearch/lib/Transport.js:337:23)
09:04:26               │ proc [kibana]     at IncomingMessage.onEnd (/dev/shm/workspace/kibana-build-9/node_modules/@elastic/elasticsearch/lib/Transport.js:264:11)
09:04:26               │ proc [kibana]     at IncomingMessage.emit (events.js:388:22)
09:04:26               │ proc [kibana]     at endReadableNT (internal/streams/readable.js:1336:12)
09:04:26               │ proc [kibana]     at processTicksAndRejections (internal/process/task_queues.js:82:21) {
@botelastic botelastic bot added the needs-team Issues missing a team label label Jun 8, 2021
@tylersmalley tylersmalley added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed needs-team Issues missing a team label labels Jun 8, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@tylersmalley
Copy link
Contributor Author

Leaning towards this not so much being an issue as it is a problem with our functional tests. Turns out that some of our archives are old enough they need to be processed through the legacy migration, which is actually determined by it not being an alias. Additionally, when importing another esArchive we are not currently clearing out all aliases which seems to be an issue.

tylersmalley pushed a commit to tylersmalley/kibana that referenced this issue Jun 11, 2021
Same as elastic#101950 - these archives are causing issues with a non-oss build elastic#101654

Last remaining fix for elastic#101118

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
tylersmalley pushed a commit that referenced this issue Jun 12, 2021
Same as #101950 - these archives are causing issues with a non-oss build #101654

Last remaining fix for #101118

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jun 12, 2021
…101992)

Same as elastic#101950 - these archives are causing issues with a non-oss build elastic#101654

Last remaining fix for elastic#101118

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
kibanamachine added a commit that referenced this issue Jun 12, 2021
…#102011)

Same as #101950 - these archives are causing issues with a non-oss build #101654

Last remaining fix for #101118

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

Co-authored-by: Tyler Smalley <tyler.smalley@elastic.co>
@joshdover
Copy link
Contributor

@tylersmalley Any updates here? I want to make sure that we get ahead of any races if we do indeed have any.

@tylersmalley
Copy link
Contributor Author

I don't think this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants