Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dangling indices more safely #48366

Closed
6 of 7 tasks
pugnascotia opened this issue Oct 22, 2019 · 3 comments · Fixed by #59698
Closed
6 of 7 tasks

Handle dangling indices more safely #48366

pugnascotia opened this issue Oct 22, 2019 · 3 comments · Fixed by #59698
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed Meta label for distributed team (obsolete) v8.0.0-alpha1

Comments

@pugnascotia
Copy link
Contributor

pugnascotia commented Oct 22, 2019

Dangling indices are indices that exist on disk on one or more nodes but which do not currently exist in the cluster state. They arise in a number of situations, such as:

  • A user overflows the index graveyard by deleting more than 500 indices while a node is offline and then the node rejoins the cluster
  • A node (unsafely) moves from one cluster to another, perhaps because the original cluster lost all its master nodes
  • A user (unsafely) meddles with the contents of the data path, maybe restoring an old index folder from a backup
  • A disk partially fails and the user has no replicas and no snapshots and wants to (unsafely) recover whatever they can
  • A cluster loses all master nodes and those are (unsafely) restored from backup, but the backup does not contain the index.

Today we greedily and automatically import any dangling indices found on disk if possible, with surprising results:

  • A deleted index may suddenly reappear when a node joins the cluster.
  • A user may delete an index and see the immediate creation of another index with the same name, containing stale mappings and old data. They may start to index into this ancient index before realising. Data loss abounds.
  • We may not be able to find copies of all of the shards of the index, resulting in a red cluster state.
  • We do not attempt to import the freshest metadata for the index, and use a possibly-stale copy of the in-sync set to pick primaries. Data loss abounds.

What can we do about this?

In the long run we would prefer to avoid auto-importing dangling indices, but we must recognise that there are some desperate situations where a dangling index import is the best option and must therefore continue to support it. Rather than automatically importing a dangling index as soon as it is discovered, we could offer an API to help users manage their dangling indices. Something like this:

GET /_dangling

Gets a list of the dangling indices across the cluster. The response could include the index metadata (the one with the highest version in case of conflict) and mappings and some information about the underlying shards to help the user decide whether it should be deleted without needing to import it first.

DELETE /_dangling/$INDEX_UUID

Marks the dangling index for deletion.

POST /_dangling/$INDEX_UUID

Imports the given index into the cluster. This would require a body with accept_data_loss: true. It may be necessary to allow dangling indices to be recovered under a different name too. Maybe we should allow specifying a particular node in case of conflicting metadata versions.

It should also be possible to use a wildcard i.e. POST /_dangling/*, so that if a user is in a desperate situation, they can still quickly import any dangling indices without having to iterate over the whole list.

With this API we would warn the user about the existence of dangling indices through some UI (e.g. periodic log messages, or something in Kibana) and it would then be up to them to resolve that warning at their convenience.

The API sketch above is predicated on being able to disable the automatic import of dangling indices. We propose to introduce a new setting, which will default to disabling automatic imports. At a later date we will remove the setting, along with the automatic imports functionality since it is inherently unsafe.

Steps

@pugnascotia pugnascotia added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v8.0.0 labels Oct 22, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

@ywelsch ywelsch changed the title Handle dangling indices more safety Handle dangling indices more safely Oct 23, 2019
@pugnascotia pugnascotia self-assigned this Oct 23, 2019
pugnascotia added a commit that referenced this issue Nov 29, 2019
Introduce a new static setting, `gateway.auto_import_dangling_indices`, which prevents dangling indices from being automatically imported. Part of #48366.
ywelsch pushed a commit that referenced this issue Jan 8, 2020
Introduce a new static setting, `gateway.auto_import_dangling_indices`, which prevents dangling indices from being automatically imported. Part of #48366.
@pugnascotia
Copy link
Contributor Author

@ywelsch / @DaveCTurner what do you think the behaviour should be in the POST /_dangling/* restore case where a dangling index exists on multiple nodes? Import everything else, and report the failure? Don't restore anything?

@DaveCTurner
Copy link
Contributor

By default I think we should import the latest version of the index metadata that we can find. The metadata version number is not wholly trustworthy when it's detached from the cluster metadata, but it's close enough IMO.

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
…tic#49174)

Introduce a new static setting, `gateway.auto_import_dangling_indices`, which prevents dangling indices from being automatically imported. Part of elastic#48366.
@rjernst rjernst added the Team:Distributed Meta label for distributed team (obsolete) label May 4, 2020
pugnascotia added a commit that referenced this issue Jun 16, 2020
Part of #48366. Implement an API for listing, importing and deleting dangling
indices.

Co-authored-by: David Turner <david.turner@elastic.co>
pugnascotia added a commit that referenced this issue Jun 16, 2020
Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.

Co-authored-by: David Turner <david.turner@elastic.co>
pugnascotia added a commit that referenced this issue Jun 18, 2020
The dangling_indices.import API name could cause issues in the client
libs because import is a reserved word in many languages. Rename the
API to avoid this, and rename the other APIs for consistency.

Related to #48366.
pugnascotia added a commit that referenced this issue Jun 18, 2020
The dangling_indices.import API name could cause issues in the client
libs because import is a reserved word in many languages. Rename the
API to avoid this, and rename the other APIs for consistency.

Related to #48366.
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this issue Jul 2, 2020
Part of elastic#48366. Now that there is a dedicated API for dangling indices,
the auto-import behaviour can default to off.
pugnascotia added a commit that referenced this issue Jul 3, 2020
Part of #48366. Add documentation for the dangling indices
API added in #58176.

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
pugnascotia added a commit that referenced this issue Jul 9, 2020
Part of #48366. Now that there is a dedicated API for dangling indices,
the auto-import behaviour can default to off.
pugnascotia added a commit that referenced this issue Jul 9, 2020
Part of #48366. Add documentation for the dangling indices
API added in #58176.

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this issue Jul 15, 2020
Part of elastic#48366. Now that there is a dedicated API for dangling indices,
the auto-import behaviour can default to off.
pugnascotia added a commit that referenced this issue Jul 15, 2020
Backport of #58898.

Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import
behaviour can default to off. Also add a note to the breaking changes for 7.9.0.
pugnascotia added a commit that referenced this issue Jul 15, 2020
Backport of #58898.

Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import
behaviour can default to off. Also add a note to the breaking changes for 7.9.0.
pugnascotia added a commit that referenced this issue Jul 17, 2020
Closes #48366. Remove all traces of automatically importing dangling indices. This functionality is
deprecated from 7.9.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed Meta label for distributed team (obsolete) v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants