Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring a snapshot from S3 to 5.6.2 results in a hung and incomplete restore. #26865

Closed
jdoss opened this issue Oct 3, 2017 · 14 comments
Closed
Assignees
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@jdoss
Copy link

jdoss commented Oct 3, 2017

Elasticsearch version (bin/elasticsearch --version):

# rpm -qa |grep elasticsearch
elasticsearch-5.6.2-1.noarch

Plugins installed:

discovery-ec2
repository-s3
x-pack

JVM version (java -version):

# java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

OS version (uname -a if on a Unix-like system):

Fedora 26
Linux 4.12.14-300.fc26.x86_64 #1 SMP Wed Sep 20 16:28:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

We have had about twenty indexes that are stuck in a red state after trying to restore a snapshot taken from elasticsearch 5.4.1 to a brand new cluster running 5.6.2. For this issue, I will focus on one index logstash-2017.09.20.

You can see here that the index is in a red state:

# curl -XGET 'localhost:9200/_cluster/health/logstash-2017.09.20?level=shards&pretty'
{
  "cluster_name" : "redacted",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 11,
  "number_of_data_nodes" : 5,
  "active_primary_shards" : 4,
  "active_shards" : 4,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 98.60064585575888,
  "indices" : {
    "logstash-2017.09.20" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 0,
      "active_primary_shards" : 4,
      "active_shards" : 4,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 1,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "1" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "2" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "3" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "4" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        }
      }
    }
  }
}

You can see the restore says it finished with a SUCCESS:

# curl -XGET 'localhost:9200/_snapshot/my_cool_backup/snapshot_0?pretty'
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_0",
      "uuid" : "e_wavyGfTD-SwXC-imkF0g",
      "version_id" : 5040199,
      "version" : "5.4.1",
      "indices" : [
        ** SNIP **
      ],
      "state" : "SUCCESS",
      "start_time" : "2017-09-27T07:00:01.807Z",
      "start_time_in_millis" : 1506495601807,
      "end_time" : "2017-09-27T08:44:35.377Z",
      "end_time_in_millis" : 1506501875377,
      "duration_in_millis" : 6273570,
      "failures" : [ ],
      "shards" : {
        "total" : 929,
        "failed" : 0,
        "successful" : 929
      }
    }
  ]
}

Looking at the restore process in detail for the example index, you can see that it says this index has been put into the DONE state for each shard.

$ curl -XGET 'localhost:9200/_snapshot/my_cool_backup/snapshot_0/_status?pretty'
"snapshots" : [
    {
      "snapshot" : "snapshot_0",
      "repository" : "my_cool_backup",
      "uuid" : "e_wavyGfTD-SwXC-imkF0g",
      "state" : "SUCCESS",
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 929,
        "failed" : 0,
        "total" : 929
      },
      "stats" : {
        "number_of_files" : 2364,
        "processed_files" : 2364,
        "total_size_in_bytes" : 15393945691,
        "processed_size_in_bytes" : 15393945691,
        "start_time_in_millis" : 1506495618226,
        "time_in_millis" : 6252967
      },
      "indices" : {
        "logstash-2017.09.20" : {
                  "shards_stats" : {
                    "initializing" : 0,
                    "started" : 0,
                    "finalizing" : 0,
                    "done" : 5,
                    "failed" : 0,
                    "total" : 5
                  },
                  "stats" : {
                    "number_of_files" : 31,
                    "processed_files" : 31,
                    "total_size_in_bytes" : 168664,
                    "processed_size_in_bytes" : 168664,
                    "start_time_in_millis" : 1506495678150,
                    "time_in_millis" : 2401656
                  },
                  "shards" : {
                    "0" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 7,
                        "processed_files" : 7,
                        "total_size_in_bytes" : 118135,
                        "processed_size_in_bytes" : 118135,
                        "start_time_in_millis" : 1506495720316,
                        "time_in_millis" : 1949
                      }
                    },
                    "1" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 16,
                        "processed_files" : 16,
                        "total_size_in_bytes" : 33918,
                        "processed_size_in_bytes" : 33918,
                        "start_time_in_millis" : 1506495722992,
                        "time_in_millis" : 2804
                      }
                    },
                    "2" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 0,
                        "processed_files" : 0,
                        "total_size_in_bytes" : 0,
                        "processed_size_in_bytes" : 0,
                        "start_time_in_millis" : 1506498067865,
                        "time_in_millis" : 11941
                      }
                    },
                    "3" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 4,
                        "processed_files" : 4,
                        "total_size_in_bytes" : 8434,
                        "processed_size_in_bytes" : 8434,
                        "start_time_in_millis" : 1506495678150,
                        "time_in_millis" : 1206
                      }
                    },
                    "4" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 4,
                        "processed_files" : 4,
                        "total_size_in_bytes" : 8177,
                        "processed_size_in_bytes" : 8177,
                        "start_time_in_millis" : 1506495684287,
                        "time_in_millis" : 1164
                      }
                    }
                  }
                }

Looking at /_cat/recovery it says it's done too

# curl -XGET localhost:9200/_cat/recovery|grep logstash-2017.09.20

logstash-2017.09.20         0 7.9s  snapshot       done n/a n/a redacted data-03 my_cool_backup snapshot_0 1   1   100.0% 109 1699       1699       100.0% 2911728303 0 0 100.0%
logstash-2017.09.20         1 14.5m snapshot       done n/a n/a redacted  data-04 my_cool_backup snapshot_0 136 136 100.0% 136 2842065772 2842065772 100.0% 2842065772 0 0 100.0%
logstash-2017.09.20         2 1.7s  snapshot       done n/a n/a redacted data-00 my_cool_backup snapshot_0 1   1   100.0% 109 1699       1699       100.0% 2889504028 0 0 100.0%
logstash-2017.09.20         3 13.9m snapshot       done n/a n/a redacted data-02 my_cool_backup snapshot_0 127 127 100.0% 127 2929823683 2929823683 100.0% 2929823683 0 0 100.0%

But if you try to close the index it says that it is still being restored:

$ curl -XPOST 'localhost:9200/logstash-2017.09.20/_close?pretty'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "remote_transport_exception",
        "reason" : "[master-01][redacted:9300][indices:admin/close]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cannot close indices that are being restored: [[logstash-2017.09.20/crXjrjtwTEqkK6_ITG1HVQ]]"
  },
  "status" : 400
}

Looking in the logs it says that it failed to recover the index because the file already exists:

[2017-10-02T19:50:28,790][WARN ][o.e.c.a.s.ShardStateAction] [master-01] [logstash-2017.09.20][4] received shard failed for shard id [[logstash-2017.09.20][4]], allocation id [lW_4BSVGSc6phnI1vLEPWg], primary term [0], message [failed recovery], failure [RecoveryFailedException[[logstash-2017.09.20][4]: Recovery failed on {data-02}{Af43AKvBRf6r-PTr2s9KRg}{O1R6sKwAQK2FyYYmdFLjPA}{redacted}{redacted:9300}{aws_availability_zone=us-west-2c, ml.max_open_jobs=10, ml.enabled=true}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: IndexShardRestoreFailedException[failed to restore snapshot [snapshot_0/e_wavyGfTD-SwXC-imkF0g]]; nested: IndexShardRestoreFailedException[Failed to recover index]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si]; ]

[2017-10-02T19:50:28,790][WARN ][o.e.c.a.s.ShardStateAction] [master-01] [logstash-2017.09.20][4] received shard failed for shard id [[logstash-2017.09.20][4]
], allocation id [lW_4BSVGSc6phnI1vLEPWg], primary term [0], message [failed recovery], failure [RecoveryFailedException[[logstash-2017.09.20][4]: Recovery failed 
on {data-02}{Af43AKvBRf6r-PTr2s9KRg}{O1R6sKwAQK2FyYYmdFLjPA}{redacted}{redacted:9300}{aws_availability_zone=us-west-2c, ml.max_open_jobs=10, ml.enabled=
true}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: IndexShardRestoreFailedException[fa
iled to restore snapshot [snapshot_0/e_wavyGfTD-SwXC-imkF0g]]; nested: IndexShardRestoreFailedException[Failed to recover index]; nested: FileAlre
adyExistsException[/var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si]; ]
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2017.09.20][4]: Recovery failed on {data-02}{Af43AKvBRf6r-PTr2s9KRg}{O1R6sKwAQK2FyYYmdFL
jPA}{redacted}{redacted:9300}{aws_availability_zone=us-west-2c, ml.max_open_jobs=10, ml.enabled=true}
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1511) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.6.2.jar:5.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:299) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:405) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: failed to restore snapshot [snapshot_0/e_wavyGfTD-SwXC-imkF0g]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:993) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:400) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: Failed to recover index
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1679) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:991) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:400) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: java.nio.file.FileAlreadyExistsException: /var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[?:1.8.0_141]
        at java.nio.file.Files.newOutputStream(Files.java:216) ~[?:1.8.0_141]
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.RateLimitedFSDirectory.createOutput(RateLimitedFSDirectory.java:40) ~[elasticsearch-5.6.2.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:73) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.elasticsearch.index.store.Store.createVerifyingOutput(Store.java:463) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restoreFile(BlobStoreRepository.java:1734) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1676) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:991) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:400) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more

And if you look on for that file it says is already exists, it is not present on the data node:

# ll /var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si
ls: cannot access '/var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si': No such file or directory

The only way I have been able to get the cluster out of this hung state is to do a full cluster shutdown and start it back up again. From there I am able to close these red indexes and retry the restore again. When I first encountered this issue, I had ~20 indexes that failed to restore. After retrying to restore these failures with the process above, I was able to get all but seven of them restored. The remaining failures are in the same state.

@danielmitterdorfer
Copy link
Member

That sounds like two problems to me:

  • State handling during recovery seems to be inconsistent / not to agree
  • File system issues

Can you please tell which file system you've used? Also, as you are on EC2: Did you configure EBS volumes or instance storage on the nodes?

@danielmitterdorfer
Copy link
Member

Also @imotov may have further ideas.

@danielmitterdorfer danielmitterdorfer added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs feedback_needed labels Oct 4, 2017
@jdoss
Copy link
Author

jdoss commented Oct 4, 2017

These are on AWS I3 servers with NVMe SSD instance storage. We are using XFS with LUKS on these disks.

@danielmitterdorfer danielmitterdorfer added :Plugin Repository S3 and removed :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs feedback_needed labels Oct 5, 2017
@danielmitterdorfer
Copy link
Member

Thanks for the feedback. I also talked to @imotov. As this is about S3 snapshot could you please have a look @tlrx?

@ywelsch
Copy link
Contributor

ywelsch commented Oct 5, 2017

There is a bit of confusion between the snapshot and the restore APIs on this issue:

@jdoss When you say

You can see the restore says it finished with a SUCCESS:

you're actually showing the result of the (successfully completed) snapshotting process, not the restore process (same mistake for showing the details).

The /_cat/recovery output also is consistent with the cluster health. It shows that shards 0 to 3 have successfully recovered. Shard 4 (the one causing the cluster health to be red) is not reported as done.

From the output shown it is not clear that the restore process is stuck. Note that we don't allow an index that is being restored to be closed. However, you can delete this index, which will also abort the restore process (same as when you delete a snapshot that's in progress, it will abort the snapshot).

The bug you're hitting here is the FileAlreadyExistsException, which we've seen already on other reports:
https://discuss.elastic.co/t/snapshot-restore-failed-recovery-of-index-getting-filealreadyexistsexception/100300

Could you perhaps share the snapshot (privately) with us?

@danielmitterdorfer I have my doubts that this is S3 related.

@danielmitterdorfer danielmitterdorfer added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository S3 labels Oct 6, 2017
@imotov imotov assigned imotov and unassigned tlrx Oct 6, 2017
@imotov imotov added the >bug label Oct 6, 2017
@imotov
Copy link
Contributor

imotov commented Oct 6, 2017

@jdoss access to snapshots would really help, but if this is not possible would you be able to try reproducing this issue with additional logging enabled and send us the logs files?

@jdoss
Copy link
Author

jdoss commented Oct 6, 2017

you're actually showing the result of the (successfully completed) snapshotting process, not the restore process (same mistake for showing the details).

@ywelsch I was following the documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_monitoring_snapshot_restore_progress which states to use the

curl -XGET 'localhost:9200/_snapshot/my_backup/snapshot_1?pretty'

and

curl -XGET 'localhost:9200/_snapshot/my_backup/snapshot_1/status?pretty'

Which is pretty confusing mashing the snapshot and recovery status documentation together. Re-reading the whole section I see I misunderstood things and I should have been using the indices recovery and cat recovery APIs.

I do wish it was easier to see what is going on with a restore and having the snapshot status documentation crammed together with the restore documentation is confusing. I wish there was a better method to see what is going on with a specific restore and a better method on stopping a restore. I have nuked snapshots from S3 misunderstanding that the DELETE method used for stopping a snapshot does not work on restores. It is good to know that you can just delete the index on the cluster to stop the restore.

It would be nice to be able to ping a restore API to see all this information and to stop a restore vs using the recovery APIs. I was looking for something hat showed a clear status of the recovery and confused the snapshot status endpoint as something that worked with the recovery of a snapshot. My bad.

@imotov email me at jdoss at kennasecurity.com and I will talk to my higher ups about getting you this snapshot.

@imotov
Copy link
Contributor

imotov commented Oct 9, 2017

@jdoss I think I might actually get by with just 2 files from your snapshot repository that contain no actual data (just a list of files that index consisted of at the time of the snapshot, their sizes and checksums). The files I am interested in are indices/logstash-2017.09.20/4/index-* (it might be also located in indices/crXjrjtwTEqkK6_ITG1HVQ/4/index-*) and snap-snapshot_0.dat or snap-e_wavyGfTD-SwXC-imkF0g.dat from the same directory as index-*. Could you send these two files to igor at elastic.co?

@jdoss
Copy link
Author

jdoss commented Oct 9, 2017

@imotov I have sent you the requested files.

@imotov
Copy link
Contributor

imotov commented Oct 25, 2017

I was finally able to see a reproduction of this issue with enough trace logging to figure out what's going on. It looks like in the case that I was able to observe, the FileAlreadyExists exception was the secondary issue on that was triggered by a previous failure (missing blob in the repository in the case that I was able to observe). If you still have the log files from this failure around, can you see if there are any exceptions for the same shard prior to the FileAlreadyExists.

@imotov
Copy link
Contributor

imotov commented Nov 3, 2017

@tlrx this is the issue we talked about earlier today.

tlrx added a commit to tlrx/elasticsearch that referenced this issue Nov 23, 2017
Pull request elastic#20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in elastic#20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the elastic#26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to elastic#26865
tlrx added a commit that referenced this issue Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit that referenced this issue Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit that referenced this issue Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit that referenced this issue Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit that referenced this issue Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit to tlrx/elasticsearch that referenced this issue Dec 7, 2017
When the allocation of a shard has been retried too many times, the
MaxRetryDecider is engaged to prevent any future allocation of the
failed shard. If it happens while restoring a snapshot, the restore
hangs and never completes because it stays around waiting for the
shards to be assigned. It also blocks future attempts to restore the
snapshot again.

This commit changes the current behavior in order to fail the restore if
a shard reached the maximum allocations attempts without being successfully
assigned.

This is the second part of the elastic#26865 issue.

closes elastic#26865
@tlrx tlrx closed this as completed in a1ed347 Dec 12, 2017
tlrx added a commit that referenced this issue Dec 22, 2017
…27493)

This commit changes the RestoreService so that it now fails the snapshot 
restore if one of the shards to restore has failed to be allocated. It also adds
a new RestoreInProgressAllocationDecider that forbids such shards to be 
allocated again. This way, when a restore is impossible or failed too many 
times, the user is forced to take a manual action (like deleting the index 
which failed shards) in order to try to restore it again.

This behaviour has been implemented because when the allocation of a 
shard has been retried too many times, the MaxRetryDecider is engaged 
to prevent any future allocation of the failed shard. If it happens while 
restoring a snapshot, the restore hanged and was never completed because 
it stayed around waiting for the shards to be assigned (and that won't happen).
It also blocked future attempts to restore the snapshot again. With this commit,
the restore does not hang and is marked as failed, leaving failed shards 
around for investigation.

This is the second part of the #26865 issue.

Closes #26865
tlrx added a commit that referenced this issue Dec 22, 2017
…27493)

This commit changes the RestoreService so that it now fails the snapshot 
restore if one of the shards to restore has failed to be allocated. It also adds
a new RestoreInProgressAllocationDecider that forbids such shards to be 
allocated again. This way, when a restore is impossible or failed too many 
times, the user is forced to take a manual action (like deleting the index 
which failed shards) in order to try to restore it again.

This behaviour has been implemented because when the allocation of a 
shard has been retried too many times, the MaxRetryDecider is engaged 
to prevent any future allocation of the failed shard. If it happens while 
restoring a snapshot, the restore hanged and was never completed because 
it stayed around waiting for the shards to be assigned (and that won't happen).
It also blocked future attempts to restore the snapshot again. With this commit,
the restore does not hang and is marked as failed, leaving failed shards 
around for investigation.

This is the second part of the #26865 issue.

Closes #26865
tlrx added a commit that referenced this issue Dec 22, 2017
…27493)

This commit changes the RestoreService so that it now fails the snapshot
restore if one of the shards to restore has failed to be allocated. It also adds
a new RestoreInProgressAllocationDecider that forbids such shards to be
allocated again. This way, when a restore is impossible or failed too many
times, the user is forced to take a manual action (like deleting the index
which failed shards) in order to try to restore it again.

This behaviour has been implemented because when the allocation of a
shard has been retried too many times, the MaxRetryDecider is engaged
to prevent any future allocation of the failed shard. If it happens while
restoring a snapshot, the restore hanged and was never completed because
it stayed around waiting for the shards to be assigned (and that won't happen).
It also blocked future attempts to restore the snapshot again. With this commit,
the restore does not hang and is marked as failed, leaving failed shards
around for investigation.

This is the second part of the #26865 issue.

Closes #26865
tlrx added a commit that referenced this issue Dec 22, 2017
…27493)

This commit changes the RestoreService so that it now fails the snapshot
restore if one of the shards to restore has failed to be allocated. It also adds
a new RestoreInProgressAllocationDecider that forbids such shards to be
allocated again. This way, when a restore is impossible or failed too many
times, the user is forced to take a manual action (like deleting the index
which failed shards) in order to try to restore it again.

This behaviour has been implemented because when the allocation of a
shard has been retried too many times, the MaxRetryDecider is engaged
to prevent any future allocation of the failed shard. If it happens while
restoring a snapshot, the restore hanged and was never completed because
it stayed around waiting for the shards to be assigned (and that won't happen).
It also blocked future attempts to restore the snapshot again. With this commit,
the restore does not hang and is marked as failed, leaving failed shards
around for investigation.

This is the second part of the #26865 issue.

Closes #26865
@skwokie
Copy link

skwokie commented Sep 16, 2018

Hi, I'd like to ask which version contains this fix. Thanks.

@danielmitterdorfer
Copy link
Member

Please see the version labels in the corresponding pull request #27493: 5.6.6 is the earliest version in the 5.x series that contains this fix.

@skwokie
Copy link

skwokie commented Sep 17, 2018

Thanks, @danielmitterdorfer. Appreciate it. Can I also ask if this affects the S3 destination only or the Shared FS as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

6 participants