Restoring a snapshot from S3 to 5.6.2 results in a hung and incomplete restore. #26865

jdoss · 2017-10-03T15:14:52Z

Elasticsearch version (bin/elasticsearch --version):

# rpm -qa |grep elasticsearch
elasticsearch-5.6.2-1.noarch

Plugins installed:

discovery-ec2
repository-s3
x-pack

JVM version (java -version):

# java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

OS version (uname -a if on a Unix-like system):

Fedora 26
Linux 4.12.14-300.fc26.x86_64 #1 SMP Wed Sep 20 16:28:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

We have had about twenty indexes that are stuck in a red state after trying to restore a snapshot taken from elasticsearch 5.4.1 to a brand new cluster running 5.6.2. For this issue, I will focus on one index logstash-2017.09.20.

You can see here that the index is in a red state:

# curl -XGET 'localhost:9200/_cluster/health/logstash-2017.09.20?level=shards&pretty'
{
  "cluster_name" : "redacted",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 11,
  "number_of_data_nodes" : 5,
  "active_primary_shards" : 4,
  "active_shards" : 4,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 98.60064585575888,
  "indices" : {
    "logstash-2017.09.20" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 0,
      "active_primary_shards" : 4,
      "active_shards" : 4,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 1,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "1" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "2" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "3" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "4" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        }
      }
    }
  }
}

You can see the restore says it finished with a SUCCESS:

# curl -XGET 'localhost:9200/_snapshot/my_cool_backup/snapshot_0?pretty'
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_0",
      "uuid" : "e_wavyGfTD-SwXC-imkF0g",
      "version_id" : 5040199,
      "version" : "5.4.1",
      "indices" : [
        ** SNIP **
      ],
      "state" : "SUCCESS",
      "start_time" : "2017-09-27T07:00:01.807Z",
      "start_time_in_millis" : 1506495601807,
      "end_time" : "2017-09-27T08:44:35.377Z",
      "end_time_in_millis" : 1506501875377,
      "duration_in_millis" : 6273570,
      "failures" : [ ],
      "shards" : {
        "total" : 929,
        "failed" : 0,
        "successful" : 929
      }
    }
  ]
}

Looking at the restore process in detail for the example index, you can see that it says this index has been put into the DONE state for each shard.

$ curl -XGET 'localhost:9200/_snapshot/my_cool_backup/snapshot_0/_status?pretty'
"snapshots" : [
    {
      "snapshot" : "snapshot_0",
      "repository" : "my_cool_backup",
      "uuid" : "e_wavyGfTD-SwXC-imkF0g",
      "state" : "SUCCESS",
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 929,
        "failed" : 0,
        "total" : 929
      },
      "stats" : {
        "number_of_files" : 2364,
        "processed_files" : 2364,
        "total_size_in_bytes" : 15393945691,
        "processed_size_in_bytes" : 15393945691,
        "start_time_in_millis" : 1506495618226,
        "time_in_millis" : 6252967
      },
      "indices" : {
        "logstash-2017.09.20" : {
                  "shards_stats" : {
                    "initializing" : 0,
                    "started" : 0,
                    "finalizing" : 0,
                    "done" : 5,
                    "failed" : 0,
                    "total" : 5
                  },
                  "stats" : {
                    "number_of_files" : 31,
                    "processed_files" : 31,
                    "total_size_in_bytes" : 168664,
                    "processed_size_in_bytes" : 168664,
                    "start_time_in_millis" : 1506495678150,
                    "time_in_millis" : 2401656
                  },
                  "shards" : {
                    "0" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 7,
                        "processed_files" : 7,
                        "total_size_in_bytes" : 118135,
                        "processed_size_in_bytes" : 118135,
                        "start_time_in_millis" : 1506495720316,
                        "time_in_millis" : 1949
                      }
                    },
                    "1" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 16,
                        "processed_files" : 16,
                        "total_size_in_bytes" : 33918,
                        "processed_size_in_bytes" : 33918,
                        "start_time_in_millis" : 1506495722992,
                        "time_in_millis" : 2804
                      }
                    },
                    "2" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 0,
                        "processed_files" : 0,
                        "total_size_in_bytes" : 0,
                        "processed_size_in_bytes" : 0,
                        "start_time_in_millis" : 1506498067865,
                        "time_in_millis" : 11941
                      }
                    },
                    "3" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 4,
                        "processed_files" : 4,
                        "total_size_in_bytes" : 8434,
                        "processed_size_in_bytes" : 8434,
                        "start_time_in_millis" : 1506495678150,
                        "time_in_millis" : 1206
                      }
                    },
                    "4" : {
                      "stage" : "DONE",
                      "stats" : {
                        "number_of_files" : 4,
                        "processed_files" : 4,
                        "total_size_in_bytes" : 8177,
                        "processed_size_in_bytes" : 8177,
                        "start_time_in_millis" : 1506495684287,
                        "time_in_millis" : 1164
                      }
                    }
                  }
                }

Looking at /_cat/recovery it says it's done too

# curl -XGET localhost:9200/_cat/recovery|grep logstash-2017.09.20

logstash-2017.09.20         0 7.9s  snapshot       done n/a n/a redacted data-03 my_cool_backup snapshot_0 1   1   100.0% 109 1699       1699       100.0% 2911728303 0 0 100.0%
logstash-2017.09.20         1 14.5m snapshot       done n/a n/a redacted  data-04 my_cool_backup snapshot_0 136 136 100.0% 136 2842065772 2842065772 100.0% 2842065772 0 0 100.0%
logstash-2017.09.20         2 1.7s  snapshot       done n/a n/a redacted data-00 my_cool_backup snapshot_0 1   1   100.0% 109 1699       1699       100.0% 2889504028 0 0 100.0%
logstash-2017.09.20         3 13.9m snapshot       done n/a n/a redacted data-02 my_cool_backup snapshot_0 127 127 100.0% 127 2929823683 2929823683 100.0% 2929823683 0 0 100.0%

But if you try to close the index it says that it is still being restored:

$ curl -XPOST 'localhost:9200/logstash-2017.09.20/_close?pretty'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "remote_transport_exception",
        "reason" : "[master-01][redacted:9300][indices:admin/close]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cannot close indices that are being restored: [[logstash-2017.09.20/crXjrjtwTEqkK6_ITG1HVQ]]"
  },
  "status" : 400
}

Looking in the logs it says that it failed to recover the index because the file already exists:

[2017-10-02T19:50:28,790][WARN ][o.e.c.a.s.ShardStateAction] [master-01] [logstash-2017.09.20][4] received shard failed for shard id [[logstash-2017.09.20][4]], allocation id [lW_4BSVGSc6phnI1vLEPWg], primary term [0], message [failed recovery], failure [RecoveryFailedException[[logstash-2017.09.20][4]: Recovery failed on {data-02}{Af43AKvBRf6r-PTr2s9KRg}{O1R6sKwAQK2FyYYmdFLjPA}{redacted}{redacted:9300}{aws_availability_zone=us-west-2c, ml.max_open_jobs=10, ml.enabled=true}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: IndexShardRestoreFailedException[failed to restore snapshot [snapshot_0/e_wavyGfTD-SwXC-imkF0g]]; nested: IndexShardRestoreFailedException[Failed to recover index]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si]; ]

[2017-10-02T19:50:28,790][WARN ][o.e.c.a.s.ShardStateAction] [master-01] [logstash-2017.09.20][4] received shard failed for shard id [[logstash-2017.09.20][4]
], allocation id [lW_4BSVGSc6phnI1vLEPWg], primary term [0], message [failed recovery], failure [RecoveryFailedException[[logstash-2017.09.20][4]: Recovery failed 
on {data-02}{Af43AKvBRf6r-PTr2s9KRg}{O1R6sKwAQK2FyYYmdFLjPA}{redacted}{redacted:9300}{aws_availability_zone=us-west-2c, ml.max_open_jobs=10, ml.enabled=
true}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: IndexShardRestoreFailedException[fa
iled to restore snapshot [snapshot_0/e_wavyGfTD-SwXC-imkF0g]]; nested: IndexShardRestoreFailedException[Failed to recover index]; nested: FileAlre
adyExistsException[/var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si]; ]
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2017.09.20][4]: Recovery failed on {data-02}{Af43AKvBRf6r-PTr2s9KRg}{O1R6sKwAQK2FyYYmdFL
jPA}{redacted}{redacted:9300}{aws_availability_zone=us-west-2c, ml.max_open_jobs=10, ml.enabled=true}
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1511) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.6.2.jar:5.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:299) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:405) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: failed to restore snapshot [snapshot_0/e_wavyGfTD-SwXC-imkF0g]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:993) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:400) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: Failed to recover index
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1679) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:991) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:400) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more
Caused by: java.nio.file.FileAlreadyExistsException: /var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[?:1.8.0_141]
        at java.nio.file.Files.newOutputStream(Files.java:216) ~[?:1.8.0_141]
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.RateLimitedFSDirectory.createOutput(RateLimitedFSDirectory.java:40) ~[elasticsearch-5.6.2.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:73) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.elasticsearch.index.store.Store.createVerifyingOutput(Store.java:463) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restoreFile(BlobStoreRepository.java:1734) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1676) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:991) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:400) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$4(StoreRecovery.java:234) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:232) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1243) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$2(IndexShard.java:1507) ~[elasticsearch-5.6.2.jar:5.6.2]
        ... 4 more

And if you look on for that file it says is already exists, it is not present on the data node:

# ll /var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si
ls: cannot access '/var/lib/elasticsearch/nodes/0/indices/crXjrjtwTEqkK6_ITG1HVQ/4/index/_22g.si': No such file or directory

The only way I have been able to get the cluster out of this hung state is to do a full cluster shutdown and start it back up again. From there I am able to close these red indexes and retry the restore again. When I first encountered this issue, I had ~20 indexes that failed to restore. After retrying to restore these failures with the process above, I was able to get all but seven of them restored. The remaining failures are in the same state.

The text was updated successfully, but these errors were encountered:

danielmitterdorfer · 2017-10-04T16:01:10Z

That sounds like two problems to me:

State handling during recovery seems to be inconsistent / not to agree
File system issues

Can you please tell which file system you've used? Also, as you are on EC2: Did you configure EBS volumes or instance storage on the nodes?

danielmitterdorfer · 2017-10-04T16:01:22Z

Also @imotov may have further ideas.

jdoss · 2017-10-04T16:13:48Z

These are on AWS I3 servers with NVMe SSD instance storage. We are using XFS with LUKS on these disks.

danielmitterdorfer · 2017-10-05T07:48:13Z

Thanks for the feedback. I also talked to @imotov. As this is about S3 snapshot could you please have a look @tlrx?

ywelsch · 2017-10-05T08:24:03Z

There is a bit of confusion between the snapshot and the restore APIs on this issue:

@jdoss When you say

You can see the restore says it finished with a SUCCESS:

you're actually showing the result of the (successfully completed) snapshotting process, not the restore process (same mistake for showing the details).

The /_cat/recovery output also is consistent with the cluster health. It shows that shards 0 to 3 have successfully recovered. Shard 4 (the one causing the cluster health to be red) is not reported as done.

From the output shown it is not clear that the restore process is stuck. Note that we don't allow an index that is being restored to be closed. However, you can delete this index, which will also abort the restore process (same as when you delete a snapshot that's in progress, it will abort the snapshot).

The bug you're hitting here is the FileAlreadyExistsException, which we've seen already on other reports:
https://discuss.elastic.co/t/snapshot-restore-failed-recovery-of-index-getting-filealreadyexistsexception/100300

Could you perhaps share the snapshot (privately) with us?

@danielmitterdorfer I have my doubts that this is S3 related.

imotov · 2017-10-06T15:26:43Z

@jdoss access to snapshots would really help, but if this is not possible would you be able to try reproducing this issue with additional logging enabled and send us the logs files?

jdoss · 2017-10-06T19:46:53Z

you're actually showing the result of the (successfully completed) snapshotting process, not the restore process (same mistake for showing the details).

@ywelsch I was following the documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_monitoring_snapshot_restore_progress which states to use the

curl -XGET 'localhost:9200/_snapshot/my_backup/snapshot_1?pretty'

and

curl -XGET 'localhost:9200/_snapshot/my_backup/snapshot_1/status?pretty'

Which is pretty confusing mashing the snapshot and recovery status documentation together. Re-reading the whole section I see I misunderstood things and I should have been using the indices recovery and cat recovery APIs.

I do wish it was easier to see what is going on with a restore and having the snapshot status documentation crammed together with the restore documentation is confusing. I wish there was a better method to see what is going on with a specific restore and a better method on stopping a restore. I have nuked snapshots from S3 misunderstanding that the DELETE method used for stopping a snapshot does not work on restores. It is good to know that you can just delete the index on the cluster to stop the restore.

It would be nice to be able to ping a restore API to see all this information and to stop a restore vs using the recovery APIs. I was looking for something hat showed a clear status of the recovery and confused the snapshot status endpoint as something that worked with the recovery of a snapshot. My bad.

@imotov email me at jdoss at kennasecurity.com and I will talk to my higher ups about getting you this snapshot.

imotov · 2017-10-09T14:09:08Z

@jdoss I think I might actually get by with just 2 files from your snapshot repository that contain no actual data (just a list of files that index consisted of at the time of the snapshot, their sizes and checksums). The files I am interested in are indices/logstash-2017.09.20/4/index-* (it might be also located in indices/crXjrjtwTEqkK6_ITG1HVQ/4/index-*) and snap-snapshot_0.dat or snap-e_wavyGfTD-SwXC-imkF0g.dat from the same directory as index-*. Could you send these two files to igor at elastic.co?

jdoss · 2017-10-09T15:54:17Z

@imotov I have sent you the requested files.

imotov · 2017-10-25T11:12:34Z

I was finally able to see a reproduction of this issue with enough trace logging to figure out what's going on. It looks like in the case that I was able to observe, the FileAlreadyExists exception was the secondary issue on that was triggered by a previous failure (missing blob in the repository in the case that I was able to observe). If you still have the log files from this failure around, can you see if there are any exceptions for the same shard prior to the FileAlreadyExists.

imotov · 2017-11-03T23:41:32Z

@tlrx this is the issue we talked about earlier today.

Pull request elastic#20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in elastic#20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the elastic#26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to elastic#26865

Pull request #20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in #20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the #26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to #26865

When the allocation of a shard has been retried too many times, the MaxRetryDecider is engaged to prevent any future allocation of the failed shard. If it happens while restoring a snapshot, the restore hangs and never completes because it stays around waiting for the shards to be assigned. It also blocks future attempts to restore the snapshot again. This commit changes the current behavior in order to fail the restore if a shard reached the maximum allocations attempts without being successfully assigned. This is the second part of the elastic#26865 issue. closes elastic#26865

…27493) This commit changes the RestoreService so that it now fails the snapshot restore if one of the shards to restore has failed to be allocated. It also adds a new RestoreInProgressAllocationDecider that forbids such shards to be allocated again. This way, when a restore is impossible or failed too many times, the user is forced to take a manual action (like deleting the index which failed shards) in order to try to restore it again. This behaviour has been implemented because when the allocation of a shard has been retried too many times, the MaxRetryDecider is engaged to prevent any future allocation of the failed shard. If it happens while restoring a snapshot, the restore hanged and was never completed because it stayed around waiting for the shards to be assigned (and that won't happen). It also blocked future attempts to restore the snapshot again. With this commit, the restore does not hang and is marked as failed, leaving failed shards around for investigation. This is the second part of the #26865 issue. Closes #26865

skwokie · 2018-09-16T05:24:52Z

Hi, I'd like to ask which version contains this fix. Thanks.

danielmitterdorfer · 2018-09-17T05:00:51Z

Please see the version labels in the corresponding pull request #27493: 5.6.6 is the earliest version in the 5.x series that contains this fix.

skwokie · 2018-09-17T22:11:16Z

Thanks, @danielmitterdorfer. Appreciate it. Can I also ask if this affects the S3 destination only or the Shared FS as well?

danielmitterdorfer added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs feedback_needed labels Oct 4, 2017

danielmitterdorfer added :Plugin Repository S3 and removed :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs feedback_needed labels Oct 5, 2017

danielmitterdorfer assigned tlrx Oct 5, 2017

danielmitterdorfer added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository S3 labels Oct 6, 2017

imotov assigned imotov and unassigned tlrx Oct 6, 2017

imotov added the >bug label Oct 6, 2017

imotov assigned tlrx and unassigned imotov Nov 3, 2017

tlrx mentioned this issue Nov 21, 2017

Delete shard store files before restoring a snapshot #27476

Merged

tlrx mentioned this issue Nov 22, 2017

Fail restore when the shard allocations max retries count is reached #27493

Merged

tlrx closed this as completed in a1ed347 Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring a snapshot from S3 to 5.6.2 results in a hung and incomplete restore. #26865

Restoring a snapshot from S3 to 5.6.2 results in a hung and incomplete restore. #26865

jdoss commented Oct 3, 2017

danielmitterdorfer commented Oct 4, 2017

danielmitterdorfer commented Oct 4, 2017

jdoss commented Oct 4, 2017

danielmitterdorfer commented Oct 5, 2017

ywelsch commented Oct 5, 2017

imotov commented Oct 6, 2017

jdoss commented Oct 6, 2017 •

edited

Loading

imotov commented Oct 9, 2017

jdoss commented Oct 9, 2017

imotov commented Oct 25, 2017

imotov commented Nov 3, 2017

skwokie commented Sep 16, 2018

danielmitterdorfer commented Sep 17, 2018

skwokie commented Sep 17, 2018

Restoring a snapshot from S3 to 5.6.2 results in a hung and incomplete restore. #26865

Restoring a snapshot from S3 to 5.6.2 results in a hung and incomplete restore. #26865

Comments

jdoss commented Oct 3, 2017

danielmitterdorfer commented Oct 4, 2017

danielmitterdorfer commented Oct 4, 2017

jdoss commented Oct 4, 2017

danielmitterdorfer commented Oct 5, 2017

ywelsch commented Oct 5, 2017

imotov commented Oct 6, 2017

jdoss commented Oct 6, 2017 • edited Loading

imotov commented Oct 9, 2017

jdoss commented Oct 9, 2017

imotov commented Oct 25, 2017

imotov commented Nov 3, 2017

skwokie commented Sep 16, 2018

danielmitterdorfer commented Sep 17, 2018

skwokie commented Sep 17, 2018

jdoss commented Oct 6, 2017 •

edited

Loading