-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add RemoveCorruptedShardDataCommand #32281
Changes from 29 commits
843f977
4f01609
a8f1488
5f6b084
1fc72e9
153e4f2
2964fef
c71e306
a7668d6
6ee74a0
ee955b0
918ce41
97fa399
ebef6d2
5cddefb
fd407bb
4bc9c95
b29aa9a
9ceeaf4
addb03f
0f29f0f
e6c6d70
c155b36
85b7eef
7f292e3
43ae3a1
087d558
ad819ec
75fcafa
cf6837f
260a5f4
3de84e2
073d29f
03bbc5f
3231803
64c29db
d1805d6
c2b5b8a
14e6175
fee8a5b
e1808d6
5b5d516
5cee2b9
b11670c
e38238a
ad62da0
6f6ca5a
6763cf9
7f1f6f3
5083e83
2a9dbeb
d165a6c
f985de4
aa16487
f74c058
28c6a5a
24bc3d4
2d2dd2b
e196e9e
4d89496
5bdb069
5349c72
4286800
01be5af
af64fd4
d26fbfb
f8fd76a
47fa3fa
3a4916a
9f3a7fb
abcff3c
91dc295
33f3a45
c796417
4181988
a1593e8
185adc9
f5cf90a
8de0ae5
5b29ad0
8242bbb
418c922
674d1ba
53b404a
24ffdd1
2a3f58d
e1eb32f
ee1f6a2
1df4685
844adaf
8210f3b
dab5125
54c4030
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#!/bin/bash | ||
|
||
ES_MAIN_CLASS=org.elasticsearch.index.shard.ShardToolCli \ | ||
"`dirname "$0"`"/elasticsearch-cli \ | ||
"$@" |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,12 +63,6 @@ corruption is detected, it will prevent the shard from being opened. Accepts: | |
Check for both physical and logical corruption. This is much more | ||
expensive in terms of CPU and memory usage. | ||
|
||
`fix`:: | ||
|
||
Check for both physical and logical corruption. Segments that were reported | ||
as corrupted will be automatically removed. This option *may result in data loss*. | ||
Use with extreme caution! | ||
|
||
WARNING: Expert only. Checking shards may take a lot of time on large indices. | ||
-- | ||
|
||
|
@@ -279,6 +273,17 @@ Other index settings are available in index modules: | |
|
||
Control over the transaction log and background flush operations. | ||
|
||
<<index-modules-command-line-tools,Command-line tools>>:: | ||
|
||
Command-line tools if shard is corrupted | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is the single elasticsearch-shard tool, I'd refer to it as a single tool. This should be a floated level 3 heading instead of an item in the settings list. I'd go with: [float] You can use the <<index-modules-elasticsearch-shard,elasticsearch-shard>> recovery tool to remove corrupted translog or corrupted Lucene segments if a shard cannot be recovered automatically or restored from backup. |
||
[float] | ||
[[elasticsearch-shard-tool]] | ||
=== elasticsearch-shard tool | ||
|
||
You can use the <<index-modules-elasticsearch-shard,shard-tool>> recovery tool to remove corrupted translog or corrupted Lucene | ||
vladimirdolzhenko marked this conversation as resolved.
Show resolved
Hide resolved
|
||
segments if a shard cannot be recovered automatically or restored from backup. | ||
|
||
-- | ||
|
||
include::index-modules/analysis.asciidoc[] | ||
|
@@ -297,4 +302,6 @@ include::index-modules/store.asciidoc[] | |
|
||
include::index-modules/translog.asciidoc[] | ||
|
||
include::index-modules/shard-tool.asciidoc[] | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lcawl Instead of including this in the index-modules section, do we want to just go ahead and rename the X-Pack commands section "Command line tools" and include it there? It can still be linked to from the index-modules page, but it would be great if we could move toward having a single command reference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that would be great! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've created #33005 for that purpose. |
||
include::index-modules/index-sorting.asciidoc[] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
[[shard-tool]] | ||
== elasticsearch-shard | ||
|
||
In some cases (a bad drive, user error) the Lucene index or translog on a shard copy can become corrupted. | ||
|
||
The `elasticsearch-shard` command enables you to remove a corrupted Lucene index segments or corrupted translog | ||
if a shard cannot be recovered automatically or restored from backup. | ||
|
||
[WARNING] | ||
You will lose the corrupted data when you run `elasticsearch-shard`. | ||
This tool should only be used as a last resort if there is no way to recover from another copy of the shard | ||
or restore a snapshot. | ||
|
||
When Elasticsearch detects that a shard's translog or Lucene index is corrupted, it fails that shard copy | ||
and quits using it. Under normal conditions, the shard is automatically recovered from another copy. | ||
If no good copy of the shard is available and you cannot restore from backup, you can use `elasticsearch-shard` | ||
to remove only the corrupted data and restore access to the data in unaffected segments. | ||
|
||
[WARNING] | ||
Stop Elasticsearch before running `elasticsearch-shard`. | ||
|
||
To remove corrupted Lucene index segments and/or corrupted translog files, use the `remove-corrupted-data` subcommand. | ||
|
||
There are two ways to specify the path: | ||
|
||
* Specify the index name and shard name with the `--index` and `--shard-id` options. | ||
* Use the `-dir` option to specify the full path to the corrupted index or translog files. | ||
|
||
=== Listing corrupted Lucene index files and/or corrupted translog files | ||
|
||
You can get an overview of the corruption with `--dry-run` option : | ||
|
||
[source,txt] | ||
-------------------------------------------------- | ||
$ bin/elasticsearch-shard remove-corrupted-data --dry-run --index twitter --shard-id 0 | ||
|
||
|
||
WARNING: ElasticSearch MUST be stopped before running this tool. | ||
|
||
Please make a complete backup of your index before using this tool. | ||
|
||
|
||
Opening Lucene index at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/ | ||
|
||
>> Lucene index is corrupted at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/ | ||
|
||
Opening translog at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/ | ||
|
||
>> Translog is clean at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/ | ||
|
||
|
||
Corrupted Lucene index segments found - 32 documents will be lost. | ||
|
||
|
||
-------------------------------------------------- | ||
|
||
=== Removing a corrupted Lucene index files and/or corrupted translog files | ||
|
||
You must confirm that you want to remove the corrupted segments when run `elasticsearch-shard` without `--dry-run`: | ||
|
||
[WARNING] | ||
Back up your data before running `elasticsearch-shard`. This is a destructive operation that removes corrupted data from the shard. | ||
|
||
[source,txt] | ||
-------------------------------------------------- | ||
$ bin/elasticsearch-shard remove-corrupted-data --index twitter --shard-id 0 | ||
|
||
|
||
WARNING: ElasticSearch MUST be stopped before running this tool. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no capital 'S' in Elasticsearch please 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||
|
||
Please make a complete backup of your index before using this tool. | ||
|
||
|
||
Opening Lucene index at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/ | ||
|
||
>> Lucene index is corrupted at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/ | ||
|
||
Opening translog at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/ | ||
|
||
|
||
>> Translog is clean at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/ | ||
|
||
|
||
Corrupted Lucene index segments found - 32 documents will be lost. | ||
|
||
WARNING: YOU WILL LOSE DATA. | ||
|
||
Continue and remove docs from the index ? Y | ||
|
||
WARNING: 1 broken segments (containing 32 documents) detected | ||
Took 0.056 sec total. | ||
Writing... | ||
OK | ||
Wrote new segments file "segments_c" | ||
Marking index with the new history uuid : 0pIBd9VTSOeMfzYT6p0AsA | ||
Changing allocation id V8QXk-QXSZinZMT-NvEq4w to tjm9Ve6uTBewVFAlfUMWjA | ||
You should run follow command to apply allocation id changes: | ||
|
||
POST /_cluster/reroute | ||
{ | ||
"commands" : [ | ||
{ | ||
"allocate_stale_primary" : { | ||
"index" : "index42", | ||
"shard" : 0, | ||
"node" : "II47uXW2QvqzHBnMcl2o_Q", | ||
"accept_data_loss" : false | ||
} | ||
} | ||
] | ||
} | ||
|
||
Deleted corrupt marker corrupted_FzTSBSuxT7i3Tls_TgwEag | ||
|
||
-------------------------------------------------- | ||
|
||
|
||
When you use `elasticsearch-shard` to drop the corrupted data, the shard's allocation ID changes. | ||
After you restart the node, you must use the cluster reroute API to tell Elasticsearch to use the new ID. | ||
When you run the `elasticsearch-shard` command, it shows the request that you need to submit. | ||
|
||
[WARNING] | ||
You should admin data loss changing parameter `accept_data_loss` to `true`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this warning should be included in the output of the tool and consequently doesn't need to be here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||
|
||
You can also use the `-h` option to get a list of all options and parameters | ||
that the `elasticsearch-shard` tool supports. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,59 +88,4 @@ file based sync. Defaults to `512mb` | |
The maximum duration for which translog files will be kept. Defaults to `12h`. | ||
|
||
|
||
[float] | ||
[[corrupt-translog-truncation]] | ||
=== What to do if the translog becomes corrupted? | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid having to set up redirects & help steer people to the new tool, I'd keep this section heading and just xref the new one. |
||
In some cases (a bad drive, user error) the translog on a shard copy can become | ||
corrupted. When this corruption is detected by Elasticsearch due to mismatching | ||
checksums, Elasticsearch will fail that shard copy and refuse to use that copy | ||
of the data. If there are other copies of the shard available then | ||
Elasticsearch will automatically recover from one of them using the normal | ||
shard allocation and recovery mechanism. In particular, if the corrupt shard | ||
copy was the primary when the corruption was detected then one of its replicas | ||
will be promoted in its place. | ||
|
||
If there is no copy of the data from which Elasticsearch can recover | ||
successfully, a user may want to recover the data that is part of the shard at | ||
the cost of losing the data that is currently contained in the translog. We | ||
provide a command-line tool for this, `elasticsearch-translog`. | ||
|
||
[WARNING] | ||
The `elasticsearch-translog` tool should *not* be run while Elasticsearch is | ||
running. If you attempt to run this tool while Elasticsearch is running, you | ||
will permanently lose the documents that were contained only in the translog! | ||
|
||
In order to run the `elasticsearch-translog` tool, specify the `truncate` | ||
subcommand as well as the directory for the corrupted translog with the `-d` | ||
option: | ||
|
||
[source,txt] | ||
-------------------------------------------------- | ||
$ bin/elasticsearch-translog truncate -d /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/ | ||
Checking existing translog files | ||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | ||
! WARNING: Elasticsearch MUST be stopped before running this tool ! | ||
! ! | ||
! WARNING: Documents inside of translog files will be lost ! | ||
! ! | ||
! WARNING: The following files will be DELETED! ! | ||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | ||
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-41.ckp | ||
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-6.ckp | ||
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-37.ckp | ||
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-24.ckp | ||
--> data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-11.ckp | ||
|
||
Continue and DELETE files? [y/N] y | ||
Reading translog UUID information from Lucene commit from shard at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index] | ||
Translog Generation: 3 | ||
Translog UUID : AxqC4rocTC6e0fwsljAh-Q | ||
Removing existing translog files | ||
Creating new empty checkpoint at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog.ckp] | ||
Creating new empty translog at [data/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/translog-3.tlog] | ||
Done. | ||
-------------------------------------------------- | ||
|
||
You can also use the `-h` option to get a list of all options and parameters | ||
that the `elasticsearch-translog` tool supports. | ||
[float] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of the
elasticsearch-translog
tool is a breaking change, so this cannot happen in 6.5. At the moment this PR is only tagged for 7.0, which is ok, but it cannot be backported as-is.