Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add RemoveCorruptedShardDataCommand #32281

Merged
merged 93 commits into from
Sep 19, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
843f977
drop `index.shard.check_on_startup: fix`
Jul 23, 2018
4f01609
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Jul 31, 2018
a8f1488
add RemoveCorruptedSegmentsCommand; merge elasticsearch-translog and …
Jul 23, 2018
5f6b084
fix test with ClusterAllocationExplanation
Aug 20, 2018
1fc72e9
fix test with ClusterAllocationExplanation
Aug 21, 2018
153e4f2
create corrupted marker on `check_on_startup: true`; split testIndexC…
Aug 21, 2018
2964fef
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 21, 2018
c71e306
create manually corruption marker (but don't corrupt index files) to …
Aug 21, 2018
a7668d6
checkstyle fix
Aug 21, 2018
6ee74a0
merge into ResolveShardCorruptionCommand
Aug 22, 2018
ee955b0
check is _state folder exist before reading state
Aug 22, 2018
918ce41
merge two commands into a single remove-corrupted-segments
Aug 24, 2018
97fa399
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 24, 2018
ebef6d2
Merge remote-tracking branch 'remotes/origin/fix/31389_1' into fix/31…
Aug 24, 2018
5cddefb
fixes after merge with remote-tracking branch 'remotes/origin/fix/313…
Aug 25, 2018
fd407bb
move corruptIndex to CorruptionUtils
Aug 25, 2018
4bc9c95
reworked resolveShardPath
Aug 25, 2018
b29aa9a
split testShardLock; testCorruptedBothIndexAndTranslog is added
Aug 26, 2018
9ceeaf4
simplified test
Aug 26, 2018
addb03f
test code cleanup
Aug 26, 2018
0f29f0f
test code cleanup
Aug 27, 2018
e6c6d70
checkstyle
Aug 27, 2018
c155b36
addressed unit test comments
Aug 27, 2018
85b7eef
keep `fix` for 6.x branch
Aug 27, 2018
7f292e3
drop unused class
Aug 27, 2018
43ae3a1
remove-corrupted-data subcommand instead of remove-corrupted-segments
Aug 27, 2018
087d558
remove-corrupted-data subcommand instead of remove-corrupted-segments…
Aug 27, 2018
ad819ec
dropped `index.shard.check_on_startup: fix` - it has to go with anoth…
Aug 27, 2018
75fcafa
amendment on a CLI tool name
Aug 27, 2018
cf6837f
a bit of clean up + show translog file names in sorted order instead …
Aug 27, 2018
260a5f4
keep node lock on shard shamanizing; fix allocate empty primary; inst…
Aug 27, 2018
3de84e2
fix node lock scope
Aug 28, 2018
073d29f
renamed to RemoveCorruptedShardDataCommand
Aug 28, 2018
03bbc5f
added test for multi-node layout for a single env
Aug 28, 2018
3231803
added `fix` deprecation log message + test
Aug 28, 2018
64c29db
dropped `dry-run`
Aug 28, 2018
d1805d6
keep elasticsearch-translog for 6.x
Aug 28, 2018
c2b5b8a
added `fix` deprecation log message + test
Aug 28, 2018
14e6175
adjusted `fix` deprecation log message
Aug 28, 2018
fee8a5b
dropped `fix` to avoid deprecation warnings
Aug 28, 2018
e1808d6
Merge remote-tracking branch 'remotes/origin/fix/31389_1' into fix/31…
Aug 28, 2018
5b5d516
set 755 to elasticsearch-shard, elasticsearch-translog
Aug 28, 2018
5cee2b9
skip files added by Lucene's ExtrasFS
Aug 28, 2018
b11670c
skip files added by Lucene's ExtrasFS
Aug 28, 2018
e38238a
skip files added by Lucene's ExtrasFS
Aug 28, 2018
ad62da0
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 28, 2018
6f6ca5a
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 29, 2018
6763cf9
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 29, 2018
7f1f6f3
Merge branch 'fix/31389_1' into fix/31389_2
Aug 29, 2018
5083e83
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 31, 2018
2a9dbeb
resolved conflicts on Merge remote-tracking branch 'remotes/origin/ma…
Aug 31, 2018
d165a6c
Merge branch 'fix/31389_1' into fix/31389_2
Aug 31, 2018
f985de4
resolve conflict after Merge branch 'fix/31389_1' into fix/31389_2
Aug 31, 2018
aa16487
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 31, 2018
f74c058
Merge remote-tracking branch 'remotes/origin/fix/31389_1' into fix/31…
Aug 31, 2018
28c6a5a
checkstyle
Aug 31, 2018
24bc3d4
added comment on the reason to keep index lock
Aug 31, 2018
2d2dd2b
dropped left-over
Aug 31, 2018
e196e9e
addressed documentation review comments (links, clean up)
Aug 31, 2018
4d89496
removed misleading comments
Aug 31, 2018
5bdb069
clean up; inlining of resolveShardPath; text adjustments
Aug 31, 2018
5349c72
extracted lock logic from NodeEnvironment ctor into NodeLock; reused …
Aug 31, 2018
4286800
reworked resolve shard path
Aug 31, 2018
01be5af
added Lucene.SOFT_DELETES_FIELD to IndexWriter
Aug 31, 2018
af64fd4
polish a bit NodeLock
Aug 31, 2018
d26fbfb
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_1
Aug 31, 2018
f8fd76a
Merge remote-tracking branch 'remotes/origin/fix/31389_1' into fix/31…
Aug 31, 2018
47fa3fa
Merge branch 'remote/origin/master' into fix/31389_2
Aug 31, 2018
3a4916a
checkstyle
Sep 1, 2018
9f3a7fb
dropped testCheckOnStartupDeprecatedValue due to wrong merge with master
Sep 1, 2018
abcff3c
fix NodeEnvironment.NodeLock
Sep 1, 2018
91dc295
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_2
Sep 4, 2018
33f3a45
improved message on delete marker
Sep 4, 2018
c796417
minor test code style change
Sep 5, 2018
4181988
fix test
Sep 5, 2018
a1593e8
fix test
Sep 5, 2018
185adc9
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_2
Sep 6, 2018
f5cf90a
move shard-tool doc next to other docs
Sep 6, 2018
8de0ae5
fix [float] Removing a corrupted data files header
Sep 6, 2018
5b29ad0
Merge remote-tracking branch 'remotes/origin/master' into fix/31716_2
Sep 10, 2018
8242bbb
after merge fixes
Sep 10, 2018
418c922
Tweaks to docs
DaveCTurner Sep 10, 2018
674d1ba
dropped unrelated checkIndexOnStartup = fix setting
Sep 10, 2018
53b404a
nodeEnv code style clean up
Sep 10, 2018
24ffdd1
do not expose node lock; code style adjustment; text comment adjustment
Sep 10, 2018
2a3f58d
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_2
Sep 10, 2018
e1eb32f
tiny doc amendment
Sep 13, 2018
ee1f6a2
NodeEnvironment.NodeLock can skip node path if it is required
Sep 13, 2018
1df4685
Merge remote-tracking branch 'remotes/origin/master' into fix/31389_2
Sep 13, 2018
844adaf
after merge fix
Sep 13, 2018
8210f3b
after merge fix
Sep 14, 2018
dab5125
inline nodeLock
Sep 18, 2018
54c4030
add javadoc comment for pathFunction
Sep 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions distribution/src/bin/elasticsearch-shard
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

ES_MAIN_CLASS=org.elasticsearch.index.shard.ShardToolCli \
"`dirname "$0"`"/elasticsearch-cli \
"$@"
12 changes: 12 additions & 0 deletions distribution/src/bin/elasticsearch-shard.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
@echo off

setlocal enabledelayedexpansion
setlocal enableextensions

set ES_MAIN_CLASS=org.elasticsearch.index.shard.ShardToolCli
call "%~dp0elasticsearch-cli.bat" ^
%%* ^
|| exit /b 1

endlocal
endlocal
2 changes: 2 additions & 0 deletions docs/reference/commands/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ tasks from the command line:
* <<migrate-tool>>
* <<saml-metadata>>
* <<setup-passwords>>
* <<shard-tool>>
* <<syskeygen>>
* <<users-command>>

Expand All @@ -22,5 +23,6 @@ include::certutil.asciidoc[]
include::migrate-tool.asciidoc[]
include::saml-metadata.asciidoc[]
include::setup-passwords.asciidoc[]
include::shard-tool.asciidoc[]
include::syskeygen.asciidoc[]
include::users-command.asciidoc[]
107 changes: 107 additions & 0 deletions docs/reference/commands/shard-tool.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
[[shard-tool]]
== elasticsearch-shard

In some cases the Lucene index or translog of a shard copy can become
corrupted. The `elasticsearch-shard` command enables you to remove corrupted
parts of the shard if a good copy of the shard cannot be recovered
automatically or restored from backup.

[WARNING]
You will lose the corrupted data when you run `elasticsearch-shard`. This tool
should only be used as a last resort if there is no way to recover from another
copy of the shard or restore a snapshot.

When Elasticsearch detects that a shard's data is corrupted, it fails that
shard copy and refuses to use it. Under normal conditions, the shard is
automatically recovered from another copy. If no good copy of the shard is
available and you cannot restore from backup, you can use `elasticsearch-shard`
to remove the corrupted data and restore access to any remaining data in
unaffected segments.

[WARNING]
Stop Elasticsearch before running `elasticsearch-shard`.

To remove corrupted shard data use the `remove-corrupted-data` subcommand.

There are two ways to specify the path:

* Specify the index name and shard name with the `--index` and `--shard-id`
options.
* Use the `--dir` option to specify the full path to the corrupted index or
translog files.

[float]
=== Removing corrupted data

`elasticsearch-shard` analyses the shard copy and provides an overview of the
corruption found. To proceed you must then confirm that you want to remove the
corrupted data.

[WARNING]
Back up your data before running `elasticsearch-shard`. This is a destructive
operation that removes corrupted data from the shard.

[source,txt]
--------------------------------------------------
$ bin/elasticsearch-shard remove-corrupted-data --index twitter --shard-id 0


WARNING: Elasticsearch MUST be stopped before running this tool.

Please make a complete backup of your index before using this tool.


Opening Lucene index at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/

>> Lucene index is corrupted at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/

Opening translog at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/


>> Translog is clean at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/


Corrupted Lucene index segments found - 32 documents will be lost.

WARNING: YOU WILL LOSE DATA.

Continue and remove docs from the index ? Y

WARNING: 1 broken segments (containing 32 documents) detected
Took 0.056 sec total.
Writing...
OK
Wrote new segments file "segments_c"
Marking index with the new history uuid : 0pIBd9VTSOeMfzYT6p0AsA
Changing allocation id V8QXk-QXSZinZMT-NvEq4w to tjm9Ve6uTBewVFAlfUMWjA

You should run the following command to allocate this shard:

POST /_cluster/reroute
{
"commands" : [
{
"allocate_stale_primary" : {
"index" : "index42",
"shard" : 0,
"node" : "II47uXW2QvqzHBnMcl2o_Q",
"accept_data_loss" : false
}
}
]
}

You must accept the possibility of data loss by changing parameter `accept_data_loss` to `true`.

Deleted corrupt marker corrupted_FzTSBSuxT7i3Tls_TgwEag from /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/

--------------------------------------------------

When you use `elasticsearch-shard` to drop the corrupted data, the shard's
allocation ID changes. After restarting the node, you must use the
<<cluster-reroute,cluster reroute API>> to tell Elasticsearch to use the new
ID. The `elasticsearch-shard` command shows the request that
you need to submit.

You can also use the `-h` option to get a list of all options and parameters
that the `elasticsearch-shard` tool supports.
4 changes: 4 additions & 0 deletions docs/reference/index-modules/translog.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ The maximum duration for which translog files will be kept. Defaults to `12h`.
[[corrupt-translog-truncation]]
=== What to do if the translog becomes corrupted?

[WARNING]
This tool is deprecated and will be completely removed in 7.0.
Use the <<shard-tool,elasticsearch-shard tool>> instead of this one.

In some cases (a bad drive, user error) the translog on a shard copy can become
corrupted. When this corruption is detected by Elasticsearch due to mismatching
checksums, Elasticsearch will fail that shard copy and refuse to use that copy
Expand Down
7 changes: 6 additions & 1 deletion libs/cli/src/main/java/org/elasticsearch/cli/Terminal.java
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,17 @@ public final void println(Verbosity verbosity, String msg) {

/** Prints message to the terminal at {@code verbosity} level, without a newline. */
public final void print(Verbosity verbosity, String msg) {
if (this.verbosity.ordinal() >= verbosity.ordinal()) {
if (isPrintable(verbosity)) {
getWriter().print(msg);
getWriter().flush();
}
}

/** Checks if is enough {@code verbosity} level to be printed */
public final boolean isPrintable(Verbosity verbosity) {
return this.verbosity.ordinal() >= verbosity.ordinal();
}

/**
* Prompt for a yes or no answer from the user. This method will loop until 'y' or 'n'
* (or the default empty value) is entered.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -325,4 +325,21 @@ public void test90SecurityCliPackaging() {
}
}

public void test100RepairIndexCliPackaging() {
assumeThat(installation, is(notNullValue()));

final Installation.Executables bin = installation.executables();
final Shell sh = new Shell();

Platforms.PlatformAction action = () -> {
final Result result = sh.run(bin.elasticsearchShard + " help");
assertThat(result.stdout, containsString("A CLI tool to remove corrupted parts of unrecoverable shards"));
};

if (distribution().equals(Distribution.DEFAULT_TAR) || distribution().equals(Distribution.DEFAULT_ZIP)) {
Platforms.onLinux(action);
Platforms.onWindows(action);
}
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ private static void verifyOssInstallation(Installation es, Distribution distribu
"elasticsearch-env",
"elasticsearch-keystore",
"elasticsearch-plugin",
"elasticsearch-shard",
"elasticsearch-translog"
).forEach(executable -> {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,9 @@ public class Executables {
public final Path elasticsearch = platformExecutable("elasticsearch");
public final Path elasticsearchPlugin = platformExecutable("elasticsearch-plugin");
public final Path elasticsearchKeystore = platformExecutable("elasticsearch-keystore");
public final Path elasticsearchTranslog = platformExecutable("elasticsearch-translog");
public final Path elasticsearchCertutil = platformExecutable("elasticsearch-certutil");
public final Path elasticsearchShard = platformExecutable("elasticsearch-shard");
public final Path elasticsearchTranslog = platformExecutable("elasticsearch-translog");

private Path platformExecutable(String name) {
final String platformExecutableName = Platforms.WINDOWS
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ private static void verifyOssInstallation(Installation es, Distribution distribu
"elasticsearch",
"elasticsearch-plugin",
"elasticsearch-keystore",
"elasticsearch-shard",
"elasticsearch-translog"
).forEach(executable -> assertThat(es.bin(executable), file(File, "root", "root", p755)));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ verify_package_installation() {
assert_file "$ESHOME/bin" d root root 755
assert_file "$ESHOME/bin/elasticsearch" f root root 755
assert_file "$ESHOME/bin/elasticsearch-plugin" f root root 755
assert_file "$ESHOME/bin/elasticsearch-shard" f root root 755
assert_file "$ESHOME/bin/elasticsearch-translog" f root root 755
assert_file "$ESHOME/lib" d root root 755
assert_file "$ESCONFIG" d root elasticsearch 2750
Expand Down
1 change: 1 addition & 0 deletions qa/vagrant/src/test/resources/packaging/utils/tar.bash
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ verify_archive_installation() {
assert_file "$ESHOME/bin/elasticsearch-env" f elasticsearch elasticsearch 755
assert_file "$ESHOME/bin/elasticsearch-keystore" f elasticsearch elasticsearch 755
assert_file "$ESHOME/bin/elasticsearch-plugin" f elasticsearch elasticsearch 755
assert_file "$ESHOME/bin/elasticsearch-shard" f elasticsearch elasticsearch 755
assert_file "$ESHOME/bin/elasticsearch-translog" f elasticsearch elasticsearch 755
assert_file "$ESCONFIG" d elasticsearch elasticsearch 755
assert_file "$ESCONFIG/elasticsearch.yml" f elasticsearch elasticsearch 660
Expand Down
Loading