Introduce Lucene-based metadata persistence #48733

DaveCTurner · 2019-10-31T08:01:06Z

This commit introduces LucenePersistedState which master-eligible nodes
can use to persist the cluster metadata in a Lucene index rather than in
many separate files.

Relates #48701

This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates elastic#48701

elasticmachine · 2019-10-31T08:01:08Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

DaveCTurner · 2019-10-31T08:35:44Z

CI failure is unrelated (reproduces on master, see #48735).

@elasticmachine please run elasticsearch-ci/1

DaveCTurner · 2019-10-31T09:10:55Z

CI failure looks like slowness (doesn't obviously reproduce or seem obviously related). Let's try again.

@elasticmachine please run elasticsearch-ci/2

…o shenanigans

…ersisted-state

DaveCTurner · 2019-11-05T14:40:08Z

@ywelsch as discussed, 85a7424 adds a consistency check to ensure that the freshest state also has the freshest current term instead of the previous lenience.

original-brownbear

Gave it a quick look and left some suggestions on the efficiency of things. Also, can you update the feature branch so it's at the same master height as this one and we get a neat diff for a full review? :)

server/src/main/java/org/elasticsearch/gateway/LucenePersistedStateFactory.java

DaveCTurner · 2019-11-12T12:17:11Z

CI failure looks to be a connection timeout in an unrelated test.

@elasticmachine please run elasticsearch-ci/2

ywelsch

Looks very good. I left one main comment around versioning of the metadata folder (which I think is not needed). I did not find time to go through the tests yet but am happy to defer that to the other reviewers.

server/src/main/java/org/elasticsearch/env/NodeEnvironment.java

server/src/main/java/org/elasticsearch/gateway/LucenePersistedStateFactory.java

ywelsch · 2019-11-12T15:24:13Z

server/src/main/java/org/elasticsearch/gateway/LucenePersistedStateFactory.java

+
+    public static String getMetaDataIndexDirectoryName(int majorVersion) {
+        // include the version in the directory name to create a completely new index when upgrading to the next major version.
+        return "_metadata_v" + majorVersion;


I think we can reuse the same directory, which will simplify things. To upgrade, we can just create a new IndexWriter with OpenMode.CREATE and use that one to write out the upgraded content, and finally commit. The IndexWriter will carry over the generation numbers, and version/counter, but otherwise create a new vanilla index (with new lucene created version). The benefit of this approach is that it solves atomic upgrades on the folder.

Update: This is actually how the implementation already works on "open", so it looks like we can just get rid of the versioned folders?

server/src/main/java/org/elasticsearch/gateway/LucenePersistedStateFactory.java

server/src/test/java/org/elasticsearch/gateway/GatewayMetaStatePersistedStateTests.java

DaveCTurner

Addressed some comments

DaveCTurner · 2019-11-12T18:44:54Z

Failure looks like #48951 again. @elasticmachine please run elasticsearch-ci/2

ywelsch

LGTM

ywelsch · 2019-11-13T08:50:24Z

server/src/main/java/org/elasticsearch/env/NodeEnvironment.java

+                    MetaDataStateFormat.STATE_DIR_NAME,
+
+                    // Lucene-based metadata folder
+                    LucenePersistedStateFactory.METADATA_DIRECTORY_NAME,


Do we need a separate metadata folder? Given that we're not addressing BWC in this PR, should we use the current _state folder to keep things as close as possible to what we have today? Should this metadata folder become a subfolder of _state at some point? Should it replace the full content of the state folder (i.e. incl. node id?)

I don't really see much difference either way, but I personally prefer the conceptual separation here. On dedicated master nodes we could indeed drop the node metadata file in due course. We avoid sharing a single folder between a Lucene index and some MetaDataStateFormat-based files in other places but we could break that pattern here if needed. I think repurposing will be a little simpler with separate folders, if only because there's no need to implement a new delete operation on MetaDataStateFormat when you can just wipe out the whole directory.

server/src/test/java/org/elasticsearch/gateway/LucenePersistedStateFactoryTests.java

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co>

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (elastic#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates elastic#48701 * Remove per-index metadata without assigned shards (elastic#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates elastic#48701 * Use Lucene exclusively for metadata storage (elastic#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates elastic#48701 * Add command-line tool support for Lucene-based metadata storage (elastic#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates elastic#48701 * Use single directory for metadata (elastic#50639) Earlier PRs for elastic#48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (elastic#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates elastic#48701 * Fold node metadata into new node storage (elastic#50741) Moves node metadata to uses the new storage mechanism (see elastic#48701) as the authoritative source. * Write CS asynchronously on data-only nodes (elastic#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates elastic#48701 * Remove persistent cluster settings tool (elastic#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates elastic#48701 * Make cluster state writer resilient to disk issues (elastic#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to elastic#48701 * Omit writing global metadata if no change (elastic#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates elastic#48701 * DanglingIndicesIT should ensure node removed first (elastic#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co>

* Move metadata storage to Lucene (#50907) Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co> * fix tests Co-authored-by: David Turner <david.turner@elastic.co>

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (elastic#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates elastic#48701 * Remove per-index metadata without assigned shards (elastic#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates elastic#48701 * Use Lucene exclusively for metadata storage (elastic#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates elastic#48701 * Add command-line tool support for Lucene-based metadata storage (elastic#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates elastic#48701 * Use single directory for metadata (elastic#50639) Earlier PRs for elastic#48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (elastic#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates elastic#48701 * Fold node metadata into new node storage (elastic#50741) Moves node metadata to uses the new storage mechanism (see elastic#48701) as the authoritative source. * Write CS asynchronously on data-only nodes (elastic#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates elastic#48701 * Remove persistent cluster settings tool (elastic#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates elastic#48701 * Make cluster state writer resilient to disk issues (elastic#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to elastic#48701 * Omit writing global metadata if no change (elastic#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates elastic#48701 * DanglingIndicesIT should ensure node removed first (elastic#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co>

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

Introduce Lucene-based metadata persistence

17ec60e

This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates elastic#48701

DaveCTurner added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Oct 31, 2019

DaveCTurner requested review from jpountz, andrershov and ywelsch October 31, 2019 08:01

DaveCTurner mentioned this pull request Oct 31, 2019

Reduce number of writes needed for metadata updates #48701

Closed

14 tasks

DaveCTurner added 2 commits October 31, 2019 08:08

Revert whitespace

669d3b2

Use IOUtils.close

9ac94cb

DaveCTurner added 7 commits November 1, 2019 10:27

Support persistent storage in coordinator tests as long as there is n…

7d65358

…o shenanigans

Proper awaits fixes

dea94dd

Fix up GatewayIndexStateIT

b431a1d

Precommit

6526b82

Merge branch 'reduce-metadata-writes-master' into 2019-10-31-lucene-p…

ce19ee0

…ersisted-state

Merge branch 'master' into 2019-10-31-lucene-persisted-state

79fc46b

Require freshest state to have the freshest current term

85a7424

DaveCTurner added 4 commits November 5, 2019 21:15

Merge branch 'master' into 2019-10-31-lucene-persisted-state

2a736d5

Fix up test for previous commit

39b0358

Check that updating the state also works

fa22424

Use stored fields instead of docvalues

fc80816

original-brownbear reviewed Nov 7, 2019

View reviewed changes

jpountz reviewed Nov 8, 2019

View reviewed changes

DaveCTurner added 4 commits November 8, 2019 09:25

SerialMergeScheduler

9fbf557

No query cache

aa02a22

Reduce scope

3d34c6d

Fix doc ID handling

1027c8c

ywelsch reviewed Nov 12, 2019

View reviewed changes

DaveCTurner added 4 commits November 12, 2019 17:10

No need for a versioned folder (yet?)

492b3a7

Record paths in ISE messages

d2e4b70

Reject duplicate/missing bits even if assertions disabled

773b9f1

Right size for hashmap

3578297

DaveCTurner commented Nov 12, 2019

View reviewed changes

DaveCTurner added 2 commits November 12, 2019 18:09

Use SimpleFSDirectory again

e6e4d32

Use updateDocument instead of delete/add

7fa29f9

DaveCTurner requested a review from ywelsch November 12, 2019 18:25

ywelsch approved these changes Nov 13, 2019

View reviewed changes

super

21e154f

DaveCTurner merged commit 2a08dd1 into elastic:reduce-metadata-writes-master Nov 13, 2019

DaveCTurner deleted the 2019-10-31-lucene-persisted-state branch November 13, 2019 12:36

ywelsch mentioned this pull request Jan 13, 2020

Move metadata storage to Lucene #50928

Merged

mkleen added a commit to crate/crate that referenced this pull request Apr 14, 2021

bp: Introduce Lucene-based metadata persistence

881e671

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 14, 2021

bp: Introduce Lucene-based metadata persistence

b979d68

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen mentioned this pull request Apr 14, 2021

Move metadata storage to Lucene crate/crate#11270

Merged

5 tasks

mkleen added a commit to crate/crate that referenced this pull request Apr 14, 2021

bp: Introduce Lucene-based metadata persistence

a65504b

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 14, 2021

bp: Introduce Lucene-based metadata persistence

cc96d50

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 14, 2021

bp: Introduce Lucene-based metadata persistence

c038956

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 15, 2021

bp: Introduce Lucene-based metadata persistence

b52c7a6

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 16, 2021

bp: Introduce Lucene-based metadata persistence

5623080

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 16, 2021

bp: Introduce Lucene-based metadata persistence

6308d0c

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

mkleen added a commit to crate/crate that referenced this pull request Apr 26, 2021

bp: Introduce Lucene-based metadata persistence

7fbc9d8

elastic/elasticsearch@2a08dd1 elastic/elasticsearch#48733

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Lucene-based metadata persistence #48733

Introduce Lucene-based metadata persistence #48733

DaveCTurner commented Oct 31, 2019

elasticmachine commented Oct 31, 2019

DaveCTurner commented Oct 31, 2019

DaveCTurner commented Oct 31, 2019

DaveCTurner commented Nov 5, 2019

original-brownbear left a comment

DaveCTurner commented Nov 12, 2019

ywelsch left a comment

ywelsch Nov 12, 2019

DaveCTurner left a comment

DaveCTurner commented Nov 12, 2019

ywelsch left a comment

ywelsch Nov 13, 2019

DaveCTurner Nov 13, 2019

Introduce Lucene-based metadata persistence #48733

Introduce Lucene-based metadata persistence #48733

Conversation

DaveCTurner commented Oct 31, 2019

elasticmachine commented Oct 31, 2019

DaveCTurner commented Oct 31, 2019

DaveCTurner commented Oct 31, 2019

DaveCTurner commented Nov 5, 2019

original-brownbear left a comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 12, 2019

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Nov 12, 2019

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 12, 2019

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Nov 13, 2019

Choose a reason for hiding this comment

DaveCTurner Nov 13, 2019

Choose a reason for hiding this comment