Enhances get snapshots API to allow retrieving repository index only #24477

abeyad · 2017-05-04T02:34:18Z

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc. In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (snap-{snapshot_uuid}.dat) for
every snapshot. In cloud-based repositories, this can be expensive,
both from a cost and performance perspective. Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot. This minimal information can be
retrieved from the repository index blob (index-N) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional verbose
parameter. If verbose is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288

StefanSchmidtOz · 2017-05-04T10:06:46Z

...n/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java

+                                                        final List<SnapshotInfo> currentSnapshots) {
+        List<SnapshotInfo> snapshotInfos = new ArrayList<>();
+        for (SnapshotInfo snapshotInfo : currentSnapshots) {
+            if (toResolve.contains(snapshotInfo.snapshotId())) {


Idea:

if (toResolve.remove(snapshotInfo.snapshotId())) { snapshotInfos.add(snapshotInfo.basic()); }

thank you @StefanSchmidtOz, i made the suggested change

imotov

Code looks good. One concern that I have is that the state is such an important part of the snapshot information that omitting state can lead to the situation when a user might thing that they have the latest snapshot for a particular index, while this snapshot actually failed. Could we add this information to the index file? What do you think?

Since we are modifying toXContent, I think it might be useful to add another get snapshot info test with verbose=false to rest tests. There is also some non-trivial bwc work going on here. So, I think having a bwc test might be helpful as well.

abeyad · 2017-05-04T17:40:46Z

@imotov I looked at adding the state to the index blob and I believe this will be relatively straight forward, and I agree with you its only a little bit of added information to the index blob, but the benefits are great as state is the most important thing one would care about for a given snapshot. I will work on adding this and push commits, and will add the tests you suggested as well.

abeyad · 2017-05-09T03:06:25Z

@imotov I pushed baa2055 that addresses your feedback and enables storing snapshot states in the index file

imotov · 2017-05-09T19:43:27Z

Looks good! Any chance we can add reading non-verbose listings to one of snapshot/restore bwc tests?

abeyad · 2017-05-09T19:59:29Z

@imotov the different repository index formats are implicitly executed in the backwards-5.0 and rolling upgrade tests - since we have different versioned nodes, with some of them on the old format, sometimes we get nodes that read/write the old format and sometimes we get nodes that read/write the new format. In fact, running those tests helped me find a bug in reading the old format from a new node.

I can add a test which just writes the old x-content and reads it in the newer version, and vice versa (a unit test where I'd be copying/pasting the old code in the test to test old serialization/deserialization with the new).

abeyad · 2017-05-09T20:12:02Z

retest this please

abeyad · 2017-05-10T02:31:54Z

@imotov i added the old snapshot bwc tests in 9ca8c06e635d3e246c70a288e43a0525f37695cf

imotov

Thanks! LGTM

imotov · 2017-05-10T11:24:01Z

This is going only to 6.0, right? Should we remove 5.5 label?

abeyad · 2017-05-10T14:59:19Z

@imotov I thought we could backport it too 5.5 too, do you feel its too risky?

abeyad · 2017-05-10T15:01:42Z

retest this please

imotov · 2017-05-10T15:11:40Z

@abeyad I don't think it's too risky. I just noticed this

VERBOSE_INTRODUCED = Version.V_6_0_0_alpha1_UNRELEASED;

but I guess you did it temporary to make sure BWC tests pass.

abeyad · 2017-05-10T15:21:04Z

VERBOSE_INTRODUCED = Version.V_6_0_0_alpha1_UNRELEASED;

Yes exactly, once I backport the change to 5.5, I will change the constant as well on both 5.5 and master.

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all) provides information about snapshots in the repository, including the snapshot state, number of shards snapshotted, failures, etc. In order to provide information about each snapshot in the repository, the call must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for every snapshot. In cloud-based repositories, this can be expensive, both from a cost and performance perspective. Sometimes, all the user wants is to retrieve all the names/uuids of each snapshot, and the indices that went into each snapshot, without any of the other status information about the snapshot. This minimal information can be retrieved from the repository index blob (`index-N`) without needing to read each snapshot metadata blob. This commit enhances the get snapshots API with an optional `verbose` parameter. If `verbose` is set to false on the request, then the get snapshots API will only retrieve the minimal information about each snapshot (the name, uuid, and indices in the snapshot), and only read this information from the repository index blob, thereby giving users the option to retrieve the snapshots in a repository in a more cost-effective and efficient manner. Closes elastic#24288

…24477) Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all) provides information about snapshots in the repository, including the snapshot state, number of shards snapshotted, failures, etc. In order to provide information about each snapshot in the repository, the call must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for every snapshot. In cloud-based repositories, this can be expensive, both from a cost and performance perspective. Sometimes, all the user wants is to retrieve all the names/uuids of each snapshot, and the indices that went into each snapshot, without any of the other status information about the snapshot. This minimal information can be retrieved from the repository index blob (`index-N`) without needing to read each snapshot metadata blob. This commit enhances the get snapshots API with an optional `verbose` parameter. If `verbose` is set to false on the request, then the get snapshots API will only retrieve the minimal information about each snapshot (the name, uuid, and indices in the snapshot), and only read this information from the repository index blob, thereby giving users the option to retrieve the snapshots in a repository in a more cost-effective and efficient manner. Closes #24288

abeyad · 2017-05-10T21:30:23Z

5.x commit: 8fc3f1c

abeyad · 2017-05-10T21:30:29Z

thanks for the review @imotov

In elastic#24477, a less verbose option was added to retrieve snapshot info via GET /_snapshot/{repo}/{snapshots}. The point of adding this less verbose option was so that if the repository is a cloud based one, and there are many snapshots for which the snapshot info needed to be retrieved, then each snapshot would require reading a separate snapshot metadata file to pull out the necessary information. This can be costly (performance and cost) on cloud based repositories, so a less verbose option was added that only retrieves very basic information about each snapshot that is all available in the index-N blob - requiring only one read! In order to display this less verbose snapshot info appropriately, logic was added to not display those fields which could not be populated. However, this broke integrators (e.g. ECE) that required these fields to be present, even if empty. This commit is to return these fields in the response, even if empty, if the verbose option is set.

In #24477, a less verbose option was added to retrieve snapshot info via GET /_snapshot/{repo}/{snapshots}. The point of adding this less verbose option was so that if the repository is a cloud based one, and there are many snapshots for which the snapshot info needed to be retrieved, then each snapshot would require reading a separate snapshot metadata file to pull out the necessary information. This can be costly (performance and cost) on cloud based repositories, so a less verbose option was added that only retrieves very basic information about each snapshot that is all available in the index-N blob - requiring only one read! In order to display this less verbose snapshot info appropriately, logic was added to not display those fields which could not be populated. However, this broke integrators (e.g. ECE) that required these fields to be present, even if empty. This commit is to return these fields in the response, even if empty, if the verbose option is set.

A NullPointerException is thrown when trying to create or delete a snapshot in a repository that has been used by a pre-5.5 cluster and a post-5.5 cluster. The way snapshots are formatted in the repository snapshots index file changed in elastic#24477 and it can cause a NPE if a very specific set of operations is written in the repository.

A NullPointerException is thrown when trying to create or delete a snapshot in a repository that has been written to by an older Elasticsearch after writing to it with a newer Elasticsearch version. This is because the way snapshots are formatted in the repository snapshots index file changed in #24477. This commit changes the parsing of the repository index file so that it now detects a corrupted index file and fails early the snapshot operation. closes #29052

abeyad added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement review v5.5.0 v6.0.0 labels May 4, 2017

abeyad requested a review from imotov May 4, 2017 02:34

StefanSchmidtOz reviewed May 4, 2017

View reviewed changes

abeyad force-pushed the simple_get_snapshots branch from 8091d03 to e2a6633 Compare May 4, 2017 15:38

imotov reviewed May 4, 2017

View reviewed changes

abeyad force-pushed the simple_get_snapshots branch from e2a6633 to baa2055 Compare May 9, 2017 02:57

imotov approved these changes May 10, 2017

View reviewed changes

Ali Beyad added 5 commits May 10, 2017 11:49

adds state to repo data

f99d4c4

fix test

41bb55d

add bwc test

9e29820

simpler remove logic

c37d9b8

abeyad force-pushed the simple_get_snapshots branch from 9ca8c06 to c37d9b8 Compare May 10, 2017 17:11

abeyad merged commit 743217a into elastic:master May 10, 2017

abeyad deleted the simple_get_snapshots branch May 10, 2017 19:48

untergeek mentioned this pull request May 15, 2017

[FUTURE] Changes to get snapshots API elastic/curator#965

Open

clintongormley added v6.0.0-alpha2 and removed v6.0.0 labels Jun 6, 2017

abeyad mentioned this pull request Jun 28, 2017

Output all empty snapshot info fields if in verbose mode #25455

Merged

tlrx mentioned this pull request Apr 25, 2018

Fail snapshot operations early on repository corruption #30140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhances get snapshots API to allow retrieving repository index only #24477

Enhances get snapshots API to allow retrieving repository index only #24477

abeyad commented May 4, 2017

StefanSchmidtOz May 4, 2017

abeyad May 10, 2017

imotov left a comment

abeyad commented May 4, 2017

abeyad commented May 9, 2017 •

edited

Loading

imotov commented May 9, 2017

abeyad commented May 9, 2017

abeyad commented May 9, 2017

abeyad commented May 10, 2017

imotov left a comment

imotov commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 10, 2017

imotov commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 10, 2017

Enhances get snapshots API to allow retrieving repository index only #24477

Enhances get snapshots API to allow retrieving repository index only #24477

Conversation

abeyad commented May 4, 2017

StefanSchmidtOz May 4, 2017

Choose a reason for hiding this comment

abeyad May 10, 2017

Choose a reason for hiding this comment

imotov left a comment

Choose a reason for hiding this comment

abeyad commented May 4, 2017

abeyad commented May 9, 2017 • edited Loading

imotov commented May 9, 2017

abeyad commented May 9, 2017

abeyad commented May 9, 2017

abeyad commented May 10, 2017

imotov left a comment

Choose a reason for hiding this comment

imotov commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 10, 2017

imotov commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 10, 2017

abeyad commented May 9, 2017 •

edited

Loading