Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhances get snapshots API to allow retrieving repository index only #24477

Merged
merged 5 commits into from
May 10, 2017

Conversation

abeyad
Copy link

@abeyad abeyad commented May 4, 2017

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc. In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (snap-{snapshot_uuid}.dat) for
every snapshot. In cloud-based repositories, this can be expensive,
both from a cost and performance perspective. Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot. This minimal information can be
retrieved from the repository index blob (index-N) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional verbose
parameter. If verbose is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288

@abeyad abeyad requested a review from imotov May 4, 2017 02:34
final List<SnapshotInfo> currentSnapshots) {
List<SnapshotInfo> snapshotInfos = new ArrayList<>();
for (SnapshotInfo snapshotInfo : currentSnapshots) {
if (toResolve.contains(snapshotInfo.snapshotId())) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea:

if (toResolve.remove(snapshotInfo.snapshotId())) {
        snapshotInfos.add(snapshotInfo.basic());
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @StefanSchmidtOz, i made the suggested change

@abeyad abeyad force-pushed the simple_get_snapshots branch from 8091d03 to e2a6633 Compare May 4, 2017 15:38
Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. One concern that I have is that the state is such an important part of the snapshot information that omitting state can lead to the situation when a user might thing that they have the latest snapshot for a particular index, while this snapshot actually failed. Could we add this information to the index file? What do you think?

Since we are modifying toXContent, I think it might be useful to add another get snapshot info test with verbose=false to rest tests. There is also some non-trivial bwc work going on here. So, I think having a bwc test might be helpful as well.

@abeyad
Copy link
Author

abeyad commented May 4, 2017

@imotov I looked at adding the state to the index blob and I believe this will be relatively straight forward, and I agree with you its only a little bit of added information to the index blob, but the benefits are great as state is the most important thing one would care about for a given snapshot. I will work on adding this and push commits, and will add the tests you suggested as well.

@abeyad abeyad force-pushed the simple_get_snapshots branch from e2a6633 to baa2055 Compare May 9, 2017 02:57
@abeyad
Copy link
Author

abeyad commented May 9, 2017

@imotov I pushed baa2055 that addresses your feedback and enables storing snapshot states in the index file

@imotov
Copy link
Contributor

imotov commented May 9, 2017

Looks good! Any chance we can add reading non-verbose listings to one of snapshot/restore bwc tests?

@abeyad
Copy link
Author

abeyad commented May 9, 2017

@imotov the different repository index formats are implicitly executed in the backwards-5.0 and rolling upgrade tests - since we have different versioned nodes, with some of them on the old format, sometimes we get nodes that read/write the old format and sometimes we get nodes that read/write the new format. In fact, running those tests helped me find a bug in reading the old format from a new node.

I can add a test which just writes the old x-content and reads it in the newer version, and vice versa (a unit test where I'd be copying/pasting the old code in the test to test old serialization/deserialization with the new).

@abeyad
Copy link
Author

abeyad commented May 9, 2017

retest this please

@abeyad
Copy link
Author

abeyad commented May 10, 2017

@imotov i added the old snapshot bwc tests in 9ca8c06e635d3e246c70a288e43a0525f37695cf

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@imotov
Copy link
Contributor

imotov commented May 10, 2017

This is going only to 6.0, right? Should we remove 5.5 label?

@abeyad
Copy link
Author

abeyad commented May 10, 2017

@imotov I thought we could backport it too 5.5 too, do you feel its too risky?

@abeyad
Copy link
Author

abeyad commented May 10, 2017

retest this please

@imotov
Copy link
Contributor

imotov commented May 10, 2017

@abeyad I don't think it's too risky. I just noticed this

VERBOSE_INTRODUCED = Version.V_6_0_0_alpha1_UNRELEASED;

but I guess you did it temporary to make sure BWC tests pass.

@abeyad
Copy link
Author

abeyad commented May 10, 2017

VERBOSE_INTRODUCED = Version.V_6_0_0_alpha1_UNRELEASED;

Yes exactly, once I backport the change to 5.5, I will change the constant as well on both 5.5 and master.

Ali Beyad added 5 commits May 10, 2017 11:49
Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes elastic#24288
@abeyad abeyad force-pushed the simple_get_snapshots branch from 9ca8c06 to c37d9b8 Compare May 10, 2017 17:11
@abeyad abeyad merged commit 743217a into elastic:master May 10, 2017
@abeyad abeyad deleted the simple_get_snapshots branch May 10, 2017 19:48
abeyad pushed a commit that referenced this pull request May 10, 2017
…24477)

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288
@abeyad
Copy link
Author

abeyad commented May 10, 2017

5.x commit: 8fc3f1c

@abeyad
Copy link
Author

abeyad commented May 10, 2017

thanks for the review @imotov

abeyad pushed a commit to abeyad/elasticsearch that referenced this pull request Jun 28, 2017
In elastic#24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.
abeyad pushed a commit that referenced this pull request Jun 28, 2017
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.
abeyad pushed a commit that referenced this pull request Jun 28, 2017
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.
abeyad pushed a commit that referenced this pull request Jun 28, 2017
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Apr 27, 2018
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been used by a pre-5.5 cluster
and a post-5.5 cluster. The way snapshots are formatted in the
repository snapshots index file changed in elastic#24477 and it can cause
a NPE if a very specific set of operations is written in the repository.
tlrx added a commit that referenced this pull request Apr 27, 2018
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been written to by an older 
Elasticsearch after writing to it with a newer Elasticsearch version.

This is because the way snapshots are formatted in the repository 
snapshots index file changed in #24477.

This commit changes the parsing of the repository index file so that 
it now detects a corrupted index file and fails early the snapshot 
operation.

closes #29052
tlrx added a commit that referenced this pull request Apr 27, 2018
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been written to by an older
Elasticsearch after writing to it with a newer Elasticsearch version.

This is because the way snapshots are formatted in the repository
snapshots index file changed in #24477.

This commit changes the parsing of the repository index file so that
it now detects a corrupted index file and fails early the snapshot
operation.

closes #29052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants