Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose snapshot index through API for S3 #24288

Closed
ruflin opened this issue Apr 24, 2017 · 0 comments
Closed

Expose snapshot index through API for S3 #24288

ruflin opened this issue Apr 24, 2017 · 0 comments
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >feature

Comments

@ruflin
Copy link
Contributor

ruflin commented Apr 24, 2017

Elasticsearch allows to fetch snapshot information in two different ways. Either a specific snapshot of a repository or a pattern can be specified, or info for all snapshots can be retrieved:

GET /_snapshot/my_backup/_all

Having S3 as the snapshot location this means 1 request to get the repository index file from S3 and 1 request for each snapshot. This means each time _all is called N+1 request are made to S3.

It would be nice to have an API endpoint the exposes the data from the S3 repository index file with all the snapshot in it. This would allow to implement a logic on the client side for which snapshots the detailed stats should be fetched.

An example use case would be a snapshot that is created every 30 minutes with snapshot-name-timestamp and now on the client side I would like to fetch the stats for the most recent snapshot. If I don't know the exact timestamp without the above API I would have to do N+1 requests to S3 which can become expensive. Otherwise I could do 1 request for the repository index and 1, sort the snapshot list on the client side and fetch stats for the first entry.

@abeyad FYI

@abeyad abeyad self-assigned this Apr 24, 2017
abeyad pushed a commit to abeyad/elasticsearch that referenced this issue May 10, 2017
Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes elastic#24288
abeyad pushed a commit that referenced this issue May 10, 2017
…24477)

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288
abeyad pushed a commit that referenced this issue May 10, 2017
…24477)

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288
@clintongormley clintongormley added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository S3 labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >feature
Projects
None yet
Development

No branches or pull requests

3 participants