Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BlobStoreRepository Aware of ClusterState #49639

Merged

Conversation

original-brownbear
Copy link
Member

This is a preliminary to #49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service
to the blob store, associated test changes and a best effort approach to tracking the latest repository
generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest
repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060

This is a preliminary to elastic#49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service
to the blob store, associated test changes and a best effort approach to tracking the latest repository
generation on all nodes from cluster state updates. This brings a slight improvement to the consistency
by which non-master nodes (or master directly after a failover) will be able to determine the latest
repository generation. It does not however do any tricky checks for the situation after a repository operation
(create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change
simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess"
for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the
repository operates in elastic#49060
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Member Author

@ywelsch @tlrx I know that the practical improvement to resiliency from this change is pretty small (see PR description ... we're always picking up the best known state before an operation), but I think it should take a lot of the pain out of reviewing #49060 by containing almost all non-functional changes in that PR. Let me know what you think :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Thanks!

// Inspects all cluster state elements that contain a hint about what the current repository generation is and updates
// #latestKnownRepoGen if a newer than currently known generation is found
@Override
public void applyClusterState(ClusterChangedEvent event) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if RepositoriesService should update the relevant repository about changes to their snapshots. RepositoriesService is already a cluster state listener, which means that we don't need an additional lifecycle here.

This would also be closer to how we inform shards about updates (see IndexShard.updateShardState)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that initially but then dropped the idea because that would've meant leaking the specifics of BlobStoreRepository into RepositoriesService.
But come to think of it now ... doing this would also save a bit of tricky of code in #49060 to get the initialization of the repo on non-master nodes right.
Let's do it -> will do :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if RepositoriesService should update the relevant repository about changes to their snapshots.

I agree, that seems to be the right thing to do. I'm also not super happy to have BlobStoreRepository have its own lifecycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I warm your heart for cfb104f maybe? :)
I think in the case of the repositories it's better to pass the whole ClusterState to the repo instead of figuring out what parts go to what repo in RepositoriesService. Otherwise we needlessly waste cycles on the repos that don't require the cluster state (e.g. read-only repos, ccr, ...). Also, in #49060 the parts of the CS that need to be inspected will change, but we may want to keep the current logic from here as BwC fallback which is much easier if we just pass the full ClusterState down.
Also, I realized that I actually had to move the cluster state application method to Repository because of FilterRepository (and upcoming encrypted repo wrapper etc.) so this didn't leak any blob store repo specifics to RepositoriesService after all :)

final SnapshotsInProgress snapshotsInProgress = state.custom(SnapshotsInProgress.TYPE);
if (snapshotsInProgress != null) {
final SnapshotsInProgress.Entry entry = snapshotsInProgress.entries().stream()
.filter(e -> e.snapshot().getRepository().equals(repoName)).findFirst()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why findFirst? Let's take the max of all ongoing snapshots for this repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, currently there's just one entry here at all times. I don't think that will change before this logic gets replaced by the more inolved logic in #49060 and I'm not sure if and when we move to parallel snapshot taking, that those will in fact have different repository generations set (in my prototype for parallel ops they wouldn't have at least ...). Could just assert that the count of snapshots in progress is always 1 here? :)

}

final SnapshotDeletionsInProgress deletionsInProgress = state.custom(SnapshotDeletionsInProgress.TYPE);
if (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN && deletionsInProgress != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this extra condition (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN), it's an optimization which does not mattter I think and could hurt us later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually that's not true I think. We have the sitaution where if you abort a snapshot, the delete entry will be created with currentRepoGeneration + 1 since that's what the repo will be at when the delete actually runs. That's why I added this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this condition. Why are we not just taking the max of all the entries that we see? Also, I suppose that we don't allow concurrent operations right now, so we assume that we have only one of these 3 metadata for the current repo?

Copy link
Member Author

@original-brownbear original-brownbear Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the one special case of aborting a snapshot where you can have an in progress snapshot and a delete. The delete will contain the future repository generation for after the snapshot finished -> we can't use that. That's why I added this condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix that situation? And can you add a comment to that effect? The condition as it stands here today is very unintuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix that situation?

Yes, once we have #49060 we can clean this up when implementing concurrent operations on the repo. That will require the whole business of associating an operation with a strict repo generation to go away anyway.

=> Added a comment for now :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good already, I'm waiting for Yannick's feedback to be addressed before LGTM :)

// Inspects all cluster state elements that contain a hint about what the current repository generation is and updates
// #latestKnownRepoGen if a newer than currently known generation is found
@Override
public void applyClusterState(ClusterChangedEvent event) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if RepositoriesService should update the relevant repository about changes to their snapshots.

I agree, that seems to be the right thing to do. I'm also not super happy to have BlobStoreRepository have its own lifecycle.

@@ -411,14 +415,21 @@ private Repository createRepository(RepositoryMetaData repositoryMetaData, Map<S
throw new RepositoryException(repositoryMetaData.name(),
"repository type [" + repositoryMetaData.type() + "] does not exist");
}
boolean success = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strickly necessary with the changes just now, but maybe fine to keep it here since we discovered it here and I think we should def. cleanup on a failed start out of principle?

@@ -345,6 +346,9 @@ public void applyClusterState(ClusterChangedEvent event) {
}
}
repositories = Collections.unmodifiableMap(builder);
for (Repository repo : repositories.values()) {
repo.updateState(state);
Copy link
Member

@tlrx tlrx Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to catch a potential RepositoryException here, log it and continue to update the others repositories.

I misread the code, sorry... But it maybe still worth it, just in case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, looking at the code above we aren't defensive around closeRepository either (though that might be called in a loop as well) but catch (which has a TODO about it) on createRepository.
Maybe we should rather wrap this in a

try {
    repo.updateState(state);
} catch (Exception e) {
   assert false;
   throw e;
}

and do the same for closeRepository? Could do it in a follow up and remove that todo while we're at it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main contention to just catch and log here would be that once the state update is important for the proper functioning of blob store repositories, then what would an uncaught exception even mean? (imo it would mean that the repo can't be used for writes any longer ... but that would be something the repo would have to set in its internal state when handling exceptions)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger issue I think is to get rid of the condition:

 if ((oldMetaData == null && newMetaData == null) || (oldMetaData != null && oldMetaData.equals(newMetaData))) {

which currently compares the current with the previous cluster state. If anything went wrong applying the previous cluster state, we will not update the repo again.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Armin. I left a small comment where I think we should be defensive just in case an exception is thrown while updating the repositories states.

@original-brownbear
Copy link
Member Author

@tlrx as just discussed fixed the exposing of repositories before update in c257b0e (thanks for catching this!)

@tlrx
Copy link
Member

tlrx commented Nov 28, 2019

@tlrx as just discussed fixed the exposing of repositories before update in c257b0e (thanks for catching this!)

Thanks Armin! As discussed I think we should have a look at the other issues in the applyClusterState() method (like maybe only create + update repositories before closing the existing ones) and that could be done in a follow up PR.

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/1 (unrelated transport test failure)

@@ -345,6 +346,9 @@ public void applyClusterState(ClusterChangedEvent event) {
}
}
repositories = Collections.unmodifiableMap(builder);
for (Repository repo : repositories.values()) {
repo.updateState(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger issue I think is to get rid of the condition:

 if ((oldMetaData == null && newMetaData == null) || (oldMetaData != null && oldMetaData.equals(newMetaData))) {

which currently compares the current with the previous cluster state. If anything went wrong applying the previous cluster state, we will not update the repo again.

}

final SnapshotDeletionsInProgress deletionsInProgress = state.custom(SnapshotDeletionsInProgress.TYPE);
if (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN && deletionsInProgress != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this condition. Why are we not just taking the max of all the entries that we see? Also, I suppose that we don't allow concurrent operations right now, so we assume that we have only one of these 3 metadata for the current repo?

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/bwc (seems like a messed up version got build ...)
Jenkins run elasticsearch-ci/default-distro

@original-brownbear
Copy link
Member Author

original-brownbear commented Nov 28, 2019

Somethings messed up with the BwC tests here (some version constant test), not related to my changes.
Opened #49696 for test failures

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks Yannick + Tanguy!

@original-brownbear original-brownbear merged commit 459d8ed into elastic:master Nov 29, 2019
@original-brownbear original-brownbear deleted the repo-uses-cs-light branch November 29, 2019 09:14
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 29, 2019
This is a preliminary to elastic#49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency
by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation
(create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the
repository operates in elastic#49060
original-brownbear added a commit that referenced this pull request Nov 29, 2019
* Make BlobStoreRepository Aware of ClusterState (#49639)

This is a preliminary to #49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency
by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation
(create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the
repository operates in #49060
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 29, 2019
The current forwarding of the state is disabled if the repo metadata
did not change, so it's somewhat useless.
Relates elastic#49639
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
This is a preliminary to elastic#49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency
by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation
(create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the
repository operates in elastic#49060
@mkleen mkleen mentioned this pull request Jun 1, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants