Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy snapshot repository initialization #31606

Merged
merged 21 commits into from
Jul 13, 2018

Conversation

vladimirdolzhenko
Copy link
Contributor

Instead of eager initialization of underlying blob store in ctor this PR move actual initialization of underlying blob store to later stage.

It makes possible to register e.g. s3repo with bucket that is not available at the moment.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the tests but I looked at the main components, it looks great. I think this is the right way to go.

I know we already discussed this but I don't like the fact that settings are sometimes verified in the repository constructor and sometimes in the blobstore creation method. I think we need to check all settings at the same place if they don't require any external access (I think the ctor is the right place for this but Yannick might have a different opinion).

public abstract class BlobStoreRepository extends AbstractLifecycleComponent implements Repository {

private BlobContainer snapshotsBlobContainer;
public abstract class BlobStoreRepository<BS extends BlobStore> extends AbstractLifecycleComponent implements Repository {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a generic here? It looks to me that it's useful for innerBlobStore() / blobContainer() methods used in tests but otherwise it's not really required.

@@ -225,6 +224,12 @@

private final ChecksumBlobStoreFormat<BlobStoreIndexShardSnapshots> indexShardSnapshotsFormat;

private final Object lock = new Object();

private volatile BlobContainer snapshotsBlobContainer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name this blobContainer too?

blobStore().close();
} catch (Exception t) {
logger.warn("cannot close blob store", t);
BlobStore store = blobStore;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to be in a synchronized block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason for sync here as blobStore is volatile. It could be reasonable to add opened/closed flag to avoid reinit after repo is closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlobStoreRepository is a lifecycle component - no any need for open/close flag there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifecycle component is not concurrency-aware, and not all methods that create the blobstore etc. are guarded by "if-not-closed" checks. We need to guard against creating a blobstore / container after the closing has started

}

// for test purposes only
protected BS innerBlobStore() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need this in test, you can still call it getBlobStore()

@vladimirdolzhenko
Copy link
Contributor Author

thanks @tlrx for your comments - make sense

will move checks (those don't require extra storage call) to ctor

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments. Overall approach makes sense to me.


@Override
protected BlobStore createBlobStore() {
return new URLBlobStore(settings, url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously this was using the normalizedURL? Why not anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why is this not breaking any tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lack of test, adding it

basePath = BlobPath.cleanPath();
url = URL_SETTING.exists(metadata.settings())
? URL_SETTING.get(metadata.settings()) : REPOSITORIES_URL_SETTING.get(settings);
checkURL(url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be done as part of createBlobStore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

new NamedXContentRegistry(Collections.emptyList()));

assertThat("blob store has to be lazy initialized", repository.getBlobStore(), is(nullValue()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be testing the lazy initialization of BlobStoreRepository, not URLRepository? Do we need to expose the methods in URLRepository and test this at this level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLRepositoryTests doesn't extend ESBlobStoreRepositoryIntegTestCase (as FsBlobStoreRepositoryIT, S3BlobStoreRepositoryTests and GoogleCloudStorageBlobStoreRepositoryTests) - it looks like more due to read-only nature of url repo - therefore such kind of check is done on this level

if (Repository.READONLY_SETTING.exists(metadata.settings())) {
this.readonly = Repository.READONLY_SETTING.get(metadata.settings());
} else {
this.readonly = blobStore.getLocationMode() == LocationMode.SECONDARY_ONLY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can all be done in the constructor. The location mode is determined through a setting, just as the read-only setting. There's no need to create the BlobStore to determine the location mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


GoogleCloudStorageRepository(RepositoryMetaData metadata, Environment environment,
NamedXContentRegistry namedXContentRegistry,
GoogleCloudStorageService storageService) throws Exception {
GoogleCloudStorageService storageService){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing

*/
protected abstract BlobStore blobStore();
protected BlobStore blobStore() {
BlobStore store = blobStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add an assertion that this method is only called from the snapshot thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be done on generic thread e.g. from IndexShard.restoreFromRepository chain:

	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.verificationThreadCheck(BlobStoreRepository.java:608)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.blobContainer(BlobStoreRepository.java:301)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.listBlobsToGetLatestIndexId(BlobStoreRepository.java:809)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:787)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:650)
	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:454)
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278)
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301)
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276)
	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1547)
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2015)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then assert that it's either snapshot or generic threadpool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you address this comment, i.e. make sure that the blobStore is only accessed through the snapshot / generic thread.

* maintains single lazy instance of {@link BlobContainer}
*/
protected BlobContainer blobContainer() {
BlobContainer blobContainer = this.blobContainer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add an assertion that this method is only called from the snapshot thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this comment please?

try {
if (isReadOnly()) {
// It's readonly - so there is not much we can do here to verify it
// It's readonly - so there is not much we can do here to verify it apart try to create blobStore()
blobStore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only verifies the repo on the master, not the other nodes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean that comment has to be - try the best - try to create blobStore on master ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to extend verification so that it also works on read-only repos (and make it work in BWC-compatible fashion). Also, this should probably call blobStore().blobContainer(basePath()).
We can do that in a follow-up, but for now add a comment that this currently only checks that the repository is accessible on the master. // TODO: add repository verification for read-only repositories

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me

blobStore().close();
} catch (Exception t) {
logger.warn("cannot close blob store", t);
BlobStore store = blobStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifecycle component is not concurrency-aware, and not all methods that create the blobstore etc. are guarded by "if-not-closed" checks. We need to guard against creating a blobstore / container after the closing has started

createTestRepository(name, verify);

final RepositoriesService repositoriesService =
internalCluster().getDataOrMasterNodeInstances(RepositoriesService.class).iterator().next();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check all of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

…meters are moved to ctors; using SetOnce and lifecycle-aware lazy creation
@vladimirdolzhenko
Copy link
Contributor Author

Thanks @ywelsch for review - I did relevant changes - one comment is still under question (to me) - is to add assertion on snapshot thread (pls check comment)

@vladimirdolzhenko
Copy link
Contributor Author

test this please

1 similar comment
@vladimirdolzhenko
Copy link
Contributor Author

test this please


// only use for testing
@Override
protected BlobStore getBlobStore() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, it is in use in AzureRepositorySettingsTests

synchronized (lock) {
store = blobStore.get();
if (store == null) {
if (lifecycle.started() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not concurrency-safe. Closing can be happening concurrently to this.

*/
protected abstract BlobStore blobStore();
protected BlobStore blobStore() {
BlobStore store = blobStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then assert that it's either snapshot or generic threadpool

if (lifecycle.started() == false) {
throw new RepositoryException(metadata.name(), "repository is not in started state");
}
verificationThreadCheck();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check this on every access, not only creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see your point - you'd like to enforce any access to blobStore rather just initialization; there are some non-integration test failures - gonna fix it soon

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one more comment. No need for another iteration though. LGTM

@@ -543,11 +613,21 @@ public long getRestoreThrottleTimeInNanos() {
return restoreRateLimitingTimeInNanos.count();
}

protected void verificationThreadCheck() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not only checking the verification thread here, but all operations that are accessing the blobstore. Can you rename this method?

…d couple comments for assertSnapshotOrGenericThread and the reason of FsLikeRepoPlugin in BlobStoreRepositoryTests.java
@vladimirdolzhenko vladimirdolzhenko merged commit b1bf643 into elastic:master Jul 13, 2018
@vladimirdolzhenko vladimirdolzhenko deleted the lazy_repo_init branch July 13, 2018 18:07
@vladimirdolzhenko
Copy link
Contributor Author

thanks @ywelsch for the review

vladimirdolzhenko pushed a commit that referenced this pull request Jul 13, 2018
dnhatn added a commit that referenced this pull request Jul 13, 2018
* 6.x:
  Watcher: Make settings reloadable (#31746)
  [Rollup] Histo group config should support scaled_floats (#32048)
  lazy snapshot repository initialization (#31606)
  Add secure setting for watcher email password (#31620)
  Watcher: cleanup ensureWatchExists use (#31926)
  Add second level of field collapsing (#31808)
  Added lenient flag for synonym token filter (#31484) (#31970)
  Test: Fix a second case of bad watch creation
  [Rollup] Use composite's missing_bucket (#31402)
  Docs: Restyled cloud link in getting started
  Docs: Change formatting of Cloud options
  Re-instate link in StringFunctionUtils javadocs
  Correct spelling of AnalysisPlugin#requriesAnalysisSettings (#32025)
  Fix problematic chars in javadoc
  [ML] Move open job failure explanation out of root cause (#31925)
  [ML] Switch ML native QA tests to use a 3 node cluster (#32011)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants