-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazy snapshot repository initialization #31606
lazy snapshot repository initialization #31606
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the tests but I looked at the main components, it looks great. I think this is the right way to go.
I know we already discussed this but I don't like the fact that settings are sometimes verified in the repository constructor and sometimes in the blobstore creation method. I think we need to check all settings at the same place if they don't require any external access (I think the ctor is the right place for this but Yannick might have a different opinion).
public abstract class BlobStoreRepository extends AbstractLifecycleComponent implements Repository { | ||
|
||
private BlobContainer snapshotsBlobContainer; | ||
public abstract class BlobStoreRepository<BS extends BlobStore> extends AbstractLifecycleComponent implements Repository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a generic here? It looks to me that it's useful for innerBlobStore() / blobContainer() methods used in tests but otherwise it's not really required.
@@ -225,6 +224,12 @@ | |||
|
|||
private final ChecksumBlobStoreFormat<BlobStoreIndexShardSnapshots> indexShardSnapshotsFormat; | |||
|
|||
private final Object lock = new Object(); | |||
|
|||
private volatile BlobContainer snapshotsBlobContainer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's name this blobContainer
too?
blobStore().close(); | ||
} catch (Exception t) { | ||
logger.warn("cannot close blob store", t); | ||
BlobStore store = blobStore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this to be in a synchronized block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason for sync
here as blobStore
is volatile. It could be reasonable to add opened
/closed
flag to avoid reinit after repo is closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlobStoreRepository
is a lifecycle component - no any need for open
/close
flag there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifecycle component is not concurrency-aware, and not all methods that create the blobstore etc. are guarded by "if-not-closed" checks. We need to guard against creating a blobstore / container after the closing has started
} | ||
|
||
// for test purposes only | ||
protected BS innerBlobStore() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need this in test, you can still call it getBlobStore()
thanks @tlrx for your comments - make sense will move checks (those don't require extra storage call) to ctor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments. Overall approach makes sense to me.
|
||
@Override | ||
protected BlobStore createBlobStore() { | ||
return new URLBlobStore(settings, url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously this was using the normalizedURL? Why not anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why is this not breaking any tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lack of test, adding it
basePath = BlobPath.cleanPath(); | ||
url = URL_SETTING.exists(metadata.settings()) | ||
? URL_SETTING.get(metadata.settings()) : REPOSITORIES_URL_SETTING.get(settings); | ||
checkURL(url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be done as part of createBlobStore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
new NamedXContentRegistry(Collections.emptyList())); | ||
|
||
assertThat("blob store has to be lazy initialized", repository.getBlobStore(), is(nullValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be testing the lazy initialization of BlobStoreRepository, not URLRepository? Do we need to expose the methods in URLRepository and test this at this level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URLRepositoryTests
doesn't extend ESBlobStoreRepositoryIntegTestCase
(as FsBlobStoreRepositoryIT
, S3BlobStoreRepositoryTests
and GoogleCloudStorageBlobStoreRepositoryTests
) - it looks like more due to read-only nature of url repo - therefore such kind of check is done on this level
if (Repository.READONLY_SETTING.exists(metadata.settings())) { | ||
this.readonly = Repository.READONLY_SETTING.get(metadata.settings()); | ||
} else { | ||
this.readonly = blobStore.getLocationMode() == LocationMode.SECONDARY_ONLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can all be done in the constructor. The location mode is determined through a setting, just as the read-only setting. There's no need to create the BlobStore to determine the location mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
|
||
GoogleCloudStorageRepository(RepositoryMetaData metadata, Environment environment, | ||
NamedXContentRegistry namedXContentRegistry, | ||
GoogleCloudStorageService storageService) throws Exception { | ||
GoogleCloudStorageService storageService){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing
*/ | ||
protected abstract BlobStore blobStore(); | ||
protected BlobStore blobStore() { | ||
BlobStore store = blobStore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add an assertion that this method is only called from the snapshot thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be done on generic
thread e.g. from IndexShard.restoreFromRepository
chain:
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.verificationThreadCheck(BlobStoreRepository.java:608)
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.blobContainer(BlobStoreRepository.java:301)
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.listBlobsToGetLatestIndexId(BlobStoreRepository.java:809)
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:787)
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:650)
at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:454)
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278)
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301)
at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276)
at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1547)
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2015)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then assert that it's either snapshot or generic threadpool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you address this comment, i.e. make sure that the blobStore is only accessed through the snapshot / generic thread.
* maintains single lazy instance of {@link BlobContainer} | ||
*/ | ||
protected BlobContainer blobContainer() { | ||
BlobContainer blobContainer = this.blobContainer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add an assertion that this method is only called from the snapshot thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you address this comment please?
try { | ||
if (isReadOnly()) { | ||
// It's readonly - so there is not much we can do here to verify it | ||
// It's readonly - so there is not much we can do here to verify it apart try to create blobStore() | ||
blobStore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only verifies the repo on the master, not the other nodes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean that comment has to be - try the best - try to create blobStore on master
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to extend verification so that it also works on read-only repos (and make it work in BWC-compatible fashion). Also, this should probably call blobStore().blobContainer(basePath())
.
We can do that in a follow-up, but for now add a comment that this currently only checks that the repository is accessible on the master. // TODO: add repository verification for read-only repositories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
blobStore().close(); | ||
} catch (Exception t) { | ||
logger.warn("cannot close blob store", t); | ||
BlobStore store = blobStore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifecycle component is not concurrency-aware, and not all methods that create the blobstore etc. are guarded by "if-not-closed" checks. We need to guard against creating a blobstore / container after the closing has started
createTestRepository(name, verify); | ||
|
||
final RepositoriesService repositoriesService = | ||
internalCluster().getDataOrMasterNodeInstances(RepositoriesService.class).iterator().next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's check all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
…meters are moved to ctors; using SetOnce and lifecycle-aware lazy creation
Thanks @ywelsch for review - I did relevant changes - one comment is still under question (to me) - is to add assertion on snapshot thread (pls check comment) |
test this please |
1 similar comment
test this please |
|
||
// only use for testing | ||
@Override | ||
protected BlobStore getBlobStore() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, it is in use in AzureRepositorySettingsTests
synchronized (lock) { | ||
store = blobStore.get(); | ||
if (store == null) { | ||
if (lifecycle.started() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not concurrency-safe. Closing can be happening concurrently to this.
*/ | ||
protected abstract BlobStore blobStore(); | ||
protected BlobStore blobStore() { | ||
BlobStore store = blobStore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then assert that it's either snapshot or generic threadpool
if (lifecycle.started() == false) { | ||
throw new RepositoryException(metadata.name(), "repository is not in started state"); | ||
} | ||
verificationThreadCheck(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's check this on every access, not only creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see your point - you'd like to enforce any access to blobStore
rather just initialization; there are some non-integration test failures - gonna fix it soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one more comment. No need for another iteration though. LGTM
@@ -543,11 +613,21 @@ public long getRestoreThrottleTimeInNanos() { | |||
return restoreRateLimitingTimeInNanos.count(); | |||
} | |||
|
|||
protected void verificationThreadCheck() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not only checking the verification thread here, but all operations that are accessing the blobstore. Can you rename this method?
…d couple comments for assertSnapshotOrGenericThread and the reason of FsLikeRepoPlugin in BlobStoreRepositoryTests.java
thanks @ywelsch for the review |
(cherry picked from commit b1bf643)
* 6.x: Watcher: Make settings reloadable (#31746) [Rollup] Histo group config should support scaled_floats (#32048) lazy snapshot repository initialization (#31606) Add secure setting for watcher email password (#31620) Watcher: cleanup ensureWatchExists use (#31926) Add second level of field collapsing (#31808) Added lenient flag for synonym token filter (#31484) (#31970) Test: Fix a second case of bad watch creation [Rollup] Use composite's missing_bucket (#31402) Docs: Restyled cloud link in getting started Docs: Change formatting of Cloud options Re-instate link in StringFunctionUtils javadocs Correct spelling of AnalysisPlugin#requriesAnalysisSettings (#32025) Fix problematic chars in javadoc [ML] Move open job failure explanation out of root cause (#31925) [ML] Switch ML native QA tests to use a 3 node cluster (#32011)
Instead of eager initialization of underlying blob store in ctor this PR move actual initialization of underlying blob store to later stage.
It makes possible to register e.g. s3repo with bucket that is not available at the moment.