Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exchange spooling on Azure Blob Storage #12211

Merged
merged 4 commits into from
May 10, 2022

Conversation

linzebing
Copy link
Member

@linzebing linzebing commented May 2, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

New feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

trino-exchange-plugin

How would you describe this change to a non-technical end user or system administrator?

This enables users to set up exchange spooling on Azure Blob Storage.

Related issues, pull requests, and links

Documentation

#12467
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
(x) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

() No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Added support for exchange spooling on Azure Blob Storage.

}

@Override
public ListenableFuture<Void> finish()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that make sense to mark Writer as finishing so we throw an error if finish is called twice (or we can return previous future for second call).

(can be a followup - if to be done, same applies for s3 implementation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in ExchangeSink's protocol, for finish() This method is guaranteed not be called more than once..

@Override
public long getRetainedSize()
{
return INSTANCE_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there are two lists also - but I guess it is fine to skip those as size should be negligible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added accounting for blockIds, but not sure how to get estimated size of a ListenableFuture<Void>

if (secretKey.isPresent()) {
blockBlobAsyncClient = blockBlobAsyncClient.getCustomerProvidedKeyAsyncClient(new CustomerProvidedKey(secretKey.get().getEncoded()));
}
for (int i = 0; i < readableBlocks && fileOffset < fileSize; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need && fileOffset < fileSize here? Isn't how readableBlocks is computed enough to not go over file size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need. readableBlocks is calculated based on the remaining space in the buffer, but it's possible that the remaining file content is even shorter

if (sliceSize < 0) {
sliceSize = sliceInput.readInt();
}
Slice data = sliceInput.readSlice(sliceSize);
Copy link
Member

@losipiuk losipiuk May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not obvious that sliceInput will have at least sliceSize bytes at this point (after single call to fillBuffer) worth a comment.

Edit: well maybe it is ok - as fillBuffer literally ensures buffer is full AND we know a single row will always fit in the buffer. Still, a sentence which states just that would be helpful IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I find it a little awkward to add comment here as the code seems to explain itself better.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - some nitpicks.

public boolean exists(URI file)
throws IOException
{
return blobServiceClient.getBlobContainerClient(getContainerName(file)).getBlobClient(getPath(file)).exists();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is blobServiceClient.getBlobContainerClient cheap? Do we need to cache it?

Copy link
Member Author

@linzebing linzebing May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's purely a class wrapper, not sure it's worth it (because the cache will need to be concurrently accessed). Please advise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just a matter of creating a wrapper object than it's fine

.transformAsync(response -> toListenableFuture(response.getValue().collectList().toFuture()), directExecutor())
.transform(byteBuffers -> {
int offset = finalBufferFill;
for (ByteBuffer byteBuffer : byteBuffers) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have the client directly write bytes into a destination buffer (to avoid double copying?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an easy way, as Azure SDK doesn't provide any kind of callback/transformer interface to do so.

@linzebing linzebing force-pushed the azure branch 2 times, most recently from 6beb139 to 7e82ec4 Compare May 9, 2022 16:22
@linzebing
Copy link
Member Author

CI: #12298

@linzebing
Copy link
Member Author

CI: #12300

@arhimondr arhimondr merged commit 260a244 into trinodb:master May 10, 2022
@github-actions github-actions bot added this to the 381 milestone May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants