Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use write api #20800

Merged
merged 1 commit into from
Mar 4, 2024
Merged

Use write api #20800

merged 1 commit into from
Mar 4, 2024

Conversation

elonazoulay
Copy link
Member

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@ebyhr
Copy link
Member

ebyhr commented Feb 22, 2024

/test-with-secrets sha=456142157c288d25d917965279ad2627008d7c43

Copy link

github-actions bot commented Feb 22, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/7999002596

@ebyhr
Copy link
Member

ebyhr commented Feb 22, 2024

/test-with-secrets sha=d8f54d89f31deee461f75758672ec579d0d9333c

Copy link

github-actions bot commented Feb 22, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/7999104709

@elonazoulay elonazoulay requested a review from wendigo February 22, 2024 05:39
@elonazoulay
Copy link
Member Author

/test-with-secrets sha=db78942fd7aa85ae23b51c3521f3eb5d82f66e7e

@electrum
Copy link
Member

Can you update the file system tests to verify that the file doesn’t exist until the output stream is closed?

@findepi
Copy link
Member

findepi commented Feb 22, 2024

/test-with-secrets sha=bf9f124989286f046fece43bad73613d3d38f788

Copy link

github-actions bot commented Feb 22, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8001732829

@findepi
Copy link
Member

findepi commented Feb 23, 2024

/test-with-secrets sha=7e7d44c1a71902875866fab97ae1bc27c87bac69

Copy link

github-actions bot commented Feb 23, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8019094596

}

@Test
@Disabled("This test takes 10 minutes to run locally")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writing 32M takes 10 minutes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep (!) writing and reading the file every 8k bytes to verify that it's not changed until the outputstream is closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it suffice to verify it once just before writing last 8k,
or after writing last 8k and before doing close (= committing the write)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Trying that out and will push

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and now it takes 18 seconds:) removing from @Disabled. Thanks for that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 seconds for the other test, enabled that one as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now down to about 9.7 - 10 seconds

@elonazoulay elonazoulay force-pushed the use_write_api branch 3 times, most recently from efdaffa to c4e3a3a Compare February 27, 2024 05:43
Comment on lines 1256 to 1269
for (int i = 0; i < bytes.length / 4; i++) {
//noinspection NumericCastThatLosesPrecision
bytes[i] = (byte) i;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what is happening here.
Why are we filling only a quarter of the array?

This snippet is used in another method below as well - extract in a helper method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the hint from #20800 (comment) was not correctly applied?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the hint was correct: the contents don't matter as long as they differ from the original contents - +1 on extracting, will do. lmk if you have any concern w any of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what is happening here. Why are we filling only a quarter of the array?

This snippet is used in another method below as well - extract in a helper method.

Update: misread this. This was a bug, thanks for catching! Also extracted to a helper method.

}
try (OutputStream outputStream = getFileSystem().newOutputFile(location).createOrOverwrite()) {
// write a 32 mb file
for (int i = 0; i < divide(32 * 1024 * 1024, bytes.length, UNNECESSARY); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 32 MB really necessary ? We do control the size of array being written (8K). Is there local caching that you're trying to avoid ?
There may be some issues on the CI when testing from a Github runner which is far from the region of the GCS bucket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were originally to test gcs filesystem: the write buffer under the hood is 16mb, so wanted to choose a size that would ensure we were exercising the case where the buffer flushes to storage. I can do 20mb - currently the 2 tests take 18 seconds and 14 seconds respectively, lmk if you'd like the size lowered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it 17MB or 18MB

}

@Test
public void testLargeFileDoesNotOverwriteUntilClosed()
Copy link
Contributor

@findinpath findinpath Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testLargeFileDoesNotOverwriteUntilClosed -> testLargeFileDoesIsNotOverwrittenUntilClosed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about combining the 2? :) testLargeFileIsNotOverwrittenUntilClosed?

getFileSystem().deleteFile(location);
byte[] bytes = new byte[8192];
for (int i = 0; i < bytes.length / 4; i++) {
//noinspection NumericCastThatLosesPrecision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this @SuppressWarnings

void testFileDoesNotExistUntilClosed()
throws Exception
{
Location location = getRootLocation().appendPath("testFile2");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give these better names

public void testLargeFileDoesNotExistUntilClosed()
throws IOException
{
Location location = getRootLocation().appendPath("testFile3");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

bytes[i] = (byte) i;
}
try (OutputStream outputStream = getFileSystem().newOutputFile(location).create()) {
// write a 32 mb file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use uppercase MB to match other usages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What buffer size do we care about here? If this is for a 16MB buffer, we should use 17MB, or 33MB for a 32MB buffer, to ensure we have written more than the buffer flush size.

Copy link
Member Author

@elonazoulay elonazoulay Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16mb, so will try 17mb. But that only applies to gcs.

Location location = getRootLocation().appendPath("testFile3");
getFileSystem().deleteFile(location);
byte[] bytes = new byte[8192];
for (int i = 0; i < bytes.length / 4; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be / 4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! didn't catch that when I changed the function.

}
try (OutputStream outputStream = getFileSystem().newOutputFile(location).create()) {
// write a 32 mb file
for (int i = 0; i < divide(32 * 1024 * 1024, bytes.length, UNNECESSARY); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this divide() call does here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was from a previous review suggestion above - to ensure that the size is divisible by the bytes.length.

}
try (OutputStream outputStream = getFileSystem().newOutputFile(location).create()) {
// write a 32 mb file
for (int i = 0; i < divide(32 * 1024 * 1024, bytes.length, UNNECESSARY); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we do

assertThat(MEGABYTE % bytes.length).isEqualTo(0);
// write a 33 MB file
int target = 33 * MEGABYTE;
int count = 0;
while (count < target) {
    outputStream.write(bytes);
    count += bytes.length;
    if (count % MEGABYTE == 0) {
        assertFalse(fileExistsInListing(location));
        assertFalse(fileExists(location));
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! They both take about 10-11seconds now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supercool. do we need to check every megabyte (33 times)?
would we lose anything if we asserted only before final flush/close?
ie do we believe it is possible for a file to first show up, then disappear, and then show up again while we're writing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - updating now, will save this version just in case.

public void testLargeFileDoesNotOverwriteUntilClosed()
throws IOException
{
Location location = getRootLocation().appendPath("testFile4");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

private boolean fileExistsInListing(Location location)
throws IOException
{
for (FileIterator fileIterator = getFileSystem().listFiles(getRootLocation()); fileIterator.hasNext(); ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this reads better as

FileIterator fileIterator = getFileSystem().listFiles(getRootLocation());
while (fileIterator.hasNext()) {

@elonazoulay elonazoulay force-pushed the use_write_api branch 2 times, most recently from 5759144 to 4520de2 Compare February 28, 2024 09:41
throws IOException
{
if (!supportsIncompleteWriteNoClobber()) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abort("...") to mark it skipped

throws IOException
{
if (!supportsIncompleteWriteNoClobber()) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abort("...") to mark it skipped

}

@Test
@SuppressWarnings("NumericCastThatLosesPrecision")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not @SuppressWarnings on whole method (unless it's a very short method).
use either @SuppressWarnings on "a variable declaration with assignment" or use comment-based suppression

Comment on lines 1311 to 1304
// write a 17 MB file
int target = 17 * MEGABYTE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-comment why 17 and not some other number. (The idea was to write more than assumed buffer size)

(same above)

Comment on lines 1317 to 1311
if (count % MEGABYTE == 0) {
assertEquals(fileSize, readFile(inputFile, fileBytes));
assertEquals("test", new String(fileBytes, UTF_8));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be enough to do this once, before existing from try (OutputStream outputStream = getFileSystem().newOutputFile(location).createOrOverwrite()) { block

Comment on lines 1277 to 1276
if (count % MEGABYTE == 0) {
assertFalse(fileExistsInListing(location));
assertFalse(fileExists(location));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be enough to do this once, before existing from try (OutputStream outputStream = getFileSystem().newOutputFile(location).createOrOverwrite()) { block

@findepi
Copy link
Member

findepi commented Feb 29, 2024

there are (related) failures (eg in TestHdfsFileSystemLocal)

@electrum electrum merged commit 59db1f7 into trinodb:master Mar 4, 2024
62 checks passed
@github-actions github-actions bot added this to the 440 milestone Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants