Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update some methods to accept Path input #1681

Merged
merged 14 commits into from
Oct 11, 2023
Merged

Update some methods to accept Path input #1681

merged 14 commits into from
Oct 11, 2023

Conversation

takutosato
Copy link
Collaborator

Description

This PR makes changes needed for supporting cloud input/output files in some Picard tools. (see broadinstitute/picard#1804)

Things to think about before submitting:

  • Make sure your changes compile and new tests pass locally.
  • Add new tests or update existing ones:
    • A bug fix should include a test that previously would have failed and passes now.
    • New features should come with new tests that exercise and validate the new functionality.
  • Extended the README / documentation, if necessary
  • Check your code style.
  • Write a clear commit title and message
    • The commit message should describe what changed and is targeted at htsjdk developers
    • Breaking changes should be mentioned in the commit message.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato Looks good overall. I found a few issues that need to be resolved before we can merge it. (including a bug I seem to have added years ago an no one has noticed I guess).

I'm also not really sure we should deprecate the file methods. In general we've just been keeping them along side the existing ones. Maybe it's time to start flipping them to deprecated, maybe people will complain. I'm unsure...

*/
@Deprecated
public static void reheaderBamFile(final SAMFileHeader samFileHeader, final File inputFile, final File outputFile, final boolean createMd5, final boolean createIndex) {
reheaderBamFile(samFileHeader, inputFile.toPath(), outputFile.toPath(), createMd5, createIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IOUtil.toPath() instead of file.toPath() in these cases because it propagates the null in the case of a null input instead of NPE.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final long skipLast = ((term == BlockCompressedInputStream.FileTermination.HAS_TERMINATOR_BLOCK) && skipTerminator) ?
BlockCompressedStreamConstants.EMPTY_GZIP_BLOCK.length : 0;
final long bytesToWrite = length - skipLast - currentPos;

IOUtil.transferByStream(in, outputStream, bytesToWrite);
IOUtil.transferByStream(in, outputStream, bytesToWrite); // tsato: this method is manually buffered so make sure BufferedReader/Writer is not used (or does that matter?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double buffering is a pretty minor issue in most cases. If it wasn't dealing with it before I'm not worried about it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

IOUtil.assertFileIsReadable(inputFile);
IOUtil.assertFileIsWritable(outputFile);
// IOUtil.assertFileIsWritable(outputFile); // tsato: what do I do with this...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.. We could presumably adapt that too take a path but maybe that's complicated. There is an existing method assertDirectoryIsWriteable that you could run on the parent of the outputPath (with a null check added to make sure output file isn't null...)

I'm really not a huge fan of these preemptive checks. The good thing is that they produce sane error messages. The bad thing is that they can potentially fail in misleading ways (i.e. transient internet events, weird implementation of directories by google. ), and they have non-negligible performance costs in some cases.

That's a long way to say that it's probably fine to drop it, although I might add a null check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Added the null check. Another possibility (which I started to adopt in Picard) is to check writability if the Path refers to a local file e.g.

ValidationUtils.nonNull(inputFile);
ValidationUtils.nonNull(outputFile);
IOUtil.assertFileIsReadable(inputFile);
if (outputFile.toUri().getScheme().equals("file")){
    IOUtil.assertFileIsWritable(outputFile.toFile());
}

Is this worth doing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that solution. Lets do that for now but wrap it into a utility method so you don't have to keep rewriting that logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Done

final long length = inputFile.length();
final long currentPos = in.getCount(); // tsato: assuming currentPos is the position after writing the first block, this should be ok.
// final long length = inputFile.length();
final long length = Files.size(inputFile); // tsato: rename to size
final long skipLast = ((term == BlockCompressedInputStream.FileTermination.HAS_TERMINATOR_BLOCK) && skipTerminator) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think length is still perfectly clear.

final long vOffsetOfFirstRecord = SAMUtils.findVirtualOffsetOfFirstRecordInBam(inputFile);
final BlockCompressedInputStream blockIn = new BlockCompressedInputStream(inputFile);
blockIn.seek(vOffsetOfFirstRecord);
if (skipHeader) { // tsato: this method assumes bam file...should I test cram?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about cram here, this method only works for bam files.


@Test
public void testReheaderBamFile(){
final File originalBam = new File(this.NA12878_8000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually don't use this when referring to a static variable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Confirm that the reads are the same as the original
Assert.assertTrue(compareBamReads(originalBam, output, DEFAULT_NUM_RECORDS_TO_COMPARE));
} catch (IOException e){
throw new HtsjdkException("Could not create a temporary output file.", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add throws IOException to the test method. It's not clear that this error message will be correct, there are a bunch of IO operations going on during this method so it could be more confusing than helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public void testBlockCopyBamFile(final boolean skipHeader, final boolean skipTerminator) throws IOException {
System.out.println(skipHeader + ", " + skipTerminator);
final File output = File.createTempFile("output", ".bam");
final OutputStream out = Files.newOutputStream(output.toPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try-with-resourcces

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return true;
}

// tsato: kinda ugly?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, seems fine

*
* @return true if the first (numRecordsToCompare) reads are equal, else false
*/
private boolean compareBamReads(final File bam1, final File bam2, final int numRecordsToCompare){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is subtly wrong. If either iterarator is too short it will fail with an exception instead of returning false. Instead of reimplementing this, you should just use Assert.assertEquals() which includes an overload for comparing two iterables.

It might make sense to keep this method and have it handle opening/closing the SamReaders but renaming it to assertBamRecordsEqual and use the existing comparators.

@takutosato
Copy link
Collaborator Author

@lbergelson could you take a look at my revised code? Thanks!

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks google to me!

}

final long pos = BlockCompressedFilePointerUtil.getBlockAddress(blockIn.getFilePointer());
blockIn.close(); // tsato: why doesn't IntelliJ say this is unnecessary?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could leave out that close. I don't know why intellij isn't telling you that but it looks like it's redundant to me.

@lbergelson lbergelson merged commit 6d1a796 into master Oct 11, 2023
4 checks passed
@lbergelson lbergelson deleted the ts_nio branch October 11, 2023 21:27
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants