-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement VCF bundles. #1703
Implement VCF bundles. #1703
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad I'm pretty sure the resolve index method is completely broken. I think either delete it or add some tests.
I'm finding it very confusing to differentiate where to use VARIANT_CONTEXTS vs VARIANTS_VCF. Maybe we should change the naming or put them in different classes? Alternatively, it might be less confusing if we added VARIANTS_BCF and constants for the different indexes? It's not super important but I found it confusing, especially in the tests where in some places you use VARIANTS_VCF to specify the extension while VARIANTS_INDEX is different. We can definitely punt on it in this pr.
Other than that, intellij found a few minor suggestions.
* this type. | ||
*/ | ||
public class VariantsBundle<T extends IOPath> extends Bundle implements Serializable { | ||
private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably good practice to start annotating these with @Serial
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (on the variable itself - though I'm still not sure exactly what that does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It throws a compiler error if the variable or method isn't actually part of the serialization system. It prevents things like serialVersionID
which has been a problem for us back in the spark days. Annoyingly hard to notice. Sort of like the Override annotation, just a helpful sanity check.
* for variants sources that are backed by streams or otherv{@link BundleResource} types, the {@link Bundle} and | ||
* {@link BundleBuilder} classes can be used to construct such bundles directly. | ||
* | ||
* @param <T> The type to use when creating a new IOPathResource for a {@link htsjdk.beta.plugin.variants.VariantsBundle}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe someday java will add self typed calls.
* @param vcfPath An {@link IOPath}-derived object that represents a source of variants. | ||
*/ | ||
public VariantsBundle(final T vcfPath) { | ||
this(Arrays.asList(toInputResource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're free from the tyranny of Arrays.asLis()! List.of(), unless of course that causes some horrible confusing serialization error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
public VariantsBundle(final T vcfPath, final T indexPath) { | ||
this(Arrays.asList( | ||
toInputResource(BundleResourceType.VARIANT_CONTEXTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toInputResource()
is called with this verbose nonNull construct everywhere. It looks like that could be just moved into the method itself since the first parameter seems to always want to be the same as error message name in the null check.
(I think in this case it actually should VARIANT_CONTEXT and not VARIANTS_VCF also. It seems inconsistent.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is an identical copy of this method in ReadsBundle that has the same unnecessary verbosity also. Maybe it should be pulled out into a utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of the nonNull business -it turns out the called method already had that anyway. As for the reads one, its slightly different in that they each use different (content type-specific) getInferredContentTypes
methods.
} | ||
|
||
/** | ||
* return the {@link BundleResourceType#VARIANTS_VCF} {@link BundleResource} for this {@link htsjdk.beta.plugin.variants.VariantsBundle} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the duplicate comment line and just keep the `@returns'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
public static boolean looksLikeAVariantsBundle(final IOPath rawVCFPath) { | ||
return rawVCFPath.getURI().getPath().endsWith(BundleJSON.BUNDLE_EXTENSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use rawVCFPath.hasExtension()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I've removed this method until such a time as we actually need it (so far I don't).
// and correctly infer the types in all cases without reproducing all the logic embedded in all the | ||
// codecs (i.e., an htsget IOPath ends in ".bam", but has format HTSGET_BAM, not BAM - detecting | ||
// that here would require parsing the entire IOPath structure, which is best left to the codecs | ||
// themselves). So for now its advisory, but maybe it should be abandoned altogether. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this check is better than nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retained this for now, since it does some checking, though it could do a lot more.
/** | ||
* return the {@link BundleResourceType#VARIANTS_VCF} {@link BundleResource} for this {@link htsjdk.beta.plugin.variants.VariantsBundle} | ||
* | ||
* @return the {@link BundleResourceType#VARIANTS_VCF} {@link BundleResource} for this {@link htsjdk.beta.plugin.variants.VariantsBundle} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the doc say VARIANT_CONTEXTS
instead off VARIANTS_VCF
? There are few times it refers to requiring a resource with the type VARIANTs_VCF but the actual check seems to be for the more general VARIANTS_CONTEXT type instead of the VCF specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - those names are a bit of a mess. I'll update them.
"ALIGNED_READS":{"path":"my.cram","format":"READS_CRAM"}, | ||
"primary":"ALIGNED_READS" | ||
} | ||
""".formatted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I replaced some of the static text in these cases with additional parameters anyway.
} | ||
""".formatted(VCF_FILE, VCF_INDEX_FILE), | ||
// VariantsBundle doesn't automatically infer format, so create one manually | ||
new VariantsBundle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VariantsBundle<>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the type params from VariantsBundle altogether, since they really weren't needed anyway.
e418dc7
to
2c9878f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems pretty straight forward. Just a lot of it. I think it might makes ense to change the new utility method to be two different ones instead of one with a mode toggle. It's not a blocker though.
Another musing:
I wonder if t would make more sense to write the schema as:
schema: {verson: x, name: htsbundle}
instead of:
schemaVersion: x
schemaName: htsbundle
Maybe for version x+1...
|
||
/** | ||
* Takes an IOPath and returns a new IOPath object that keeps the same basename as the original but has | ||
* a new extension. If append is set to false, only the last component of an extension will be replaced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only the last -> the last
I read that as implying that append == true will replace all of the components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Will separate append/replace to eliminate confusion.
* @param ioPathConstructor a function that takes a string and returns an IOPath-derived class of type <T> | ||
* @return A new IOPath object with the new extension | ||
*/ | ||
public static <T extends IOPath> T replaceExtension( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a favor to my future self seeing this fail at runtime, is there a reason we shouldn't make it accept it either with or without the dot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense split these into two different functions? replaceExtension() and appendExtension()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like that better. I back ported this from one of Takuto's Picard PRs since I needed it. Also, I made him add tests for it ;-). Will separate these and add some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait I did add tests. Good for me. Now I have to update them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
String.format("A JSON string with more than one bundle was provided but only a single bundle is allowed in this context (%s)", | ||
e.getMessage())); | ||
} | ||
return bundles.stream().findFirst().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a weird way to get the first element of a list. why not just bundles.get(0)? Or update to java 21 and just use bundles .getFirst() :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah. Done.
} | ||
} | ||
if (bundles.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refer to a collection of bundles as a bindle for added alliteration. Definitely won't make it more confusing.
} else if (ext.equals((FileExtensions.SAM))) { | ||
return Optional.of(new Tuple<>(BundleResourceType.ALIGNED_READS, BundleResourceType.READS_SAM)); | ||
return Optional.of(new Tuple<>(BundleResourceType.CT_ALIGNED_READS, BundleResourceType.FMT_READS_SAM)); | ||
} | ||
//TODO: finish this, else SRA, htsget,... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably leave this as a todo for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ReadsBundle needs some work since it was never really finished. But its not used by my current work, so I'll do that in a separate PR.
* this type. | ||
*/ | ||
public class VariantsBundle<T extends IOPath> extends Bundle implements Serializable { | ||
private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It throws a compiler error if the variable or method isn't actually part of the serialization system. It prevents things like serialVersionID
which has been a problem for us back in the spark days. Annoyingly hard to notice. Sort of like the Override annotation, just a helpful sanity check.
src/main/java/htsjdk/io/IOPath.java
Outdated
@@ -77,6 +77,14 @@ default String getScheme() { | |||
return getURI().getScheme(); | |||
} | |||
|
|||
// default String resolveSibling(final String sibling) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Om, yeah, I have no idea why this is here. Deleted.
"schemaVersion":"%s", | ||
"primary":"%s", | ||
"%s":{"path":"%s","format":"BAM"} | ||
}""".formatted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The triple quotes string is so much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next year in string interpolation!
@@ -194,49 +213,143 @@ public void testFromJSONValidWithPathOverride(final String jsonString, final Str | |||
}); | |||
} | |||
|
|||
// Make sure that any JSON that is a valid single Bundle can also be parsed as a Bundle collection, so that users | |||
// can provide an individual bundle to any client that expects a collection of bundles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement added complexity but I think it will save so much frustration.
2c9878f
to
e45cd78
Compare
Draft until #1702 goes in.