-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporating changes from GVS to existing files #8256
Conversation
@droazen @mcovarr If tests are passing we should probably do a light review to make sure nothing is obviously bonkers in unintentional ways from the merge. |
We're failing with: |
@lbergelson we're not sure why we'd be running out of space, we'll have a look. |
Per team discussion:
|
@lbergelson The ~370 MB of files added under |
@mcovarr Happy to remove these then! We're dangerously close to the vm disk limit it seems. |
* Absorbing the changes to core GATK files from the long running GVS work. * Based on the vs_834_deletions branch where code was factored to separate the gvs specific classes from the shared gatk code * There were fairly minimal conflicts between the two except for in the new VQSR package. In this case we've taken the master version of those files since the GVS version is out of date. remove some large files that don't seem to be referenced
19f90bc
to
3bee3c1
Compare
Yeah disk space definitely seems pretty tight in these builds right now... I'm not sure why those files are still in our branch. I did a cleanup pass on our branch a couple of weeks ago that was intended to identify files like this; I'll see if I can figure out how they were missed. |
Maybe they were referenced in the vqsr stuff that I took the newer version of? |
OK so the script I wrote earlier was really about figuring out which files in
Which are the 6 files we already knew about plus a bonus JSON. I'll make a PR against |
Actually it turns out all of these files are still being used in the version of VQSR Lite code on |
@mcovarr Interesting. Maybe they were additional test files added when you incorporated the vqsr lite changes? It was very unclear to me if there were intentional changes / additions in the variant store branch or if things were just out of date / slightly different than master. Maybe we should have someone who was involved in that take a look so we can be sure to incorporate any positive changes that were made. |
Hi Louis, we talked this over and the Variants folks who have done more VQSR Lite than myself say this looks fine as they expected the version of VQSR Lite on So basically this looks good to us. Thanks again for taking this on! 🙏 |
So I tried adding in the GVS files to this branch and found two compilation issues:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergelson Just a few minor issues / questions
Can you also comment on what happened to the changes in 95b1548 during conflict resolution?
@@ -145,7 +145,7 @@ public abstract class GATKTool extends CommandLineProgram { | |||
/** | |||
* Our source of reference data (null if no reference was provided) | |||
*/ | |||
ReferenceDataSource reference; | |||
protected ReferenceDataSource reference; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would keep this package-protected. It's meant to be encapsulated for walker traversals (descendants of WalkerBase
), and exposed for direct descendants of GATKTool
(ie., non-walkers). Non-walker tools can access this field via the inherited directlyAccessEngineReferenceDataSource()
method. @mcovarr / @lbergelson , would it be a significant hardship for the GVS branch to migrate to using that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be okay if we do the following in ExtractTool.java
? Only the three subclasses of ExtractTool.java
reference reference
.
protected ReferenceDataSource reference = directlyAccessEngineReferenceDataSource();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcovarr Of course, that's totally fine! If those are the only usages in the GVS branch, then, would you be ok with @lbergelson changing this field back to package-accessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that would be fine, thanks!
public GnarlyGenotyperEngine(final boolean keepAllSites, final int maxAltAllelesToOutput, final boolean stripASAnnotations) { | ||
this(keepAllSites, maxAltAllelesToOutput, true, stripASAnnotations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good for @ldgauthier to have a quick look at the GnarlyGenotyper
changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second look these GnarlyGenotyper
changes are trivial (just a new emitPls
flag), and so don't require extra scrutiny
@@ -200,7 +200,7 @@ public LineBuilder fill(final String filling) { | |||
* Constructs the line and writes it out to the output | |||
*/ | |||
public void write() { | |||
Utils.validate(!Arrays.stream(lineToBuild).anyMatch(Objects::isNull), "Attempted to construct an incomplete line, make sure all columns are filled"); | |||
Utils.validate(!Arrays.stream(lineToBuild).anyMatch(Objects::isNull), "Attempted to construct an incomplete line, make sure all columns are filled: " + Arrays.toString(lineToBuild)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this String should be lazily constructed (ie., only when the validation check fails) instead of unconditionally on every call to write()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made lazy
@@ -38,28 +36,42 @@ public void testExecuteQueryWithWhereClause() { | |||
expectedNamesAndAges.put("Fred", "35"); | |||
|
|||
final String query = String.format("SELECT * FROM `%s` WHERE name = 'Fred'", BIGQUERY_FULLY_QUALIFIED_TABLE); | |||
Map<String, String> labels = new HashMap<String, String>(); | |||
labels.put("gatktestquery", "testwhereclause" + runUuid); | |||
System.out.print("testwhereclause" + runUuid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the logger, not System.out.print()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM delta David's requested changes. Looking forward to having GVS synced up with master! 😄
@droazen I've responded to your comments. |
specific classes from the shared gatk code
In this case we've taken the master version of those files since the GVS version is out
of date.