Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shard ordering bug on some filesystems. #138

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lbergelson
Copy link
Contributor

  • We were relying on the results of FileInputFormat.listStatus() returning a sorted set of statuses when loading multiple parts files.
    This is not a safe assumption on some files systems which return them in essentially random order.
    This corrects the problem by overriding listStatus in FileSplitInputFormat and sorting the results.
  • See SortSam ShardedInput tests failing.  broadinstitute/gatk#5881 for additional information.

* We were relying on the results of FileInputFormat.listStatus() returning a sorted set of statuses when loading multiple parts files.
  This is not a safe assumption on some files systems which return them in essentially random order.
  This corrects the problem by overriding listStatus in FileSplitInputFormat and sorting the results.
* See broadinstitute/gatk#5881 for additional information.
@lbergelson lbergelson assigned heuermh and unassigned heuermh May 29, 2020
@lbergelson lbergelson requested a review from heuermh May 29, 2020 20:49
@lbergelson
Copy link
Contributor Author

@heuermh Could you take a look at this? It's a pretty nasty bug that has apparently been around for a while but was only manifesting locally on macs. It started happening in GATK's travis builds for some unknown reason ( I don't know what the configuration change was that made it start...).

@jamesemery Please take a look at this as well.

Copy link

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I'm not sure what the best way to test a transient-Filesystem dependent error like this would be...

@@ -4,7 +4,7 @@

<groupId>org.disq-bio</groupId>
<artifactId>disq</artifactId>
<version>0.3.7-SNAPSHOT</version>
<version>0.3.7-bugfix-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change isn't necessary for this pull request -- versioning will be handled by the Maven release plugin on the next release cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops! That's funny, I literally told @tedsharpe the same thing on his last PR and then I did it myself...

@heuermh
Copy link
Contributor

heuermh commented Jun 2, 2020

Requested the version change be dropped, otherwise looks good to me.

@lbergelson
Copy link
Contributor Author

On further investigation disq is only passing because the reader tests don't assert that the files we get back are in the correct order... I'm seeing the sam issue with SAM files that isn't fixed by this patch. (I didn't even know we could write sharded sam files...)

@lbergelson
Copy link
Contributor Author

I suspect we may have the same issue with vcf too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants