Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/picard crosscheck fingerprints #1327

Merged
merged 13 commits into from
Mar 8, 2021

Conversation

sstadick
Copy link
Contributor

@sstadick sstadick commented Oct 23, 2020

This add support for picard CrosscheckFingerprints output. A column is added to General Stats for each sample that indicates True/False (colored Green/Red respectively) as to whether or not all comparisons for that sample return Expected relationships.

There is also a CrosscheckFingerprints table that includes the pairwise comparison, the categorical result, the LOD_SCORE, and the threshold used, to be referenced in the event that a False is observed in the General Statistics.

  • There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
  • Code is tested and works locally (including with --lint flag)
  • docs/README.md is updated with link to below
  • docs/modulename.md is created
  • Everything that can be represented with a plot instead of a table is a plot
  • Report sections have a description and help text (with self.add_section)
  • There aren't any huge tables with > 6 columns (explain reasoning if so)
  • Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
  • Module does not do any significant computational work

For test data see MultiQC/test-data#172

@sstadick sstadick marked this pull request as ready for review October 23, 2020 21:12
docs/modules/picard.md Outdated Show resolved Hide resolved
docs/modules/picard.md Outdated Show resolved Hide resolved
docs/modules/picard.md Outdated Show resolved Hide resolved
Copy link
Contributor

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can screenshot what this looks like with MultiQC/test-data#172?

docs/modules/picard.md Outdated Show resolved Hide resolved
docs/modules/picard.md Show resolved Hide resolved
multiqc/modules/picard/CrosscheckFingerprints.py Outdated Show resolved Hide resolved
multiqc/modules/picard/CrosscheckFingerprints.py Outdated Show resolved Hide resolved
@yfarjoun
Copy link

Do note that the "Expected" algorithm doesn't actually know if samples are from the same individual, and so LOD is "expected" to be positive is the sample ID is the same (from VCF or bam header) or if tool was run with "EXPECT_ALL_GROUPS_TO_MATCH=true". I've been toying with the idea of having a concept of "Individual", but that's not for here to discuss....

@nh13
Copy link
Contributor

nh13 commented Oct 27, 2020

@sstadick I see that the Sample Name column contains both the left and right sample names. Any chance we could have those also in two separate columns so they're sortable? What do you think?

@sstadick
Copy link
Contributor Author

@nh13 I don't think so. The input data for the table must come as a dictionary with the sample names as the key, so you can't have multiple rows with the same sample name.

Also of note, if someone were to run CrosscheckFingerprints and use ReadGroups as the level of granularity of the checking, I expand the names out to <left-sample>/<left-group> - <right-sample>/<right-group>.

My overall feeling is that plots.table just isn't made for this and it's a force fit for sure. The main "feature" here is in the general stats table with the red/green color of all expected.

sstadick and others added 3 commits October 28, 2020 10:30
Co-authored-by: Nils Homer <nh13@users.noreply.github.com>
This makes `Sample Name` in the table just be an index and moves the
sample names into the {LEFT,RIGHT}_SAMPLE column.

Additionally, this adds support for respecting the sample ignores
options, as well as renaming the samples based on the renaming rules.
@sstadick
Copy link
Contributor Author

sstadick commented Oct 29, 2020

Latest screenshot, run with --ignore-samples 'DNA_SAMPLE1_T_1F_tumor_dna' on the data in the test repo

image

@nh13
Copy link
Contributor

nh13 commented Oct 29, 2020

@sstadick that's pretty!

@ewels I think this is ready to go!

@ewels
Copy link
Member

ewels commented Nov 15, 2020

Thanks all! Test data merged, will take a look at the PR ASAP.

I think that there was only one sample in the test data PR? If you fancy adding a few more then that makes testing a bit easier as I can see how a typical report would look.. (But not critical).

Phil

@nh13
Copy link
Contributor

nh13 commented Dec 2, 2020

@ewels I think this is ready to go. The only issue I see is that when the sample names are long, the columns will overlap, but I am assuming this is an issue for all tables? If the columns were fit better, or re-sizeable, I think that would help.

@matthdsm
Copy link
Contributor

matthdsm commented Mar 1, 2021

Hi @ewels,

We're looking forward to using this module, can this be merged?

Thanks
M

@ewels
Copy link
Member

ewels commented Mar 8, 2021

Hi all,

Thanks for this! I've gone over the code and done a fair bit of refactoring, so please check that it still works as you expect.

I actually found a nasty little bug in the table code during this (maybe the reason why you disabled the colour schemes for LOD?) which was good to fix. I also added a new table config option to treat values as absolute for the bar width which works really nicely here to highlight the extreme LODs properly.

Along with a few other fixes and tweaks, I think that it looks even better now:

image

Note that the zero-point for this new feature is hardcoded to 0 for now. Surely someone will ask for this to be customisable but that can be rewritten in a future PR if so 😀

I'll merge this now but please give it a try and shout if I messed anything up 👍🏻

Phil

@ewels ewels merged commit 9c8aafa into MultiQC:master Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants