Sample distribution plots: account for multiple samples from same individual #170

jaclyn-taroni · 2019-10-24T18:21:57Z

Purpose/implementation

The sample distribution plots did not account for multiple samples from the same individual (#155) or the fact that there are multiple experimental strategies in the pbta-histologies.tsv file.

Here, I'm limiting the plots to tumor tissue only and to distinct pairs of participant ids and whatever is being plotted.

Note it is not the case that a single participant identifier will map to a single disease_type_new/broad_histology/short_histology value. So, we have a higher number of samples in these plots than number of unique individuals. This seems appropriate for these figures.

Issue

#162 - does not look at the tumor_descriptor breakdown

Directions for reviewers

Does it seem reasonable to limit these plots to tumors and exclude cell lines for these plots? These will probably be included in an overview figure (not the interactive ones so much).
Does my filtering/group scheme seem consistent with what we'd like to present?

Results

Same overall picture, which is what I would expect, but the numbers are more consistent with the number of participants.

Docker and continuous integration

Check all those that apply or remove this section if it is not applicable.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

jashapiro

Looks good to me. You had me wondering if I could just use distinct() in my code for a minute, but then I realized you weren't going to do anything with the other columns, so whether it picked randomly didn't matter.

jaclyn-taroni · 2019-10-24T20:01:23Z

Funnily enough I was looking at what you added with #167 and making this probably overly complicated for its specific purpose.

jaclyn-taroni added 2 commits October 24, 2019 14:05

Only include tumors; distinct individual disease type pairs

5bb8593

Only include tumors; distinct histologies/disease type

e50efab

jaclyn-taroni requested review from jashapiro and cbethell October 24, 2019 18:21

Safer bash while we're at it

d825751

jashapiro approved these changes Oct 24, 2019

View reviewed changes

Merge branch 'master' into 162-dedup-dist-plot

85ece99

jaclyn-taroni merged commit b38dc64 into AlexsLemonade:master Oct 24, 2019

jaclyn-taroni deleted the 162-dedup-dist-plot branch October 24, 2019 22:01

jaclyn-taroni mentioned this pull request Oct 25, 2019

Update: sample distribution plots accounting for multiple samples from the same individual #162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample distribution plots: account for multiple samples from same individual #170

Sample distribution plots: account for multiple samples from same individual #170

jaclyn-taroni commented Oct 24, 2019 •

edited

Loading

jashapiro left a comment

jaclyn-taroni commented Oct 24, 2019

Sample distribution plots: account for multiple samples from same individual #170

Sample distribution plots: account for multiple samples from same individual #170

Conversation

jaclyn-taroni commented Oct 24, 2019 • edited Loading

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

jashapiro left a comment

Choose a reason for hiding this comment

jaclyn-taroni commented Oct 24, 2019

jaclyn-taroni commented Oct 24, 2019 •

edited

Loading