Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Sample distribution plots: account for multiple samples from same individual #170

Merged
merged 4 commits into from
Oct 24, 2019

Conversation

jaclyn-taroni
Copy link
Member

@jaclyn-taroni jaclyn-taroni commented Oct 24, 2019

Purpose/implementation

The sample distribution plots did not account for multiple samples from the same individual (#155) or the fact that there are multiple experimental strategies in the pbta-histologies.tsv file.

Here, I'm limiting the plots to tumor tissue only and to distinct pairs of participant ids and whatever is being plotted.

Note it is not the case that a single participant identifier will map to a single disease_type_new/broad_histology/short_histology value. So, we have a higher number of samples in these plots than number of unique individuals. This seems appropriate for these figures.

Issue

#162 - does not look at the tumor_descriptor breakdown

Directions for reviewers

  • Does it seem reasonable to limit these plots to tumors and exclude cell lines for these plots? These will probably be included in an overview figure (not the interactive ones so much).
  • Does my filtering/group scheme seem consistent with what we'd like to present?

Results

Same overall picture, which is what I would expect, but the numbers are more consistent with the number of participants.

Docker and continuous integration

Check all those that apply or remove this section if it is not applicable.

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. You had me wondering if I could just use distinct() in my code for a minute, but then I realized you weren't going to do anything with the other columns, so whether it picked randomly didn't matter.

@jaclyn-taroni
Copy link
Member Author

Funnily enough I was looking at what you added with #167 and making this probably overly complicated for its specific purpose.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants