This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
Sample distribution plots: account for multiple samples from same individual #170
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose/implementation
The sample distribution plots did not account for multiple samples from the same individual (#155) or the fact that there are multiple experimental strategies in the
pbta-histologies.tsv
file.Here, I'm limiting the plots to tumor tissue only and to distinct pairs of participant ids and whatever is being plotted.
Note it is not the case that a single participant identifier will map to a single
disease_type_new
/broad_histology
/short_histology
value. So, we have a higher number of samples in these plots than number of unique individuals. This seems appropriate for these figures.Issue
#162 - does not look at the
tumor_descriptor
breakdownDirections for reviewers
Results
Same overall picture, which is what I would expect, but the numbers are more consistent with the number of participants.
Docker and continuous integration
Check all those that apply or remove this section if it is not applicable.