Convert the cnv consensus output table to a segfile #441

jashapiro · 2020-01-15T21:13:20Z

Purpose/implementation Section

What scientific question is your analysis addressing?

How should various CNV calls be merged for downstream analysis?

What was your approach?

I wrote an R script to take the output from the copy number consensus analysis, and reduce it to a set of data that can be consumed as if it had come from cnvkit (more or less). Notably, as the copy number consensus may merge multiple nearby CNVs into a single event, this this requires some way of merging seg.mean values from multiple CNV calls.

In order to merge seg.mean values, I am currently taking the approach of calculating the weighted mean of all CNVkit calls that comprise each consensus call. The code does include the possibility of various other schemes such as weighted medians with or without interpolation.

While not used in this output, the script does also calculate a weighted median of copy number values (integers).

What GitHub issue does your pull request address?

#392

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is the choice of weighted means to produce a single value reasonable?

I am using weighted median copy number from CNVkit where I can, and ControlFreeC values when CNVkit did not contribute to the consensus. Does that make sense?

Is there anything that you want to discuss further?

I do not think the discussions around #392 are complete: is this the format of output that we want to use, or should we include some of the other information in the final output file, such as alternative consensus seg.mean estimates?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

A segfile table: pbta-cnv-consensus.seg

What is your summary of the results?

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

Follows the format of the cnvkit seg file in OpenPBTA/data/

…ensus_segfile

jaclyn-taroni · 2020-01-16T20:24:45Z

@jashapiro CI is failing with:

Complete log: /rocker-build/analyses/copy_number_consensus_call/.snakemake/log/2020-01-16T171857.497395.snakemake.log
Error in file.path(analysis_dir, opts$cnv_file) : 
  object 'analysis_dir' not found
Calls: <Anonymous> ... read_delimited -> source_name -> is.connection -> file.path
Execution halted

Full output is here: build_2567_step_116_container_0.txt

jaclyn-taroni · 2020-01-17T13:30:27Z

@jashapiro I am unable to update this branch so it is up to date with the AlexsLemonade master branch. Can you allow edits from maintainers? If that is already enabled, then something else is going on and I'm not sure what it might be but I'm sure we could get to the bottom of it. Something similar happened on #439 so would love to figure it out if it's not that!

jaclyn-taroni

This looks good. I had a few questions about the output and I think the README needs more documentation before merging. You and I had talked in person about revisiting how CNVkit calculates seg.mean -- did you look at that and come to the decision that weighted mean is the way to go here?

analyses/copy_number_consensus_call/README.md

analyses/copy_number_consensus_call/scripts/bed_to_segfile.R

jaclyn-taroni · 2020-01-17T13:56:02Z

analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg

@@ -0,0 +1,7070 @@
+ID	chrom	loc.start	loc.end	num.mark	seg.mean	copy.num
+BS_007JTNB8	chr11	771036	866778	NA	0.214821	3


It looks like all of the num.mark values are NA - is this expected behavior? Also some NA values in the seg.mean column. I don't have an intuition for whether or not this is expected or if NA values should be handed in the seg_* functions.

Oh, does the seg.mean become NA when a call is the result of consensus between Manta and ControlFreeC but not CNVkit?

That is correct. Only CNVkit uses seg.mean, so if there is no CNVkit call, we have nothing to work from. The num.mark column is also something potentially specific to CNVkit, but was not carried through in the consensus scripts. It seems to be optional to the .seg file format; I could just leave it off if nothing downstream depends on it. I thought preserving the column structure was worth doing for the first pass.

As to the contents of num.mark: it is "bins" from CNVkit, so it doesn't really correspond to anything we can calculate easily, though I suppose we could fill in length or something like it.

What you've implemented sounds good to me in light of this additional information, thanks for the explanation!

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

jaclyn-taroni

Spotted one typo - LGTM otherwise!

jaclyn-taroni · 2020-01-17T19:47:28Z

analyses/copy_number_consensus_call/README.md

@@ -35,6 +43,10 @@ Since there are 3 callers, there were 3 comparisons: `manta-cnvkit`, `manta-free

 11) **Sort and merge** the CNVs from the comparison pairs ,`manta-cnvkit` `manta-freec` `cnvkit-freec`, together into 1 file
 12) After every samples' consensus CNVs were called, **combine all merged files** from step 10 and output to `results/cnv_consensus.tsv`
+13) The `results/cnv_consensus.tsv` is translated into a `pbta-cnv-consensus.seg` file in the same format as `pbta-cnv-cnvkit.seg.gz`.
+When a consensus CNV contains from multiple source CNV segments, we take the mean of the CNVkit `seg.mean` values from the source segments, weighted by segment lenth.


Suggested change

When a consensus CNV contains from multiple source CNV segments, we take the mean of the CNVkit `seg.mean` values from the source segments, weighted by segment lenth.

When a consensus CNV contains from multiple source CNV segments, we take the mean of the CNVkit `seg.mean` values from the source segments, weighted by segment length.

jashapiro added 4 commits January 15, 2020 16:01

Convert the output table to a segfile

e24947c

Follows the format of the cnvkit seg file in OpenPBTA/data/

Add copy number to output

f2cad1d

Update file organization

a70fcc7

Update comments & simplify io locations

19968ac

jashapiro marked this pull request as ready for review January 16, 2020 16:24

jashapiro added 3 commits January 16, 2020 11:37

README updates

24123b4

Merge remote-tracking branch 'upstream/master' into jashapiro/cn_cons…

3c2152c

…ensus_segfile

Minor README updates

9967537

jaclyn-taroni self-requested a review January 16, 2020 19:50

Fix file paths

e77f47c

jaclyn-taroni reviewed Jan 17, 2020

View reviewed changes

jashapiro and others added 5 commits January 17, 2020 09:36

Merge branch 'master' into jashapiro/cn_consensus_segfile

afec0b7

Apply @jtaroni suggestions from code review

4dbe2c6

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

Make seg.mean calculation method an option

88c189e

Update readme to describe seg.mean consensus values

d8ea8ad

Merge branch 'master' into jashapiro/cn_consensus_segfile

e587238

jaclyn-taroni approved these changes Jan 17, 2020

View reviewed changes

jaclyn-taroni added 2 commits January 17, 2020 14:48

Merge branch 'master' into jashapiro/cn_consensus_segfile

2924dcd

Update README.md

d8f3c71

jaclyn-taroni merged commit 9b1cb6f into AlexsLemonade:master Jan 17, 2020

jashapiro deleted the jashapiro/cn_consensus_segfile branch April 11, 2021 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert the cnv consensus output table to a segfile #441

Convert the cnv consensus output table to a segfile #441

jashapiro commented Jan 15, 2020 •

edited

Loading

jaclyn-taroni commented Jan 16, 2020

jaclyn-taroni commented Jan 17, 2020

jaclyn-taroni left a comment

jaclyn-taroni Jan 17, 2020

jaclyn-taroni Jan 17, 2020

jashapiro Jan 17, 2020

jaclyn-taroni Jan 17, 2020 •

edited

Loading

jaclyn-taroni left a comment

jaclyn-taroni Jan 17, 2020

		@@ -0,0 +1,7070 @@
		ID chrom loc.start loc.end num.mark seg.mean copy.num
		BS_007JTNB8 chr11 771036 866778 NA 0.214821 3

	When a consensus CNV contains from multiple source CNV segments, we take the mean of the CNVkit `seg.mean` values from the source segments, weighted by segment lenth.
	When a consensus CNV contains from multiple source CNV segments, we take the mean of the CNVkit `seg.mean` values from the source segments, weighted by segment length.

Convert the cnv consensus output table to a segfile #441

Convert the cnv consensus output table to a segfile #441

Conversation

jashapiro commented Jan 15, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented Jan 16, 2020

jaclyn-taroni commented Jan 17, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Jan 17, 2020

Choose a reason for hiding this comment

jaclyn-taroni Jan 17, 2020

Choose a reason for hiding this comment

jashapiro Jan 17, 2020

Choose a reason for hiding this comment

jaclyn-taroni Jan 17, 2020 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Jan 17, 2020

Choose a reason for hiding this comment

jashapiro commented Jan 15, 2020 •

edited

Loading

jaclyn-taroni Jan 17, 2020 •

edited

Loading