Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBC1, PBC2 from pipeline? #97

Closed
nrnatesh opened this issue Mar 23, 2020 · 5 comments
Closed

PBC1, PBC2 from pipeline? #97

nrnatesh opened this issue Mar 23, 2020 · 5 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@nrnatesh
Copy link

Hello,

First of all, thank you so much for collating this pipeline together. It has been a tremendous help in my research and in actually receiving interpretable results from alignment and QC of sequencing data.

ENCODE has specific guidelines for how QC and ATAC-seq data should be processed. One of the necessary metrics is PCR bottlenecking coefficient 1 and 2 (PBC1/PBC2). I've been looking through the output files from the nfcore-atacseq run pipeline in ataqv and multiQC, and can't seem to find anything on this. Is this something we'll have to calculate using the merged bam files? Thanks for the help!

@drpatelh
Copy link
Member

Hi @nrnatesh! You're very welcome :)

You should be able to get an idea of the library complexity from the Preseq reports generated by the pipeline. Can you send me a link to where the ENCODE pipeline implements this and Ill have a look.

The PBC metric should be reported by the nf-core/chipseq pipeline because it has an additional step of running phantompeakqualtools:
https://github.com/nf-core/chipseq/blob/21be3149542cdc84431e12d1e092359058aed32a/main.nf#L976

@drpatelh drpatelh added enhancement New feature or request question Further information is requested labels Mar 23, 2020
@nrnatesh
Copy link
Author

https://github.com/kundajelab/ataqc/blob/59d6121c3ff85a6d04ff81c2e52923c1837dbec1/run_ataqc.py

The PBC calculations are in the function "run_preseq" function. I ran nf-core/atacseq so I'm not sure if it calculated it. I also can't seem to find preseq reports in my results dir. I only see the complexity curve for preseq_plot_1.pdf but no specific value for PBC1/2. I appreciate the help.

@drpatelh
Copy link
Member

drpatelh commented Mar 23, 2020

Having had a quick glance it looks like they are using the standard error from the command to get these metrics:
https://github.com/kundajelab/ataqc/blob/59d6121c3ff85a6d04ff81c2e52923c1837dbec1/run_ataqc.py#L326-L351

Im still not entirely sure how though. Note that even if the data is paired-end the pipeline isnt using the -P flag because Preseq breaks on the test-dataset I use for the CI testing because its too small. Maybe that flag can be added back in before the next release in favour of using --skip_preseq if this happens.

If you can come up with a robust solution to extract these metrics from the .command.err file generated by Nextflow in the work/ directory for any given Preseq process then we could think about adding it into the pipeline. It would have to work with --single_end data too.

@drpatelh
Copy link
Member

@nrnatesh Ive added the -pe parameter to the Preseq process by default now and I am also copying out the standard error into the results directory hoping that can be useful to calculate PBC1/PBC2. If you manage to figure out how to calculate this from this file then we can think about getting it into MultQC using a custom content file.

3d8e305

@drpatelh
Copy link
Member

drpatelh commented Jul 1, 2020

Hi @nrnatesh not sure whether you managed to figure this out? The log files containing these metrics should now be written to the results directory. Closing for now but please feel free to re-open if you find a way we can formally report this in MultiQC somehow. Thanks!

@drpatelh drpatelh closed this as completed Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants