Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the calculation of sval in encode_task_macs2_signal_track_chip.py take into account the difference between single-end and paired-end data? #298

Open
ezioljj opened this issue Sep 1, 2023 · 2 comments

Comments

@ezioljj
Copy link

ezioljj commented Sep 1, 2023

I am in the process of using your innovative scripts for ChIP-seq data processing (many datasets containing both single-end and paired-end). While going through the pipeline, I noticed that there doesn't appear to be a clear distinction made between single-end ChIP-seq data and paired-end ChIP-seq data during sval calculation (encode_task_macs2_signal_track_chip.py, line 145). In my understanding, the tagAlign files generated by this pipeline might differ between single-end and paired-end data.

For single-end tagAlign files, the read count should be equal to the total number of lines in the file, while for paired-end tagAlign files, the read count should be half of the total number of lines in the file. However, I'm not entirely certain if my interpretation is correct. I would greatly appreciate your assistance in clarifying this matter. Thank you once again for creating such valuable tools.

@ezioljj ezioljj changed the title Does the calculation of sval in encode_task_macs2_signal_track_chip.py take into account the difference between single-ended and double-ended data? Does the calculation of sval in encode_task_macs2_signal_track_chip.py take into account the difference between single-ended and paired-ended data? Sep 1, 2023
@ezioljj ezioljj changed the title Does the calculation of sval in encode_task_macs2_signal_track_chip.py take into account the difference between single-ended and paired-ended data? Does the calculation of sval in encode_task_macs2_signal_track_chip.py take into account the difference between single-end and paired-end data? Sep 1, 2023
@akundaje
Copy link
Collaborator

akundaje commented Sep 4, 2023 via email

@ezioljj
Copy link
Author

ezioljj commented Sep 4, 2023

Thank you for your explanation. I have two additional questions I would like to ask:

  1. All paired-end (PE) and single-end (SE) data will be treated as single-end (SE). What is the reason for this? Is it because you intend to streamline the analysis pipeline for both SE and PE data in a larger project that contains both SE and PE datasets? I also assume that during certain processing steps, such as removing reads in a blacklist or retaining reads within mappable regions, some reads may be discarded or retained without maintaining their paired relationships (maybe this situation occurs rarely, so we just do not consider this as a problem?)

  2. In your opinion, what would be a better or more suitable approach for calling peaks with MACS2: using fragment or reads? In my understanding, using reads can get two peaks (need merge) and using fragments only get one peak for each binding position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants