Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methylation pipeline #3700

Open
Adrian-Zet opened this issue Mar 2, 2023 · 2 comments
Open

Methylation pipeline #3700

Adrian-Zet opened this issue Mar 2, 2023 · 2 comments

Comments

@Adrian-Zet
Copy link

Version info

  • bcbio version 1.2.9:
  • OS Ubuntu LTS 20.04:

To Reproduce
Exact bcbio command you have used:

bcbio_nextgen.py ../config/CTRL-MDD-S-MDD.yaml -n 72

Your yaml configuration file:

A few observations here:
The YAML file is really long since this is trying to analyze a study with 182 samples.
I thus excluded most rows repeating the same information for all samples. (separated with "..................")

details:
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_087
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190430_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190430_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: male
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_086
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190431_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190431_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: male
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_089
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190432_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190432_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: male
.............................................................................
(excluded most samples from this point to the end for simplicity)
..............................................................................

- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_082
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190791_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190791_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: female
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_085
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190792_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190792_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: female
fc_name: ctrl-vs-mdd-vs-mdds-mRNA-Methylation
resources:
  bismark:
    bismark_threads: 16
    bowtie_threads: 2
  trim_galore:
    options:
    - --clip_r1 4
    - --clip_r2 4
    - --three_prime_clip_r1 4
    - --three_prime_clip_r2 4
upload:
  dir: ../final

Log files (could be found in work/log)
Please attach (10MB max):
The debug.log is huge (250Mb) due to the size of the workflow. If required I can either compress it or I can run a workflow with just one sample instead and attach that debug-log.

bcbio-nextgen.log
bcbio-nextgen-commands.log

Expected behavior:

  • I expected multiple Bismark instances to be launched using 50-100Gb of memory each.

Resulting behavior:

  • Only one core is being used with ~20Gb of memory, it also seems to be slowly pacing only one sample at a time.
  • It's been stuck for days at the "Writing cytosine report for chromosome ..." stage.
@naumenko-sa
Copy link
Contributor

Hi @Adrian-Zet !
The methylation pipeline does not support parallelization with ipython.
Please run one bcbio project per sample, or per small group of samples.
SN

@naumenko-sa
Copy link
Contributor

bismark parallelization is tricky, see the table at the bottom here:
https://bcbio-nextgen.readthedocs.io/en/latest/contents/methylation.html
the running times might be differ from 2 hours to 3 days + depending on the settings.

I am not surprised what -n 72 is not working,
I'd start with safer settings: -n 8, bismark/bowtie threads 4/2, 50G RAM for starters for one sample and go from there,
maybe increase to -n16/ b/b: 8/2 100G RAM if that works for you.

SN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants