Only 1 CPU was used in minigraph alignment #1254
Replies: 7 comments
-
I think most of the parallelism in Minigraph-Cactus splits a lot of the work by chromosome automatically. So if running time is a concern, you are probably best to just run it on your entire input set at once. |
Beta Was this translation helpful? Give feedback.
-
Dear Glenn,
Thanks for your previous reply. For the minigraphcactus pipeline, I met a memory issue at the cactus-graphmap-join step, which showed that:
toil.batchSystems.abstractBatchSystem.InsufficientSystemResources: The job 'clip_vg' kind-clip_vg/instance-5m37i5ua v1 is requesting 597433447860 bytes of memory, more than the maximum of 300000000000 bytes of memory that SingleMachineBatchSystem was configured with, or enforced by --maxMemory. Scale is set to 1.0.
My command is:
cactus-pangenome \
./js.Chr09 \
./Chr09.evolver.txt \
--mapCores 90 \
--permissiveContigFilter 0.1 \
--maxLen 10000 --clip 10000 \
--outDir Chr09.out \
--outName Chr09 \
--reference Zhangshugang \
--vcf --giraffe --gfa --gbz --odgi
Since the pipeline took me about 10 days to finally reach this step, I do not want to re-run all the previous steps again. I simply re-run the cactus-graphmap-join with the followsing command:
cactus-graphmap-join js.Chr09 --vg chrom-alignments/Zhangshugang_Chr09.vg --hal chrom-alignments/Zhangshugang_Chr09.hal \
--outDir Chr09.out \
--outName Chr09 \
--maxMemory 300G --defaultCores 10 --maxCores 10 --scale 0.2 \
--reference Zhangshugang \
--clip 10000 --filter 2 \
--gbz \
--odgi \
--vcf \
--giraffe
However, the same memory issue is still there. Do you have any suggestions to solve this issue?
All the best,
Jiantao
…---------------------------------------------------------
Postdoc, Boyce Thompson Institute
Cornell University
533 Tower Road, Ithaca, NY, 14850
Email: ***@***.***
________________________________
From: Glenn Hickey ***@***.***>
Sent: Friday, December 22, 2023 8:25 AM
To: ComparativeGenomicsToolkit/cactus ***@***.***>
Cc: Jiantao Zhao ***@***.***>; Author ***@***.***>
Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254)
I think most of the parallelism in minigraph -xggs comes from processing multiple chromosomes in parallel, so what you're seeing doesn't sound too surprising. For further details on that, you'll have to ask on the minigraph github.
Minigraph-Cactus splits a lot of the work by chromosome automatically. So if running time is a concern, you are probably best to just run it on your entire input set at once.
—
Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRL52VGLIM2TKGKDG4TYKWC43AVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSMRXG43DG>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This is annoying. We need better overrides, where For this job in particular, it's asking for 20X the input size. So I guess What I will do:
What you can do as a work-around now: Patch your cactus as follows. Replace
and then rerun |
Beta Was this translation helpful? Give feedback.
-
Dear Glenn,
Thanks for your timely reply. In our lab, we have several independent servers with about 512 Gb memory and 112 CPUs. I am testing the MC on various pepper accessions, where the chromosome length ranges from 170 Mb to 450 Mb, with a total of about 50 genomes.
I tried to add the --maxMemory 300Gb, where the memory issue is still there, seems not a good solution.
For your last suggestion to modify the "cactus_graphmap_join.py", it seems that the memory issue was solved and the program is running (still waiting to see if there are any additional issues).
By the way, after modifying the "cactus_graphmap_join.py", is there any other potential issues of the other steps?
Best,
Jiantao
…---------------------------------------------------------
Postdoc, Boyce Thompson Institute
Cornell University
533 Tower Road, Ithaca, NY, 14850
Email: ***@***.***
________________________________
From: Glenn Hickey ***@***.***>
Sent: Tuesday, January 2, 2024 2:09 PM
To: ComparativeGenomicsToolkit/cactus ***@***.***>
Cc: Jiantao Zhao ***@***.***>; Author ***@***.***>
Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254)
This is annoying. We need better overrides, where --maxMemory clamps the estimated limits of every single job. Toil doesn't support this. It would be nice to add this support to Cactus, but would take a fair bit of work (this is a similar issue to #1187<#1187>).
For this job in particular, it's asking for 20X the input size. So I guess Zhangshugang_Chr09.vg is about 30GB? That's pretty big and you may need a larger machine to index it. But it's also possible 300GB is okay, and Cactus is being too conservative. Out of curiousity, how long is your chromosome?
What I will do:
* (short term) Add some kind of option (may re-use --indexMemory) to limit memory requested by this job
* (long term) Hopefully get some better global memory controls in Cactus
What you can do as a work-around now:
Patch your cactus as follows. Replace ./venv-cactus-v2.7.0 below with where you installed it
sed -i $(find ./venv-cactus-v2.7.0 -name "cactus_graphmap_join.py") -e 's/ \* 20//g' -e 's/ \* 22//g' -e 's/ \* 32//g' -e 's/ \* 64//g'
and then rerun cactus-graphmap-join from the beginning. This removes all the big coefficients used for resource estimation.
—
Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRKUJL54PWMLYNNQDALYMRLQBAVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSOJWGE4TI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Glenn,
Following up the same issue, the cactus-graphmap-join script met another error:
[2024-01-03T02:41:48-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-add4f6afcb83498595aaa57656a02cd7/Zhangshugang_Chr05.vg' to path '/data/zhaojiantao/pepper/pangenome/minigraphcactus/Chr05.out/af8d658b050a5a69b45c906a5db6e41c/943f/3a11/tmpwr52z1zt/Zhangshugang_Chr05.vg'
Traceback (most recent call last):
File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/worker.py", line 403, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/job.py", line 2774, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/job.py", line 2691, in _run
return self.run(fileStore)
File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/job.py", line 2919, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/cactus/refmap/cactus_graphmap_join.py", line 634, in clip_vg
cactus_call(parameters=cmd, outfile=clipped_path, job_memory=job.memory)
File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/cactus/shared/common.py", line 891, in cactus_call
raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
RuntimeError: Command ['docker', 'run', '--interactive', '--net=host', '--log-driver=none', '-u', '0:0', '-v', '/data/zhaojiantao/pepper/pangenome/minigraphcactus/Chr05.out/af8d658b050a5a69b45c906a5db6e41c/943f/3a11/tmpwr52z1zt:/data', '--entrypoint', '/bin/bash', '--name', 'b1bb404e-c5c5-48b9-b47c-8a0480609c95', '--rm', 'quay.io/comparative-genomics-toolkit/cactus:v2.7.0', '-c', 'set -eo pipefail && clip-vg Zhangshugang_Chr05.vg.gfaffixed -f -e Zhangshugang -d _MINIGRAPH_ -L | vg clip -d 1 - -P Zhangshugang | vg ids -s -'] exited 1: stderr=
libgomp: Thread creation failed: Operation not permitted
error[VPKG::load_one]: Correct input type not found in standard input while loading handlegraph::MutablePathMutableHandleGraph
[2024-01-03T02:41:48-0500] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host apple
My command is:
cactus-graphmap-join js.Chr05 \
--vg chrom-alignments/Zhangshugang_Chr05.vg \
--hal chrom-alignments/Zhangshugang_Chr05.hal \
--outDir Chr05.out \
--outName Chr05 \
--reference Zhangshugang \
--clip 10000 --filter 2 \
--gbz --odgi --vcf --giraffe \
--workDir .
Do you have any ideas why this happened? Any suggestions on how to solve this issue?
Thanks,
Jiantao
…---------------------------------------------------------
Postdoc, Boyce Thompson Institute
Cornell University
533 Tower Road, Ithaca, NY, 14850
Email: ***@***.***
________________________________
From: Jiantao Zhao ***@***.***>
Sent: Tuesday, January 2, 2024 2:57 PM
To: ComparativeGenomicsToolkit/cactus ***@***.***>
Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254)
Dear Glenn,
Thanks for your timely reply. In our lab, we have several independent servers with about 512 Gb memory and 112 CPUs. I am testing the MC on various pepper accessions, where the chromosome length ranges from 170 Mb to 450 Mb, with a total of about 50 genomes.
I tried to add the --maxMemory 300Gb, where the memory issue is still there, seems not a good solution.
For your last suggestion to modify the "cactus_graphmap_join.py", it seems that the memory issue was solved and the program is running (still waiting to see if there are any additional issues).
By the way, after modifying the "cactus_graphmap_join.py", is there any other potential issues of the other steps?
Best,
Jiantao
---------------------------------------------------------
Postdoc, Boyce Thompson Institute
Cornell University
533 Tower Road, Ithaca, NY, 14850
Email: ***@***.***
________________________________
From: Glenn Hickey ***@***.***>
Sent: Tuesday, January 2, 2024 2:09 PM
To: ComparativeGenomicsToolkit/cactus ***@***.***>
Cc: Jiantao Zhao ***@***.***>; Author ***@***.***>
Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254)
This is annoying. We need better overrides, where --maxMemory clamps the estimated limits of every single job. Toil doesn't support this. It would be nice to add this support to Cactus, but would take a fair bit of work (this is a similar issue to #1187<#1187>).
For this job in particular, it's asking for 20X the input size. So I guess Zhangshugang_Chr09.vg is about 30GB? That's pretty big and you may need a larger machine to index it. But it's also possible 300GB is okay, and Cactus is being too conservative. Out of curiousity, how long is your chromosome?
What I will do:
* (short term) Add some kind of option (may re-use --indexMemory) to limit memory requested by this job
* (long term) Hopefully get some better global memory controls in Cactus
What you can do as a work-around now:
Patch your cactus as follows. Replace ./venv-cactus-v2.7.0 below with where you installed it
sed -i $(find ./venv-cactus-v2.7.0 -name "cactus_graphmap_join.py") -e 's/ \* 20//g' -e 's/ \* 22//g' -e 's/ \* 32//g' -e 's/ \* 64//g'
and then rerun cactus-graphmap-join from the beginning. This removes all the big coefficients used for resource estimation.
—
Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRKUJL54PWMLYNNQDALYMRLQBAVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSOJWGE4TI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hard to say. I think this could happen if there's no path beginning with |
Beta Was this translation helpful? Give feedback.
-
Dear Glenn,
I also tried the yeast example data from the github and the pipeline could run properly. I am wondering if it is possible to split the chrom-alignments/*vg chrom-alignments/*hal into a relatively smaller dataset and then run cactus-graphmap-join for each split?
Thanks,
Jiantao
…---------------------------------------------------------
Postdoc, Boyce Thompson Institute
Cornell University
533 Tower Road, Ithaca, NY, 14850
Email: ***@***.***
________________________________
From: Glenn Hickey ***@***.***>
Sent: Wednesday, January 3, 2024 3:29 PM
To: ComparativeGenomicsToolkit/cactus ***@***.***>
Cc: Jiantao Zhao ***@***.***>; Author ***@***.***>
Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254)
Hard to say. I think this could happen if there's no path beginning with Zhangshugang in your input graph. (you can verify this with vg paths)
—
Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRIQSTU4WOM6EWPOOJDYMW5STAVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DAMBWGY4DQ>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I recently tried the minigraph-cactus for about 50 pepper genomes. In order to reduce the file input, I split the genomes into chromosomes and run mc pipeline for each chromosome repestively. However, I found that at the minigraph alignment step, only 1 CPU was used, even though the command line (minigraph -c -xggs -t 104 sample01.fa sample02.fa sample03.fa ... ) takes all the CPUs available (~100) and the memory usage was only about ~4%. Is there any explanations or how to run this in a faster way in terms using the majority of CPUs?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions