Only 1 CPU was used in minigraph alignment #1254

JiantaoZhao426 · 2023-12-22T02:39:34Z

JiantaoZhao426
Dec 22, 2023

Hi,

I recently tried the minigraph-cactus for about 50 pepper genomes. In order to reduce the file input, I split the genomes into chromosomes and run mc pipeline for each chromosome repestively. However, I found that at the minigraph alignment step, only 1 CPU was used, even though the command line (minigraph -c -xggs -t 104 sample01.fa sample02.fa sample03.fa ... ) takes all the CPUs available (~100) and the memory usage was only about ~4%. Is there any explanations or how to run this in a faster way in terms using the majority of CPUs?

Thanks

glennhickey · 2023-12-22T13:25:22Z

glennhickey
Dec 22, 2023
Maintainer

I think most of the parallelism in minigraph -xggs comes from processing multiple chromosomes in parallel, so what you're seeing doesn't sound too surprising. For further details on that, you'll have to ask on the minigraph github.

Minigraph-Cactus splits a lot of the work by chromosome automatically. So if running time is a concern, you are probably best to just run it on your entire input set at once.

0 replies

JiantaoZhao426 · 2024-01-02T16:11:44Z

JiantaoZhao426
Jan 2, 2024
Author

Dear Glenn, Thanks for your previous reply. For the minigraphcactus pipeline, I met a memory issue at the cactus-graphmap-join step, which showed that: toil.batchSystems.abstractBatchSystem.InsufficientSystemResources: The job 'clip_vg' kind-clip_vg/instance-5m37i5ua v1 is requesting 597433447860 bytes of memory, more than the maximum of 300000000000 bytes of memory that SingleMachineBatchSystem was configured with, or enforced by --maxMemory. Scale is set to 1.0. My command is: cactus-pangenome \ ./js.Chr09 \ ./Chr09.evolver.txt \ --mapCores 90 \ --permissiveContigFilter 0.1 \ --maxLen 10000 --clip 10000 \ --outDir Chr09.out \ --outName Chr09 \ --reference Zhangshugang \ --vcf --giraffe --gfa --gbz --odgi Since the pipeline took me about 10 days to finally reach this step, I do not want to re-run all the previous steps again. I simply re-run the cactus-graphmap-join with the followsing command: cactus-graphmap-join js.Chr09 --vg chrom-alignments/Zhangshugang_Chr09.vg --hal chrom-alignments/Zhangshugang_Chr09.hal \ --outDir Chr09.out \ --outName Chr09 \ --maxMemory 300G --defaultCores 10 --maxCores 10 --scale 0.2 \ --reference Zhangshugang \ --clip 10000 --filter 2 \ --gbz \ --odgi \ --vcf \ --giraffe However, the same memory issue is still there. Do you have any suggestions to solve this issue? All the best, Jiantao

…

--------------------------------------------------------- Postdoc, Boyce Thompson Institute Cornell University 533 Tower Road, Ithaca, NY, 14850 Email: ***@***.***

________________________________ From: Glenn Hickey ***@***.***> Sent: Friday, December 22, 2023 8:25 AM To: ComparativeGenomicsToolkit/cactus ***@***.***> Cc: Jiantao Zhao ***@***.***>; Author ***@***.***> Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254) I think most of the parallelism in minigraph -xggs comes from processing multiple chromosomes in parallel, so what you're seeing doesn't sound too surprising. For further details on that, you'll have to ask on the minigraph github. Minigraph-Cactus splits a lot of the work by chromosome automatically. So if running time is a concern, you are probably best to just run it on your entire input set at once. — Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRL52VGLIM2TKGKDG4TYKWC43AVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSMRXG43DG>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

glennhickey · 2024-01-02T19:09:41Z

glennhickey
Jan 2, 2024
Maintainer

This is annoying. We need better overrides, where --maxMemory clamps the estimated limits of every single job. Toil doesn't support this. It would be nice to add this support to Cactus, but would take a fair bit of work (this is a similar issue to #1187).

For this job in particular, it's asking for 20X the input size. So I guess Zhangshugang_Chr09.vg is about 30GB? That's pretty big and you may need a larger machine to index it. But it's also possible 300GB is okay, and Cactus is being too conservative. Out of curiousity, how long is your chromosome?

What I will do:

(short term) Add some kind of option (may re-use --indexMemory) to limit memory requested by this job
(long term) Hopefully get some better global memory controls in Cactus

What you can do as a work-around now:

Patch your cactus as follows. Replace ./venv-cactus-v2.7.0 below with where you installed it

sed -i $(find ./venv-cactus-v2.7.0 -name "cactus_graphmap_join.py") -e 's/ \* 20//g' -e 's/ \* 22//g' -e 's/ \* 32//g' -e 's/ \* 64//g'

and then rerun cactus-graphmap-join from the beginning. This removes all the big coefficients used for resource estimation.

0 replies

JiantaoZhao426 · 2024-01-02T19:57:50Z

JiantaoZhao426
Jan 2, 2024
Author

Dear Glenn, Thanks for your timely reply. In our lab, we have several independent servers with about 512 Gb memory and 112 CPUs. I am testing the MC on various pepper accessions, where the chromosome length ranges from 170 Mb to 450 Mb, with a total of about 50 genomes. I tried to add the --maxMemory 300Gb, where the memory issue is still there, seems not a good solution. For your last suggestion to modify the "cactus_graphmap_join.py", it seems that the memory issue was solved and the program is running (still waiting to see if there are any additional issues). By the way, after modifying the "cactus_graphmap_join.py", is there any other potential issues of the other steps? Best, Jiantao

…

--------------------------------------------------------- Postdoc, Boyce Thompson Institute Cornell University 533 Tower Road, Ithaca, NY, 14850 Email: ***@***.***

________________________________ From: Glenn Hickey ***@***.***> Sent: Tuesday, January 2, 2024 2:09 PM To: ComparativeGenomicsToolkit/cactus ***@***.***> Cc: Jiantao Zhao ***@***.***>; Author ***@***.***> Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254) This is annoying. We need better overrides, where --maxMemory clamps the estimated limits of every single job. Toil doesn't support this. It would be nice to add this support to Cactus, but would take a fair bit of work (this is a similar issue to #1187<#1187>). For this job in particular, it's asking for 20X the input size. So I guess Zhangshugang_Chr09.vg is about 30GB? That's pretty big and you may need a larger machine to index it. But it's also possible 300GB is okay, and Cactus is being too conservative. Out of curiousity, how long is your chromosome? What I will do: * (short term) Add some kind of option (may re-use --indexMemory) to limit memory requested by this job * (long term) Hopefully get some better global memory controls in Cactus What you can do as a work-around now: Patch your cactus as follows. Replace ./venv-cactus-v2.7.0 below with where you installed it sed -i $(find ./venv-cactus-v2.7.0 -name "cactus_graphmap_join.py") -e 's/ \* 20//g' -e 's/ \* 22//g' -e 's/ \* 32//g' -e 's/ \* 64//g' and then rerun cactus-graphmap-join from the beginning. This removes all the big coefficients used for resource estimation. — Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRKUJL54PWMLYNNQDALYMRLQBAVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSOJWGE4TI>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

JiantaoZhao426 · 2024-01-03T13:06:16Z

JiantaoZhao426
Jan 3, 2024
Author

Hi Glenn, Following up the same issue, the cactus-graphmap-join script met another error: [2024-01-03T02:41:48-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-add4f6afcb83498595aaa57656a02cd7/Zhangshugang_Chr05.vg' to path '/data/zhaojiantao/pepper/pangenome/minigraphcactus/Chr05.out/af8d658b050a5a69b45c906a5db6e41c/943f/3a11/tmpwr52z1zt/Zhangshugang_Chr05.vg' Traceback (most recent call last): File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/worker.py", line 403, in workerScript job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer) File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/job.py", line 2774, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/job.py", line 2691, in _run return self.run(fileStore) File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/toil/job.py", line 2919, in run rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs) File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/cactus/refmap/cactus_graphmap_join.py", line 634, in clip_vg cactus_call(parameters=cmd, outfile=clipped_path, job_memory=job.memory) File "/data/zhaojiantao/tools/cactus/cactus/lib/python3.9/site-packages/cactus/shared/common.py", line 891, in cactus_call raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out)) RuntimeError: Command ['docker', 'run', '--interactive', '--net=host', '--log-driver=none', '-u', '0:0', '-v', '/data/zhaojiantao/pepper/pangenome/minigraphcactus/Chr05.out/af8d658b050a5a69b45c906a5db6e41c/943f/3a11/tmpwr52z1zt:/data', '--entrypoint', '/bin/bash', '--name', 'b1bb404e-c5c5-48b9-b47c-8a0480609c95', '--rm', 'quay.io/comparative-genomics-toolkit/cactus:v2.7.0', '-c', 'set -eo pipefail && clip-vg Zhangshugang_Chr05.vg.gfaffixed -f -e Zhangshugang -d _MINIGRAPH_ -L | vg clip -d 1 - -P Zhangshugang | vg ids -s -'] exited 1: stderr= libgomp: Thread creation failed: Operation not permitted error[VPKG::load_one]: Correct input type not found in standard input while loading handlegraph::MutablePathMutableHandleGraph [2024-01-03T02:41:48-0500] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host apple My command is: cactus-graphmap-join js.Chr05 \ --vg chrom-alignments/Zhangshugang_Chr05.vg \ --hal chrom-alignments/Zhangshugang_Chr05.hal \ --outDir Chr05.out \ --outName Chr05 \ --reference Zhangshugang \ --clip 10000 --filter 2 \ --gbz --odgi --vcf --giraffe \ --workDir . Do you have any ideas why this happened? Any suggestions on how to solve this issue? Thanks, Jiantao

…

--------------------------------------------------------- Postdoc, Boyce Thompson Institute Cornell University 533 Tower Road, Ithaca, NY, 14850 Email: ***@***.***

________________________________ From: Jiantao Zhao ***@***.***> Sent: Tuesday, January 2, 2024 2:57 PM To: ComparativeGenomicsToolkit/cactus ***@***.***> Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254) Dear Glenn, Thanks for your timely reply. In our lab, we have several independent servers with about 512 Gb memory and 112 CPUs. I am testing the MC on various pepper accessions, where the chromosome length ranges from 170 Mb to 450 Mb, with a total of about 50 genomes. I tried to add the --maxMemory 300Gb, where the memory issue is still there, seems not a good solution. For your last suggestion to modify the "cactus_graphmap_join.py", it seems that the memory issue was solved and the program is running (still waiting to see if there are any additional issues). By the way, after modifying the "cactus_graphmap_join.py", is there any other potential issues of the other steps? Best, Jiantao

--------------------------------------------------------- Postdoc, Boyce Thompson Institute Cornell University 533 Tower Road, Ithaca, NY, 14850 Email: ***@***.***

________________________________ From: Glenn Hickey ***@***.***> Sent: Tuesday, January 2, 2024 2:09 PM To: ComparativeGenomicsToolkit/cactus ***@***.***> Cc: Jiantao Zhao ***@***.***>; Author ***@***.***> Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254) This is annoying. We need better overrides, where --maxMemory clamps the estimated limits of every single job. Toil doesn't support this. It would be nice to add this support to Cactus, but would take a fair bit of work (this is a similar issue to #1187<#1187>). For this job in particular, it's asking for 20X the input size. So I guess Zhangshugang_Chr09.vg is about 30GB? That's pretty big and you may need a larger machine to index it. But it's also possible 300GB is okay, and Cactus is being too conservative. Out of curiousity, how long is your chromosome? What I will do: * (short term) Add some kind of option (may re-use --indexMemory) to limit memory requested by this job * (long term) Hopefully get some better global memory controls in Cactus What you can do as a work-around now: Patch your cactus as follows. Replace ./venv-cactus-v2.7.0 below with where you installed it sed -i $(find ./venv-cactus-v2.7.0 -name "cactus_graphmap_join.py") -e 's/ \* 20//g' -e 's/ \* 22//g' -e 's/ \* 32//g' -e 's/ \* 64//g' and then rerun cactus-graphmap-join from the beginning. This removes all the big coefficients used for resource estimation. — Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRKUJL54PWMLYNNQDALYMRLQBAVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSOJWGE4TI>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

glennhickey · 2024-01-03T20:29:18Z

glennhickey
Jan 3, 2024
Maintainer

Hard to say. I think this could happen if there's no path beginning with Zhangshugang in your input graph. (you can verify this with vg paths)

0 replies

JiantaoZhao426 · 2024-01-03T20:40:37Z

JiantaoZhao426
Jan 3, 2024
Author

Dear Glenn, I also tried the yeast example data from the github and the pipeline could run properly. I am wondering if it is possible to split the chrom-alignments/*vg chrom-alignments/*hal into a relatively smaller dataset and then run cactus-graphmap-join for each split? Thanks, Jiantao

…

--------------------------------------------------------- Postdoc, Boyce Thompson Institute Cornell University 533 Tower Road, Ithaca, NY, 14850 Email: ***@***.***

________________________________ From: Glenn Hickey ***@***.***> Sent: Wednesday, January 3, 2024 3:29 PM To: ComparativeGenomicsToolkit/cactus ***@***.***> Cc: Jiantao Zhao ***@***.***>; Author ***@***.***> Subject: Re: [ComparativeGenomicsToolkit/cactus] Only 1 CPU was used in minigraph alignment (Discussion #1254) Hard to say. I think this could happen if there's no path beginning with Zhangshugang in your input graph. (you can verify this with vg paths) — Reply to this email directly, view it on GitHub<#1254 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5KZGRIQSTU4WOM6EWPOOJDYMW5STAVCNFSM6AAAAABA7HTQZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DAMBWGY4DQ>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComparativeGenomicsToolkit

Only 1 CPU was used in minigraph alignment #1254

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ComparativeGenomicsToolkit

Only 1 CPU was used in minigraph alignment #1254

JiantaoZhao426 Dec 22, 2023

Replies: 7 comments

glennhickey Dec 22, 2023 Maintainer

JiantaoZhao426 Jan 2, 2024 Author

glennhickey Jan 2, 2024 Maintainer

JiantaoZhao426 Jan 2, 2024 Author

JiantaoZhao426 Jan 3, 2024 Author

glennhickey Jan 3, 2024 Maintainer

JiantaoZhao426 Jan 3, 2024 Author

JiantaoZhao426
Dec 22, 2023

glennhickey
Dec 22, 2023
Maintainer

JiantaoZhao426
Jan 2, 2024
Author

glennhickey
Jan 2, 2024
Maintainer

JiantaoZhao426
Jan 2, 2024
Author

JiantaoZhao426
Jan 3, 2024
Author

glennhickey
Jan 3, 2024
Maintainer

JiantaoZhao426
Jan 3, 2024
Author