never ending run #27

romaingroux · 2022-11-08T07:28:10Z

I recently tried to run ccsmeth and hit an unexpected issue. The run never ends. I run ccsmeth from a docker image on an entire human genome (the T2T v1.0 release). I have a 42X SequelII coverage dataset as input.

Here is the command:

ccsmeth call_mods --tseed 20221101 --input all.ccs.chm13v1_0.bam --ref chm13.draft_v1.0.fasta --model_file /ccsmeth/models/model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v1.ckpt --output all.ccs.chm13v1_0.CpG-5mC.call_mods --threads 30 --threads_call 3 --model_type attbigru2s --rm_per_readsite --mode align

The genome file is of regular size, 2.9GB. The bam files is 353GB big, it corresponds to a coverage of ~40X. It's sorted, indexed, normally nothing wrong on this end.

What happens:

The program starts. I see 30 processes spawning as expected. I also see the creation of a all.ccs.chm13v1_0.CpG-5mC.call_mods.per_readsite.tsv file. There is active writing on this file, it's getting bigger and bigger. The memory consumption steadily grows until reaching huge amounts of memory. Then there is a reallocation process going on. It results in a sudden decrease of the memory used by the process and the docker containers enters a "brain death" state. There is hardly anything happening in there and it just will never resume nor finish (I waited until one week).

What I expect:

To get an error message or to complete the run instead of this "death" state.

I have tried to sub-sample heavily from the BAM file to obtain a final 1X coverage in order to run a test. In this case, the run finishes and I get a modbam file. Is this simply a matter of input size? Should we run ccsmeth on separated chromosomes or avoid big datasets?

Thank you in advance

The text was updated successfully, but these errors were encountered:

PengNi · 2022-11-08T07:58:03Z

Hi @romaingroux , thank you very much for using ccsmeth and for your suggestions on fixing this issue. I didn't get this issue before. I will try to replicate this issue and try to fix it. Also, may you show me the log of this run so I can check where the run got stuck?

In the meantime, I guess it may be related to the RAM size. What is the RAM size and how many CPU cores of your machine - maybe less threads like --threads 20 will work? Also, I'd suggest to run ccsmeth with GPU if GPU is available. Using only CPU may be 100 times slower (Maybe there will be a faster lightweight model of ccsmeth in the future though). So if GPU is not available, primrose of PacBio seems a better option, cause it is super fast.

Best,
Peng

romaingroux · 2022-11-09T07:45:05Z

Thank you for this very rapid answer.

Regarding the machine specs, I actually run jobs in pods on a kubernetes cluster. The pod was allocated 30 CPUs and 260GB of RAM. Max RAM allocation I can do is 386GB. So, as you proposed, I guess I'll try to increase the RAM allocation and decrease the number of "threads". I guess the parallelization is based on python processes. Am I right?

For the log, I don't have any. I'll try to produce one.

Finally, it's a bit unfortunate but I don't have access to a GPU machine. But thanks for pointing this out anyway.

PengNi added the bug Something isn't working label Nov 15, 2022

PengNi added bug Something isn't working and removed bug Something isn't working labels Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

never ending run #27

never ending run #27

romaingroux commented Nov 8, 2022

PengNi commented Nov 8, 2022

romaingroux commented Nov 9, 2022

never ending run #27

never ending run #27

Comments

romaingroux commented Nov 8, 2022

What happens:

What I expect:

PengNi commented Nov 8, 2022

romaingroux commented Nov 9, 2022