Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step 6 tmp/phase_output/phase_bam/.bam no found error #23

Closed
aragornwubo opened this issue Jun 4, 2021 · 10 comments
Closed

Step 6 tmp/phase_output/phase_bam/.bam no found error #23

aragornwubo opened this issue Jun 4, 2021 · 10 comments

Comments

@aragornwubo
Copy link

Hi,

Thank you for developing the Clair3.

I have met the same unexpected error either running Clair3 using the singularity image or the one installed via the conda method.
The command I used:

            run_clair3.sh \
            --bam_fn=${NANO_BAM} \
            --ref_fn=${REF} \
            --threads=32 \
            --platform="ont" \
            --model_path="./models/ont" \
            --output=${BASE}/CLAIR3_CONDA \
            --sample_name='HG002' \
            --chunk_size=10000000 \
            --include_all_ctgs

The error information of the following pattern occurred multiple times at the end the running:

            [INFO] 6/7 Calling variants using Full Alignment
            [ERROR] file /scratch1/bwu4/NEW_XIAO/CLAIR3_CONDA/tmp/phase_output/phase_bam/.bam not found
            parallel: This job failed:
            python3 /home/bwu4/bin/Clair3/scripts/../clair3.py CallVarBam     --chkpnt_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/./models/ont/full_alignment     --bam_fn /scratch1/bwu4/N                                           EW_XIAO/CLAIR3_CONDA/tmp/phase_output/phase_bam/''.bam     --call_fn /scratch1/bwu4/NEW_XIAO/CLAIR3_CONDA/tmp/full_alignment_output/full_alignment_''.vcf     --sampleName HG002                                                --vcf_fn EMPTY     --ref_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/HG002_SHA_RAG.fasta     --full_aln_regions ''     --ctgName ''     --add_indel_length     --phasing_info_                                           in_bam     --gvcf False     --python python3     --pypy pypy3     --samtools samtools     --platform ont
            
            real    0m0.464s
            user    0m0.499s
            sys     0m0.252s
            cat: '/scratch1/bwu4/NEW_XIAO/CLAIR3_CONDA/tmp/full_alignment_output/full_alignment_*.vcf': No such file or directory
            [ERROR] No vcf file found, please check the setting

The following are the chromosomal names in my reference fasta file:

      >chr1_RagTag
      >chr10_RagTag
      >chr11_RagTag
      >chr12_RagTag
      >chr13_RagTag
      >chr14_RagTag
      >chr15_RagTag
      >chr16_RagTag
      >chr17_RagTag
      >chr18_RagTag
      >chr19_RagTag
      >chr2_RagTag
      >chr20_RagTag
      >chr21_RagTag
      >chr22_RagTag
      >chr3_RagTag
      >chr4_RagTag
      >chr5_RagTag
      >chr6_RagTag
      >chr7_RagTag
      >chr8_RagTag
      >chr9_RagTag
      >chrX_RagTag
      >chrY_RagTag

The tagged bams have been successfully generated for all 24 chromosomes. Could you help me figure out what the problem is? Thank you very much.

Best,
Bo

@aragornwubo
Copy link
Author

I have checked with Huangneng and this problem seems to be the same as #20.

@zhengzhenxian
Copy link
Collaborator

We reopened the issues because it might have a different cause from #20. There seems to be one empty contig name that caused an invalid bam filename in the command. For us to pinpoint the problem, could you send us the fasta index .fai file and the running log ${OUTPUT_DIR}/run_clair3.log to my email address zxzheng@cs.hku.hk. Much appreciated.

@aragornwubo
Copy link
Author

aragornwubo commented Jun 5, 2021

Thank you for replying. I noticed that there was an error in Step 1 the same as #20 in the log file after posting this problem. I also checked that the vcf files from pileup were missing for most chromosomes, that's why I closed this issue. I'm trying running with 8 threads now. I have sent the two files to your email and please check them.
To be mentioned, there is a small error in the 'run_clair3.sh' at line 215 "if [[ ${THREADS} > ${MAX_THREADS} ]]", which will set THREADS to the MAX_THREADS when I use the '--threads=8' option. I think "if [[ ${THREADS} -gt ${MAX_THREADS} ]]" should be the right version.

@aragornwubo
Copy link
Author

The program seemed to run successfully with 8 threads. However, a new error was detected during the process:

parallel: This job failed:
python3 /home/bwu4/bin/Clair3/scripts/../clair3.py CallVarBam --chkpnt_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/./models/ont/full_alignment --bam_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/CLAIR3_CONDA/tmp/phase_output/phase_bam/chr15_RagTag.bam --call_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/CLAIR3_CONDA/tmp/full_alignment_output/full_alignment_chr15_RagTag.26_64.vcf --sampleName HG002 --vcf_fn EMPTY --ref_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/HG002_SHA_RAG.fasta --full_aln_regions /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/CLAIR3_CONDA/tmp/full_alignment_output/candidate_bed/chr15_RagTag.26_64 --ctgName chr15_RagTag --add_indel_length --phasing_info_in_bam --gvcf False --python python3 --pypy pypy3 --samtools samtools --platform ont

Will this error have an effect on the final output? Thank you.

@aquaskyline
Copy link
Member

aquaskyline commented Jun 6, 2021

Hi, if you rerun this failed command individually, would it run successfully?

@aragornwubo
Copy link
Author

It runs successfully.

@aquaskyline
Copy link
Member

Many thanks for the feedback. It looks like we have exceeded a system resources limitation. We are looking into the problem. What's the printout of ulimit -a in your running environment?

@aragornwubo
Copy link
Author

[bwu@node0183 chr2_RagTag_clair3_filt]$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1540728
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 387973120
open files (-n) 16384
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1540728
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

@aquaskyline
Copy link
Member

aquaskyline commented Jun 9, 2021

The reason why some jobs failed is that Clair3 was requesting more processes than the user environment allows ulimit -u. We have added more running environment checks and automatic retries in v0.1-r3.

Clair3 uses Tensorflow and pypy. These libraries open quite a few threads in each running instance. The THREADS parameter controls how many Clair3 instances can run concurrently, but each instance, as we've summarized, consumes up to 40-50 processes at peak. The number of processes a user could create is limited to a number that could be checked using ulimit -a. In an Ubuntu system, the limitation is usually over 10k (unless otherwise reduced), thus not a problem. But in RedHat or CentOS, which is commonly used in grids and institutions, the limitation is usually at 1024 or 2048, thus setting the THREADS to a number above 20 would reach the limit at some points. Setting ulimit -u to a higher number can solve the problem, but that requires the root privilege (or a blessing from the system admin team).

In v0.1-r3, we check ulimit -u and lower the THREADS accordingly. We also added automatic retries on failed jobs before handing them to users.

@aragornwubo
Copy link
Author

Thank you very much. Since the problem is solved, I'm going to close this issue and try running with the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants