Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not repeat a CpG extraction with the same reference file and its index #148

Open
sttongjai opened this issue Jul 8, 2023 · 2 comments

Comments

@sttongjai
Copy link

Hello,

I was using MethylDackel via Conda to extract CpG from a BAM file created by bwa-meth. After my first attempt of extraction CpG which was successful, my second attempt, aiming at repeating the previous extraction, was not.

I received an error message telling that the program could neither load an index file of a reference genome (hg38 with no alternative chromosomes) nor build one for it. It also suggested that the reference genome file could be corrupted. However, the index file from the previous run was in the same directory as the reference genome.

Could you please let me know what's a solution to this problem.

Thank you very much,

Siripong

P.S. Please see a list of programs and their versions installed in my conda environment below.

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h7b6447c_0
c-ares 1.19.0 h5eee18b_0
ca-certificates 2023.05.30 h06a4308_0
htslib 1.12 h9093b5e_1 bioconda
krb5 1.19.4 h568e23c_0
libcurl 7.88.1 h91b91d3_0
libdeflate 1.17 h5eee18b_0
libedit 3.1.20221030 h5eee18b_0
libev 4.33 h7f8727e_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libnghttp2 1.52.0 ha637b67_1
libssh2 1.10.0 h37d81fd_2
libstdcxx-ng 11.2.0 h1234567_1
methyldackel 0.6.1 h22771d5_0 bioconda
ncurses 6.4 h6a678d5_0
openssl 1.1.1u h7f8727e_0
xz 5.2.10 h5eee18b_1
zlib 1.2.13 h5eee18b_0

@akshaydinesh26
Copy link

I am also facting the same issue. When the program was run separately for each file it was working but not using bash script

@dpryan79
Copy link
Owner

One "fun" issue is that if there is no index for the genome (a .fai file) then running this or any similar tool in parallel will result in each process recreating that file at the same time and overwriting each-others results. My guess is that something like that happened in both of these cases. There's really nothing that can be done within MethylDackel itself to guard against that, "just" ensure that the fasta file is indexed with samtools faidx before running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants