Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running call_mods #8

Closed
Naish-M opened this issue Jun 18, 2021 · 11 comments
Closed

Error when running call_mods #8

Naish-M opened this issue Jun 18, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@Naish-M
Copy link

Naish-M commented Jun 18, 2021

Hi

I run the following command but numpy seemingly throws an error and all the reads then fail.
Any idea what may be causing this?

Thanks,
Matt

CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path 30_90_single_fast5/95/ --model_path model.dp2.CHH.arabnrice2-1_R9.4plus_tem.bn13_sn16.denoise_signal_bilstm.both_bilstm.b13_s16_epoch7.ckpt --result_file fast5s.CHH.call_mods.tsv --corrected_group RawGenomeCorrected_000 --reference_path t2t-col.20210610.fasta --motifs CHH --nproc 8 --nproc_gpu 2

====================================================
[main]call_mods starts..
4000 fast5 files in total..
parse the motifs string..
read genome reference file..
read position file if it is not None..
read_fast5 process-3095 starts
read_fast5 process-3093 starts
read_fast5 process-3092 starts
read_fast5 process-3096 starts
read_fast5 process-3094 starts
read_fast5 process-3091 starts
call_mods process-3098 starts
write_process-3099 starts
read_fast5 process-3090 starts
call_mods process-3097 starts
/home/nm359/anaconda3/envs/deepsignalpenv_cuda11_2/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/nm359/anaconda3/envs/deepsignalpenv_cuda11_2/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/nm359/anaconda3/envs/deepsignalpenv_cuda11_2/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/nm359/anaconda3/envs/deepsignalpenv_cuda11_2/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
read_fast5 process-3094 ending, proceed 420 fast5s
read_fast5 process-3092 ending, proceed 720 fast5s
read_fast5 process-3091 ending, proceed 320 fast5s
read_fast5 process-3095 ending, proceed 800 fast5s
read_fast5 process-3090 ending, proceed 180 fast5s
read_fast5 process-3093 ending, proceed 880 fast5s
read_fast5 process-3096 ending, proceed 680 fast5s
call_mods process-3098 ending, proceed 0 batches
call_mods process-3097 ending, proceed 0 batches
write_process-3099 finished
4000 of 4000 fast5 files failed..
[main]call_mods costs 14.04 seconds..

@PengNi
Copy link
Owner

PengNi commented Jun 19, 2021

Hi @Naish-M , thanks for your interest of our tool. Did you run guppy_basecaller and tombo resquiggle before running deepsignal_plant call_mods? I think that maybe an unsuccessful tombo run rather than numpy causes the problem here.

Best,
Peng

@Naish-M
Copy link
Author

Naish-M commented Jun 21, 2021

Hi @PengNi , thanks for the reply - yes I did the previous steps before the call_mods step and the CG and CHG all completed successfully. While running the CHH the process was killed (I think it exceeded the RAM limit ) and I received the error when I tried the restart the call_mods - maybe this did something to the fast5s? but it seems to have affected all files involved as I tried on a random selection of the fast5s and all gave the same result? - I will try running the resquiggle to see if this resolves it.

Also do you have any benchmarks on the speed of the processing on GPU vs CPU or the RAM requirements per CPU/GPU requested?

Thanks,
Matt

@PengNi
Copy link
Owner

PengNi commented Jun 21, 2021

@Naish-M , we haven't completed a benchmark of running time and memory usage yet. In our analysis, for a A. thaliana run, at least 70GB RAM is required.
If the issue is caused by CPU/GPU memory limit, I suggest trying to use a smaller --batch_size and --f5_batch_size (e.g., --batch_size 128 --f5_batch_size 4).

Best,
Peng

@Naish-M
Copy link
Author

Naish-M commented Jun 22, 2021

Hi @PengNi,

I re-ran resquiggle on the fast5 (this completed fine) but this did not resolve the error and I still get the:

/home/nm359/anaconda3/envs/deepsignalpenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/nm359/anaconda3/envs/deepsignalpenv/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars

errors and all the fast5s then fail. I have tried recreating a new environment but this still doesn't resolve the issue.

Best,
Matt

@PengNi
Copy link
Owner

PengNi commented Jun 22, 2021

@Naish-M , did you try a smaller --batch_size and --f5_batch_size? These two args can reduce certain memory usage of deepsignal-plant. And, can you tell me the size of RAM and GPU memory of your machine?

Best,
Peng

@Naish-M
Copy link
Author

Naish-M commented Jun 22, 2021

@PengNi - Yes I did try that, and I can see both RAM and GPU memory wasn't exceeded while it was running:- RAM is 32Gb and GPU RAM 10Gb

But the error persists and all the fast5s still fail.

Best, Matt

@PengNi
Copy link
Owner

PengNi commented Jun 22, 2021

@Naish-M , can you share one or two fast5s, so that I can do some tests?

Best,
Peng

@Naish-M
Copy link
Author

Naish-M commented Jun 22, 2021

@PengNi
Copy link
Owner

PengNi commented Jun 23, 2021

Hi @Naish-M , this seems a VBZ compression issue (refer to tombo#254). In all 5 fast5s I tested, some data groups ([Events], [Signal]) is compressed, thus can't be read by h5py, which is used in both tombo and deepsignal-plant.

So here is what I do (refer to tombo#254):


  1. install hdf5-tools
sudo apt-get install hdf5-tools
  1. download ont-vbz-hdf-plugin-1.0.0-Linux-x86_64.tar.gz and set HDF5_PLUGIN_PATH
wget https://github.com/nanoporetech/vbz_compression/releases/download/v1.0.1/ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz
tar zxvf ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz
export HDF5_PLUGIN_PATH=/abslolute/path/to/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin

NOTE: For most users, the above 2 steps (maybe only step2) are enough for running tombo and deepsignal-plant. If it doesn't work, the fast5 files may need to be repacked as the following steps (step3 and step4).


  1. unzip all fast5s in a directory
find 96 -name "*.fast5" | xargs -P 10 -I % h5repack -v -f GZIP=1 % %.unzip.fast5
  1. if the unzipping succeed (without warning), mv the new fast5s into a new directory
mkdir 96_new
mv 96/*.unzip.fast5 96_new

The new fast5s can be successfully processed by tombo and deepsignal-plant.

Also, in my test, before unzipping , tombo resquiggle does not work either. But there are already re-squiggle groups ([RawGenomeCorrected_000]) in the fast5s. In your test, did you process the fast5s with any vbz-related action after tombo resquiggle?

Hope that can help!

Best,
Peng

@Naish-M
Copy link
Author

Naish-M commented Jun 23, 2021

Hi @PengNi,

Thanks for having a look, I have already have incorporated the fix for the vbz compression by adding plugin path 'export HDF5_PLUGIN_PATH=/abslolute/path/to/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin' step which was necessary before using tombo but after that plugin was added the resquiggle process and the CG and CHG call_mods both worked successfully without error - I have also resquiggled all the fast5s this week (with no error) to see if that resolved the problem but it didn't. After the tombo resquiggle step I moved directly onto the call_mods as per the Quickstart guide.

I will try the unzipping to see if that solves the issue.

Best,
Matt

@PengNi PengNi added the enhancement New feature or request label Jul 10, 2021
@PengNi
Copy link
Owner

PengNi commented Jul 10, 2021

#vbz_compression issue5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants