Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The output csv file is empty #12

Closed
weir12 opened this issue Jun 3, 2020 · 9 comments
Closed

The output csv file is empty #12

weir12 opened this issue Jun 3, 2020 · 9 comments

Comments

@weir12
Copy link

weir12 commented Jun 3, 2020

Hi:
@adnaniazi
Thank you for your contribution to this project.I trying to apply tailfindr to my RNA data.
However,I got an abnormal result.Each column is empty in output csv file except file_path

read_id,tail_start,tail_end,samples_per_nt,tail_length,file_path
,,,,,/home/weir/tair_rawdata/fixed_rawdata/basecalled/col0/run1/workspace/0/GXB01159_20180404_FAH71487_GA30000_mux_scan_20180404_RDS03_YF3_57044_read_10_ch_171_strand.fast5,,,,,/home/weir/tair_rawdata/fixed_rawdata/basecalled/col0/run1/workspace/0/GXB01159_20180404_FAH71487_GA30000_mux_scan_20180404_RDS03_YF3_57044_read_10_ch_168_strand.fast5

one of Input fast5 files maybe help you found the reason for problem.
GXB01159_20180404_FAH71487_GA30000_mux_scan_20180404_RDS03_YF3_57044_read_10_ch_171_strand.zip
here is my find_tails with parameters

df <- find_tails(fast5_dir = paste(basedir,sample,paste('run',batch,sep=''),'workspace',sep='/'),
                 save_dir = save_dir,
                 csv_filename = 'rna_tails.csv',
                 num_cores = 16,
                 save_plots = TRUE,
                 basecall_group ='Basecall_1D_001',
                 plot_debug_traces = TRUE,
                 plotting_library = 'rbokeh')

And here is specified parameter of guppy during basecalling.

Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 3.5.2
+5b7a51b, client-server API version 1.1.0

--flowcell FLO-MIN106 --kit SQK-RNA001 --recursive \
--num_callers 8 --cpu_threads_per_caller 2 --records_per_fastq 0 --compress_fastq --fast5_out --qscore_filtering --min_qscore 7

Next is the log file of tailfindr

── Started tailfindr (version 0.1.0) ───────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────
☰ You have configured tailfindr as following:
❯ fast5_dir:         /home/weir/tair_rawdata/fixed_rawdata/basecalled/col0/run1/worksp
ace
❯ save_dir:          /home/weir/output/polya/tailfinder_res/col0/1
❯ csv_filename:      rna_tails.csv
❯ num_cores:         16
❯ basecall_group:    Basecall_1D_001
❯ save_plots:        TRUE
❯ plot_debug_traces: TRUE
❯ plotting_library:  rbokeh
── Processing started at 2020-06-03 00:51:09 ───────────────────────────────────
● Creating a sub-directory to save the plots in.
  Done! All plots will be saved in the following direcotry:
  /home/weir/output/polya/tailfinder_res/col0/1/plots
● Searching for all Fast5 files...
  Done! Found 988103 Fast5 files.
● Analyzing a single Fast5 file to assess if your data 
  is in an acceptable format...
  ✔ The data has been basecalled using Guppy.
  ✔ Flipflop model was used during basecalling.
  ✔ Every read is in a single fast5 file of its own.
  ✔ The experiment type is RNA, so we will search
    for poly(A) tails.
  ✔ The reads are 1D reads.
● Starting a parallel compute cluster...
  Done!
● Searching for Poly(A) tails...
(omitted processing)
● Formatting the tail data...
  Done!
● Saving the data in the CSV file...
  Done! Below is the path of the CSV file:
  /home/weir/output/polya/tailfinder_res/col0/1/rna_tails.csv
● A logfile containing all this information has been saved in this path: 
  /home/weir/output/polya/tailfinder_res/col0/1/2020-06-03_00-50-35_tailfinder.log
── Processing ended at 2020-06-03 06:09:24 ─────────────────────────────────────
✔ tailfindr finished successfully!

Uh...It doesn't seem to found a problem in log file.
I noticed that turrn-off of enabling_trimming is required in DNA samples.
Perhaps RNA samples have similar situation.
I would really appreciate it if you could help me.
Thank you

@adnaniazi
Copy link
Owner

Hi,

Thank you for reporting the issue in detail. Can you please provide me 5 of your fast5 files. I tried to debug it using the one fast5 file that you provided, but it is not enough for me to debug the issue.

Thank you.

Adnan

@weir12
Copy link
Author

weir12 commented Jun 3, 2020

THANK YOU!!!
fast5_files.zip
If you need any other data.Do not hesitate to contact me :)

@adnaniazi
Copy link
Owner

Great. Thank you!

I have now fixed the issue. Please uninstall tailfindr, and then install it again from the GitHub repo. Hopefully, it would work this time. If there is any problem again, please feel free to report it.

One more thing: Only generate the plots for a subset of your reads, and not the whole dataset -- unless you absolutely need to. This is because generating the plots takes a lot of time. But it is your choice.

Wish you all the best!

Adnan

@weir12
Copy link
Author

weir12 commented Jun 4, 2020

Sorry for the late reply,Because I've been waiting for the process to finish and to evaluate result.
This seems like an CPU intensive task which need More threads and time.
But there's no problem.I can afford to be patient.

Thank you for your wise advice & your timely assistance.

@jon-xu
Copy link

jon-xu commented Mar 31, 2022

Hi Adnaniazi,

I got the same issue as weir12 with the newest version.
Could you please give some advise?

Please download the fast5 file example I used as input:
https://cloudstor.aarnet.edu.au/plus/s/WlxHDt1lVmsRO64

Thanks,
Jon

@adnaniazi
Copy link
Owner

Hi Jon,

Your FAST5 seems to be okay. Can you please try running tailfindr on the data that comes with tailfindr to see if it works. Use this path for fast5_dir in your tailfindr command:
fast5_dir = system.file('extdata', 'rna', package = 'tailfindr')
See if the CSV file is empty for the internal dataset as well.

Adnan

@jon-xu
Copy link

jon-xu commented Mar 31, 2022

Hi Adnan,

Just tried and with the built-in data it did output some results (attached).

Thanks,
Jon
example.csv

@adnaniazi
Copy link
Owner

This means that you have not installed the VBZ plugin or have not configured it properly.

Please download and extract the VBZ plugin for your OS from this link:
https://github.com/nanoporetech/vbz_compression/releases

Then extract it somewhere and make it discoverable by HDF5 libarary by exporting the path like this (edit the path according to your extracted vbz folder):
export HDF5_PLUGIN_PATH=/bla/bla/bla_path/vbz/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin

That's it. Tailfindr should now work on your data.

@jon-xu
Copy link

jon-xu commented Mar 31, 2022

I see, many thanks Adnan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants