preprocessing for tombo #429

priyanagpal25 · 2023-04-12T10:16:34Z

Hi,
I am learning to use Tombo for my analysis. As suggested in documentation, I followed the following stpes.

converted multi read fast5 files to single read fast5 files.
preprocessed these single read fast5 files.

following commands were used.

multi_to_single_fast5 -i /home/akhilesh/Desktop/priya/Tombo/barcode01_fast5/ -s /home/akhilesh/Desktop/priya/Tombo/single_read/barcode01 -t 10
| 12 of 12|##################################################|100% Time: 0:00:07

$ tombo preprocess annotate_raw_with_fastqs --fast5-basedir /home/akhilesh/Desktop/priya/Tombo/single_read/barcode01/ --fastq-filenames /home/akhilesh/Desktop/priya/Tombo/barcode01_combined.fastq --overwrite
[20:07:26] Preparing reads and extracting read identifiers.
100%|███████████████████████████████████| 42313/42313 [00:18<00:00, 2257.30it/s]
[20:07:45] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [00:01, ?it/s]
[20:07:46] Added sequences to a total of 0 reads.

I am having a trouble in preprocessing, as mentioned above.

Thanks
Priya

bayraktar1 · 2023-05-19T11:02:44Z

Hi,

Try fixing your sequencing_summary.txt after converting from multi to single fast5 with this code:

with open(sequencing_summary.txt) as file, open(sequencing_summary_fix.txt 'w') as outfile:
    header = next(file)
    outfile.write(header)
    for line in file:
        line = line.split()
        line[0] = f"{line[1]}.fast5"
        line.append('\n')
        outfile.write("\t".join(line))

keenhl · 2023-11-10T20:58:40Z

Thanks for providing the code. I modified it as below and the sequencing summary file seems to be correct now, but I'm still getting an error.

[14:54:32] Getting read filenames.
[14:54:32] Parsing sequencing summary files.
******************** WARNING ********************
Some FASTQ records from sequencing summaries do not appear to have a matching file.
[14:54:35] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [00:00, ?it/s]
[14:54:35] Added sequences to a total of 0 reads.

with open("sequencing_summary.txt") as file, open("sequencing_summary_fix.txt", 'w') as outfile:
    header = next(file)
    outfile.write(header)
    for line in file:
        line = line.split()
        line[1] = f"{line[3]}.fast5"
        line[0] = string.replace(".gz", "")
        line.append('\n')
        outfile.write("\t".join(line))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocessing for tombo #429

preprocessing for tombo #429

priyanagpal25 commented Apr 12, 2023

bayraktar1 commented May 19, 2023

keenhl commented Nov 10, 2023

preprocessing for tombo #429

preprocessing for tombo #429

Comments

priyanagpal25 commented Apr 12, 2023

bayraktar1 commented May 19, 2023

keenhl commented Nov 10, 2023