Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocessing for tombo #429

Open
priyanagpal25 opened this issue Apr 12, 2023 · 2 comments
Open

preprocessing for tombo #429

priyanagpal25 opened this issue Apr 12, 2023 · 2 comments

Comments

@priyanagpal25
Copy link

Hi,
I am learning to use Tombo for my analysis. As suggested in documentation, I followed the following stpes.

  1. converted multi read fast5 files to single read fast5 files.
  2. preprocessed these single read fast5 files.

following commands were used.

multi_to_single_fast5 -i /home/akhilesh/Desktop/priya/Tombo/barcode01_fast5/ -s /home/akhilesh/Desktop/priya/Tombo/single_read/barcode01 -t 10
| 12 of 12|##################################################|100% Time: 0:00:07

$ tombo preprocess annotate_raw_with_fastqs --fast5-basedir /home/akhilesh/Desktop/priya/Tombo/single_read/barcode01/ --fastq-filenames /home/akhilesh/Desktop/priya/Tombo/barcode01_combined.fastq --overwrite
[20:07:26] Preparing reads and extracting read identifiers.
100%|███████████████████████████████████| 42313/42313 [00:18<00:00, 2257.30it/s]
[20:07:45] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [00:01, ?it/s]
[20:07:46] Added sequences to a total of 0 reads.

I am having a trouble in preprocessing, as mentioned above.

Thanks
Priya

@bayraktar1
Copy link

Hi,

Try fixing your sequencing_summary.txt after converting from multi to single fast5 with this code:

with open(sequencing_summary.txt) as file, open(sequencing_summary_fix.txt 'w') as outfile:
    header = next(file)
    outfile.write(header)
    for line in file:
        line = line.split()
        line[0] = f"{line[1]}.fast5"
        line.append('\n')
        outfile.write("\t".join(line))

@keenhl
Copy link

keenhl commented Nov 10, 2023

Thanks for providing the code. I modified it as below and the sequencing summary file seems to be correct now, but I'm still getting an error.

[14:54:32] Getting read filenames.
[14:54:32] Parsing sequencing summary files.
******************** WARNING ********************
Some FASTQ records from sequencing summaries do not appear to have a matching file.
[14:54:35] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [00:00, ?it/s]
[14:54:35] Added sequences to a total of 0 reads.

with open("sequencing_summary.txt") as file, open("sequencing_summary_fix.txt", 'w') as outfile:
    header = next(file)
    outfile.write(header)
    for line in file:
        line = line.split()
        line[1] = f"{line[3]}.fast5"
        line[0] = string.replace(".gz", "")
        line.append('\n')
        outfile.write("\t".join(line))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants