Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use TRUST4 to deal with different length of reads? #39

Closed
hz1010 opened this issue May 17, 2021 · 10 comments
Closed

Can I use TRUST4 to deal with different length of reads? #39

hz1010 opened this issue May 17, 2021 · 10 comments

Comments

@hz1010
Copy link

hz1010 commented May 17, 2021

Hi!
TRUST4 is a great job. Now I have a my.fastq file in whose reads are in different length (eg. first read is 1000bp, second read is 1001bp). I wonder if I can use directly use TRUST4, or I need to trim those reads into same length?

Thanks

@mourisl
Copy link
Collaborator

mourisl commented May 17, 2021

Yes, TRUST4 can work with such fastq files. TRUST4 also works for the fastq files where read lengths are varied within one file.

@hz1010
Copy link
Author

hz1010 commented May 17, 2021

It's very nice of you. I still have a question, my fastq files are from Three-generation sequencing (by Nanopore). Do TRUST4 still work?

@mourisl
Copy link
Collaborator

mourisl commented May 17, 2021

I have tested with PacBio HiFi data and it works fine. For your Nanopore data, you need to correct the sequencing errors first.

@hz1010
Copy link
Author

hz1010 commented May 17, 2021

Thank you for your suggestion! But I have encountered an error when running TRUST4: failed: 256 at ./run-trust4 line 47. I don't know how to solve it and what it means.

@mourisl
Copy link
Collaborator

mourisl commented May 17, 2021

Can you show me the running log? What is the length of the longest read in your file?

@hz1010
Copy link
Author

hz1010 commented May 17, 2021

The length of the longest read in my file is 57798 bp. Running log:[Mon May 17 22:16:05 2021] TRUST4 begins.
[Mon May 17 22:16:05 2021] SYSTEM CALL: /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/fastq-extractor -t 1 -f /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/human_IMGT+C.fa -o test1_toassemble -u /xtdisk/jiangl_group/liyun/data/RNA-test/VDJ-ONT-10Xsp-sm_20210322/ONT/fastq_pass/PAG50586_pass_barcode05_3c2e0e42_0.fastq
[Mon May 17 22:16:05 2021] Start to extract candidate reads from read files.
[Mon May 17 22:16:12 2021] Finish extracting reads.
[Mon May 17 22:16:13 2021] SYSTEM CALL: /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/trust4 -f /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/human_IMGT+C.fa -o test1 -u test1_toassemble.fq
[Mon May 17 22:16:13 2021] SYSTEM CALL: /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/annotator -f /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/human_IMGT+C.fa -a test1_final.out -t 1 -o test1 -r test1_assembled_reads.fa > test1_annot.fa
Need to use -a to specify the assembly file.
system /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/annotator -f /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/human_IMGT+C.fa -a test1_final.out -t 1 -o test1 -r test1_assembled_reads.fa > test1_annot.fa failed: 256 at /xtdisk/jiangl_group/huangzh/software/TRUST4-1/TRUST4/run-trust4 line 47.

@mourisl
Copy link
Collaborator

mourisl commented May 17, 2021

TRUST4 can handle the read length up to 100K. It seems in your data, TRUST4 could not find any reads from the VDJ region, hence the file test1_final.out was empty, and the downstream annotation method failed. Just want to make sure, is your data corrected? With the error rate of raw Nanopore data, I don't think TRUST4 can identify the hit on V, J, C genes.

The read length also looks much longer than the gene sequence, is it from other genes or DNA-based sequencing?

@hz1010
Copy link
Author

hz1010 commented May 18, 2021

Thanks a lot for your help. It seems something wrong with my sequencing data,I will check my files again.

@hz1010
Copy link
Author

hz1010 commented May 20, 2021

Hello!
In my file, if VDJ regions repeated within one read, can TRUST4 still find them out? Or do I need to extract repeated regions out first before running TRUST4.
Thanks.

@mourisl
Copy link
Collaborator

mourisl commented May 20, 2021

If they are identical repeats, I think TRUST4 will pick the one that showed up first. If not, it may use V, J, C gene coordinates from different repeat structures, hence unable to identify CDR3. To be safe, I think you shall process the data first.

@mourisl mourisl closed this as completed Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants