Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trust-smartseq.pl Output #258

Closed
jakob-arnold opened this issue Apr 5, 2024 · 11 comments
Closed

trust-smartseq.pl Output #258

jakob-arnold opened this issue Apr 5, 2024 · 11 comments

Comments

@jakob-arnold
Copy link

jakob-arnold commented Apr 5, 2024

Hi,

I ran the trust 4 smartseq wrapper to analyze my FLASH-seq sequencing data (that should work pretty much the same I think).

perl /home/jakob/miniconda3/bin/trust-smartseq.pl -1 $R1 -2 $R2 -t 32 -f $BCRTCR --ref $IMGT -o "$OUT/VDJ/TRUST"

The code works and I get a proper TRUST_report.tsv and TRUST_annot.fa output. However, the script also produces a TRUST_airr.tsv output file, which is empty. Did something go wrong there? I'm not sure, but I think I need the airr format for downstream analysis of TCR clones with scRepertoire in R for example?

Some additional information: Each cell has its own FASTQ files, whose absolute file paths are provided in .txt files ($R1 and $R2).

@mourisl
Copy link
Collaborator

mourisl commented Apr 5, 2024

Can you show me a few lines of the report file and annot file? Do you see any error messages in the running log?

@jakob-arnold
Copy link
Author

jakob-arnold commented Apr 6, 2024

Of course, thank you for your reply!

TRUST_annot.fa
image

TRUST_report.tsv (I imported the tsv in excel, that's why the frequency values are wrong)
image

I just checked, from what I saw there appeared to be no error messages. Small excerpt:
[Sun Mar 31 13:16:46 2024] Found 1175 reads. [Sun Mar 31 13:16:46 2024] Finish sorting the reads. [Sun Mar 31 13:16:46 2024] Finish rough annotations. [Sun Mar 31 13:16:46 2024] Assembled 506 reads. [Sun Mar 31 13:16:46 2024] Try to rescue 26 reads for assembly. [Sun Mar 31 13:16:46 2024] Rescued 4 reads. [Sun Mar 31 13:16:46 2024] SYSTEM CALL: /home/jakob/miniconda3/bin/annotator -f /home/jakob/genome_human/human_IMGT+C.fa -a tmp_smartseq_final.out -t 28 -o tmp_smartseq -r tmp_smartseq_assembled_reads.fa --airrAlignment > tmp_smartseq_annot.fa [Sun Mar 31 13:16:46 2024] Start to annotate assemblies. [Sun Mar 31 13:16:46 2024] Start to realign reads for CDR3 analysis. [Sun Mar 31 13:16:46 2024] Compute CDR3 abundance. [Sun Mar 31 13:16:46 2024] Finish annotation. [Sun Mar 31 13:16:46 2024] SYSTEM CALL: perl /home/jakob/miniconda3/bin/trust-simplerep.pl tmp_smartseq_cdr3.out > tmp_smartseq_report.tsv [Sun Mar 31 13:16:46 2024] SYSTEM CALL: perl /home/jakob/miniconda3/bin/trust-airr.pl tmp_smartseq_report.tsv tmp_smartseq_annot.fa --airr-align tmp_smartseq_airr_align.tsv > tmp_smartseq_airr.tsv [Sun Mar 31 13:16:46 2024] TRUST4 finishes. [Sun Mar 31 13:16:47 2024] TRUST4 begins. [Sun Mar 31 13:16:47 2024] SYSTEM CALL: /home/jakob/miniconda3/bin/fastq-extractor -t 28 -f /home/jakob/genome_human/hg38_bcrtcr.fa -o tmp_smartseq_toassemble -1 /media/jakob/Elements/P2704_REBECCA/20240308112451965-75105/original/GFB-22183_HW3NFDRX3_1/raw_data/GFB-22183_HW3NFDRX3_1_SS2023-02E56C21_S35_L001_R1_001.fastq.gz -2 /media/jakob/Elements/P2704_REBECCA/20240308112451965-75105/original/GFB-22183_HW3NFDRX3_1/raw_data/GFB-22183_HW3NFDRX3_1_SS2023-02E56C21_S35_L001_R2_001.fastq.gz [Sun Mar 31 13:16:47 2024] Start to extract candidate reads from read files. [Sun Mar 31 13:16:47 2024] Finish extracting reads. [Sun Mar 31 13:16:47 2024] SYSTEM CALL: /home/jakob/miniconda3/bin/trust4 -t 28 --skipMateExtension -f /home/jakob/genome_human/hg38_bcrtcr.fa -o tmp_smartseq -1 tmp_smartseq_toassemble_1.fq -2 tmp_smartseq_toassemble_2.fq

@mourisl
Copy link
Collaborator

mourisl commented Apr 6, 2024

Thanks for sharing the files with me. The log and screenshot look quite normal to me. Can you directly run TRUST4 on one of the cell as a standard bulk RNA-seq or use the commands from the log to see whether its airr file is empty or not? Thank you.

@mourisl
Copy link
Collaborator

mourisl commented Apr 7, 2024

I think I've found the reason and updated the github repo to fix this bug. Could you please clone the github repo and give it a try?

@jakob-arnold
Copy link
Author

Thank you so much for your efforts. I will try it tomorrow!

@jakob-arnold
Copy link
Author

So I ran the updated version and now the airr.tsv looks good! Thanks a lot.

Just a quick follow-up question for downstream analysis in R: Which output file would be suitable for the scRepertoire package? In their vignette it says:

"We are beginning to support contig sequences from other pipelines, the first being VDJ reconstruction from TRUST4. We have added combineTRUST4() that will parse the contig output from TRUST4 into a format usable by the rest of the scRepertoire package."

However, the trust-smartseq command does not produce a contig file, does it?

@mourisl
Copy link
Collaborator

mourisl commented Apr 8, 2024

The contig sequence definition in scRepertoire might be different. It could be just the report file from TRUST4. I think scRepertoire may not support the analysis of smart-seq data. I just added the "cell_id" information (previously was empty) to the github repo for the smart-seq output. In this way, I think you can directly use the AIRR tsv file for scRepertoire.

@jakob-arnold
Copy link
Author

Perfect, thanks so much!

@jakob-arnold
Copy link
Author

jakob-arnold commented Apr 11, 2024

Hi,

I hate to reopen this issue, but I think there may now be another bug in the AIRR output file:

I realize that you added "cell_id" information. However, when I run the script, only the first two lines have entries in that column:
image

I'm not sure if that is related, but I also noticed that only the first 2 entries in the "sequence_id" have a suffix with the pattern "...assemble#...":
image

That was not the case, in the last version of the script (when "cell_id" was still empty):
image

Do you think that is fixable?

Thank you so much and kind regards from Germany!

@mourisl
Copy link
Collaborator

mourisl commented Apr 11, 2024

My bad. It should be fixed now in the github repo. Could you please give it a try? Sorry for the trouble.

@jakob-arnold
Copy link
Author

I ran the script for a few minutes, looks like it's fixed. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants