Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The identifier following the barcode #330

Open
origami974 opened this issue Nov 16, 2024 · 1 comment
Open

The identifier following the barcode #330

origami974 opened this issue Nov 16, 2024 · 1 comment

Comments

@origami974
Copy link

Hello Dr. Li,
I have been processing TCR data in single cells recently. I observed from the TRUST4 output file that the numbers after the barcode are not as ordered as 0123.
CATCGAAGTTAGGGTG_0 0 TRBV5-101 TRBD101 TRBJ2-201 TRBC2 TCTGGGCATAGGAGT TACTTCAGTGAGACACAG TGCGCCAGCAGCCCCCAACAGGGCCCCGGGGAGCTGTTTTTT 1.00 696.00 100.00 1
CATCGAAGTTAGGGTG_2 0 TRAV21
01 * TRAJ1801 TRAC GATAGCGCTATTTACAAC ATTCAGTCAAGTCAGAGAGAG TGTGCTGTGAGGCCCTAAGAGGCTTCCGACAGAGGCTCAACCCTGGGGAGGCTATACTTT 1.00 57.00 100.00 1
CATCGAAGTTAGGGTG_14 0 TRBV9
01 TRBD202 TRBJ2-301 TRBC2 * * TGTGCCAGCAGCGTAGAGGCAGGGCCACAGAATACGCAGTATTTT 1.00 2.00 94.87 0
CATCGAAGTTAGGGTG_16 0 TRBV5-101 TRBD101 TRBJ2-2*01 TRBC2 * * TGCGCCAGCAGCCCCCAACAGGGCCCCGGGGAGCTGTTTTT 1.00 2.00 100.00 0
I think it is also a type of identifier. I am curious about the generation rules of these identifiers, or do they have any representative meanings? My file generation command is
run-trust4 -1 /data/zhanqh/data_all/GSA/HRR/read1_clustered.fq -2 /data/data_all/GSA/HRR/read2_clustered.fq -f ./py_pre-test/ref_all/hg38_tcr.fa --ref ./py_pre-test/ref_all/human_IMGT_T.fa --barcode /data/data_all/GSA/HRR/read1_clustered.fq --readFormat bc:0:15 -o /data/data_all/GSA/HRR/
In addition, I also noticed that the single-cell part of the README mentions that TRUST4 only clusters and assembles 10X data with barcode based on barcode. So, when assembling barcode double ended sequencing with commands like the one I mentioned above without using the umi keyword, will it participate in the assembly? In other words, does the umi sequence have a significant impact on TCR assembly?
Thank you in advance for your reply

@mourisl
Copy link
Collaborator

mourisl commented Nov 16, 2024

This is the ID of the internal contigs that were generated by TRUST4 during assembly steps. Some of the contigs were merged or filtered during the assembly, so the ID will be removed.

For 10X single-cell data, the UMI information is only used to quantify how many UMIs for each assembled contig, and this information is used during the assembly step. It might affect how you select the representative receptor for each cell barcode.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants