Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDR3 sequences length #295

Closed
lishuangshuang0616 opened this issue Jul 25, 2024 · 12 comments
Closed

CDR3 sequences length #295

lishuangshuang0616 opened this issue Jul 25, 2024 · 12 comments

Comments

@lishuangshuang0616
Copy link

Hi Dr. Li,

When I use Cell Ranger and TRUST4 for analysis,
I noticed that some CDR3 sequences obtained by TRUST4 are longer than those from Cell Ranger.
What caused the difference?

# trust4
TRA:CAVSSTNTGKLTFGD  TRA:TGTGCAGTGAGTAGCACCAATACAGGCAAATTAACCTTTGGGGAT
TRB:CSSREGLQDTQYF    TRB:TGTAGTTCTAGAGAAGGACTCCAAGACACCCAGTACTTT

# cellranger
TRA:CAVSSTNTGKLTF    TRA:TGTGCAGTGAGTAGCACCAATACAGGCAAATTAACCTTT
TRB:CSSREGLQDTQYF    TRB:TGTAGTTCTAGAGAAGGACTCCAAGACACCCAGTACTTT

i have used the latest repo.

@mourisl
Copy link
Collaborator

mourisl commented Jul 25, 2024

Which version of TRUST4 are you using? Could you please send me the corresponding row for TGTGCAGTGAGTAGCACCAATACAGGCAAATTAACCTTTGGGGAT in the AIRR file. Thank you!

@lishuangshuang0616
Copy link
Author

airr_select.txt

version 1.1.2

@mourisl
Copy link
Collaborator

mourisl commented Jul 26, 2024

Thank you for sharing the file. I think I've found the issue, and pushed the fix to the github repo. Could you clone the github version and give it a try? This is a pretty serious bug (only happens on mouse TRA chain), and if it works on your data set, I will draft a new release soon. Thank you!

@lishuangshuang0616
Copy link
Author

Thank you for your quick work. I will try to re-analyze using the latest repo.

@lishuangshuang0616
Copy link
Author

I checked and found that the result no longer has that last part and is consistent with cellranger.
Another question, for this particular sequence, both cdr1 and cdr2 are identified as null, but cellranger is able to identify them.
something wrong with the annotation?

@mourisl
Copy link
Collaborator

mourisl commented Jul 26, 2024

Thank you! The mouse IMGT TRA sequence has some special gaps, so the common coordinate for CDR1, 2, 3 does not hold. The CDR3 coordinate can still be inferred from the motifs, but the CDR1, 2 is more difficult to recalibrate. I think in CellRanger, they have their own reference gene annotation, so they don't have this issue. I need some time to fix the CDR1,2 issue.

@lishuangshuang0616
Copy link
Author

Thank you very much for your answer.

@mourisl
Copy link
Collaborator

mourisl commented Jul 30, 2024

I just added the option "--imgtAdditionalGap" in the "special_gap" branch on the github repo. If the CDR1,2 information for the TRA chain is needed, you can add the option "--imgtAdditionalGap TRAV:7,83". If you get a chance, could you please give this branch a try and let me know whether it works on your data? Thank you!

@lishuangshuang0616
Copy link
Author

lishuangshuang0616 commented Aug 1, 2024

i check some result

# trust4
TSGFNG,ACATCTGGGTTCAACGGG,NVLDGL,AATGTTCTGGATGGTTTG,CAVRNAGNMLTF,TGTGCTGTGAGGAATGCAGGCAACATGCTCACCTTT
NSASQS,AACAGTGCTTCTCAGTCT,VYSSGN,GTATACTCCAGTGGTAAT,CVVNPESGSARQLTF,TGTGTGGTGAACCCGGAATCTGGTTCTGCAAGGCAACTGACCTTT
NSASDY,AACAGCGCCTCAGACTAC,IRSNMDK,ATTCGTTCAAATATGGACAAA,CAENSPDNAGNMLTF,TGTGCAGAGAATTCCCCCGATAATGCAGGCAACATGCTCACCTTT
KALYS,AAGGCTTTATATTCT,LLKGGEQ,TTACTGAAGGGTGGAGAACAG,CGTEIRGDAGGTSYGKLTF,TGTGGCACAGAGATAAGAGGGGATGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
SSYSPS,TCTTCTTATTCACCATCT,YTSAATLV,TACACATCAGCGGCCACCCTGGTT,CVVRRGQNFVF,TGTGTTGTGAGGAGGGGTCAGAATTTTGTCTTT
# 10x
TSGFNG,ACATCTGGGTTCAACGGG,NVLDGL,AATGTTCTGGATGGTTTG,CAVRNAGNMLTF,TGTGCTGTGAGGAATGCAGGCAACATGCTCACCTTT
NSASQS,AACAGTGCTTCTCAGTCT,VYSSG,GTATACTCCAGTGGT,CVVNPESGSARQLTF,TGTGTGGTGAACCCGGAATCTGGTTCTGCAAGGCAACTGACCTTT
NSASDY,AACAGCGCCTCAGACTAC,IRSNMDK,ATTCGTTCAAATATGGACAAA,CAENSPDNAGNMLTF,TGTGCAGAGAATTCCCCCGATAATGCAGGCAACATGCTCACCTTT
KALYS,AAGGCTTTATATTCT,LLKGGEQ,TTACTGAAGGGTGGAGAACAG,CGTEIRGDAGGTSYGKLTF,TGTGGCACAGAGATAAGAGGGGATGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
SSYSPS,TCTTCTTATTCACCATCT,YTSAATLV,TACACATCAGCGGCCACCCTGGTT,CVVRRGQNFVF,TGTGTTGTGAGGAGGGGTCAGAATTTTGTCTTT

i've captured some of the results for you,. If you need more , i send them use email

@mourisl
Copy link
Collaborator

mourisl commented Aug 1, 2024

Thank you! This looks quite consistent. I've merged the branch to the master, and will make a new release this week.

@lishuangshuang0616
Copy link
Author

So this parameter can only be added when performing similar analyses like TCR analysis in mouse, and if it's added to other analyses like BCR analysis, it will result in incorrect CDR1/CDR2, right?

@mourisl
Copy link
Collaborator

mourisl commented Aug 2, 2024

The parameter is only effective when TRUST4 detects the IMGT introduces additional gaps, where the canonical CDR3 motif does not match the IMGT coordinate system. When that happens, it will try to use the --imgtAdditionalGap option to adjust CDR1 and CDR2's coordinates. Therefore, adding the option shouldn't affect the analysis for most chains, and only the chain specified in the option like "TRAV:7,83" will be affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants