-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalized Hamming Distance #256
Comments
Pinging @ktpolanski, who added the Hamming Distance Feature: What do you think about the normalization? Regarding the |
Does normalisation imply dividing by the length of the sequence? If so, then that puts us pretty close to what Dandelion does, no? |
This would put it more in line with what The normalization in if (normalize == "len") {
dist_mat <- dist_mat / seq_length for This should allow BCRs with different lengths to be grouped as a clonotype if they pass the similarity cut-off, but keep it a substitution-only context. It's frequently used and I can understand its appeal as it allows for more relaxed/unbiased grouping and discovery of potential related BCR patterns that use the same V- and J-genes. It does "violate" the same length requirement for BCR SHM that textbooks teaches us but you could potentially argue it's due to technical things like sequencing error. An easy way to do all this is to parse to AIRR format and access the |
I think we're all talking about slightly different things. If we take the Hamming distance and divide it by sequence length, we'll obtain a "percent of mismatches" measure, which is what dandelion does. That is, if this is what OP meant by "normalised Hamming". |
It would be better to have a normalization options in distance metric for BCR support. Still not clear on how to use the abstract class DistanceCalculator (tutorial could help)
The text was updated successfully, but these errors were encountered: