-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Allow computing percent-based scores from diffs #10
Comments
Would this apply to any two words ? Or do these two words have to be the same length ? |
Any two words. With words that aren't the same length, the total would be the max length of the two words. So for example:
The similarity score would be a 5.5/7, because:
|
I have at test implementations at the moment. I have similarity_score and difference_score functions implemented at the comment. They work by first by calling |
The idea sounds nice, but I don't think this would scale well. This would only make it possible to compute a similarity/difference score from a Levenshtein distance algorithm, but this type of value can be computed generally, like the An implementation would actually be as simple as: fn compute_similarity(diff_ops: &[StringDiffOp], total_len: usize) -> f32 {
(diff_ops.len() as f32) / (total_len as f32)
}
fn compute_difference(diff_ops: &[StringDiffOp], total_len: usize) -> u32 {
1.0 - compute_similarity(diff_ops, total_len)
} The number of differences, divided by the total number of items in a type. This helps with 2 things:
Ideally a function should do as little as possible, and these two would actually achieve it that way. However, I was thinking, that we could represent the code in a way where |
It's possible to retrieve the distance (as an integer), representing number of operations to get from string A to string B. It's also possible to retrieve a list of each individual diff-operations with
diff()
.In order to really compare similarity though, it'd be nice to get the percent (from 0.0 to 1.0, representing 0% and 100%), which would return a floating-point (
f32
would fit this).For example, the similarity score between
cattle
andbattle
would be 5/6, or 83.333% (repeating). Likewise, the difference score between them would be 1/6, or 16.666% (repeating).The text was updated successfully, but these errors were encountered: