Skip to content

Commit

Permalink
refactor: better architecture for differ library (#38)
Browse files Browse the repository at this point in the history
fixes #36
  • Loading branch information
notalfredo authored Jan 13, 2023
1 parent 827a1e5 commit cf74525
Show file tree
Hide file tree
Showing 5 changed files with 317 additions and 300 deletions.
75 changes: 34 additions & 41 deletions crates/differ/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![CI](https://github.com/nlp-rs/differ.rs/actions/workflows/main.yml/badge.svg)](https://github.com/nlp-rs/differ.rs/actions/workflows/main.yml)
[![Security audit](https://github.com/nlp-rs/differ.rs/actions/workflows/security-audit.yml/badge.svg)](https://github.com/nlp-rs/differ.rs/actions/workflows/security-audit.yml)
> warning: **Differ.rs is currently experimental**
This crate provides edit distance, delta vectors between 2 words, and lets you apply delta vectors in order to transform words.
This crate provides edit distance, deltas between 2 words, and lets you apply deltas in order to transform words.

## Install
```shell
Expand All @@ -16,81 +16,75 @@ differ-rs = "0.0.0"
```

## Features
* `apply_diff`: Allows users to apply delta vectors in order to transform a words.
* `extra_traits`: all `struct`s implemented in `differ-rs` are `HammingDistance` and `LevenshteinDistance`. Each Struct implements the `diff` and `distance` methods.

* `Diff` struct: Contains a Box<> of operations between two strings. Also keeps track of length of longest string. Has methods that allows users to get the edit distance between two words, and view delta operations.
* `apply_diff()`: Allows users to apply deltas in order to transform a words.
* `levenshtein()`: Returns a Diff struct between string 1 and string 2. Levenshtein algorithm can detect insertions, deletions, and substitutions.
* `hamming()`: Returns a Diff struct between string 1 and string 2. Hamming algorithm can only detect substitutions, and string 1 and string 2 must me equal length.

## How it works
* `apply_diff` works by giving a string and a transformation vector to the method. Then the transformation vector is applied to the string given in the first argument.
* `StringDiffAlgorithm` provides two methods `diff` which gives you a transformation vector from the first to second string. The `distance` method gives you the edit distance from the frist argument to the second argument. The structs `HammingDistance` and `LevenshteinDistance` have their own implementations for each method.
* `apply_diff()` works by giving a string and a transformation vector to the method. Then the transformation vector is applied to the string given in the first argument.
* `Diff` holds a `Box<StringDiffOp>`, and the longest length of any two strings. Both `levenshtein()`, and `hamming()` eturn this struct.

## Examples

Getting the edit distance between two words using Levenshtein algorithm
```rs
use differ_rs::{LevenshteinDistance, StringDiffAlgorithm};
use differ_rs::levenshtein;

fn main(){
let my_levensthein = LevenshteinDistance {};
let levensthein_edit_distance = levenshtein("Sitting", "Kitten").distance();

let edit_distance = my_levensthein.distance("Sitting", "Kitten");

assert_eq!(3, edit_distance)
assert_eq!(3, levensthein_edit_distance);
}
```
> **Note**: We are getting the edit distance to get from "Sitting" to "Kitten".
Getting the delta vectors between two words using Levenshtein algorithm
To view the delta between two words using Levenshtein algorithm
```rs
use differ_rs::{LevenshteinDistance, StringDiffAlgorithm};
use differ_rs::levenshtein;

fn main(){
let my_levensthein = LevenshteinDistance {};
let my_levensthein = levenshtein("Sitting", "Kitten");

let delta_vec = my_levensthein.diff("Sitting", "Kitten");

for i in delta_vec.iter(){
println!("{:?}", i);
for diff_op in my_levensthein.ops.iter() {
println!("{:?}", diff_op);
}
}
```

This example outputs:

```text
StringDiffOp { kind: Delete('g'), index: 6 }
StringDiffOp { kind: Delete, index: 6 }
StringDiffOp { kind: Substitute('i', 'e'), index: 4 }
StringDiffOp { kind: Substitute('S', 'K'), index: 0 }
```

Getting the edit distance between two words using Hamming algorithm
```rs
use differ_rs::{HammingDistance, StringDiffAlgorithm};
use differ_rs::hamming;

fn main(){
let my_hamming = HammingDistance {};
let kathrin_edit_distance = hamming("karolin", "kathrin").distance();

let edit_distance = my_hamming.distance("karolin", "kathrin");

assert_eq!(3, edit_distance);
assert_eq!(3, kathrin_edit_distance);
}
```
Note: We are getting the edit distance to get from "karolin" to "kathrin",
> **Note**: We are getting the edit distance to get from "karolin" to "kathrin",
additionally the first string and second string must be the same length, or
will cause a panic to be triggered.


Getting the delta vectors between two words using Hamming algorithm
Getting the deltas between two words using Hamming algorithm
```rs
use differ_rs::{HammingDistance, StringDiffAlgorithm};
use differ_rs::hamming;

fn main(){
let my_hamming = HammingDistance {};
let kathrin_edit_distance = hamming("karolin", "kathrin");

let delta_vec = my_hamming.diff("karolin", "kathrin");

for i in delta_vec.iter(){
println!("{:?}", i);
}
for diff_op in kathrin_edit_distance.ops.iter() {
println!("{:?}", diff_op);
}
}
```
This example outputs:
Expand All @@ -101,18 +95,17 @@ StringDiffOp { kind: Substitute('o', 'h'), index: 3 }
StringDiffOp { kind: Substitute('l', 'r'), index: 4 }
```

Applying delta vectors to words
Applying deltas to words
```rs
use differ_rs::{HammingDistance, LevenshteinDistance, StringDiffAlgorithm,apply_diff};
use differ_rs::{hamming, levenshtein, apply_diff};

fn main(){
let my_levensthein = LevenshteinDistance {};
let levensthein_delta_vec = my_levensthein.diff("sitting", "kitten");
let delta_applied_v1 = apply_diff("sitting", levensthein_delta_vec);
let my_levensthein = levenshtein("sitting", "kitten");
let delta_applied_v1 = apply_diff("sitting", my_levensthein.ops.to_vec());


let my_hamming = HammingDistance {};
let hamming_delta_vec = my_hamming.diff("karolin", "kathrin");
let delta_applied_v2 = apply_diff("karolin", hamming_delta_vec);
let my_hamming = hamming("karolin", "kathrin");
let delta_applied_v2 = apply_diff("karolin", my_hamming.ops.to_vec());

assert_eq!("kitten", delta_applied_v1);
assert_eq!("kathrin", delta_applied_v2);
Expand Down
33 changes: 18 additions & 15 deletions crates/differ/src/apply_diff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,47 +37,50 @@ mod tests {

#[test]
fn test_apply_diffs() {
let test_vec: Vec<StringDiffOp> = vec![
let test_box: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('g', 6),
StringDiffOp::new_substitute('e', 'i', 4),
StringDiffOp::new_substitute('k', 's', 0),
];
]);

let test_vec_2: Vec<StringDiffOp> = vec![
let test_box_2: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_substitute('r', 'n', 4),
StringDiffOp::new_delete(2),
StringDiffOp::new_delete(1),
];
]);

let test_vec_3: Vec<StringDiffOp> = vec![
let test_box_3: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('S', 5),
StringDiffOp::new_delete(1),
StringDiffOp::new_delete(0),
];
]);

let test_vec_4 = vec![
let test_box_4: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('e', 1),
StringDiffOp::new_insert('o', 3),
];
]);

let test_vec_5 = vec![
let test_box_5: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('r', 4),
StringDiffOp::new_insert('s', 0),
];
]);

assert_eq!(
String::from("sitting"),
super::apply_diff("kitten", test_vec)
super::apply_diff("kitten", test_box.to_vec())
);
assert_eq!(
String::from("Sunday"),
super::apply_diff("Saturday", test_vec_2)
super::apply_diff("Saturday", test_box_2.to_vec())
);
assert_eq!(String::from("SETS"), super::apply_diff("RESET", test_vec_3));
assert_eq!(String::from("heeoy"), super::apply_diff("hey", test_vec_4));
assert_eq!(
String::from("SETS"),
super::apply_diff("RESET", test_box_3.to_vec())
);
assert_eq!(String::from("heeoy"), super::apply_diff("hey", test_box_4.to_vec()));
assert_eq!(
String::from("skater"),
super::apply_diff("kate", test_vec_5)
super::apply_diff("kate", test_box_5.to_vec())
);
}
}
124 changes: 61 additions & 63 deletions crates/differ/src/hamming.rs
Original file line number Diff line number Diff line change
@@ -1,84 +1,82 @@
use crate::{StringDiffAlgorithm, StringDiffOp};
use crate::{Diff, StringDiffOp};
use std::iter::zip;

pub struct HammingDistance {}
impl StringDiffAlgorithm for HammingDistance {
fn diff<'a>(&self, s1: &'a str, s2: &'a str) -> Vec<StringDiffOp> {
if s1.len() != s2.len() {
panic!("Strings must be same length");
}
pub fn hamming<'a>(s1: &'a str, s2: &'a str) -> Diff {
if s1.len() != s2.len() {
panic!("Strings must be same length");
}

let mut opp_vec: Vec<StringDiffOp> = Vec::new();
let iter = zip(s1.chars(), s2.chars());
let mut opp_vec: Vec<StringDiffOp> = Vec::new();
let iter = zip(s1.chars(), s2.chars());

for (i, (char1, char2)) in iter.enumerate() {
if char1 != char2 {
opp_vec.push(StringDiffOp::new_substitute(char1, char2, i));
}
for (i, (char1, char2)) in iter.enumerate() {
if char1 != char2 {
opp_vec.push(StringDiffOp::new_substitute(char1, char2, i));
}
opp_vec
}

fn distance<'a>(&self, s1: &'a str, s2: &'a str) -> usize {
self.diff(s1, s2).len()
}
Diff::new(opp_vec, s1.len())
}

#[cfg(test)]
mod tests {
use crate::{StringDiffAlgorithm, StringDiffOp};

#[test]
fn test_hamming_distance_edit_distance() {
let test_struct = super::HammingDistance {};

assert_eq!(3, test_struct.distance("karolin", "kathrin"));
assert_eq!(3, test_struct.distance("karolin", "kerstin"));
assert_eq!(4, test_struct.distance("kathrin", "kerstin"));
assert_eq!(4, test_struct.distance("0000", "1111"));
assert_eq!(3, test_struct.distance("2173896", "2233796"));
}
use crate::StringDiffOp;

#[test]
fn test_hamming_distance_op_distance() {
let test_struct = super::HammingDistance {};
use crate::hamming::hamming;
use crate::Diff;

let test_vec: Vec<StringDiffOp> = vec![
StringDiffOp::new_substitute('r', 't', 2),
StringDiffOp::new_substitute('o', 'h', 3),
StringDiffOp::new_substitute('l', 'r', 4),
];
let test_diff = Diff {
ops: Box::new([
StringDiffOp::new_substitute('r', 't', 2),
StringDiffOp::new_substitute('o', 'h', 3),
StringDiffOp::new_substitute('l', 'r', 4),
]),
total_len: 7,
};

let test_vec_2: Vec<StringDiffOp> = vec![
StringDiffOp::new_substitute('a', 'e', 1),
StringDiffOp::new_substitute('o', 's', 3),
StringDiffOp::new_substitute('l', 't', 4),
];
let test_diff_2 = Diff {
ops: Box::new([
StringDiffOp::new_substitute('a', 'e', 1),
StringDiffOp::new_substitute('o', 's', 3),
StringDiffOp::new_substitute('l', 't', 4),
]),
total_len: 7,
};

let test_vec_3: Vec<StringDiffOp> = vec![
StringDiffOp::new_substitute('a', 'e', 1),
StringDiffOp::new_substitute('t', 'r', 2),
StringDiffOp::new_substitute('h', 's', 3),
StringDiffOp::new_substitute('r', 't', 4),
];
let test_diff_3 = Diff {
ops: Box::new([
StringDiffOp::new_substitute('a', 'e', 1),
StringDiffOp::new_substitute('t', 'r', 2),
StringDiffOp::new_substitute('h', 's', 3),
StringDiffOp::new_substitute('r', 't', 4),
]),
total_len: 7,
};

let test_vec_4: Vec<StringDiffOp> = vec![
StringDiffOp::new_substitute('0', '1', 0),
StringDiffOp::new_substitute('0', '1', 1),
StringDiffOp::new_substitute('0', '1', 2),
StringDiffOp::new_substitute('0', '1', 3),
];
let test_diff_4 = Diff {
ops: Box::new([
StringDiffOp::new_substitute('0', '1', 0),
StringDiffOp::new_substitute('0', '1', 1),
StringDiffOp::new_substitute('0', '1', 2),
StringDiffOp::new_substitute('0', '1', 3),
]),
total_len: 4,
};

let test_vec_5: Vec<StringDiffOp> = vec![
StringDiffOp::new_substitute('1', '2', 1),
StringDiffOp::new_substitute('7', '3', 2),
StringDiffOp::new_substitute('8', '7', 4),
];
let test_diff_5 = Diff {
ops: Box::new([
StringDiffOp::new_substitute('1', '2', 1),
StringDiffOp::new_substitute('7', '3', 2),
StringDiffOp::new_substitute('8', '7', 4),
]),
total_len: 7,
};

assert_eq!(&test_vec, &test_struct.diff("karolin", "kathrin"));
assert_eq!(&test_vec_2, &test_struct.diff("karolin", "kerstin"));
assert_eq!(&test_vec_3, &test_struct.diff("kathrin", "kerstin"));
assert_eq!(&test_vec_4, &test_struct.diff("0000", "1111"));
assert_eq!(&test_vec_5, &test_struct.diff("2173896", "2233796"));
assert_eq!(test_diff, hamming("karolin", "kathrin"));
assert_eq!(test_diff_2, hamming("karolin", "kerstin"));
assert_eq!(test_diff_3, hamming("kathrin", "kerstin"));
assert_eq!(test_diff_4, hamming("0000", "1111"));
assert_eq!(test_diff_5, hamming("2173896", "2233796"));
}
}
Loading

0 comments on commit cf74525

Please sign in to comment.