Skip to content

Commit

Permalink
refactor: better architecture for differ library
Browse files Browse the repository at this point in the history
fixes #36
  • Loading branch information
notalfredo committed Jan 6, 2023
1 parent 827a1e5 commit 8e7d892
Show file tree
Hide file tree
Showing 7 changed files with 369 additions and 300 deletions.
71 changes: 31 additions & 40 deletions crates/differ/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![CI](https://github.com/nlp-rs/differ.rs/actions/workflows/main.yml/badge.svg)](https://github.com/nlp-rs/differ.rs/actions/workflows/main.yml)
[![Security audit](https://github.com/nlp-rs/differ.rs/actions/workflows/security-audit.yml/badge.svg)](https://github.com/nlp-rs/differ.rs/actions/workflows/security-audit.yml)
> warning: **Differ.rs is currently experimental**
This crate provides edit distance, delta vectors between 2 words, and lets you apply delta vectors in order to transform words.
This crate provides edit distance, deltas between 2 words, and lets you apply deltas in order to transform words.

## Install
```shell
Expand All @@ -16,82 +16,73 @@ differ-rs = "0.0.0"
```

## Features
* `apply_diff`: Allows users to apply delta vectors in order to transform a words.
* `extra_traits`: all `struct`s implemented in `differ-rs` are `HammingDistance` and `LevenshteinDistance`. Each Struct implements the `diff` and `distance` methods.
* `apply_diff` function: Allows users to apply deltas in order to transform a words.
* `Diff` struct: Contains a Box<> of operations between two strings. Also keeps track of length of longest string. Has methods that allows users to get the edit distance between two words, and view delta operations.
* `levenshtein` function: Returns a Diff struct between string 1 and string 2 using levenshtein algorithm.
* `hamming` function: Returns a Diff struct between string 1 and string 2 hamming algorithm.

## How it works
* `apply_diff` works by giving a string and a transformation vector to the method. Then the transformation vector is applied to the string given in the first argument.
* `StringDiffAlgorithm` provides two methods `diff` which gives you a transformation vector from the first to second string. The `distance` method gives you the edit distance from the frist argument to the second argument. The structs `HammingDistance` and `LevenshteinDistance` have their own implementations for each method.
* `Diff` works by hodling a Box<> of operations, and longest length between any two strings. Both the `levenshtein`, `hamming` algorithm return this struct.

## Examples

Getting the edit distance between two words using Levenshtein algorithm
```rs
use differ_rs::{LevenshteinDistance, StringDiffAlgorithm};
use differ_rs::levenshtein;

fn main(){
let my_levensthein = LevenshteinDistance {};
let levensthein_edit_distance = levenshtein("Sitting", "Kitten").distance();

let edit_distance = my_levensthein.distance("Sitting", "Kitten");

assert_eq!(3, edit_distance)
assert_eq!(3, levensthein_edit_distance);
}
```
> **Note**: We are getting the edit distance to get from "Sitting" to "Kitten".
Getting the delta vectors between two words using Levenshtein algorithm
To view the delta between two words using Levenshtein algorithm
```rs
use differ_rs::{LevenshteinDistance, StringDiffAlgorithm};
use differ_rs::levenshtein;

fn main(){
let my_levensthein = LevenshteinDistance {};
let my_levensthein = levenshtein("Sitting", "Kitten");

let delta_vec = my_levensthein.diff("Sitting", "Kitten");

for i in delta_vec.iter(){
println!("{:?}", i);
}
my_levensthein.operations();
}
```

This example outputs:

```text
StringDiffOp { kind: Delete('g'), index: 6 }
StringDiffOp { kind: Delete, index: 6 }
StringDiffOp { kind: Substitute('i', 'e'), index: 4 }
StringDiffOp { kind: Substitute('S', 'K'), index: 0 }
```

Getting the edit distance between two words using Hamming algorithm
```rs
use differ_rs::{HammingDistance, StringDiffAlgorithm};
use differ_rs::hamming;

fn main(){
let my_hamming = HammingDistance {};
let kathrin_edit_distance = hamming("karolin", "kathrin").distance();

let edit_distance = my_hamming.distance("karolin", "kathrin");

assert_eq!(3, edit_distance);
assert_eq!(3, kathrin_edit_distance);
}
```
Note: We are getting the edit distance to get from "karolin" to "kathrin",
> **Note**: We are getting the edit distance to get from "karolin" to "kathrin",
additionally the first string and second string must be the same length, or
will cause a panic to be triggered.


Getting the delta vectors between two words using Hamming algorithm
Getting the deltas between two words using Hamming algorithm
```rs
use differ_rs::{HammingDistance, StringDiffAlgorithm};
use differ_rs::hamming;

fn main(){
let my_hamming = HammingDistance {};
let kathrin_edit_distance = hamming("karolin", "kathrin");

let delta_vec = my_hamming.diff("karolin", "kathrin");

for i in delta_vec.iter(){
println!("{:?}", i);
}
kathrin_edit_distance.operations();
}

```
This example outputs:

Expand All @@ -101,22 +92,22 @@ StringDiffOp { kind: Substitute('o', 'h'), index: 3 }
StringDiffOp { kind: Substitute('l', 'r'), index: 4 }
```

Applying delta vectors to words
Applying deltas to words
```rs
use differ_rs::{HammingDistance, LevenshteinDistance, StringDiffAlgorithm,apply_diff};
use differ_rs::{hamming, levenshtein, apply_diff};

fn main(){
let my_levensthein = LevenshteinDistance {};
let levensthein_delta_vec = my_levensthein.diff("sitting", "kitten");
let delta_applied_v1 = apply_diff("sitting", levensthein_delta_vec);
let my_levensthein = levenshtein("sitting", "kitten");
let delta_applied_v1 = apply_diff("sitting", &my_levensthein.ops);


let my_hamming = HammingDistance {};
let hamming_delta_vec = my_hamming.diff("karolin", "kathrin");
let delta_applied_v2 = apply_diff("karolin", hamming_delta_vec);
let my_hamming = hamming("karolin", "kathrin");
let delta_applied_v2 = apply_diff("karolin", &my_hamming.ops);

assert_eq!("kitten", delta_applied_v1);
assert_eq!("kathrin", delta_applied_v2);
}

```

## License
Expand Down
35 changes: 19 additions & 16 deletions crates/differ/src/apply_diff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pub(crate) fn remove(start: usize, stop: usize, s: &str) -> String {
result
}

pub fn apply_diff(s: &str, diffs: Vec<StringDiffOp>) -> String {
pub fn apply_diff(s: &str, diffs: &Box<[StringDiffOp]>) -> String {
let mut new_string: String = s.into();

for i in diffs.iter() {
Expand All @@ -37,47 +37,50 @@ mod tests {

#[test]
fn test_apply_diffs() {
let test_vec: Vec<StringDiffOp> = vec![
let test_box: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('g', 6),
StringDiffOp::new_substitute('e', 'i', 4),
StringDiffOp::new_substitute('k', 's', 0),
];
]);

let test_vec_2: Vec<StringDiffOp> = vec![
let test_box_2: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_substitute('r', 'n', 4),
StringDiffOp::new_delete(2),
StringDiffOp::new_delete(1),
];
]);

let test_vec_3: Vec<StringDiffOp> = vec![
let test_box_3: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('S', 5),
StringDiffOp::new_delete(1),
StringDiffOp::new_delete(0),
];
]);

let test_vec_4 = vec![
let test_box_4: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('e', 1),
StringDiffOp::new_insert('o', 3),
];
]);

let test_vec_5 = vec![
let test_box_5: Box<[StringDiffOp]> = Box::new([
StringDiffOp::new_insert('r', 4),
StringDiffOp::new_insert('s', 0),
];
]);

assert_eq!(
String::from("sitting"),
super::apply_diff("kitten", test_vec)
super::apply_diff("kitten", &test_box)
);
assert_eq!(
String::from("Sunday"),
super::apply_diff("Saturday", test_vec_2)
super::apply_diff("Saturday", &test_box_2)
);
assert_eq!(String::from("SETS"), super::apply_diff("RESET", test_vec_3));
assert_eq!(String::from("heeoy"), super::apply_diff("hey", test_vec_4));
assert_eq!(
String::from("SETS"),
super::apply_diff("RESET", &test_box_3)
);
assert_eq!(String::from("heeoy"), super::apply_diff("hey", &test_box_4));
assert_eq!(
String::from("skater"),
super::apply_diff("kate", test_vec_5)
super::apply_diff("kate", &test_box_5)
);
}
}
26 changes: 26 additions & 0 deletions crates/differ/src/diff.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
use crate::StringDiffOp;

#[derive(Debug, PartialEq)]
pub struct Diff {
pub ops: Box<[StringDiffOp]>,
pub total_len: usize,
}

impl Diff {
pub fn new(diffs: Vec<StringDiffOp>, total_len: usize) -> Self {
Self {
ops: diffs.into_boxed_slice(),
total_len: total_len,
}
}

pub fn distance(&self) -> usize {
self.ops.len()
}

pub fn operations(&self) {
for i in self.ops.iter() {
println!("{:?}", i);
}
}
}
41 changes: 41 additions & 0 deletions crates/differ/src/diff_score.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
pub struct DiffScoreConfig {
pub sub_cost: f32,
pub lowercase_sub_cost: f32,
pub indel_cost: f32,
pub transpose_cost: f32,
// future properties here as needed
}

impl Default for DiffScoreConfig {
fn default() -> Self {
Self {
sub_cost: 1.0,
lowercase_sub_cost: 1.0,
indel_cost: 1.0,
transpose_cost: 1.0,
}
}
}

#[cfg(test)]
mod tests {

#[test]
fn test_default() {
let test_struct = super::DiffScoreConfig::default();
assert_eq!(test_struct.sub_cost, 1.0);
assert_eq!(test_struct.lowercase_sub_cost, 1.0);
assert_eq!(test_struct.indel_cost, 1.0);
assert_eq!(test_struct.transpose_cost, 1.0);

let mut test_struct = super::DiffScoreConfig::default();
test_struct.sub_cost = 2.0;
test_struct.lowercase_sub_cost = 2.0;
test_struct.indel_cost = 2.0;
test_struct.transpose_cost = 2.0;
assert_eq!(test_struct.sub_cost, 2.0);
assert_eq!(test_struct.lowercase_sub_cost, 2.0);
assert_eq!(test_struct.indel_cost, 2.0);
assert_eq!(test_struct.transpose_cost, 2.0);
}
}
Loading

0 comments on commit 8e7d892

Please sign in to comment.