Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate code between diff() and distance() #9

Closed
neoncitylights opened this issue Dec 29, 2022 · 1 comment · Fixed by #15
Closed

Duplicate code between diff() and distance() #9

neoncitylights opened this issue Dec 29, 2022 · 1 comment · Fixed by #15
Assignees
Labels
lvl-2-medium Medium-ranking issue p3-high Priority 3: Someone is planning to work on this task very soon or immediately. t-code-quality Improvements to code to make it more maintainable, readable, etc

Comments

@neoncitylights
Copy link
Contributor

The initial implementation seems to have caused some duplicate code between both methods defined for the implementations of the StringDiffAlgorithm trait.

One reason the distance() method exists is that it's less memory-intensive. It always returns a usize type (statically-sized), since it's guaranteed to either be 32-bit or 64-bit integer. as creating a struct requires allocating memory.

diff() allocates more memory since it's creating instances of StringDiffOps, and also returns a vector, where vectors are a DST (dynamically-sized type), and therefore heap-allocated.


Is there a way we can deduplicate the code? The alternative is to let the library user call the len() method, like:

let hamming = HammingDistance {};
let distance: usize = hamming.diff().len();

What happens though, as mentioned above, is that if they only need the distance, then it's allocating memory that'll never be used. On the other hand, it'll be a maintenance burden to manage both duplicate implementations, and it'll be easier for bugs to pop up. Thoughts?

@neoncitylights neoncitylights added p3-high Priority 3: Someone is planning to work on this task very soon or immediately. t-code-quality Improvements to code to make it more maintainable, readable, etc lvl-2-medium Medium-ranking issue labels Dec 29, 2022
@notalfredo
Copy link
Member

I like this idea. Another alternative I can also think of is that we can make diff return the vector AND also the usize distance ? the only downside to this would be that now the user has to handle two variables when they might only need one. While the suggestion you provided the user can choose just to get the length or the vector. I think maybe the suggestion you made might be better. Thoughts ?

Either way would require some changes to the readme instructions not a problem tho.

notalfredo added a commit that referenced this issue Dec 30, 2022
Fixes #9

I had a different approach let me know what you think. For levenshtein distance I converted the matrix computations functions as a helper function. Then diff and distance call this helper function. This way we dont have any dupe code. Hamming works similar
notalfredo added a commit that referenced this issue Dec 31, 2022
Fixes #9

I had a different approach let me know what you think. For levenshtein distance I converted the matrix computations functions as a helper function. Then diff and distance call this helper function. This way we dont have any dupe code. Hamming works similar
notalfredo added a commit that referenced this issue Dec 31, 2022
Fixes #9

I had a different approach let me know what you think. For levenshtein distance I converted the matrix computations functions as a helper function. Then diff and distance call this helper function. This way we dont have any dupe code. Hamming works similar
notalfredo added a commit that referenced this issue Dec 31, 2022
Fixes #9

I had a different approach let me know what you think. For levenshtein distance I converted the matrix computations functions as a helper function. Then diff and distance call this helper function. This way we dont have any dupe code. Hamming works similar
notalfredo added a commit that referenced this issue Dec 31, 2022
Fixes #9

I had a different approach let me know what you think. For levenshtein distance I converted the matrix computations functions as a helper function. Then diff and distance call this helper function. This way we dont have any dupe code. Hamming works similar
This was referenced Dec 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lvl-2-medium Medium-ranking issue p3-high Priority 3: Someone is planning to work on this task very soon or immediately. t-code-quality Improvements to code to make it more maintainable, readable, etc
Projects
None yet
2 participants