Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
hamming_distance can hang or produce incorrect results (facebookincub…
…ator#9349) Summary: Pull Request resolved: facebookincubator#9349 The call function for hamming_distance was written to iterate through two strings comparing UTF-8 characters. It uses utf8proc_codepoint to read those characters, it returns the character or the negative length of the invalid code point if it's invalid UTF-8. It then updates it's position in the string to either the number of bytes in the character, or the length of the invalid code point. The logic currently incorrectly treats ASCII 0 (the null character) as an invalid code point. Since the external library correctly treats it as a valid UTF-8 character it returns 0 for the character. The logic in hamming_distance treats 0 as the negative value of the length of the invalid code point, meaning it doesn't change it's position in the string. This means we return incorrect results if a null character appears in either string, as we incorrectly compute the length of the string with the null character. If both strings contain null characters, we end up in an infinite loop as neither string will make progress. Note that callAscii handles this correctly. Reviewed By: amitkdutta, kgpai Differential Revision: D55670296 fbshipit-source-id: 73d15b48b67f5342fe1c7904146c32dc5c34bd2e
- Loading branch information