Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hamming_distance can hang or produce incorrect results #9349

Closed

Commits on Apr 3, 2024

  1. hamming_distance can hang or produce incorrect results (facebookincub…

    …ator#9349)
    
    Summary:
    
    The call function for hamming_distance was written to iterate through two strings comparing UTF-8 characters.  It uses
    utf8proc_codepoint to read those characters, it returns the character or the negative length of the invalid code point if it's
    invalid UTF-8.  It then updates it's position in the string to either the number of bytes in the character, or the length of the 
    invalid code point.
    
    The logic currently incorrectly treats ASCII 0 (the null character) as an invalid code point.  Since the external library correctly
    treats it as a valid UTF-8 character it returns 0 for the character.  The logic in hamming_distance treats 0 as the negative
    value of the length of the invalid code point, meaning it doesn't change it's position in the string.
    
    This means we return incorrect results if a null character appears in either string, as we incorrectly compute the length of the
    string with the null character.  If both strings contain null characters, we end up in an infinite loop as neither string will make
    progress.
    
    Note that callAscii handles this correctly.
    
    Reviewed By: kgpai
    
    Differential Revision: D55670296
    Kevin Wilfong authored and facebook-github-bot committed Apr 3, 2024
    Configuration menu
    Copy the full SHA
    87bc046 View commit details
    Browse the repository at this point in the history