Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character \x00 in to_ascii() raises an exception #6

Closed
philippemilink opened this issue Jun 17, 2023 · 1 comment · Fixed by #7
Closed

Character \x00 in to_ascii() raises an exception #6

philippemilink opened this issue Jun 17, 2023 · 1 comment · Fixed by #7

Comments

@philippemilink
Copy link

import homoglyphs as hg
hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD).to_ascii('\x00')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "homoglyphs/core.py", line 240, in to_ascii
    return self.uniq_and_sort(self._to_ascii(text))
  File "homoglyphs/core.py", line 169, in uniq_and_sort
    result = list(set(data))
  File "homoglyphs/core.py", line 235, in _to_ascii
    for variant in self._get_combinations(text, ascii=True):
  File "homoglyphs/core.py", line 218, in _get_combinations
    alt_chars = self._get_char_variants(char)
  File "homoglyphs/core.py", line 195, in _get_char_variants
    if not self._update_alphabet(char):
  File "homoglyphs/core.py", line 182, in _update_alphabet
    category = Categories.detect(char)
  File "homoglyphs/core.py", line 66, in detect
    category = unicodedata.name(char).split()[0]
ValueError: no such name

I guess it should rather return [].

(BTW, is this fork still maintained?)

@wesinator
Copy link

This exception occurs in the Python standard library unicodedata module:

import unicodedata
print(unicodedata.name("\x00"))

there are other similar zero characters that also raise an exception. This seems odd since there seems to be a name alias defined in https://www.unicode.org/Public/15.0.0/ucd/Index.txt

However, the issue in this library is that it's catching the wrong exception type:

except TypeError:

Should be ValueError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants