Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pylance reports "Invalid character in token" but character is valid #1286

Closed
p-i- opened this issue May 14, 2021 · 8 comments
Closed

Pylance reports "Invalid character in token" but character is valid #1286

p-i- opened this issue May 14, 2021 · 8 comments
Labels
bug Something isn't working fixed in next version (main) A fix has been implemented and will appear in an upcoming version

Comments

@p-i-
Copy link

p-i- commented May 14, 2021

New version of this plugin uses PyLance, which compains here:

𐌎 = 42  # Invalid character in token

But this is a valid character:

> ipython
Python 3.7.4 (default, Aug 13 2019, 15:17:50)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 𐌎 = 42

In [2]: 𐌎
Out[2]: 42

ref: https://www.asmeurer.com/python-unicode-variable-names/ -- this links to a list of all valid characters for variable names

@karthiknadig karthiknadig transferred this issue from microsoft/vscode-python May 14, 2021
@erictraut erictraut added bug Something isn't working and removed triage labels May 14, 2021
@erictraut
Copy link
Contributor

Thanks for the bug report.

Pylance supports unicode characters in general, but it doesn't currently support "supplementary characters" that extend the UTF16 Unicode character set. The character above falls into that category. It uses the "surrogate" character 0xD800 followed by a second 16-bit value to encode the "𐌎" symbol, which is character in the "old italic" character set (from a dead language). We'll need to add support for these supplementary characters in the pyright tokenizer so it properly handles the complete "Lo" character set, which PEP 3131 indicates is supported for identifiers.

@erictraut
Copy link
Contributor

This will be addressed in the next release.

@erictraut erictraut added the fixed in next version (main) A fix has been implemented and will appear in an upcoming version label May 15, 2021
@p-i-
Copy link
Author

p-i- commented May 16, 2021

@erictraut
Copy link
Contributor

There's no way to disable the warning short of disabling pylance. You can work around it by choosing identifier names that don't rely on supplementary characters. Out of curiosity, why are you using characters from dead languages in your variable names? I know that it's supported by Python, but it doesn't seem to be a good idea for code readability.

We typically release a new version of pylance each week, so you won't have to wait long for the fix.

@p-i-
Copy link
Author

p-i- commented May 16, 2021

My brain parses single characters more easily than words.

I wouldn't do it on a team project, but for my own code it makes a big difference.

Screenshot 2021-05-16 at 09 05 36

This code generates a file I work from:

import requests
from bs4 import BeautifulSoup

# original URL from localhost
URL = "https://www.asmeurer.com/python-unicode-variable-names/start-characters.html"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
# table = soup.find('table')


lines = soup.tbody.find_all('tr')

groups = {}

for tr in lines:
    td = tr.find_all('td')
    group = td[2].text.split(' ')[0]
    symbol = td[1].text

    if group in groups:
        groups[group] += [symbol]
    else:
        groups[group] = [symbol]

exclude = 'HANGUL CJK SOGDIAN ELYMAIC DOGRA NANDINAGARI ZANABAZAR SOYOMBO MASARAM' \
    ' GUNJALA MAKASAR ANATOLIAN MEDEFAIDRIN TANGUT NUSHU (unknown) HENTAIGANA NYIAKENG'.split(' ')

with open('symbols.txt', 'w') as _file:
    for group, symbols in groups.items():
        if group not in exclude:
            _file.write(str(group) + '\n' + " ".join(symbols) + '\n\n')

There's also https://www.asmeurer.com/python-unicode-variable-names/continue-characters.html

@jakebailey
Copy link
Member

This issue has been fixed in version 2021.5.3, which we've just released. You can find the changelog here: https://github.com/microsoft/pylance-release/blob/main/CHANGELOG.md#202153-19-may-2021

@baggiponte
Copy link

Hi! I am still having this issue (VSCode version is 1.58.2) with Jupyter Notebook cells running shell scripts via ! like !<my_bash_script>.

@jakebailey
Copy link
Member

That's unrelated; see #1579 (comment) and https://github.com/microsoft/vscode-jupyter/issues/6635.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed in next version (main) A fix has been implemented and will appear in an upcoming version
Projects
None yet
Development

No branches or pull requests

4 participants