Pylance reports "Invalid character in token" but character is valid #1286

p-i- · 2021-05-14T07:19:57Z

New version of this plugin uses PyLance, which compains here:

𐌎 = 42  # Invalid character in token

But this is a valid character:

> ipython
Python 3.7.4 (default, Aug 13 2019, 15:17:50)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 𐌎 = 42

In [2]: 𐌎
Out[2]: 42

ref: https://www.asmeurer.com/python-unicode-variable-names/ -- this links to a list of all valid characters for variable names

The text was updated successfully, but these errors were encountered:

erictraut · 2021-05-14T18:00:06Z

Thanks for the bug report.

Pylance supports unicode characters in general, but it doesn't currently support "supplementary characters" that extend the UTF16 Unicode character set. The character above falls into that category. It uses the "surrogate" character 0xD800 followed by a second 16-bit value to encode the "𐌎" symbol, which is character in the "old italic" character set (from a dead language). We'll need to add support for these supplementary characters in the pyright tokenizer so it properly handles the complete "Lo" character set, which PEP 3131 indicates is supported for identifiers.

erictraut · 2021-05-15T23:31:15Z

This will be addressed in the next release.

p-i- · 2021-05-16T01:55:54Z

Is there any way I can disable the warning for now?

https://stackoverflow.com/questions/67552574/in-vscode-prevent-pylance-invalid-character-in-identifierpylance-errors

erictraut · 2021-05-16T02:02:40Z

There's no way to disable the warning short of disabling pylance. You can work around it by choosing identifier names that don't rely on supplementary characters. Out of curiosity, why are you using characters from dead languages in your variable names? I know that it's supported by Python, but it doesn't seem to be a good idea for code readability.

We typically release a new version of pylance each week, so you won't have to wait long for the fix.

p-i- · 2021-05-16T02:10:30Z

My brain parses single characters more easily than words.

I wouldn't do it on a team project, but for my own code it makes a big difference.

This code generates a file I work from:

import requests
from bs4 import BeautifulSoup

# original URL from localhost
URL = "https://www.asmeurer.com/python-unicode-variable-names/start-characters.html"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
# table = soup.find('table')


lines = soup.tbody.find_all('tr')

groups = {}

for tr in lines:
    td = tr.find_all('td')
    group = td[2].text.split(' ')[0]
    symbol = td[1].text

    if group in groups:
        groups[group] += [symbol]
    else:
        groups[group] = [symbol]

exclude = 'HANGUL CJK SOGDIAN ELYMAIC DOGRA NANDINAGARI ZANABAZAR SOYOMBO MASARAM' \
    ' GUNJALA MAKASAR ANATOLIAN MEDEFAIDRIN TANGUT NUSHU (unknown) HENTAIGANA NYIAKENG'.split(' ')

with open('symbols.txt', 'w') as _file:
    for group, symbols in groups.items():
        if group not in exclude:
            _file.write(str(group) + '\n' + " ".join(symbols) + '\n\n')

There's also https://www.asmeurer.com/python-unicode-variable-names/continue-characters.html

jakebailey · 2021-05-19T23:43:13Z

This issue has been fixed in version 2021.5.3, which we've just released. You can find the changelog here: https://github.com/microsoft/pylance-release/blob/main/CHANGELOG.md#202153-19-may-2021

baggiponte · 2021-07-22T20:32:31Z

Hi! I am still having this issue (VSCode version is 1.58.2) with Jupyter Notebook cells running shell scripts via ! like !<my_bash_script>.

jakebailey · 2021-07-22T20:35:47Z

That's unrelated; see #1579 (comment) and https://github.com/microsoft/vscode-jupyter/issues/6635.

karthiknadig transferred this issue from microsoft/vscode-python May 14, 2021

github-actions bot added the triage label May 14, 2021

erictraut mentioned this issue May 14, 2021

Invalid character in token for Unicode character microsoft/pyright#1858

Closed

erictraut added bug Something isn't working and removed triage labels May 14, 2021

erictraut added the fixed in next version (main) A fix has been implemented and will appear in an upcoming version label May 15, 2021

jakebailey closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pylance reports "Invalid character in token" but character is valid #1286

Pylance reports "Invalid character in token" but character is valid #1286

p-i- commented May 14, 2021

erictraut commented May 14, 2021

erictraut commented May 15, 2021

p-i- commented May 16, 2021

erictraut commented May 16, 2021

p-i- commented May 16, 2021

jakebailey commented May 19, 2021

baggiponte commented Jul 22, 2021

jakebailey commented Jul 22, 2021

Pylance reports "Invalid character in token" but character is valid #1286

Pylance reports "Invalid character in token" but character is valid #1286

Comments

p-i- commented May 14, 2021

erictraut commented May 14, 2021

erictraut commented May 15, 2021

p-i- commented May 16, 2021

erictraut commented May 16, 2021

p-i- commented May 16, 2021

jakebailey commented May 19, 2021

baggiponte commented Jul 22, 2021

jakebailey commented Jul 22, 2021