[BUG] Support for custom Python environment that ignore PEP 3120 #114

kivhub · 2021-09-22T21:42:38Z

Describe the bug
With requests library using charset-normalizer I am getting an error when calling Python via User-Defined Transform in SAP BODS:

File "EXPRESSION", line 6, in <module>
File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
from charset_normalizer import __version__ as charset_normalizer_version
File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 11
SyntaxError: Non-ASCII character '\xd1' in file c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py on
line 12, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details.

I am not able to define a source code encoding by placing a magic comment into the source files (either as a first or second line in the file) because the app probably modifies the script by itself (placing # -*- coding: utf-8 -*- doesn't help). The setting of environment variable PYTHONUTF8=1 doesn't help too.

To Reproduce
I am not able to provide code to reproduce the issue, it arises when calling Python via User-Defined Transform in SAP BODS
Please check: apache/superset#15631
This could be the same problem: https://stackoverflow.com/questions/68594538/syntaxerror-non-ascii-character-xd1-in-file-charset-normalizer-init-py-i

Expected behavior
No error - with requests version using chardet library there is no problem. Maybe avoiding non-ASCII characters in init.py could help...?

Logs
Please see the bug description.

Desktop (please complete the following information):

OS: Windows 2016 Server
Python version 3.9.6
Package version 2.0.6
Requests version 2.26.0

Additional context
N/A

The text was updated successfully, but these errors were encountered:

Ousret · 2021-09-23T15:39:24Z

Hi,

Thanks for the detailed report.
Yes, for some reasons, some environments does not take UTF-8 as the default source encoding.

The PEP 3120 gets ignored https://www.python.org/dev/peps/pep-3120/

Its not just the top level __init__.py that have non ASCII characters. Also in assets/__init__.py.
Since, by your tests it seems to also ignore PEP 263 https://www.python.org/dev/peps/pep-0263/ I do not have any silver bullet for this one.

The obvious solution would be to find a proper way to represent u8 characters without actually using the str repr.
I am going to think more about this.

Open for suggestions/PR.

Ousret · 2021-09-23T18:17:33Z

There is something of interest cf. https://bugs.python.org/issue29240

Ousret · 2021-09-24T22:12:26Z

@kivhub I have found something interesting regarding the NT platform + python.

On Windows, the PYTHONLEGACYWINDOWSFSENCODING environment variable (PEP 529) has the priority over UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

What does return the following for you:

sys.getfilesystemencoding()
locale.getpreferredencoding()

And could you test against dev-master to see if that patch does anything at all #116

kivhub · 2021-09-27T11:49:51Z

@kivhub I have found something interesting regarding the NT platform + python.

On Windows, the PYTHONLEGACYWINDOWSFSENCODING environment variable (PEP 529) has the priority over UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

Thank you very much, @Ousret, for your effort. It looks there is a problem with old version of Python which comes to play with SAP BODS.

What does return the following for you:
sys.getfilesystemencoding()
locale.getpreferredencoding()

Python (3.9.6) itself returns UTF-8 on both. But when running Python via SAP BODS job the results are as follows:

('sys.getfilesystemencoding()', 'mbcs')
('locale.getpreferredencoding()', 'cp1252')
('sys.version_info.major', 2)
('sys.version_info.minor', 7)

The Python 3 libraries are used via:

#this works (requests + chardet)
sys.path.insert(0, 'c:\program files\python37\lib\site-packages')

#this doesn't work (requests + charset-normalizer)
sys.path.insert(0, 'c:\program files\python39\lib\site-packages')

And could you test against dev-master to see if that patch does anything at all #116

I replaced the contents of these 2 files and after change to:

sys.path.insert(0, 'c:\program files\python39\lib\site-packages')

I am getting this error message:

File "EXPRESSION", line 7, in <module>
File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
from charset_normalizer import __version__ as charset_normalizer_version
File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 20, in <module>
from .api import from_bytes, from_fp, from_path, normalize
File "c:\program files\python39\lib\site-packages\charset_normalizer\api.py", line 38
sequences: bytes,
^
SyntaxError: invalid syntax.

kivhub · 2021-09-27T12:09:08Z

And could you test against dev-master to see if that patch does anything at all #116

I replaced the contents of these 2 files and after change to:

sys.path.insert(0, 'c:\program files\python39\lib\site-packages')

I am getting this error message:
File "EXPRESSION", line 7, in <module>
File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
from charset_normalizer import __version__ as charset_normalizer_version
File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 20, in <module>
from .api import from_bytes, from_fp, from_path, normalize
File "c:\program files\python39\lib\site-packages\charset_normalizer\api.py", line 38
sequences: bytes,
^
SyntaxError: invalid syntax.

I have an update here. After moving path.insert before all imports:

sys.path.insert(0, 'c:\program files\python39\lib\site-packages')
import json
import sys
import os
import requests
from datetime import datetime

the code runs fine without any errors.

Ousret · 2021-09-27T20:23:28Z

I have not been able to force Python to trigger the decode error using a NT platform..

('sys.getfilesystemencoding()', 'mbcs')
('locale.getpreferredencoding()', 'cp1252')
('sys.version_info.major', 2)  # ??
('sys.version_info.minor', 7)

Well, from the look of it, your setup invoke Python 2.7 and from that standpoint I cannot do anything.

SyntaxError: invalid syntax. is a pretty good indicator.

I would recommend you to alter the PATH outside of Python.
So I am closing this as there is nothing that can be done from here.

kivhub added bug Something isn't working help wanted Extra attention is needed labels Sep 22, 2021

Ousret changed the title ~~[BUG] Non-ASCII character '\xd1' in file ...\lib\site-packages\charset_normalizer_init_.py~~ [BUG] Support for custom Python environment that ignore PEP 3120 Sep 23, 2021

Ousret mentioned this issue Sep 23, 2021

🔧 Trying to leverage PEP263 when PEP3120 is ignored #116

Merged

Ousret closed this as completed Sep 27, 2021

Ousret mentioned this issue Jan 3, 2022

[BUG] api.py "sequences: bytes" SyntaxError #157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Support for custom Python environment that ignore PEP 3120 #114

[BUG] Support for custom Python environment that ignore PEP 3120 #114

kivhub commented Sep 22, 2021

Ousret commented Sep 23, 2021

Ousret commented Sep 23, 2021

Ousret commented Sep 24, 2021

kivhub commented Sep 27, 2021

kivhub commented Sep 27, 2021

Ousret commented Sep 27, 2021

[BUG] Support for custom Python environment that ignore PEP 3120 #114

[BUG] Support for custom Python environment that ignore PEP 3120 #114

Comments

kivhub commented Sep 22, 2021

Ousret commented Sep 23, 2021

Ousret commented Sep 23, 2021

Ousret commented Sep 24, 2021

kivhub commented Sep 27, 2021

kivhub commented Sep 27, 2021

Ousret commented Sep 27, 2021