-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix type mismatch #1154
fix type mismatch #1154
Conversation
after json load the value type in the dictionary will be unicode this can not be used in a os.path.join or other string expressions where a string is expected
@@ -191,14 +191,15 @@ def parse_compile_commands_json(logfile, parseLogOptions): | |||
filtered_build_actions = {} | |||
|
|||
logfile.seek(0) | |||
# WARN: Every value will be unicode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did it work before? Do we have this problem only when the compile command file is utf encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json.load
will always produce unicode type values, I'm not sure why this was not a problem so far.
Do we have functional test for the dependency collection? The test or the dependency collection should have failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we have tests for dependency collection:
tests/functional/test_analyze.py test_failure()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to know why these tests do not fail currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall a previous case where suddenly this unicode-ness became a problem, and it wasn't earlier. Perhaps it's environment-specific.
@gyorb What are you basing this statement on?
json.load will always produce unicode type values
I can't find the issue in particular, but I recall the error only happened to some particular user and we couldn't even reproduce it initially. I remember myself trying to find something related to this in the Python changelog...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing says in os.path.join
, nor its code (/usr/lib/python2.7/posixpath.py
) that it needs to be unicode. It gets variables, it uses .endswith()
and +=
. The implementation for string concatenation lies in the Python interpreter's world, in C
files. Something tells me there is a conversion going on in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use python 2.7.12 and the dev_venv.
If I load a json from a string:
>>> json.loads('{ "name":"John", "age":31, "city":"New York" }')
{u'city': u'New York', u'age': 31, u'name': u'John'}
or from a file
with open("test.json") as json_file:
... json.load(json_file)
...
{u'city': u'New York', u'age': 31, u'name': u'John'}
the values will be unicode.
A json string will be unicode in python you can checkout this table.
I run the analyze functional tests when I get this error.
The problem is with the type mismatch in the join one of the value is str and the other is unicode.
Traceback (most recent call last):
File "...libcodechecker/analyze/analysis_manager.py", line 372, in check
dependent_source)
File ".../lib/python2.7/posixpath.py", line 73, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to run
os.path.join(os.getcwd(), u'whisperity', "gyorb")
In the same environment and it worked.
So I think my explanation is the closest to reality (if not the explanation) on why we didn't see this error earlier: Python can essentially convert from unicorn to string and the error is only raised if we have a special character is in there and thus it can't convert.
Though notice that if at least one of the operands for the join is a unicorn, the result is also a unicorn, even if no non-ASCII chars are there which would make using non-utf strings reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0xe2
is the â
letter according to a Google search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but in this case we would still fail if there is a special character in the path.
The output decoding fix in #1173 makes this pull request irrelevant.
|
after json load the value type in the dictionary will be unicode
this can not be used in a os.path.join or other string expressions
where a string is expected
In our current case this would fail:
codechecker/libcodechecker/analyze/analysis_manager.py
Lines 371 to 372 in 8a84dbe