Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix type mismatch #1154

Closed
wants to merge 1 commit into from
Closed

fix type mismatch #1154

wants to merge 1 commit into from

Conversation

gyorb
Copy link
Contributor

@gyorb gyorb commented Nov 20, 2017

after json load the value type in the dictionary will be unicode
this can not be used in a os.path.join or other string expressions
where a string is expected

In our current case this would fail:

dependent_source = os.path.join(action.directory,
dependent_source)

after json load the value type in the dictionary will be unicode
this can not be used in a os.path.join or other string expressions
where a string is expected
@gyorb gyorb requested a review from martong November 20, 2017 13:02
@gyorb gyorb requested review from whisperity and dkrupp November 20, 2017 13:03
@gyorb gyorb added this to the release 6.3 milestone Nov 20, 2017
@@ -191,14 +191,15 @@ def parse_compile_commands_json(logfile, parseLogOptions):
filtered_build_actions = {}

logfile.seek(0)
# WARN: Every value will be unicode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did it work before? Do we have this problem only when the compile command file is utf encoded?

Copy link
Contributor Author

@gyorb gyorb Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.load will always produce unicode type values, I'm not sure why this was not a problem so far.
Do we have functional test for the dependency collection? The test or the dependency collection should have failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we have tests for dependency collection:
tests/functional/test_analyze.py test_failure()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to know why these tests do not fail currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall a previous case where suddenly this unicode-ness became a problem, and it wasn't earlier. Perhaps it's environment-specific.

@gyorb What are you basing this statement on?

json.load will always produce unicode type values

I can't find the issue in particular, but I recall the error only happened to some particular user and we couldn't even reproduce it initially. I remember myself trying to find something related to this in the Python changelog...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing says in os.path.join, nor its code (/usr/lib/python2.7/posixpath.py) that it needs to be unicode. It gets variables, it uses .endswith() and +=. The implementation for string concatenation lies in the Python interpreter's world, in C files. Something tells me there is a conversion going on in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use python 2.7.12 and the dev_venv.

If I load a json from a string:

>>> json.loads('{ "name":"John", "age":31, "city":"New York" }')
{u'city': u'New York', u'age': 31, u'name': u'John'}

or from a file

with open("test.json") as json_file:
...     json.load(json_file)
... 
{u'city': u'New York', u'age': 31, u'name': u'John'}

the values will be unicode.
A json string will be unicode in python you can checkout this table.

I run the analyze functional tests when I get this error.
The problem is with the type mismatch in the join one of the value is str and the other is unicode.

Traceback (most recent call last):
  File "...libcodechecker/analyze/analysis_manager.py", line 372, in check
    dependent_source)
  File ".../lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run

os.path.join(os.getcwd(), u'whisperity', "gyorb")

In the same environment and it worked.

So I think my explanation is the closest to reality (if not the explanation) on why we didn't see this error earlier: Python can essentially convert from unicorn to string and the error is only raised if we have a special character is in there and thus it can't convert.

Though notice that if at least one of the operands for the join is a unicorn, the result is also a unicorn, even if no non-ASCII chars are there which would make using non-utf strings reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0xe2 is the â letter according to a Google search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but in this case we would still fail if there is a special character in the path.

@gyorb gyorb modified the milestones: release 6.3, release 6.2.1 Nov 21, 2017
@gyorb
Copy link
Contributor Author

gyorb commented Nov 27, 2017

The output decoding fix in #1173 makes this pull request irrelevant.

output = codecs.decode(output, 'utf-8', 'replace')

@gyorb gyorb closed this Nov 27, 2017
@gyorb gyorb deleted the type_mismatch branch April 11, 2019 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants