fix type mismatch #1154

gyorb · 2017-11-20T13:02:54Z

after json load the value type in the dictionary will be unicode
this can not be used in a os.path.join or other string expressions
where a string is expected

In our current case this would fail:

codechecker/libcodechecker/analyze/analysis_manager.py

Lines 371 to 372 in 8a84dbe

    
           dependent_source = os.path.join(action.directory, 
        
                                           dependent_source)

after json load the value type in the dictionary will be unicode this can not be used in a os.path.join or other string expressions where a string is expected

martong · 2017-11-20T14:10:13Z

libcodechecker/analyze/log_parser.py

@@ -191,14 +191,15 @@ def parse_compile_commands_json(logfile, parseLogOptions):
    filtered_build_actions = {}

    logfile.seek(0)
+    # WARN: Every value will be unicode.


How did it work before? Do we have this problem only when the compile command file is utf encoded?

json.load will always produce unicode type values, I'm not sure why this was not a problem so far.
Do we have functional test for the dependency collection? The test or the dependency collection should have failed.

Yes we have tests for dependency collection:
tests/functional/test_analyze.py test_failure()

Would be good to know why these tests do not fail currently.

I recall a previous case where suddenly this unicode-ness became a problem, and it wasn't earlier. Perhaps it's environment-specific.

@gyorb What are you basing this statement on?

json.load will always produce unicode type values

I can't find the issue in particular, but I recall the error only happened to some particular user and we couldn't even reproduce it initially. I remember myself trying to find something related to this in the Python changelog...

Nothing says in os.path.join, nor its code (/usr/lib/python2.7/posixpath.py) that it needs to be unicode. It gets variables, it uses .endswith() and +=. The implementation for string concatenation lies in the Python interpreter's world, in C files. Something tells me there is a conversion going on in there.

I use python 2.7.12 and the dev_venv.

If I load a json from a string:

>>> json.loads('{ "name":"John", "age":31, "city":"New York" }') {u'city': u'New York', u'age': 31, u'name': u'John'}

or from a file

with open("test.json") as json_file: ... json.load(json_file) ... {u'city': u'New York', u'age': 31, u'name': u'John'}

the values will be unicode.
A json string will be unicode in python you can checkout this table.

I run the analyze functional tests when I get this error.
The problem is with the type mismatch in the join one of the value is str and the other is unicode.

Traceback (most recent call last): File "...libcodechecker/analyze/analysis_manager.py", line 372, in check dependent_source) File ".../lib/python2.7/posixpath.py", line 73, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

I tried to run

os.path.join(os.getcwd(), u'whisperity', "gyorb")

In the same environment and it worked.

So I think my explanation is the closest to reality (if not the explanation) on why we didn't see this error earlier: Python can essentially convert from unicorn to string and the error is only raised if we have a special character is in there and thus it can't convert.

Though notice that if at least one of the operands for the join is a unicorn, the result is also a unicorn, even if no non-ASCII chars are there which would make using non-utf strings reasonable.

0xe2 is the â letter according to a Google search.

Ok, but in this case we would still fail if there is a special character in the path.

gyorb · 2017-11-27T15:47:51Z

The output decoding fix in #1173 makes this pull request irrelevant.

codechecker/libcodechecker/analyze/analysis_manager.py

Line 152 in ff2db7c

output = codecs.decode(output, 'utf-8', 'replace')

fix type mismatch

2c7259a

after json load the value type in the dictionary will be unicode this can not be used in a os.path.join or other string expressions where a string is expected

gyorb requested a review from martong November 20, 2017 13:02

gyorb added the bugfix 🔨 label Nov 20, 2017

gyorb requested review from whisperity and dkrupp November 20, 2017 13:03

gyorb added this to the release 6.3 milestone Nov 20, 2017

martong reviewed Nov 20, 2017

View reviewed changes

gyorb modified the milestones: release 6.3, release 6.2.1 Nov 21, 2017

gyorb closed this Nov 27, 2017

gyorb deleted the type_mismatch branch April 11, 2019 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix type mismatch #1154

fix type mismatch #1154

gyorb commented Nov 20, 2017

martong Nov 20, 2017

gyorb Nov 20, 2017 •

edited

Loading

martong Nov 20, 2017

martong Nov 20, 2017

whisperity Nov 20, 2017

whisperity Nov 20, 2017

gyorb Nov 21, 2017

whisperity Nov 21, 2017

whisperity Nov 21, 2017

gyorb Nov 27, 2017

gyorb commented Nov 27, 2017

	dependent_source = os.path.join(action.directory,
	dependent_source)

fix type mismatch #1154

fix type mismatch #1154

Conversation

gyorb commented Nov 20, 2017

Choose a reason for hiding this comment

gyorb Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gyorb commented Nov 27, 2017

gyorb Nov 20, 2017 •

edited

Loading