Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix type mismatch #1154

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions libcodechecker/analyze/log_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,14 +191,15 @@ def parse_compile_commands_json(logfile, parseLogOptions):
filtered_build_actions = {}

logfile.seek(0)
# WARN: Every value will be unicode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did it work before? Do we have this problem only when the compile command file is utf encoded?

Copy link
Contributor Author

@gyorb gyorb Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.load will always produce unicode type values, I'm not sure why this was not a problem so far.
Do we have functional test for the dependency collection? The test or the dependency collection should have failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we have tests for dependency collection:
tests/functional/test_analyze.py test_failure()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to know why these tests do not fail currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall a previous case where suddenly this unicode-ness became a problem, and it wasn't earlier. Perhaps it's environment-specific.

@gyorb What are you basing this statement on?

json.load will always produce unicode type values

I can't find the issue in particular, but I recall the error only happened to some particular user and we couldn't even reproduce it initially. I remember myself trying to find something related to this in the Python changelog...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing says in os.path.join, nor its code (/usr/lib/python2.7/posixpath.py) that it needs to be unicode. It gets variables, it uses .endswith() and +=. The implementation for string concatenation lies in the Python interpreter's world, in C files. Something tells me there is a conversion going on in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use python 2.7.12 and the dev_venv.

If I load a json from a string:

>>> json.loads('{ "name":"John", "age":31, "city":"New York" }')
{u'city': u'New York', u'age': 31, u'name': u'John'}

or from a file

with open("test.json") as json_file:
...     json.load(json_file)
... 
{u'city': u'New York', u'age': 31, u'name': u'John'}

the values will be unicode.
A json string will be unicode in python you can checkout this table.

I run the analyze functional tests when I get this error.
The problem is with the type mismatch in the join one of the value is str and the other is unicode.

Traceback (most recent call last):
  File "...libcodechecker/analyze/analysis_manager.py", line 372, in check
    dependent_source)
  File ".../lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run

os.path.join(os.getcwd(), u'whisperity', "gyorb")

In the same environment and it worked.

So I think my explanation is the closest to reality (if not the explanation) on why we didn't see this error earlier: Python can essentially convert from unicorn to string and the error is only raised if we have a special character is in there and thus it can't convert.

Though notice that if at least one of the operands for the join is a unicorn, the result is also a unicorn, even if no non-ASCII chars are there which would make using non-utf strings reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0xe2 is the â letter according to a Google search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but in this case we would still fail if there is a special character in the path.

data = json.load(logfile)

compiler_includes = {}
compiler_target = {}

counter = 0
for entry in data:
sourcefile = entry['file']
sourcefile = str(entry['file'])

if not os.path.isabs(sourcefile):
# Newest versions of intercept-build can create the 'file' in the
Expand All @@ -213,7 +214,7 @@ def parse_compile_commands_json(logfile, parseLogOptions):

action = build_action.BuildAction(counter)
if 'command' in entry:
command = entry['command']
command = str(entry['command'])

# Old versions of intercept-build (confirmed to those shipping
# with upstream clang-5.0) do escapes in another way:
Expand All @@ -229,9 +230,11 @@ def parse_compile_commands_json(logfile, parseLogOptions):
command = ' '.join(entry['arguments'])
else:
raise KeyError("No valid 'command' or 'arguments' entry found!")

command = str(command)
results = option_parser.parse_options(command)

action.original_command = command
action.original_command = str(command)
action.analyzer_options = results.compile_opts

action.lang = results.lang
Expand Down Expand Up @@ -266,8 +269,8 @@ def parse_compile_commands_json(logfile, parseLogOptions):
action.skip = False

# TODO: Check arch.
action.directory = entry['directory']
action.sources = sourcefile
action.directory = str(entry['directory'])
action.sources = str(sourcefile)
# Filter out duplicate compilation commands.
unique_key = action.cmp_key
if filtered_build_actions.get(unique_key) is None:
Expand Down