Don't attempt to add the same file multiple times to the ZIP #1234

whisperity · 2017-12-07T13:05:34Z

Related to #1219. Not sure yet if this closes it!

When a file is added multiple times to a ZIP, at the .close() of the ZipFile context manager, warnings are raised about this. (ZIP files seemingly can contain the same path twice, but clients can get nondeterministic about which actual content to use if the contents differ between the paths!)

I am absolutely unsure why the analysis manager for @sylvestre attempts to put the same file twice into the ZIP. But instead of letting the warnings flow at the end of failure-zip creation, we preemptively check if the file already contains a path, and show a message about this fact.

whisperity · 2017-12-07T13:06:06Z

libcodechecker/server/api/report_server.py

-                    # The file was not in the ZIP file, because we already have the
-                    # content. Let's check if we already have a file record in the
-                    # database or we need to add one.
+                    # The file was not in the ZIP file, because we already


Changes in this file irrelevant to the PR, but the commit failed to generate locally for me on pep8 errors:

libcodechecker/server/api/report_server.py:1924:80: E501 line too long (83 > 79 characters) libcodechecker/server/api/report_server.py:1925:80: E501 line too long (82 > 79 characters) libcodechecker/server/api/report_server.py:1934:80: E501 line too long (80 > 79 characters)

I fixed them in #1166 too.

Okay, undid the changes.

gyorb

Checking if the the file is already in the content can be done only based on the path?

Otherwise LGTM.

gyorb · 2017-12-07T14:39:44Z

libcodechecker/server/api/report_server.py

-                    # The file was not in the ZIP file, because we already have the
-                    # content. Let's check if we already have a file record in the
-                    # database or we need to add one.
+                    # The file was not in the ZIP file, because we already


I fixed them in #1166 too.

whisperity · 2017-12-07T14:49:16Z

@gyorb I think we are safe to assume that once the user has /home/foo/bar/baz.cpp and we save it as sources-root/..., it won't change. Hopefully. If it does, we are done for...

Xazax-hun

LG!

Xazax-hun · 2017-12-07T17:00:00Z

libcodechecker/analyze/analysis_manager.py

-                                os.path.join("sources-root",
-                                             archive_path),
-                                zipfile.ZIP_DEFLATED)
+                            _ = archive.getinfo(archive_path)


I wonder if this can be a bottleneck? An alternative approach would be to query the file list upfront and do not query the info for each path. In case this query is cheap, it should not be a problem.

The ZipFile class (see /usr/lib/python2.7/zipfile.py) contains a NameToInfo dict which maps paths to ZipInfo objects. Dicts have O(n) worst-case lookup time.

However, this is currently a strange patch. If you look at the code, you can see we use sets everywhere for file lists, which also have a linear worst-case complexity. As explained earlier, I have no idea why some files are marked to be added multiple times to the ZIP. Perhaps something is wrong with parsing the analyzer output, perhaps this simply happens from @sylvestre using --enable debug checkers which produce strange output.

And we are using string keys, so this is hopefully fast enough. We also aren't expecting a lot of analyses to fail. The ZIP file already goes over its internal structure at .close() (to finish up the ZIP structure, calculte CRCs and such), where it already figures out path duplications (hence the warning in #1219).

Xazax-hun · 2017-12-07T17:06:36Z

Maybe it would be great to open a bug to investigate the underlying issue.

whisperity · 2017-12-07T17:07:35Z

There already is. And it's mentioned in this PR in the first line.

whisperity added bugfix 🔨 analyzer 📈 Related to the analyze commands (analysis driver) labels Dec 7, 2017

whisperity commented Dec 7, 2017

View reviewed changes

gyorb reviewed Dec 7, 2017

View reviewed changes

Don't attempt to add the same file multiple times to the ZIP

00c34d0

whisperity force-pushed the analysis-zip-multiadd branch from 9a19ee5 to 00c34d0 Compare December 7, 2017 14:50

whisperity requested a review from gyorb December 7, 2017 14:50

gyorb approved these changes Dec 7, 2017

View reviewed changes

Xazax-hun approved these changes Dec 7, 2017

View reviewed changes

Xazax-hun reviewed Dec 7, 2017

View reviewed changes

gyorb added this to the release 6.3 milestone Dec 8, 2017

gyorb merged commit eceb906 into Ericsson:master Dec 8, 2017

whisperity deleted the analysis-zip-multiadd branch December 18, 2017 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't attempt to add the same file multiple times to the ZIP #1234

Don't attempt to add the same file multiple times to the ZIP #1234

whisperity commented Dec 7, 2017

whisperity Dec 7, 2017

gyorb Dec 7, 2017

whisperity Dec 7, 2017

gyorb left a comment

gyorb Dec 7, 2017

whisperity commented Dec 7, 2017

Xazax-hun left a comment

Xazax-hun Dec 7, 2017

whisperity Dec 7, 2017

whisperity Dec 7, 2017

Xazax-hun commented Dec 7, 2017

whisperity commented Dec 7, 2017

Don't attempt to add the same file multiple times to the ZIP #1234

Don't attempt to add the same file multiple times to the ZIP #1234

Conversation

whisperity commented Dec 7, 2017

whisperity Dec 7, 2017

Choose a reason for hiding this comment

gyorb Dec 7, 2017

Choose a reason for hiding this comment

whisperity Dec 7, 2017

Choose a reason for hiding this comment

gyorb left a comment

Choose a reason for hiding this comment

gyorb Dec 7, 2017

Choose a reason for hiding this comment

whisperity commented Dec 7, 2017

Xazax-hun left a comment

Choose a reason for hiding this comment

Xazax-hun Dec 7, 2017

Choose a reason for hiding this comment

whisperity Dec 7, 2017

Choose a reason for hiding this comment

whisperity Dec 7, 2017

Choose a reason for hiding this comment

Xazax-hun commented Dec 7, 2017

whisperity commented Dec 7, 2017