Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jar: skip corrupt jar files #1049

Merged
merged 1 commit into from
Sep 7, 2023
Merged

Conversation

frostmar
Copy link
Contributor

Some types of jar file corruption don't cause errors as soon as the file is opened with archive/zip, but only later when the scanner looks for specific zip entries. Currently the error causes the indexing to fail completely.

So: Log and skip any jar files where archive/zip errors with "not a zip file"


Example of a weirdly corrupt jar:
https://github.com/IBM/db2-samples/blob/master/repl/xmlpubtk/loadqueue/LoadQueue.jar

# file /tmp/LoadQueue.jar
/tmp/LoadQueue.jar: Java archive data (JAR)
# jar tf /tmp/LoadQueue.jar
java.util.zip.ZipException: invalid END header (bad central directory offset)
	at java.util.zip.ZipFile.open(Native Method)
	at java.util.zip.ZipFile.<init>(ZipFile.java:247)
	at java.util.zip.ZipFile.<init>(ZipFile.java:172)
	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
	at sun.tools.jar.Main.list(Main.java:1127)
	at sun.tools.jar.Main.run(Main.java:305)
	at sun.tools.jar.Main.main(Main.java:1300)

@frostmar frostmar requested a review from a team as a code owner August 25, 2023 16:29
@frostmar frostmar requested review from hdonnay and removed request for a team August 25, 2023 16:29
@crozzy crozzy requested review from crozzy and removed request for hdonnay August 25, 2023 16:32
@codecov
Copy link

codecov bot commented Aug 25, 2023

Codecov Report

Patch coverage is 60.00% of modified lines.

Files Changed Coverage
java/jar/jar.go 60.00%

📢 Thoughts on this report? Let us know!.

@crozzy
Copy link
Contributor

crozzy commented Aug 25, 2023

Great stuff, thanks for the example jar. The change LGTM, it'd be create to have a test case around it. I'm thinking of either:

  • If a public c/image exists somewhere with this jar we could add it to the packagescanner_test.TestScan testcases
  • We could include the LoadQueue.jar in java/testdata and build a layer with it during the test and just check it no longer fails (gobin/gobin_test.go does something similar to embed a binary in a layer).

Thoughts?

@frostmar
Copy link
Contributor Author

Yup, very reasonable to have a test, I only didn't include one as I couldn't see any existing ones to extend covering error scenarios.

I'm unsure if there's a freely available image containing the corrupt LoadQueue.jar, I'll go searching... if not, I'll check it's licencing and try the second approach to build a layer to scan

@frostmar
Copy link
Contributor Author

frostmar commented Aug 30, 2023

I've added an integration test in the existing pattern.
The DB2 layer containing a corrupt file is ~2Gb (unpacked), so it may take some time to fetch - is that OK?

Passes for me locally, and fails without the code fix.

@crozzy
Copy link
Contributor

crozzy commented Aug 30, 2023

I've added an integration test in the existing pattern. The DB2 layer containing a corrupt file is ~2Gb (unpacked), so it may take some time to fetch - is that OK?

Passes for me locally, and fails without the code fix.

Great, thanks for adding that, it looks like the average CI time is no slower so LGTM, just going to check this flaky test isn't indicative of something in this patch...

@hdonnay
Copy link
Member

hdonnay commented Aug 30, 2023

Why not figure out how the jar is corrupted and create a jar in-test that's suitably mangled?

@frostmar
Copy link
Contributor Author

frostmar commented Aug 31, 2023

Why not figure out how the jar is corrupted and create a jar in-test that's suitably mangled?

If you'd like that, I'd appreciate assistance as I fear it may take some time to pick through the zip specs and fiddling to get an equivalent manufactured recreate. My team has several other urgent priorities, so the time I can invest right now is limited.

As the jar example in the description is a public DB2 sample, I think it could be used without licencing concerns, which may make reverse engineering to create an equivalent unnecessary.

@frostmar
Copy link
Contributor Author

frostmar commented Sep 7, 2023

Is there anything needed from me to make this mergable? I'd like to get into the next release, as we have users affected by this

@hdonnay
Copy link
Member

hdonnay commented Sep 7, 2023

So I opened up the offending jar in a hex editor (after finding it, the comment in the test is wrong) and found that the offset for the central directory footer somehow has the wrong offset: It's off by 3 bytes (for 3 entries, so I think the producer has an off-by-one error somewhere).

I'll knock together some code to generate a similarly broken zip in the test.

Here's an IPS patch to fix if you're interested:
LoadQueue.patch.gz
(gzipped to placate GitHub). It's literally swapping one byte.

@hdonnay
Copy link
Member

hdonnay commented Sep 7, 2023

Should be good to go.

Copy link
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work debugging the corruption, just one question

java/jar/jar_test.go Show resolved Hide resolved
Signed-off-by: Mark Frost <frostmar@uk.ibm.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
@hdonnay hdonnay merged commit c3ca57f into quay:main Sep 7, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants