Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Properly handle non-UTF8 filenames in Zip #53

Open
punkeel opened this issue Dec 15, 2018 · 1 comment
Open

Properly handle non-UTF8 filenames in Zip #53

punkeel opened this issue Dec 15, 2018 · 1 comment
Labels

Comments

@punkeel
Copy link
Contributor

punkeel commented Dec 15, 2018

	at java.base/java.lang.StringCoding.throwMalformed(StringCoding.java:685)
	at java.base/java.lang.StringCoding.decodeUTF8_0(StringCoding.java:768)
	at java.base/java.lang.StringCoding.newStringUTF8NoRepl(StringCoding.java:965)
	at java.base/java.lang.System$2.newStringUTF8NoRepl(System.java:2197)
	at java.base/java.util.zip.ZipCoder$UTF8.toString(ZipCoder.java:60)
	at java.base/java.util.zip.ZipCoder.toString(ZipCoder.java:87)
	at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:301)
	at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:123)
	at xyz.docbleach.module.zip.ArchiveBleach.sanitize(ArchiveBleach.java:44)
	at xyz.docbleach.api.bleach.CompositeBleach.sanitize(CompositeBleach.java:74)
	at xyz.docbleach.api.BleachSession.sanitize(BleachSession.java:71)
	at xyz.docbleach.cli.Main.sanitize(Main.java:81)
	at xyz.docbleach.cli.Main.main(Main.java:54)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
	... 13 more

Process finished with exit code 1

Sample file: e35d68feda25f401da03883da1e9c437

Archive:  VirusShare_e35d68feda25f401da03883da1e9c437
Zip file size: 1978422 bytes, number of entries: 4
drwx---     3.1 fat        0 bx stor 13-Jun-24 09:44 bulletstorm-trainer18/
-rwxa--     3.1 fat  2030080 bx defN 11-Mar-12 21:16 bulletstorm-trainer18/BS+28Tr-LinGon.exe
-rw-a--     3.1 fat      893 tx defN 13-Jun-24 09:48 bulletstorm-trainer18/+�+���+��+��.txt
-rw-a--     3.1 fat      151 tx defN 13-Mar-29 17:14 bulletstorm-trainer18/+�+���+��+��.url
4 files, 2031124 bytes uncompressed, 1977223 bytes compressed:  2.7%
@punkeel punkeel added the zip label Dec 15, 2018
@punkeel
Copy link
Contributor Author

punkeel commented Dec 15, 2018

Full zipinfo:

Archive:  VirusShare_e35d68feda25f401da03883da1e9c437
The zipfile comment is 241 bytes long and contains the following text:
======================== zipfile comment begins ==========================
�������԰ http://www.cr173.com

�������������������û���⡣

�ٶ�һ�¡��������԰������ϲ������Ŷ��

--------------------------------------------------
--------------------------------------------------

��ѹ���룺www.cr173.com
========================= zipfile comment ends ===========================

End-of-central-directory record:
-------------------------------

  Zip archive file size:                   1978422 (00000000001E3036h)
  Actual end-cent-dir record offset:       1978159 (00000000001E2F2Fh)
  Expected end-cent-dir record offset:     1978159 (00000000001E2F2Fh)
  (based on the length of the central directory and its expected offset)

  This zipfile constitutes the sole disk of a single-part archive; its
  central directory contains 4 entries.
  The central directory is 572 (000000000000023Ch) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 1977587 (00000000001E2CF3h).


Central directory entry #1:
---------------------------

  bulletstorm-trainer18/

  offset of local header from start of archive:   0
                                                  (0000000000000000h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   3.1
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   1.0
  compression method:                             none (stored)
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2013 Jun 24 09:44:28
  32-bit CRC value (hex):                         00000000
  compressed size:                                0 bytes
  uncompressed size:                              0 bytes
  length of filename:                             22 characters
  length of extra field:                          36 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (10 hex):                dir

  The central-directory extra field contains:
  - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
    20 are:   00 00 00 00 01 00 18 00 d4 e6 d4 5d 7c 70 ce 01 d4 e6 d4 5d.

  There is no file comment.

Central directory entry #2:
---------------------------

  There are an extra -36 bytes preceding this file.

  bulletstorm-trainer18/BS+28Tr-LinGon.exe

  offset of local header from start of archive:   52
                                                  (0000000000000034h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   3.1
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2011 Mar 12 21:16:50
  32-bit CRC value (hex):                         3083c7e1
  compressed size:                                1976519 bytes
  uncompressed size:                              2030080 bytes
  length of filename:                             40 characters
  length of extra field:                          36 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (20 hex):                arc

  The central-directory extra field contains:
  - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
    20 are:   00 00 00 00 01 00 18 00 20 37 fb bf b7 e0 cb 01 66 31 2f 59.

  There is no file comment.

Central directory entry #3:
---------------------------

  There are an extra -36 bytes preceding this file.

  bulletstorm-trainer18/+�+���+��+��.txt

  offset of local header from start of archive:   1976641
                                                  (00000000001E2941h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   3.1
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2013 Jun 24 09:48:12
  32-bit CRC value (hex):                         558ee91a
  compressed size:                                570 bytes
  uncompressed size:                              893 bytes
  length of filename:                             38 characters
  length of extra field:                          89 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             text
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (20 hex):                arc

  The central-directory extra field contains:
  - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
    20 are:   00 00 00 00 01 00 18 00 aa 21 43 e3 7c 70 ce 01 1c 87 ee 5b.
  - A subfield with ID 0x7075 (UTF8 path name) and 49 data bytes. The first
    24 UTF8 bytes in the extra field (V1, ASCII name CRC `f37f78d7') are:
    62 75 6c 6c 65 74 73 74 6f 72 6d 2d 74 72 61 69 6e 65 72 31 38 2f e8 a5.

  There is no file comment.

Central directory entry #4:
---------------------------

  There are an extra -36 bytes preceding this file.

  bulletstorm-trainer18/+�+���+��+��.url

  offset of local header from start of archive:   1977332
                                                  (00000000001E2BF4h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   3.1
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2013 Mar 29 17:14:02
  32-bit CRC value (hex):                         fcf30365
  compressed size:                                134 bytes
  uncompressed size:                              151 bytes
  length of filename:                             38 characters
  length of extra field:                          89 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             text
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (20 hex):                arc

  The central-directory extra field contains:
  - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
    20 are:   00 00 00 00 01 00 18 00 b8 1b 98 c1 5d 2c ce 01 1c 87 ee 5b.
  - A subfield with ID 0x7075 (UTF8 path name) and 49 data bytes. The first
    24 UTF8 bytes in the extra field (V1, ASCII name CRC `1b3e623c') are:
    62 75 6c 6c 65 74 73 74 6f 72 6d 2d 74 72 61 69 6e 65 72 31 38 2f e8 a5.

  There is no file comment.
4 files, 2031124 bytes uncompressed, 1977223 bytes compressed:  2.7%

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant