Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zip] 7z complains about "Headers Error" when large files are added to a zip archive #623

Closed
riton opened this issue Jun 9, 2022 · 10 comments · Fixed by #624
Closed

[zip] 7z complains about "Headers Error" when large files are added to a zip archive #623

riton opened this issue Jun 9, 2022 · 10 comments · Fixed by #624

Comments

@riton
Copy link

riton commented Jun 9, 2022

Hi,

Context

I'm trying to generate a zip archive of huge files (something like 2GB each).

The generated zip archive can successfully be extracted using my Ubuntu unzip command.
But it raises an error when I try to extract using my graphical user interface or 7z.

Here is the output of 7z in test integrity mode:

$ 7z t archive.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (806EC),ASM,AES-NI)

Scanning the drive for archives:
1 file, 4294967634 bytes (4097 MiB)

Testing archive: archive.zip

ERRORS:
Headers Error

--
Path = archive.zip
Type = zip
ERRORS:
Headers Error
Physical Size = 4294967634
64-bit = +

              

Archives with Errors: 1

Open Errors: 1

Since unzip is able to extract the files, I would say that the archive is valid.
But I definitely does not understand why 7z is complaining here.

I've absolutely no experience in ZIP / ZIP64 archives and I may be using your library in a wrong way...

How to reproduce

I've used the following code to :

package main

import (
	"fmt"
	"io"
	"log"
	"os"
	"path/filepath"

	"github.com/klauspost/compress/zip"
)

var (
	// generated using "dd if=/dev/zero of=./file1 bs=1G count=2" or equivalent
	fileNames = []string{
		"./file1",
		"./file2",
	}
)

func main() {
	outFileW, err := os.Create("./archive.zip")
	if err != nil {
		log.Fatal(err)
	}
	defer outFileW.Close()

	zipWriter := zip.NewWriter(outFileW)

	for _, filename := range fileNames {

		fileToZip, err := os.Open(filename)
		if err != nil {
			log.Fatal(err)
		}
		defer fileToZip.Close()

		// Get the file information
		info, err := fileToZip.Stat()
		if err != nil {
			log.Fatal(err)
		}

		header, err := zip.FileInfoHeader(info)
		if err != nil {
			log.Fatal(err)
		}

		// Using FileInfoHeader() above only uses the basename of the file. If we want
		// to preserve the folder structure we can overwrite this with the full path.
		header.Name = filepath.Base(filename)

		// Change to store to avoid compression
		// see http://golang.org/pkg/archive/zip/#pkg-constants
		header.Method = zip.Store

		writer, err := zipWriter.CreateHeader(header)
		if err != nil {
			log.Fatal(err)
		}
		_, err = io.Copy(writer, fileToZip)
		if err != nil {
			log.Fatal(err)
		}
	}

	if err := zipWriter.Close(); err != nil {
		log.Fatal(err)
	}

	fmt.Println("archive.zip file created")
}

using

  • go 1.18.3
  • github.com/klauspost/compress v1.15.6

Test files are filled with zero and created using a command such as dd if=/dev/zero of=./file1 bs=1G count=2

Thanks in advance for your time and consideration

@riton
Copy link
Author

riton commented Jun 9, 2022

Note : I'm getting the same error if I use the archive/zip standard library package instead of yours so this may absolutely not be related to your library.

I'm quite confused about what I'm missing here.

@klauspost
Copy link
Owner

Without knowing what 7z is complaining about it is rather hard to know.

"Zip64" and Zip in general has a lot of legacy and not too well defined extensions. So "Headers Error" is rather useless to find out what it is expecting.

Is 7z able to decompress the file?

@klauspost
Copy link
Owner

Seems like the UI has slightly more information:

image

@klauspost
Copy link
Owner

ok, that lead to https://sourceforge.net/p/sevenzip/discussion/45797/thread/13e7d575/

TLDR; When the file is added we do not know if the size will exceed 32 bits and therefore it should be added as Zip64.

So the file header may not be Zip64, even if it ends up being it.

7z decompresses it fine anyway, so I am not going to worry about it.

@riton
Copy link
Author

riton commented Jun 9, 2022

So if I understand correctly:

  • 7z is more strict than other archive managers here
  • I'm doing things as they're supposed to be done to create the zip archive

Thanks for your investigation and time.

@riton
Copy link
Author

riton commented Jun 9, 2022

Note : the issue mentioned in https://sourceforge.net/p/sevenzip/discussion/45797/thread/13e7d575/#e90f may be golang/go#33116

Is this issue related to mine ?

@klauspost
Copy link
Owner

Yes. I see I accidentally reverted the fix for it in #432

klauspost added a commit that referenced this issue Jun 9, 2022
Accidentally reverted #313 in #432

Fixes #623
klauspost added a commit that referenced this issue Jun 9, 2022
Accidentally reverted #313 in #432

Fixes #623
@riton
Copy link
Author

riton commented Jun 9, 2022

For information, I've tried the version at 999ca10 but 7z still emits warnings with the generated archive:

$ cat go.mod 
module github.com/riton/zip_issue

go 1.18

require github.com/klauspost/compress v1.15.7-0.20220609131744-999ca1093d2e

and the output of 7z:

$ 7z t archive.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (806EC),ASM,AES-NI)

Scanning the drive for archives:
1 file, 4294967634 bytes (4097 MiB)

Testing archive: archive.zip

ERRORS:
Headers Error

--
Path = archive.zip
Type = zip
ERRORS:
Headers Error
Physical Size = 4294967634
64-bit = +

              

Archives with Errors: 1

Open Errors: 1

7z is still able to extract the archive but still exits with an error status.

I'm letting you reopen this issue if you think there is something to fix here. Otherwise I accept and keep in mind that 7z is more strict than other zip archivers out there.

Thanks for your time

Regards

Rémi

@klauspost
Copy link
Owner

Hmmm... the 7zip UI no longer shows any error:

image

WinZip is also happy about the file:

Archive: E:\gopath\src\github.com\mholt\archiver\cmd\arc\out.zip   942726480 bytes   2022-06-10 11:48:04
Current Location part 1 offset 942726458
End central directory record PK0506 (4+18)
==========================================
    location of end-of-central-dir record:          942726458 (0x3830dd3a) bytes
    part number of this part (0000):                1
    part number of start of central dir (0000):     1
    number of entries in central dir in this part:  1
    total number of entries in central dir:         1
    size of central dir:                            110 (0x0000006e) bytes
    relative offset of central dir:                 942726348 (0x3830dccc) bytes
    zipfile comment length:                         0

Current Location part 1 offset 942726348
Central directory entry PK0102 (4+42): #1
======================================
    part number in which file begins (0000):        1
    relative offset of local header:                4294967295 (0xffffffff) bytes
    version made by operating system (03):          Unix
    version made by zip software (20):              2.0
    operat. system version needed to extract (00):  MS-DOS, OS/2, NT FAT
    unzip software version needed to extract (45):  4.5
    general purpose bit flag (0x0008) (bit 15..0):  0000.0000 0000.1000
      file security status  (bit 0):                not encrypted
      extended local header (bit 3):                yes
    compression method (08):                        deflated
      compression sub-type (deflation):             normal
    file last modified on (0x00004efb 0x000059e4):  2019-07-27 11:15:08
    32-bit CRC value:                               0x2e234f73
    compressed size:                                4294967295 bytes
    uncompressed size:                              4294967295 bytes
    length of filename:                             27 characters
    length of extra field:                          37 bytes
    length of file comment:                         0 characters
    internal file attributes:                       0x0000
      apparent file type:                           binary
    external file attributes:                       0x81b60000
      Unix file attributes (100666 octal):          -rw-rw-rw-
      MS-DOS file attributes (0x00):                none
    filename: github-june-2days-2019.json
    extra field 0x5455 (universal time), 4 header and 5 data bytes:
    01 3c 32 3c 5d                                  .<2<]             
    extra field 0x0001 (ZIP64 Tag), 4 header and 24 data bytes:
    ZIP64 Tag Value(s):
      Value #1:                                     6273951764
      Value #2:                                     942726258
      Value #3:                                     0

Testing github-june-2days-2019.json
******* github-june-2days-2019.json Tested OK

Current Location part 1 offset 0
Local directory entry PK0304 (4+26): #1
------------------------------------
    operat. system version needed to extract (00):  MS-DOS, OS/2, NT FAT
    unzip software version needed to extract (20):  2.0
    general purpose bit flag (0x0008) (bit 15..0):  0000.0000 0000.1000
      file security status  (bit 0):                not encrypted
      extended local header (bit 3):                yes
    compression method (08):                        deflated
      compression sub-type (deflation):             normal
    file last modified on (0x00004efb 0x000059e4):  2019-07-27 11:15:08
    32-bit CRC value:                               0x00000000
    compressed size:                                0 bytes
    uncompressed size:                              0 bytes
  note: "real" crc and sizes are in the extended local header
    length of filename:                             27 characters
    length of extra field:                          9 bytes
    filename: github-june-2days-2019.json
    extra field 0x5455 (universal time), 4 header and 5 data bytes:
    01 3c 32 3c 5d                                  .<2<]             

Current Location part 0 offset 942726324
Extended local dir entry PK0708 (4+12): #1
---------------------------------------
    32-bit CRC value:                               0x2e234f73
    compressed size:                                942726258 bytes
    uncompressed size:                              0 bytes


No errors detected in compressed data of E:\gopath\src\github.com\mholt\archiver\cmd\arc\out.zip.

@riton
Copy link
Author

riton commented Jun 10, 2022

Okay, understood. This seems related to the default version of 7-Zip present on my Ubuntu workstation.

Different 7zip versions

My Ubuntu workstation default version

By using the default 7z utility on my workstation, I get:

$ 7z t archive.zip 

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (806EC),ASM,AES-NI)

Scanning the drive for archives:
1 file, 4294967634 bytes (4097 MiB)

Testing archive: archive.zip

ERRORS:
Headers Error

--
Path = archive.zip
Type = zip
ERRORS:
Headers Error
Physical Size = 4294967634
64-bit = +

Archives with Errors: 1

Open Errors: 1

With the latest 7-Zip version from the official website

With the latest version of 7-Zip retrieved from https://www.7-zip.org/download.html no error is detected:

$ /tmp/7zz t archive.zip

7-Zip (z) 21.07 (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26
 64-bit locale=en_US.UTF-8 Threads:8, ASM

Scanning the drive for archives:
1 file, 4294967634 bytes (4097 MiB)

Testing archive: archive.zip
--
Path = archive.zip
Type = zip
Physical Size = 4294967634
64-bit = +
Characteristics = Zip64

Everything is Ok

Files: 2
Size:       4294967296
Compressed: 4294967634

Regarding the patch applied in 7020af7 :

  • p7zip version detects invalid headers in the generated archives. With the patch, and without the patch
  • 7-Zip version from https://www.7-zip.org/download.html does not detect invalid headers. Even without the patch applied

Version details on my workstation

$ 7z -h 2>&1

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (806EC),ASM,AES-NI)
[...]

provided by

$ dpkg -S /usr/bin/7z
p7zip-full: /usr/bin/7z

$ apt info p7zip-full
Package: p7zip-full
Version: 16.02+dfsg-7build1
Priority: optional
Section: universe/utils
Source: p7zip
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Robert Luberda <robert@debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 4 887 kB
Depends: p7zip (= 16.02+dfsg-7build1), libc6 (>= 2.14), libgcc-s1 (>= 3.0), libstdc++6 (>= 5)
Suggests: p7zip-rar
Breaks: p7zip (<< 15.09+dfsg-3~)
Replaces: p7zip (<< 15.09+dfsg-3~)
Homepage: http://p7zip.sourceforge.net/
Task: kubuntu-desktop, kubuntu-full, xubuntu-desktop, lubuntu-desktop, ubuntustudio-desktop, ubuntukylin-desktop, ubuntu-mate-core, ubuntu-mate-desktop
Download-Size: 1 187 kB
APT-Manual-Installed: no
APT-Sources: http://fr.archive.ubuntu.com/ubuntu focal/universe amd64 Packages
Description: 7z and 7za file archivers with high compression ratio
 p7zip is the Unix command-line port of 7-Zip, a file archiver that
 handles the 7z format which features very high compression ratios.
 .
 p7zip-full provides utilities to pack and unpack 7z archives within
 a shell or using a GUI (such as Ark, File Roller or Nautilus).
 .
 Installing p7zip-full allows File Roller to use the very efficient 7z
 compression format for packing and unpacking files and directories.
 Additionally, it provides the 7z and 7za commands.
 .
 List of supported formats:
   - Packing / unpacking: 7z, ZIP, GZIP, BZIP2, XZ and TAR
   - Unpacking only: APM, ARJ, CAB, CHM, CPIO, CramFS, DEB, DMG, FAT,
     HFS, ISO, LZH, LZMA, LZMA2, MBR, MSI, MSLZ, NSIS, NTFS, RAR (only
     if non-free p7zip-rar package is installed), RPM, SquashFS, UDF,
     VHD, WIM, XAR and Z.
 .
 The dependent package, p7zip, provides 7zr, a light version of 7za,
 and p7zip, a gzip-like wrapper around 7zr.

For information, my workstation is:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Conclusion

I'm not quite sure why p7zip complains here. Maybe because the version on my workstation is old . A coworker also validated that the same problem is still present with the p7zip version shipped with the latest Ubuntu system.

Anyhow, this seems absolutely not related to your library.

Thanks again

Regards

Rémi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants