Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generated zip archives now support utf8 file names #1117

Closed
wants to merge 4 commits into from

Conversation

DeepDiver1975
Copy link
Member

refs #1086, #930, #578

File/folder names:
中文blah中文blah.txt
undaçao.doc
öäüß.txt

Currently known issues/observations:

  • Chinese characters are not displayed within my Gnome Archive Tools
  • On Windows XP with built in zip support the file and folder names are totally screwed

Next steps:

  • either generate zips depending on the client os
  • or normalize the file names to ASCII 😧

@owncloud/core-developers
I need reviewers and tester on Windows, Mac and Linux

@skodak thanks a lot for relicensing your code!
Can I ask you to add a comment on this PR where you state again that you agree to relicense? THX

@ghost
Copy link

ghost commented Jan 8, 2013

--- Original Nachricht ---
Betreff: Re: Relicensing of zip_archive::fix_utf8_flags() under AGPL3
Von: Petr Skoda info@skodak.org
An: "Thomas Müller" thomas.mueller@ownCloud.org
Datum: 08.01.2013 1:08

Hello,

sure, I do not mind dual licensing it under GPL3 and AGPL3, I give you the permission to reuse and adopt my code that hacks around utf8 limitations in standard zip extension for PHP.

Petr Skoda

On 7. 1. 2013, at 23:46, Thomas Müller thomas.mueller@ownCloud.org wrote:

Hi Petr,

I'm a open source developer within the ownCloud project.

While searching for a solution to the well known zip and utf8 filename issue
I found your code within the moodle project.

As ownCloud uses AGPL3 license and your code has been released under
GPL3 I cannot simply reuse your code right away.

Can I kindly ask you to grant me the permission to reuse and adopt your code
under the AGPL3 license?

Thanks a lot,

Thomas Müller

ownCloud
Your Cloud, Your Data, Your Way!

@DeepDiver1975
Copy link
Member Author

PLEASE DO NOT MERGE - LICENSING HAS TO BE CLARIFIED!

@skodak
Copy link

skodak commented Jan 9, 2013

Hello, I hope the code is going to work fine for you. If you find any problems or discover some new way to improve it please ping me. Ciao.

@karlitschek
Copy link
Contributor

👍 @skodak licensed this as MIT so this is ready for merging. If we find another reviewers of course.

@DeepDiver1975
Copy link
Member Author

Summing reviewers .....
@butonic @Raydiation @LukasReschke @bartv2 .... maybe? THX

@dragotin
Copy link
Contributor

The code uses fseek and others which are subject of doubts about big file handling, ie. file with size > 2GB. It should be tested with files > 2GB and > 4GB to verify it works, at least on 64 bit (PHP-) platforms.

@DeepDiver1975
Copy link
Member Author

@dragotin good point - even I doubt that is makes sense to generate zips pf that size 😄

@dragotin
Copy link
Contributor

@DeepDiver1975 sure, it does not make sense to have that large files in general but people do... Users don't think in terms of file sizes, they just do stuff. The thing is that these overflow errors cause all funny side effect errors which are hard to track down, because you usually do not get a warning or error somewhere in the logfiles. So its better to keep in mind...

@DeepDiver1975
Copy link
Member Author

sure - I'll test this code with big files.
FYI: there is a max zip size which can be defined by the ownCloud admin - default is 800MB

@dragotin
Copy link
Contributor

On 16.01.2013 10:19, Thomas Müller wrote:

sure - I'll test this code with big files.
Thanks.

FYI: there is a max zip size which can be defined by the ownCloud admin - default is 800MB
Thats very good. Maybe it should simply be forbidden to extend this
value over 2GB. That makes it secure for all cases.

@jacotec
Copy link

jacotec commented Jan 25, 2013

Just want to add that also common ZIP software like 7-zip got the special chars broken in OC generated ZIP files, not only Windows internal zip support. (OC 4.5.6)

@butonic
Copy link
Member

butonic commented Feb 10, 2013

still broken. I locally merged master and downloaded a folder as zip. not only are utf8 symbols replaced with ?? but subfolders get pulled to the root folder of the zip. with the following folder structure:

+ tozip
  + empty folder
  + ö ä ü ß
  | + öä+
  + 中文blah中文blah
  | + äöü#+
  + undaçao.doc
  + öäüß.txt
  + 中文blah中文blah.txt

first of all the 'empty folder' does not show up in the zip at all
the dir structure of the zip looks like this (looking at it with gnome archive viewer):

+ tozip.zip
  + tozip
  | + unda?????ao.doc
  | + ?????????????blah?????????????blah.txt
  | + ?????????????blah?????????????blah.txt
  | + ??????????????????????.txt
  + ?????????????blah?????????????blah
  | + ?????????????????#+
  + ????????
    + ????+

@DeepDiver1975 rebase please

@nicolaskarp
Copy link

Still broken for me as well with the current release :

  • Server OS : Linux / kernel 3.2.0-33
  • OwnCloud 5
  • Browser : Firefox 19.0.2
  • Client OS : Windows 7

The issue is happening when I try to download several files (zip) :

For Instance :

Filename with accent :  Carte_Identité_Recto.jpg
Inside the zip after unzip : Carte_Identit+®_Recto.jpg 

When I download only a file, it's working.

Nic

@mahiuchun
Copy link

For Gnome Archive Tools, aka, File Roller, the solution is to install p7zip as the issue of UTF-8 zip is caused by Info-Zip (patch it a little bit can also help, which is being done in some distro now).

For Mac and Windows, I think recent versions of third-party software should all be fine with UTF-8 zip.

@karlitschek
Copy link
Contributor

What is the status here?

@PVince81
Copy link
Contributor

PVince81 commented Nov 7, 2013

Just observed the issue on OC6 beta3.

Please rebase.

@mahiuchun
Copy link

Ubuntu is going to fix its unzip of 12.04-13.04 for utf8 file names issue (13.10 is already okay). It is in SRU verification phase now. Can anyone help the verification?

http://pad.lv/1199239

@PVince81
Copy link
Contributor

Going to try and rebase this onto master to revive it a bit.

@PVince81
Copy link
Contributor

Here you go: mega-rebase onto master.

I've tested it and it still doesn't work, which, by reading the previous comments, seem to never have worked properly... at least we can use this as a base to continue experimenting.

@PVince81
Copy link
Contributor

Trying to debug this: the file name workaround only seem to operate on the first file in the zip. There is still hope 😄

@ghost
Copy link

ghost commented Dec 10, 2013

Test failed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/2327/

@PVince81
Copy link
Contributor

That should explain those numbers: http://unix.stackexchange.com/a/14727
I think I'll just use them as is as they may contain the correct attribute flags for Windows as well.

@PVince81
Copy link
Contributor

Ladies and gentlemen, please help testing this: download this ZIP file generated with this fix and let us know whether you were able to
1) See file names without encoding issues
and
2) Extract it with correct file names

https://owncloud.vincentpetry.net/public.php?service=files&t=6955b2fa7fc20fc76d2b94fb6ed433fc&download

  • Linux unzip 6.00
  • Ark (KDE)
  • Gnome Fileroller
  • P7Zip (Linux)
  • WinZIP
  • WinRAR WinXP
  • WinRAR Win7
  • 🚫 7Zip WinXP
  • 7Zip Win7
  • 7Zip Win8
  • 🚫 Windows' ZIP directory WinXP
  • 🚫 Windows' ZIP directory Win7
  • Windows' ZIP directory Win8
  • Mac (not sure how it's called)
  • other tools... feel free to add

@georgehrke
Copy link
Contributor

Works fine on mac:
zipwithsubdirs at 08 56 02

@PVince81
Copy link
Contributor

Summonning the original reporters: @Hausmarke @ricardoar7 @bureautranslations @RealRancor @PKduck @frisi

See #1117 (comment) if you'd like to help testing ZIP with UTF-8 characters.

@frisi
Copy link

frisi commented Dec 11, 2013

i'm on kde too @PVince81. ark works fine, as does unzip in console.
owncloud-zip

@ghost
Copy link

ghost commented Dec 11, 2013

Test failed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/2333/

@georgehrke
Copy link
Contributor

@PVince81 and the folder with the Chinese characters is the third and not the second folder :P

@PVince81
Copy link
Contributor

@georgehrke it was the second one I created until I decided to test umlauts as well 😉

@DeepDiver1975
Copy link
Member Author

bildschirmfoto vom 2013-12-11 14 25 56

@DeepDiver1975
Copy link
Member Author

on WinXP the content is not properly displayed using 7zip and integrated zip feature in explorer
will retest on win8 soon ...

@PVince81
Copy link
Contributor

From my research it looks like Windows expects the file name to be encoded based on the system's encoding, not UTF-8. Which means we'd need to detect Windows and encode using the current locale... which means that the zip files wouldn't be the same according to which OS it has been created from... which might be acceptable if the purpose of the zip file is only to download multiple files that are going to be extracted locally anyway.

Needs more research...

@futal
Copy link

futal commented Dec 11, 2013

Apparently, not all zip implementations have UTF8 support so the standard specifies a "Unicode Path Extra Field" understood by most popular software. See evanmiller/mod_zip#3

The base filename should then default to a "most likely" encoding, possibly with transliteration and accent stripping from the original filenames for better compatibility. Different solutions for PHP are discussed at http://stackoverflow.com/questions/1284535/php-transliteration

@PVince81
Copy link
Contributor

@futal thanks for the info.
In the current fix we set the utf-8 bit but it doesn't seem to be enough.
I haven't found any info that says that the "Unicode Path Extra Field" solution also works in Windows (Winzip, Winrar, 7zip).

To implement this we'd need to rewrite the whole file and insert the relevant fields. The current workarounds were only about replacing bytes in the existing zip files generated by PHP.

@PVince81
Copy link
Contributor

According to http://support.microsoft.com/kb/2704299 there is a slight chance that the test file generated above can be extracted properly on Windows 7. Would be good if someone could test this 😄

@futal
Copy link

futal commented Dec 11, 2013

I have tested the following files with Windows 7 Pro 64 SP1:

For all these zip files, filenames were good with 7zip 9.22beta and wrong with Windows Explorer.

@PVince81
Copy link
Contributor

@futal cool, thanks.
Can someone test with WinZIP, WinRAR on WinXP/Win7 ?

@georgehrke
Copy link
Contributor

I'll test winrar on win7 this evening

@georgehrke
Copy link
Contributor

Works fine with WinRAR on Windows 7 Pro

@georgehrke
Copy link
Contributor

maybe we should check winrar on xp too

Do not close and reopen for each file entry, but only do the utf-8
fixing once after the final close.
@ghost
Copy link

ghost commented Dec 12, 2013

Test failed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/2344/

@PVince81
Copy link
Contributor

After discussing with @DeepDiver1975 we decided to use ZipStreamer instead as it will not only fix the utf-8 file names issues but also significantly improve download performance.

See #6893

@PVince81 PVince81 closed this Feb 20, 2014
@jancborchardt jancborchardt deleted the fixing-1086-master branch February 26, 2014 11:15
@PepeN
Copy link

PepeN commented Apr 24, 2014

Hello,

I have the same problem.

Server OC 6.0.3RC1 (CentOS release 6.5 (i686) httpd-2.2.15-30.el6.centos.i686 php-5.4.27-1.el6.remi.i686)
Clients Windows 2003 IE 8, Chrome, 7zip 9.20 x64

oc_6 0 3rc1_zip

@DeepDiver1975
Copy link
Member Author

@PepeN this issue has been fixed for the upcoming OC7

@lock lock bot locked as resolved and limited conversation to collaborators Aug 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet