-
-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file-type treat libreoffice saved .pptx as .zip #119
Comments
Can you submit a PR with a failing test? Otherwise, it's not much we can do about it. |
The
The comment is incorrect. The entire file, starting from the 0x1E offset, is searched for that string. @sindresorhus In a related issue I included an example that shows how the original magic check is broken. The "correct" way to do the zip checks is to jump to the central directory and look at the file names. ping @thejoshwolfe -- any thoughts on the most efficient way to scan a zip file's entry names? |
from the readme:
This is not adequate for correctly detecting a zip file, because the starting point for the zip file format is near the end of the file, not the beginning. Here's some discussion from https://en.wikipedia.org/wiki/Zip_(file_format) :
So in the case of a self-extracting zip, there are actually two correct answers for "what type of file is this?": both "exe" and "zip". You may be able to get away with reading near the beginning of the file for file types with a more strictly constrained file type that is based on .zip, but i don't know of any file types that do this. I expect most of them just say "it's zip with files that match these patterns.", e.g. JAR. If you want to correctly identify if something is a zipfile, here's what you do (reference https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ):
Then if you want to correctly detect if files of a certain name are in the archive (as opposed to just being mentioned in file contents, i.e. in the source code for this project), then decide if you want to support ZIP64 format. I suppose you don't since the api for this project reads from a ram buffer. So with no ZIP64 support:
Then to be even more correct, you can check the general purpose bit flag for bit 11, which indicates UTF-8 vs CP437 encoding for the file name. You probably don't need that though, since the magic file names you'll be looking for are probably going to be the same in those two encodings. Then to be more correct, you may want to check for the Info-ZIP Unicode Path Extra Field which can override the file name, for some reason. Then maybe you might want to convert In summary: if a file format is "just zip plus ...", then it's actually an extremely complicated and convoluted format that takes a nearly full-featured parser to detect correctly. Give me any quick-and-dirty detector, and i'll give you an example file that fools it. (Looks like you've already got an example for this issue, but I can provide more if you like.) You might consider using https://github.com/thejoshwolfe/yauzl (i.g. I don't have an easy solution for you. The approach started in this project is fundamentally incapable of correctly detecting zip-based formats. You could change your api, or you could accept that there will be unsolvable bugs like this issue. |
i just check that, it is easy to reproduct
The text was updated successfully, but these errors were encountered: