ZipArchiveInputStream vs ZipFile

作者: 董江鹏 | 来源:发表于2020-08-07 16:11 被阅读0次

ZipArchiveInputStream vs ZipFile
Python - Extracting ZIP
Python 解压缩文件详解！
zipfile/shutil/os
zip4j 2 解压缩
使用Python 压缩文件
Python: zipfile压缩解压文件
Python中的zipfile模块使用实例
Python—Zip
python zipfile打包压缩文件

最近适配Android 11的时候，想用InputStream直接解压压缩文件，但有部分压缩包出现了unsupported feature data descriptor used in entry这个异常。遇到自己不熟悉的问题时，下意识就在搜素引擎找。

我用的解压库是Apache Commons Compress，网上的解决方案都是相互抄来抄去，都是说在构造ZipArchiveInputStream的时候将参数allowStoredEntriesWithDataDescriptor设为true，但并没有用。

没办法，只能硬着头皮看文档。这个问题让我深刻的意识到，不看文档的野路子是行不通的，磨刀不误砍柴工。

官方文档在此 http://commons.apache.org/proper/commons-compress/zip.html#ZipArchiveInputStream_vs_ZipFile

ZIP archives store a archive entries in sequence and contain a registry of all entries at the very end of the archive. It is acceptable for an archive to contain several entries of the same name and have the registry (called the central directory) decide which entry is actually to be used (if any).

In addition the ZIP format stores certain information only inside the central directory but not together with the entry itself, this is:

internal and external attributes

different or additional extra fields

This means the ZIP format cannot really be parsed correctly while reading a non-seekable stream, which is what ZipArchiveInputStream is forced to do. As a result ZipArchiveInputStream

may return entries that are not part of the central directory at all and shouldn't be considered part of the archive.

may return several entries with the same name.
will not return internal or external attributes.

may return incomplete extra field data.

may return unknown sizes and CRC values for entries until the next entry has been reached if the archive uses the data descriptor feature (see below).

can not skip over bytes that occur before the real zip stream.

大致意思是：

ZIP存档按顺序存储存档条目，并在存档的末尾包含所有条目的注册表。存档可以包含多个相同名称的条目，并且由注册表（称为中央目录）决定每个条目的用法。

另外，ZIP格式仅将某些信息（内部和外部属性、不同或额外的额外字段）存储在中央目录中，而不是由条目本身存储。

这就意味着在读取不可搜索位置的输入流时，不能正确地解析ZIP格式。强行用ZipArchiveInputStream解压的话，可能导致下列问题：

可能会返回根本不属于中央目录的条目，并且不是压缩包里的
可能会返回多个具有相同名称的条目
不会返回内部或外部属性
可能会返回不完整的额外字段数据
如果存档使用数据描述符功能，则可能会返回未知的条目大小和CRC值，直到到达下一个条目为止
无法跳过实际zip流之前的字节

说到底就是通过输入流去解析压缩包是不可靠的，能不用就不用。
至此，问题畅快解决，为了节省手机存储空间，我会尽量先用输入流去解压，抛出UnsupportedZipFeatureException异常后，我会将文件拷到可用目录，然后通过ZipFile解压。

但是，如果不看文档的话，我可能要浪费大量的时间尝试、修改、搜索解决方案，甚至会怀疑自己白干了这么多年，连API都不会用。

小结

ZipArchiveInputStream vs ZipFile的比较，ZipFile完胜。

一流程序员写文档，二流程序员看文档，三流程序员不知道文档是什么。

网友评论

本文标题：ZipArchiveInputStream vs ZipFile

本文链接：https://www.haomeiwen.com/subject/exofdktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

ZipArchiveInputStream vs ZipFile

小结

相关文章