美文网首页
zip文件解析

zip文件解析

作者: devilisdevil | 来源:发表于2021-02-02 23:34 被阅读0次

暂不考虑zip加密,zip64

介绍

  • 常见zip格式文件有.JAR, .WAR, .DOCX, .XLSX, .PPTX, .ODT, .ODS, .ODP等
  • 支持多种压缩方法,Deflate (Compression method 8)是用的最多的,也是默认的方法,还有个压缩方法叫Stored,就是直接存,没有压缩
  • 每个文件有CRC32字段做校验用
  • 每个zip文件必须有且仅有一个end of central directory record
  • 每个被压缩的文件前都有一个local file header,每个local file header都对应了一个central directory record
  • File data MAY be followed by a "data descriptor" for the file. Data descriptors are used to facilitate ZIP file streaming.

ZIP与ZIP64

The format of the Local file header and Central directory entry are the same in ZIP and ZIP64, but for sizes always 0xffffffff stored, and an extra field always exists:

On the other hand, the format of EOCD for ZIP64 is slightly different than the normal ZIP version

加密

TODO

general purpose bit flag

Bit 0: If set, indicates that the file is encrypted.

文件结构说明

总体结构

[local file header 1]
[encryption header 1]
[file data 1]
[data descriptor 1]
. 
.
.
[local file header n]
[encryption header n]
[file data n]
[data descriptor n]
[archive decryption header] 
[archive extra data record] 
[central directory header 1]
.
.
.
[central directory header n]
[zip64 end of central directory record]
[zip64 end of central directory locator] 
[end of central directory record]

local file header

offset description
0 Local file header signature = 0x04034b50 (read as a little-endian number)
4 Version needed to extract (minimum)
6 General purpose bit flag
8 Compression method
10 File last modification time
12 File last modification date
14 CRC-32 of uncompressed data
18 Compressed size (or 0xffffffff for ZIP64)
22 Uncompressed size (or 0xffffffff for ZIP64)
26 File name length (n)
28 Extra field length (m)
30 File name
30+n Extra field
30+n+m the end

encryption header

TODO

data descriptor

central directory header

offset description
0 Central directory file header signature = 0x02014b50
4 Version made by
6 Version needed to extract (minimum)
8 General purpose bit flag
10 Compression method
12 File last modification time
14 File last modification date
16 CRC-32 of uncompressed data
20 Compressed size (or 0xffffffff for ZIP64)
24 Uncompressed size (or 0xffffffff for ZIP64)
28 File name length (n)
30 Extra field length (m)
32 File comment length (k)
34 Disk number where file starts
36 Internal file attributes
38 External file attributes
42 Relative offset of local file header. This is the number of bytes between the start of the first disk on which the file occurs, and the start of the local file header. This allows software reading the central directory to locate the position of the file inside the ZIP file.
46 File name
46+n Extra field
46+n+m File comment
46+n+m+k the end

end of central directory record

EOCD

offset description
0 End of central directory signature = 0x06054b50
4 Number of this disk
6 Disk where central directory starts
8 Number of central directory records on this disk
10 Total number of central directory records
12 Size of central directory (bytes)
16 Offset of start of central directory, relative to start of archive
20 Comment length (n)
22 Comment
22+n the end

EOCD64

offset description
0 End of central directory signature = 0x06064b50
4 Size of the EOCD64 - 8
8 Version made by
10 Version needed to extract (minimum)
12 Number of this disk
16 Disk where central directory starts
20 Number of central directory records on this disk
28 Total number of central directory records
36 Size of central directory (bytes)
44 Offset of start of central directory, relative to start of archive
52 Comment (up to the size of EOCD64)
52+n the end

zip文件解析步骤

一般解析,从后往前

一般来说,软件首先找到EOCD (记录了centrla directories的开始处offset),然后通过它找到central directories (每个directory记录了对应的local file header的offset),然后再通过central directories找到local file header (local file header后面的就是对应的每个文件的内容),进而解压缩每一个文件。

数据流式解析,从前向后

有的时候,zip数据只能从前向后读,不能先找到后面的EOCD啥得再到前面解析其它内容,可以直接从local file header解析起走。而且通常压缩使用的deflate算法也是支持这样的流式操作的。

解压deflate

用python解压缩deflate数据,这部分可以从zip文件中截取出来。

import zlib
inflator = zlib.decompressobj(-zlib.MAX_WBITS)
x = inflator.decompress(deflate_data)

用C/C++解压缩deflate数据

z_stream strm;
strm.zalloc = 0;
strm.zfree = 0;
strm.opaque = 0;
strm.avail_out = dst_len;
strm.next_out = dst;
strm.next_in = src;
strm.avail_in = src_len;
if (inflateInit2(&strm, -MAX_WBITS) != Z_OK) {
  fprintf(stderr, "init fail!\n");
  return -1;
}
int ret = inflate(&strm, Z_NO_FLUSH);
printf("ret: %d\n", ret);
switch (ret) {
    case Z_NEED_DICT:
        ret = Z_DATA_ERROR;     /* and fall through */
    case Z_DATA_ERROR:
    case Z_MEM_ERROR:
        (void)inflateEnd(&strm);
        return -2;
}
printf("avail_out: %lu\n", strm.avail_out);
fp = fopen("123.txt", "wb");
fwrite(dst, sizeof(unsigned char), dst_len, fp);

Signatures汇总

sig description
0x02014b50 Central directory file header signature
0x04034b50 Local file header signature
0x06054b50 End of central directory signature
0x06064b50 Zip64 End of central directory record
0x07064b50 Zip64 end of central directory locator
0x08074b50 Optional data descriptor signature

参考

相关文章

网友评论

      本文标题:zip文件解析

      本文链接:https://www.haomeiwen.com/subject/hegutltx.html