Android 伪加密和解决思路
我们都知道Android的apk文件就是一个zip格式的文件。由于工作需要,经常要解压apk文件拿到里面的资源,可是最近很多apk通过各种解压软件解压的时候都会失败,但是却能够安装和使用aapt2工具查看包的内容。本来通过python的zip可以批量解压,现在都要安装怕不是要了老命,于是就研究一下Android 11源码中的zip解压库,看看有什么特殊的地方。
zip格式
pkware.cachefly.net/webdocs/APP… 这里是官方文档,想要最详细的格式可以看这里。
粗略来看zip可以分为这三个部分,第一部分保存文件数据,第二部分是核心目录保存的是第一部分中的文件的信息,最后是结束标志,他的作用首先是标志zip文件的结束,第二是存储了核心目录的信息,所以解析zip文件反而是从后往前来解析的。
end of central directory record(ECOD)
I. End of central directory record:
end of central dir signature 4 bytes (0x06054b50) //首先就是4个字节的标志位0x06054b50,用于找到EOCD
number of this disk 2 bytes//当前的硬盘编号
number of the disk with the
start of the central directory 2 bytes//核心目录开始的硬盘编号
total number of entries in the
central directory on this disk 2 bytes//当前磁盘中保存的核心目录entry总数
total number of entries in
the central directory 2 bytes//核心目录entry总数
size of the central directory 4 bytes//核心目录大小
offset of start of central
directory with respect to
the starting disk number 4 bytes//核心目录开始位置相对于磁盘编号的偏移
.ZIP file comment length 2 bytes//注释长度
.ZIP file comment (variable size)//注释内容的内容
解压zip的第一步操作就是在EOCD中找到核心目录开始的位置和大小。
central directory
Central directory structure:
[file header 1]
.
.
.
[file header n]
[digital signature]
File header:
central file header signature 4 bytes (0x02014b50)//魔数
version made by 2 bytes//压缩用的版本
version needed to extract 2 bytes//解压需要的最低版本
general purpose bit flag 2 bytes//通用位标记,如果最低位是1就是加密为0就是未加密
compression method 2 bytes//压缩方法
last mod file time 2 bytes//文件最后修改时间
last mod file date 2 bytes//文件最后修改日期
crc-32 4 bytes//CRC-32算法
compressed size 4 bytes//压缩后大小
uncompressed size 4 bytes//未压缩的大小
file name length 2 bytes//文件名长度
extra field length 2 bytes//扩展域长度
file comment length 2 bytes//文件注释长度
disk number start 2 bytes//文件开始位置的磁盘编号
internal file attributes 2 bytes//内部文件属性
external file attributes 4 bytes//外部文件属性
relative offset of local header 4 bytes//本地文件header的相对位移。
file name (variable size)。 //目录文件名
extra field (variable size) //扩展域
file comment (variable size) //文件注释内容
Digital signature:
header signature 4 bytes (0x05054b50)
size of data 2 bytes
signature data (variable size)
核心目录由一个个file header组成,每一个file header描述了一个文件,可以拿到文件名。文件数据的位置和大小,接下来就可以去数据部分拿到文件解压了,其中general purpose bit flag & 0x01拿到最低位的值表示是否加密,将其改为1就可以实现最简单的伪加密,因为实际在打包时并没有加密设置密码只是修改了标识位,在android安装的时候不会去读这个标识位,而很多zip库和zip解压软件是会根据这个标识位来判断是否需要输入密码,从而实现了反解压的能力。
文件数据
[local file header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[file data n]
[data descriptor n]
A. Local file header:
local file header signature 4 bytes (0x04034b50) //标识位
version needed to extract 2 bytes //能解压的最低版本
general purpose bit flag 2 bytes //general purpose bit flag
compression method 2 bytes //加密方法
last mod file time 2 bytes //文件最后修改时间
last mod file date 2 bytes //文件最后修改日期
crc-32 4 bytes //CRC32校验码
compressed size 4 bytes //压缩后大小
uncompressed size 4 bytes //未压缩的大小
file name length 2 bytes //文件名长度
extra field length 2 bytes //扩展域长度
file name (variable size)//文件名
extra field (variable size)//扩展区
B. File data
Immediately following the local header for a file
is the compressed or stored data for the file.
The series of [local file header][file data][data
descriptor] repeats for each file in the .ZIP archive.
C. Data descriptor: //一般不会有
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
可以发现Local file header内容和核心目录中是几乎一样的,接在Local file header后面就是文件数据了,根据数据长度和加密方式就可以解压了。
Android 解压流程
在frameworks中可以通过frameworks/base/libs/androidfw/ZipUtils.cpp来解压文件。但是仔细看代码会发现这个类只是对ziparchive库的函数的封装,最终调用都进入了ziparchive中。这个库的源码路径是system/core/libziparchive/
system/core/libziparchive/zip_archive.cc
int32_t OpenArchive(const char* fileName, ZipArchiveHandle* handle) {
const int fd = open(fileName, O_RDONLY | O_BINARY, 0);
ZipArchive* archive = new ZipArchive(fd, true);
*handle = archive;
if (fd < 0) {
ALOGW("Unable to open '%s': %s", fileName, strerror(errno));
return kIoError;
}
return OpenArchiveInternal(archive, fileName);
}
- 首先通过路径打开文件拿到fd
- 生成ZipArchive对象
- 调用OpenArchiveInternal解析文件
static int32_t OpenArchiveInternal(ZipArchive* archive, const char* debug_file_name) {
int32_t result = -1;
if ((result = MapCentralDirectory(debug_file_name, archive)) != 0) { //解析ECOD拿到核心目录的位置和其他信息
return result;
}
if ((result = ParseZipArchive(archive))) {//解析zip文件
return result;
}
return 0;
}
到这里激动人心的核心目录已经出来了,下面就看看是怎么通过MapCentralDirectory拿到核心目录
/*
* Find the zip Central Directory and memory-map it.
*
* On success, returns 0 after populating fields from the EOCD area:
* directory_offset
* directory_ptr
* num_entries
*/
static int32_t MapCentralDirectory(const char* debug_file_name, ZipArchive* archive) {
//删除部分异常处理代码
/*
* Perform the traditional EOCD snipe hunt.
*
* We're searching for the End of Central Directory magic number,
* which appears at the start of the EOCD block. It's followed by
* 18 bytes of EOCD stuff and up to 64KB of archive comment. We
* need to read the last part of the file into a buffer, dig through
* it to find the magic number, parse some values out, and use those
* to determine the extent of the CD.
*
* We start by pulling in the last part of the file.
*/
off64_t read_amount = kMaxEOCDSearch;
if (file_length < read_amount) {
read_amount = file_length;
}
std::vector<uint8_t> scan_buffer(read_amount);
int32_t result =
MapCentralDirectory0(debug_file_name, archive, file_length, read_amount, scan_buffer.data());
return result;
}
里面只是做了一些异常处理,最终用的MapCentralDirectory0函数来解析。异常处理中出现了很熟悉EocdRecord,这个结构体就是用来描述EOCD的。
static int32_t MapCentralDirectory0(const char* debug_file_name, ZipArchive* archive,
off64_t file_length, off64_t read_amount, uint8_t* scan_buffer) {
const off64_t search_start = file_length - read_amount;
if (!archive->mapped_zip.ReadAtOffset(scan_buffer, read_amount, search_start)) {
ALOGE("Zip: read %" PRId64 " from offset %" PRId64 " failed", static_cast<int64_t>(read_amount),
static_cast<int64_t>(search_start));
return kIoError;
}
/*
* Scan backward for the EOCD magic. In an archive without a trailing
* comment, we'll find it on the first try. (We may want to consider
* doing an initial minimal read; if we don't find it, retry with a
* second read as above.)
*/
//循环查找ECOD
int i = read_amount - sizeof(EocdRecord);
for (; i >= 0; i--) {
if (scan_buffer[i] == 0x50) {
uint32_t* sig_addr = reinterpret_cast<uint32_t*>(&scan_buffer[i]);
if (get_unaligned<uint32_t>(sig_addr) == EocdRecord::kSignature) {// kSignature = 0x06054b50;通过标志位找到EOCD
ALOGV("+++ Found EOCD at buf+%d", i);
break;
}
}
}
if (i < 0) {
ALOGD("Zip: EOCD not found, %s is not zip", debug_file_name);
return kInvalidFile;
}
const off64_t eocd_offset = search_start + i;
const EocdRecord* eocd = reinterpret_cast<const EocdRecord*>(scan_buffer + i);//生成EocdRecord对象,这个对象的作用就是根据zip的EOCD结构解析数据
/*
* Verify that there's no trailing space at the end of the central directory
* and its comment.
*/
const off64_t calculated_length = eocd_offset + sizeof(EocdRecord) + eocd->comment_length;
if (calculated_length != file_length) {
ALOGW("Zip: %" PRId64 " extraneous bytes at the end of the central directory",
static_cast<int64_t>(file_length - calculated_length));
return kInvalidFile;
}
/*
* Grab the CD offset and size, and the number of entries in the
* archive and verify that they look reasonable.
*/
if (static_cast<off64_t>(eocd->cd_start_offset) + eocd->cd_size > eocd_offset) {
ALOGW("Zip: bad offsets (dir %" PRIu32 ", size %" PRIu32 ", eocd %" PRId64 ")",
eocd->cd_start_offset, eocd->cd_size, static_cast<int64_t>(eocd_offset));
#if defined(__ANDROID__)
if (eocd->cd_start_offset + eocd->cd_size <= eocd_offset) {
android_errorWriteLog(0x534e4554, "31251826");
}
#endif
return kInvalidOffset;
}
if (eocd->num_records == 0) {
ALOGW("Zip: empty archive?");
return kEmptyArchive;
}
//到这里各种异常判断结束,EOCD合法并可以拿到核心目录中File header的数量
ALOGV("+++ num_entries=%" PRIu32 " dir_size=%" PRIu32 " dir_offset=%" PRIu32, eocd->num_records,
eocd->cd_size, eocd->cd_start_offset);
/*
* It all looks good. Create a mapping for the CD, and set the fields
* in archive.
*/
//InitializeCentralDirectory创建相关变量保存起来
if (!archive->InitializeCentralDirectory(debug_file_name,
static_cast<off64_t>(eocd->cd_start_offset),
static_cast<size_t>(eocd->cd_size))) {
ALOGE("Zip: failed to intialize central directory.\n");
return kMmapFailed;
}
archive->num_entries = eocd->num_records;
archive->directory_offset = eocd->cd_start_offset;
return 0;
}
- 在文件 file_length - read_amount的地方开始找EOCD,read_amount是EOCD可能的最大长度,就是从文件最后read_amount这么长的区域中找到ECOD
- 各种异常处理之后,确定找到的ECOD合法,这里也是很多伪加密处理的地方,Android是直接从read_amount的区域查找,但是很多库和解压软件是默认没有注释和额外的数据
- InitializeCentralDirectory解析核心目录创建相关变量保存起来
回到OpenArchiveInternal调用MapCentralDirectory拿到相关信息之后就是调用ParseZipArchive解析了。
//函数比较长删掉了一部分异常处理的代码
static int32_t ParseZipArchive(ZipArchive* archive) {
const uint8_t* const cd_ptr = archive->central_directory.GetBasePtr();
const size_t cd_length = archive->central_directory.GetMapLength();
const uint16_t num_entries = archive->num_entries;
/*
* Create hash table. We have a minimum 75% load factor, possibly as
* low as 50% after we round off to a power of 2. There must be at
* least one unused entry to avoid an infinite loop during creation.
*/
archive->hash_table_size = RoundUpPower2(1 + (num_entries * 4) / 3); //创建hashtable
archive->hash_table =
reinterpret_cast<ZipStringOffset*>(calloc(archive->hash_table_size, sizeof(ZipStringOffset)));
/*
* Walk through the central directory, adding entries to the hash
* table and verifying values.
*/
const uint8_t* const cd_end = cd_ptr + cd_length;
const uint8_t* ptr = cd_ptr;
for (uint16_t i = 0; i < num_entries; i++) { //循环获取每一个CentralDirectoryRecord
if (ptr > cd_end - sizeof(CentralDirectoryRecord)) {
ALOGW("Zip: ran off the end (item #%" PRIu16 ", %zu bytes of central directory)", i,
cd_length);
#if defined(__ANDROID__)
android_errorWriteLog(0x534e4554, "36392138");
#endif
return kInvalidFile;
}
const CentralDirectoryRecord* cdr = reinterpret_cast<const CentralDirectoryRecord*>(ptr);
if (cdr->record_signature != CentralDirectoryRecord::kSignature) { //kSignature = 0x02014b50;每次都会判断一下标志位
ALOGW("Zip: missed a central dir sig (at %" PRIu16 ")", i);
return kInvalidFile;
}
const off64_t local_header_offset = cdr->local_file_header_offset;
const uint16_t file_name_length = cdr->file_name_length;
const uint16_t extra_length = cdr->extra_field_length;
const uint16_t comment_length = cdr->comment_length;
const uint8_t* file_name = ptr + sizeof(CentralDirectoryRecord);
// Add the CDE filename to the hash table.
std::string_view entry_name{reinterpret_cast<const char*>(file_name), file_name_length};//根据filename创建entry_name
const int add_result = AddToHash(archive->hash_table, archive->hash_table_size, entry_name,
archive->central_directory.GetBasePtr());//加入hashtable,key是entry_name,fvalue是当前CentralDirectoryRecord的地址
ptr += sizeof(CentralDirectoryRecord) + file_name_length + extra_length + comment_length;
}
ALOGV("+++ zip good scan %" PRIu16 " entries", num_entries);
return 0;
}
- 创建一个hashtable对象
- 通过EOCD中拿到的起始地址和数量循环解析每一个CentralDirectoryRecord
- 将解析出来的CentralDirectoryRecord全部存入hashtable中
到这里CentralDirectoryRecord的hashtable也创建好了,接下来要解压就是从hashtable中获取CentralDirectoryRecord,根据CentralDirectoryRecord找到对应数据的地址和长度截取数据就好了。
总结
zip解压的流程就到这里结束,android中解压还是通过标准的流程。找到ECOD解析CentralDirectory->根据CentralDirectory创建CentralDirectoryRecord的hashtable->最终通过CentralDirectoryRecord中的文件地址和长度压缩方式,拿到数据解压。后续如果再遇到修改了其他地方导致解压失败应该也很容易解决了。
网友评论