proguard源码分析二 class字节码解析

作者: 获取失败 | 来源:发表于2022-01-04 16:10 被阅读0次

proguard源码分析二 class字节码解析
proguard源码分析三依赖关系检索
2020-12-19 JVM-编译和类加载机制
03 java字节码文件
27-javap指令
Hermes源码分析（二）——解析字节码
Proguard相关（aar形式对外提供sdk项目总结）
python深入系列（附录）：python字节码对应代码
Effect JAVA -机制与原理
简单的main方法字节码学习

上一节我们分析了proguard的参数解析、配置获取、以及配置保存等等过程，本节我们继续分析proguard是如何读取class文件、解析class字节码以及怎么存储class字节码格式的。

数据结构

在本篇开始时，我们先介绍下几个数据结构

Clazz：接口类，它的子类有ProgramClass跟LibraryClass
ProgramClass：实现了Clazz接口，在proguard里面用来描述应用程序类
LibraryClass：实现了Clazz接口，在proguard里面用来描述第三方依赖库类
它们的关系如下：

另外在proguard里面有两个类池，programClassPool跟libraryClassPool，它们都是ClassPool类型对象。programClassPool里面保存了所有应用程序类的Clazz实例，而libraryClassPool里面保存了所有第三方依库类的Clazz实例，ClassPool类比较简单，里面就是一个TreeMap用来保存Clazz实例，代码如下：

/**
 * This is a set of representations of classes. They can be enumerated or
 * retrieved by name. They can also be accessed by means of class visitors.
 *
 * @author Eric Lafortune
 */
public class ClassPool
{
    // We're using a sorted tree map instead of a hash map to store the classes,
    // in order to make the processing more deterministic.
    private final Map<String, Clazz> classes = new TreeMap();

    /**
     * Adds the given Clazz to the class pool.
     */
    public void addClass(Clazz clazz)
    {
        classes.put(clazz.getName(), clazz);
    }

    /**
     * Removes the given Clazz from the class pool.
     */
    public void removeClass(Clazz clazz)
    {
        removeClass(clazz.getName());
    }

    /**
     * Returns a Clazz from the class pool based on its name. Returns
     * <code>null</code> if the class with the given name is not in the class
     * pool.
     */
    public Clazz getClass(String className)
    {
        return (Clazz)classes.get(className);
    }
    //此处省略一批代码....
}

java class读取

上一节分析了参数解析的过程，参数解析完了接着会new出个ProGuard对象，并且调用它的execute方法开始执行proguard的核心过程，execute方法大概如下：

/**
 * Performs all subsequent ProGuard operations.
 */
public void execute() throws IOException
{
    System.out.println(VERSION);

    GPL.check();

    if (configuration.printConfiguration != null)
    {
        printConfiguration();
    }

    new ConfigurationChecker(configuration).check();

    if (configuration.programJars != null     &&
        configuration.programJars.hasOutput() &&
        new UpToDateChecker(configuration).check())
    {
        return;
    }

    readInput();

    if (configuration.shrink    ||
        configuration.optimize  ||
        configuration.obfuscate ||
        configuration.preverify)
    {
        clearPreverification();
    }

    if (configuration.printSeeds != null ||
        configuration.shrink    ||
        configuration.optimize  ||
        configuration.obfuscate ||
        configuration.preverify)
    {
        initialize();
    }

    if (configuration.targetClassVersion != 0)
    {
        target();
    }

    if (configuration.printSeeds != null)
    {
        printSeeds();
    }

    if (configuration.shrink)
    {
        shrink();
    }

    if (configuration.preverify)
    {
        inlineSubroutines();
    }

    if (configuration.optimize)
    {
        for (int optimizationPass = 0;
                optimizationPass < configuration.optimizationPasses;
                optimizationPass++)
        {
            if (!optimize())
            {
                // Stop optimizing if the code doesn't improve any further.
                break;
            }

            // Shrink again, if we may.
            if (configuration.shrink)
            {
                // Don't print any usage this time around.
                configuration.printUsage       = null;
                configuration.whyAreYouKeeping = null;

                shrink();
            }
        }
    } else if (configuration.optimizeNoSideEffects) {
        new Optimizer(configuration).executeNoSideEffects(programClassPool, libraryClassPool);
    }

    if (configuration.optimize)
    {
        linearizeLineNumbers();
    }

    if (configuration.obfuscate)
    {
        obfuscate();
    }

    if (configuration.optimize)
    {
        trimLineNumbers();
    }

    if (configuration.preverify)
    {
        preverify();
    }

    if (configuration.shrink    ||
        configuration.optimize  ||
        configuration.obfuscate ||
        configuration.preverify)
    {
        sortClassElements();
    }

    if (configuration.programJars.hasOutput())
    {
        writeOutput();
    }

    if (configuration.dump != null)
    {
        dump();
    }
}

可以看到包括压缩、优化、混淆等功能都是在execute方法里面执行的，下面我们分析下类的读取加载过程。

Configuration里面保存了前面解析出来的programJars跟libraryJars，它们对应的就是应用程序类jar包以及第三方依赖库jar包，readInput方法会根据这些jar包路径，读取jar文件并且解压，最终会把class字节码读取到ProgramClass对象里，而第三方库class文件会读取到LibraryClass对象里。

private void readInput() throws IOException
{
    if (configuration.verbose)
    {
        System.out.println("Reading input...");
    }
    // Fill the program class pool and the library class pool.
    new InputReader(configuration).execute(programClassPool, libraryClassPool);
}

readInput方法非常的简单，就是实例化了一个InputReader对象，通过它的execute方法来读取jar文件

/**
 * Fills the given program class pool and library class pool by reading
 * class files, based on the current configuration.
 */
public void execute(ClassPool programClassPool,
                    ClassPool libraryClassPool) throws IOException
{

    //省略无相关代码...
    readInput("Reading program ",
                configuration.programJars,
                new ClassFilter(
                new ClassReader(false,
                                configuration.skipNonPublicLibraryClasses,
                                configuration.skipNonPublicLibraryClassMembers,
                                warningPrinter,
                new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
                new ClassPoolFiller(programClassPool)))));
    //省略无相关代码...
}
/**
 * Reads all input entries from the given section of the given class path.
 */
public void readInput(String          messagePrefix,
                        ClassPath       classPath,
                        int             fromIndex,
                        int             toIndex,
                        DataEntryReader reader) throws IOException
{
    for (int index = fromIndex; index < toIndex; index++)
    {
        ClassPathEntry entry = classPath.get(index);
        if (!entry.isOutput())
        {
            readInput(messagePrefix, entry, reader);
        }
    }
}

ClassPath内部维护了一个ClassPathEntry队列，每个ClassPathEntry本质上就是一个jar文件的描述，readInput会遍历这些ClassPathEntry，一个个的进行解析。

    /**
     * Reads the given input class path entry.
     */
private void readInput(String          messagePrefix,
                        ClassPathEntry  classPathEntry,
                        DataEntryReader dataEntryReader) throws IOException
{       
    //省略部分代码...
    // Create a reader that can unwrap jars, wars, ears, and zips.
    DataEntryReader reader =
        DataEntryReaderFactory.createDataEntryReader(messagePrefix,
                                                        classPathEntry,
                                                        dataEntryReader);

    // Create the data entry pump.
    DirectoryPump directoryPump =
        new DirectoryPump(classPathEntry.getFile());

    // Pump the data entries into the reader.
    directoryPump.pumpDataEntries(reader);    
}

首先会创建一个DataEntryReader，但是这个DataEntryReader其实并不是真正读取class字节码的Reader，这里创建出来的DataEntryReader只是负责根据不同的文件类型，譬如.apk .jar ".aar"等等做一些不一样的解压处理，在Android里面这里返回的DataEntryReader具体就是JarReader对象。

跟着后面创建了DirectoryPump对象，并且调用了它的pumpDataEntries方法，内部如下：

public void pumpDataEntries(DataEntryReader dataEntryReader)
throws IOException
{
    if (!directory.exists())
    {
        throw new IOException("No such file or directory");
    }

    readFiles(directory, dataEntryReader);
}


/**
 * Reads the given subdirectory recursively, applying the given DataEntryReader
 * to all files that are encountered.
 */
private void readFiles(File file, DataEntryReader dataEntryReader)
throws IOException
{
    // Pass the file data entry to the reader.
    dataEntryReader.read(new FileDataEntry(directory, file));

    if (file.isDirectory())
    {
        // Recurse into the subdirectory.
        File[] listedFiles = file.listFiles();

        for (int index = 0; index < listedFiles.length; index++)
        {
            File listedFile = listedFiles[index];
            try
            {
                readFiles(listedFile, dataEntryReader);
            }
            catch (IOException e)
            {
                throw (IOException)new IOException("Can't read ["+listedFile.getName()+"] ("+e.getMessage()+")").initCause(e);
            }
        }
    }
}

可以看到pumpDataEntries内部就是递归遍历文件，交给了前面创建的DataEntryReader来处理，由于JarReader实现了这个接口，我们直接看JarReader的read方法，

public void read(DataEntry dataEntry) throws IOException
{
    ZipInputStream zipInputStream = new ZipInputStream(dataEntry.getInputStream());
    // Get all entries from the input jar.
    while (true)
    {
        // Can we get another entry?
        ZipEntry zipEntry = zipInputStream.getNextEntry();
        if (zipEntry == null)
        {
            break;
        }

        // Delegate the actual reading to the data entry reader.
        dataEntryReader.read(new ZipDataEntry(dataEntry,
                                                zipEntry,
                                                zipInputStream));
    }
    //省略部分代码...
}

JarReader顾名思义，其实就只负责了解压jar文件的作用，解压出来的class文件最终会交给dataEntryReader来处理，proguard源码里面大量的使用了这种委托代理的模式，一层一层的传递去处理，这个地方的dataEntryReader本质上是在InputReader的execute方法里创建出来的ClassReader类型对象。

public void read(DataEntry dataEntry) throws IOException
{

    // Get the input stream.
    InputStream inputStream = dataEntry.getInputStream();

    // Wrap it into a data input stream.
    DataInputStream dataInputStream = new DataInputStream(inputStream);

    // Create a Clazz representation.
    Clazz clazz;
    if (isLibrary)
    {
        clazz = new LibraryClass();
        clazz.accept(new LibraryClassReader(dataInputStream, skipNonPublicLibraryClasses, skipNonPublicLibraryClassMembers));
    }
    else
    {
        clazz = new ProgramClass();
        clazz.accept(new ProgramClassReader(dataInputStream));
    }
    //省略部分代码...
}

到这里我们终于看到class文件的读取了，前面已经提到过了如果是应用程序类文件会被读到ProgramClass，而第三方依赖库类文件会读到LibraryClass里，这里我们只分析ProgramClass。

有趣的是ClassReader本身并不负责解析读取class字节码，真正的解析读取工作是由ProgramClassReader跟LibraryClassReader来完成的，这里我们只分析ProgramClassReader

ProgramClassReader

ProgramClassReader类实现了ClassVisitor接口，值得注意的是proguard在class类字节码的读写里大量的使用了xxxVisitor这种设计思想，读写class类得用ClassVisitor接口，读写类成员得用MemberVisitor，读写常量池也有一套对应的Visitor等等。ProgramClassReader都实现了这些接口，我们先看visitProgramClass，这是读取类的入口，代码如下：

public void visitProgramClass(ProgramClass programClass)
{
    //魔法头，四个字节 对应的就是CAFE BABE
    // Read and check the magic number.
    programClass.u4magic = dataInput.readInt();

    ClassUtil.checkMagicNumber(programClass.u4magic);

    //跟着魔法头后面的是版本号，高低版本号各用两个字节表示.
    // Read and check the version numbers.
    int u2minorVersion = dataInput.readUnsignedShort();
    int u2majorVersion = dataInput.readUnsignedShort();

    programClass.u4version = ClassUtil.internalClassVersion(u2majorVersion,
                                                            u2minorVersion);

    ClassUtil.checkVersionNumbers(programClass.u4version);

    //在版本号后面的是常量池长度，用两个字节来表示
    // Read the constant pool. Note that the first entry is not used.
    programClass.u2constantPoolCount = dataInput.readUnsignedShort();

    //创建常量池，下面开始遍历，填充常量池里的每一项。
    programClass.constantPool = new Constant[programClass.u2constantPoolCount];
    for (int index = 1; index < programClass.u2constantPoolCount; index++)
    {
        Constant constant = createConstant();
        constant.accept(programClass, this);
        programClass.constantPool[index] = constant;

        // Long constants and double constants take up two entries in the
        // constant pool.
        int tag = constant.getTag();
        if (tag == ClassConstants.CONSTANT_Long ||
            tag == ClassConstants.CONSTANT_Double)
        {
            programClass.constantPool[++index] = null;
        }
    }

    //在常量池后面的是访问标志，两个字节表示
    // Read the general class information.
    programClass.u2accessFlags = dataInput.readUnsignedShort();
    //在访问标志后面的是类索引，两个字节表示
    programClass.u2thisClass   = dataInput.readUnsignedShort();
    //在类索引后面的是父类索引，两个字节表示
    programClass.u2superClass  = dataInput.readUnsignedShort();

    //在父类索引后面的是接口计数器，两个字节表示
    // Read the interfaces.
    programClass.u2interfacesCount = dataInput.readUnsignedShort();

    //创建接口索引集合，接着是遍历填充接口索引集合里的每一项.
    programClass.u2interfaces = new int[programClass.u2interfacesCount];
    for (int index = 0; index < programClass.u2interfacesCount; index++)
    {
        programClass.u2interfaces[index] = dataInput.readUnsignedShort();
    }

    //在接口索引集合后面的是字段个数，两个字节来表示
    // Read the fields.
    programClass.u2fieldsCount = dataInput.readUnsignedShort();

    //创建字段集合，并且遍历填充字段集合里的每一项.
    programClass.fields = new ProgramField[programClass.u2fieldsCount];
    for (int index = 0; index < programClass.u2fieldsCount; index++)
    {
        ProgramField programField = new ProgramField();
        this.visitProgramField(programClass, programField);
        programClass.fields[index] = programField;
    }

    //在字段集合后面的是方法计数器，用两个字节来表示.
    // Read the methods.
    programClass.u2methodsCount = dataInput.readUnsignedShort();

    //创建方法集合，并且遍历填充方法集合里的每一项.
    programClass.methods = new ProgramMethod[programClass.u2methodsCount];
    for (int index = 0; index < programClass.u2methodsCount; index++)
    {
        ProgramMethod programMethod = new ProgramMethod();
        this.visitProgramMethod(programClass, programMethod);
        programClass.methods[index] = programMethod;
    }

    //在方法集合后面的是附加属性计数器，用两个字节来表示
    // Read the class attributes.
    programClass.u2attributesCount = dataInput.readUnsignedShort();

    //创建附加属性集合，并且遍历填充附加属性集合里的每一项.
    programClass.attributes = new Attribute[programClass.u2attributesCount];
    for (int index = 0; index < programClass.u2attributesCount; index++)
    {
        Attribute attribute = createAttribute(programClass);
        attribute.accept(programClass, this);
        programClass.attributes[index] = attribute;
    }
}

visitProgramClass方法里面的每一步我都通过注释说明上了，简单的来说就是通过读取class字节码的方式，一个字节一个字节的把class内容读取到ProgramClass对象里，对class字节码格式熟悉的，这段代码逻辑不难理解。

回到ClassReader里，当class字节码被成功读取到ProgramClass对象来，接着下面便是把它添加到ClassPool里了，

public void read(DataEntry dataEntry) throws IOException
{
    //此处省略很多代码...
    clazz = new ProgramClass();
    clazz.accept(new ProgramClassReader(dataInputStream));
    //此处的classVisitor其实是ClassPoolFiller对象.
    clazz.accept(classVisitor);
}

classVisitor是ClassPoolFiller对象，在InputReader里面被new出来的，代码如下：

new ClassReader(false,
                configuration.skipNonPublicLibraryClasses,
                configuration.skipNonPublicLibraryClassMembers,
                warningPrinter,
                new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
                new ClassPoolFiller(programClassPool)))

public class ClassPoolFiller extends SimplifiedVisitor implements ClassVisitor
{
    private final ClassPool classPool;


    /**
     * Creates a new ClassPoolFiller.
     */
    public ClassPoolFiller(ClassPool classPool)
    {
        this.classPool = classPool;
    }


    // Implementations for ClassVisitor.

    public void visitAnyClass(Clazz clazz)
    {
        classPool.addClass(clazz);
    }
}

代码比较简单，ClassPoolFiller内部持有了programClassPool，当ProgramClass初始化成功后，就会被添加到programClassPool里面。整个流程大概可以总结为：

DirectoryPump 负责遍历目录
JarReader 负责解压jar包
ClassReader 负责io读取读取class字节流
ProgramClassReader 负责把字节流格式化成 ProgramClass对象

总结

本节主要是从源码的角度出发，分析了下proguard是怎么把jar包读取到ClassPool里面的，当把class字节码读取出来并且管理起来，接着就可以开始对它们进行一些裁剪跟混淆工作了，下一节我们将会继续分析proguard是如何裁剪压缩代码的。

proguard源码分析二 class字节码解析
上一节我们分析了proguard的参数解析、配置获取、以及配置保存等等过程，本节我们继续分析proguard是如何...
proguard源码分析三依赖关系检索
上一节我们从源码的角度出发分析了proguard是怎么把class字节码解析读取出来，并且通过LibraryCla...
2020-12-19 JVM-编译和类加载机制
引言：今天谈谈源码文件如何编译Class字节码文件以及字节码文件如何加载到JVM中。源码转换为字节码文件 1.源...
03 java字节码文件
java源码经过编译，生成class字节码文件，JVM加载class文件执行。字节码文件将java语言与JVM解耦...
27-javap指令
一、解析字节码的意义 javap是JDK自带的反解析工具。它的作用就是根据 Class 字节码文件，反解析出当前类...
Hermes源码分析（二）——解析字节码
前面一节[https://www.jianshu.com/p/659ee01ed163]讲到字节码序列化为二进制是...
Proguard相关（aar形式对外提供sdk项目总结）
还是老外文章质量高：ProGuard 在 Android 上的使用姿势progurad是对编译后的class字节码...
python深入系列（附录）：python字节码对应代码
python代码最后解析成字节码，懒得看这部分源码，这里就把字节码对应的源码列出来，方便之后查看（源码基本在Pyt...
Effect JAVA -机制与原理
JAVA字节码.Class解析不论该字节码文件来自何方，由哪种编译器编译，甚至是手写字节码文件，只要符合java...
简单的main方法字节码学习
java源码如下: 解析字节码后：这里有几个小细节，在源码同样都是int类型的 a 和b 在字节码分别用 bi...

proguard源码分析二 class字节码解析

数据结构

java class读取

ProgramClassReader

总结

相关文章

proguard源码分析二 class字节码解析

proguard源码分析三依赖关系检索

2020-12-19 JVM-编译和类加载机制

03 java字节码文件

27-javap指令

Hermes源码分析（二）——解析字节码

Proguard相关（aar形式对外提供sdk项目总结）

python深入系列（附录）：python字节码对应代码

Effect JAVA -机制与原理

简单的main方法字节码学习

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读