美文网首页
proguard源码分析二 class字节码解析

proguard源码分析二 class字节码解析

作者: 获取失败 | 来源:发表于2022-01-04 16:10 被阅读0次

    上一节我们分析了proguard的参数解析、配置获取、以及配置保存等等过程,本节我们继续分析proguard是如何读取class文件、解析class字节码以及怎么存储class字节码格式的。

    数据结构

    在本篇开始时,我们先介绍下几个数据结构

    • Clazz:接口类,它的子类有ProgramClassLibraryClass
    • ProgramClass:实现了Clazz接口,在proguard里面用来描述应用程序类
    • LibraryClass:实现了Clazz接口,在proguard里面用来描述第三方依赖库类
      它们的关系如下:

    另外在proguard里面有两个类池,programClassPool跟libraryClassPool,它们都是ClassPool类型对象。programClassPool里面保存了所有应用程序类的Clazz实例,而libraryClassPool里面保存了所有第三方依库类的Clazz实例,ClassPool类比较简单,里面就是一个TreeMap用来保存Clazz实例,代码如下:

    /**
     * This is a set of representations of classes. They can be enumerated or
     * retrieved by name. They can also be accessed by means of class visitors.
     *
     * @author Eric Lafortune
     */
    public class ClassPool
    {
        // We're using a sorted tree map instead of a hash map to store the classes,
        // in order to make the processing more deterministic.
        private final Map<String, Clazz> classes = new TreeMap();
    
        /**
         * Adds the given Clazz to the class pool.
         */
        public void addClass(Clazz clazz)
        {
            classes.put(clazz.getName(), clazz);
        }
    
        /**
         * Removes the given Clazz from the class pool.
         */
        public void removeClass(Clazz clazz)
        {
            removeClass(clazz.getName());
        }
    
        /**
         * Returns a Clazz from the class pool based on its name. Returns
         * <code>null</code> if the class with the given name is not in the class
         * pool.
         */
        public Clazz getClass(String className)
        {
            return (Clazz)classes.get(className);
        }
        //此处省略一批代码....
    }
    

    java class读取

    上一节分析了参数解析的过程,参数解析完了接着会new出个ProGuard对象,并且调用它的execute方法开始执行proguard的核心过程,execute方法大概如下:

    /**
     * Performs all subsequent ProGuard operations.
     */
    public void execute() throws IOException
    {
        System.out.println(VERSION);
    
        GPL.check();
    
        if (configuration.printConfiguration != null)
        {
            printConfiguration();
        }
    
        new ConfigurationChecker(configuration).check();
    
        if (configuration.programJars != null     &&
            configuration.programJars.hasOutput() &&
            new UpToDateChecker(configuration).check())
        {
            return;
        }
    
        readInput();
    
        if (configuration.shrink    ||
            configuration.optimize  ||
            configuration.obfuscate ||
            configuration.preverify)
        {
            clearPreverification();
        }
    
        if (configuration.printSeeds != null ||
            configuration.shrink    ||
            configuration.optimize  ||
            configuration.obfuscate ||
            configuration.preverify)
        {
            initialize();
        }
    
        if (configuration.targetClassVersion != 0)
        {
            target();
        }
    
        if (configuration.printSeeds != null)
        {
            printSeeds();
        }
    
        if (configuration.shrink)
        {
            shrink();
        }
    
        if (configuration.preverify)
        {
            inlineSubroutines();
        }
    
        if (configuration.optimize)
        {
            for (int optimizationPass = 0;
                    optimizationPass < configuration.optimizationPasses;
                    optimizationPass++)
            {
                if (!optimize())
                {
                    // Stop optimizing if the code doesn't improve any further.
                    break;
                }
    
                // Shrink again, if we may.
                if (configuration.shrink)
                {
                    // Don't print any usage this time around.
                    configuration.printUsage       = null;
                    configuration.whyAreYouKeeping = null;
    
                    shrink();
                }
            }
        } else if (configuration.optimizeNoSideEffects) {
            new Optimizer(configuration).executeNoSideEffects(programClassPool, libraryClassPool);
        }
    
        if (configuration.optimize)
        {
            linearizeLineNumbers();
        }
    
        if (configuration.obfuscate)
        {
            obfuscate();
        }
    
        if (configuration.optimize)
        {
            trimLineNumbers();
        }
    
        if (configuration.preverify)
        {
            preverify();
        }
    
        if (configuration.shrink    ||
            configuration.optimize  ||
            configuration.obfuscate ||
            configuration.preverify)
        {
            sortClassElements();
        }
    
        if (configuration.programJars.hasOutput())
        {
            writeOutput();
        }
    
        if (configuration.dump != null)
        {
            dump();
        }
    }
    

    可以看到包括压缩、优化、混淆等功能都是在execute方法里面执行的,下面我们分析下类的读取加载过程。

    Configuration里面保存了前面解析出来的programJars跟libraryJars,它们对应的就是应用程序类jar包以及第三方依赖库jar包,readInput方法会根据这些jar包路径,读取jar文件并且解压,最终会把class字节码读取到ProgramClass对象里,而第三方库class文件会读取到LibraryClass对象里。

    private void readInput() throws IOException
    {
        if (configuration.verbose)
        {
            System.out.println("Reading input...");
        }
        // Fill the program class pool and the library class pool.
        new InputReader(configuration).execute(programClassPool, libraryClassPool);
    }
    

    readInput方法非常的简单,就是实例化了一个InputReader对象,通过它的execute方法来读取jar文件

    /**
     * Fills the given program class pool and library class pool by reading
     * class files, based on the current configuration.
     */
    public void execute(ClassPool programClassPool,
                        ClassPool libraryClassPool) throws IOException
    {
    
        //省略无相关代码...
        readInput("Reading program ",
                    configuration.programJars,
                    new ClassFilter(
                    new ClassReader(false,
                                    configuration.skipNonPublicLibraryClasses,
                                    configuration.skipNonPublicLibraryClassMembers,
                                    warningPrinter,
                    new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
                    new ClassPoolFiller(programClassPool)))));
        //省略无相关代码...
    }
    /**
     * Reads all input entries from the given section of the given class path.
     */
    public void readInput(String          messagePrefix,
                            ClassPath       classPath,
                            int             fromIndex,
                            int             toIndex,
                            DataEntryReader reader) throws IOException
    {
        for (int index = fromIndex; index < toIndex; index++)
        {
            ClassPathEntry entry = classPath.get(index);
            if (!entry.isOutput())
            {
                readInput(messagePrefix, entry, reader);
            }
        }
    }
    

    ClassPath内部维护了一个ClassPathEntry队列,每个ClassPathEntry本质上就是一个jar文件的描述,readInput会遍历这些ClassPathEntry,一个个的进行解析。

        /**
         * Reads the given input class path entry.
         */
    private void readInput(String          messagePrefix,
                            ClassPathEntry  classPathEntry,
                            DataEntryReader dataEntryReader) throws IOException
    {       
        //省略部分代码...
        // Create a reader that can unwrap jars, wars, ears, and zips.
        DataEntryReader reader =
            DataEntryReaderFactory.createDataEntryReader(messagePrefix,
                                                            classPathEntry,
                                                            dataEntryReader);
    
        // Create the data entry pump.
        DirectoryPump directoryPump =
            new DirectoryPump(classPathEntry.getFile());
    
        // Pump the data entries into the reader.
        directoryPump.pumpDataEntries(reader);    
    }
    

    首先会创建一个DataEntryReader,但是这个DataEntryReader其实并不是真正读取class字节码的Reader,这里创建出来的DataEntryReader只是负责根据不同的文件类型,譬如.apk .jar ".aar"等等做一些不一样的解压处理,在Android里面这里返回的DataEntryReader具体就是JarReader对象。

    跟着后面创建了DirectoryPump对象,并且调用了它的pumpDataEntries方法,内部如下:

    public void pumpDataEntries(DataEntryReader dataEntryReader)
    throws IOException
    {
        if (!directory.exists())
        {
            throw new IOException("No such file or directory");
        }
    
        readFiles(directory, dataEntryReader);
    }
    
    
    /**
     * Reads the given subdirectory recursively, applying the given DataEntryReader
     * to all files that are encountered.
     */
    private void readFiles(File file, DataEntryReader dataEntryReader)
    throws IOException
    {
        // Pass the file data entry to the reader.
        dataEntryReader.read(new FileDataEntry(directory, file));
    
        if (file.isDirectory())
        {
            // Recurse into the subdirectory.
            File[] listedFiles = file.listFiles();
    
            for (int index = 0; index < listedFiles.length; index++)
            {
                File listedFile = listedFiles[index];
                try
                {
                    readFiles(listedFile, dataEntryReader);
                }
                catch (IOException e)
                {
                    throw (IOException)new IOException("Can't read ["+listedFile.getName()+"] ("+e.getMessage()+")").initCause(e);
                }
            }
        }
    }
    

    可以看到pumpDataEntries内部就是递归遍历文件,交给了前面创建的DataEntryReader来处理,由于JarReader实现了这个接口,我们直接看JarReader的read方法,

    public void read(DataEntry dataEntry) throws IOException
    {
        ZipInputStream zipInputStream = new ZipInputStream(dataEntry.getInputStream());
        // Get all entries from the input jar.
        while (true)
        {
            // Can we get another entry?
            ZipEntry zipEntry = zipInputStream.getNextEntry();
            if (zipEntry == null)
            {
                break;
            }
    
            // Delegate the actual reading to the data entry reader.
            dataEntryReader.read(new ZipDataEntry(dataEntry,
                                                    zipEntry,
                                                    zipInputStream));
        }
        //省略部分代码...
    }
    

    JarReader顾名思义,其实就只负责了解压jar文件的作用,解压出来的class文件最终会交给dataEntryReader来处理,proguard源码里面大量的使用了这种委托代理的模式,一层一层的传递去处理,这个地方的dataEntryReader本质上是在InputReader的execute方法里创建出来的ClassReader类型对象。

    public void read(DataEntry dataEntry) throws IOException
    {
    
        // Get the input stream.
        InputStream inputStream = dataEntry.getInputStream();
    
        // Wrap it into a data input stream.
        DataInputStream dataInputStream = new DataInputStream(inputStream);
    
        // Create a Clazz representation.
        Clazz clazz;
        if (isLibrary)
        {
            clazz = new LibraryClass();
            clazz.accept(new LibraryClassReader(dataInputStream, skipNonPublicLibraryClasses, skipNonPublicLibraryClassMembers));
        }
        else
        {
            clazz = new ProgramClass();
            clazz.accept(new ProgramClassReader(dataInputStream));
        }
        //省略部分代码...
    }
    

    到这里我们终于看到class文件的读取了,前面已经提到过了如果是应用程序类文件会被读到ProgramClass,而第三方依赖库类文件会读到LibraryClass里,这里我们只分析ProgramClass。

    有趣的是ClassReader本身并不负责解析读取class字节码,真正的解析读取工作是由ProgramClassReaderLibraryClassReader来完成的,这里我们只分析ProgramClassReader

    ProgramClassReader

    ProgramClassReader类实现了ClassVisitor接口,值得注意的是proguard在class类字节码的读写里大量的使用了xxxVisitor这种设计思想,读写class类得用ClassVisitor接口,读写类成员得用MemberVisitor,读写常量池也有一套对应的Visitor等等。ProgramClassReader都实现了这些接口,我们先看visitProgramClass,这是读取类的入口,代码如下:

    public void visitProgramClass(ProgramClass programClass)
    {
        //魔法头,四个字节 对应的就是CAFE BABE
        // Read and check the magic number.
        programClass.u4magic = dataInput.readInt();
    
        ClassUtil.checkMagicNumber(programClass.u4magic);
    
        //跟着魔法头后面的是版本号,高低版本号各用两个字节表示.
        // Read and check the version numbers.
        int u2minorVersion = dataInput.readUnsignedShort();
        int u2majorVersion = dataInput.readUnsignedShort();
    
        programClass.u4version = ClassUtil.internalClassVersion(u2majorVersion,
                                                                u2minorVersion);
    
        ClassUtil.checkVersionNumbers(programClass.u4version);
    
        //在版本号后面的是常量池长度,用两个字节来表示
        // Read the constant pool. Note that the first entry is not used.
        programClass.u2constantPoolCount = dataInput.readUnsignedShort();
    
        //创建常量池,下面开始遍历,填充常量池里的每一项。
        programClass.constantPool = new Constant[programClass.u2constantPoolCount];
        for (int index = 1; index < programClass.u2constantPoolCount; index++)
        {
            Constant constant = createConstant();
            constant.accept(programClass, this);
            programClass.constantPool[index] = constant;
    
            // Long constants and double constants take up two entries in the
            // constant pool.
            int tag = constant.getTag();
            if (tag == ClassConstants.CONSTANT_Long ||
                tag == ClassConstants.CONSTANT_Double)
            {
                programClass.constantPool[++index] = null;
            }
        }
    
        //在常量池后面的是访问标志,两个字节表示
        // Read the general class information.
        programClass.u2accessFlags = dataInput.readUnsignedShort();
        //在访问标志后面的是类索引,两个字节表示
        programClass.u2thisClass   = dataInput.readUnsignedShort();
        //在类索引后面的是父类索引,两个字节表示
        programClass.u2superClass  = dataInput.readUnsignedShort();
    
        //在父类索引后面的是接口计数器,两个字节表示
        // Read the interfaces.
        programClass.u2interfacesCount = dataInput.readUnsignedShort();
    
        //创建接口索引集合,接着是遍历填充接口索引集合里的每一项.
        programClass.u2interfaces = new int[programClass.u2interfacesCount];
        for (int index = 0; index < programClass.u2interfacesCount; index++)
        {
            programClass.u2interfaces[index] = dataInput.readUnsignedShort();
        }
    
        //在接口索引集合后面的是字段个数,两个字节来表示
        // Read the fields.
        programClass.u2fieldsCount = dataInput.readUnsignedShort();
    
        //创建字段集合,并且遍历填充字段集合里的每一项.
        programClass.fields = new ProgramField[programClass.u2fieldsCount];
        for (int index = 0; index < programClass.u2fieldsCount; index++)
        {
            ProgramField programField = new ProgramField();
            this.visitProgramField(programClass, programField);
            programClass.fields[index] = programField;
        }
    
        //在字段集合后面的是方法计数器,用两个字节来表示.
        // Read the methods.
        programClass.u2methodsCount = dataInput.readUnsignedShort();
    
        //创建方法集合,并且遍历填充方法集合里的每一项.
        programClass.methods = new ProgramMethod[programClass.u2methodsCount];
        for (int index = 0; index < programClass.u2methodsCount; index++)
        {
            ProgramMethod programMethod = new ProgramMethod();
            this.visitProgramMethod(programClass, programMethod);
            programClass.methods[index] = programMethod;
        }
    
        //在方法集合后面的是附加属性计数器,用两个字节来表示
        // Read the class attributes.
        programClass.u2attributesCount = dataInput.readUnsignedShort();
    
        //创建附加属性集合,并且遍历填充附加属性集合里的每一项.
        programClass.attributes = new Attribute[programClass.u2attributesCount];
        for (int index = 0; index < programClass.u2attributesCount; index++)
        {
            Attribute attribute = createAttribute(programClass);
            attribute.accept(programClass, this);
            programClass.attributes[index] = attribute;
        }
    }
    

    visitProgramClass方法里面的每一步我都通过注释说明上了,简单的来说就是通过读取class字节码的方式,一个字节一个字节的把class内容读取到ProgramClass对象里,对class字节码格式熟悉的,这段代码逻辑不难理解。

    回到ClassReader里,当class字节码被成功读取到ProgramClass对象来,接着下面便是把它添加到ClassPool里了,

    public void read(DataEntry dataEntry) throws IOException
    {
        //此处省略很多代码...
        clazz = new ProgramClass();
        clazz.accept(new ProgramClassReader(dataInputStream));
        //此处的classVisitor其实是ClassPoolFiller对象.
        clazz.accept(classVisitor);
    }
    

    classVisitor是ClassPoolFiller对象,在InputReader里面被new出来的,代码如下:

    new ClassReader(false,
                    configuration.skipNonPublicLibraryClasses,
                    configuration.skipNonPublicLibraryClassMembers,
                    warningPrinter,
                    new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
                    new ClassPoolFiller(programClassPool)))
    
    public class ClassPoolFiller extends SimplifiedVisitor implements ClassVisitor
    {
        private final ClassPool classPool;
    
    
        /**
         * Creates a new ClassPoolFiller.
         */
        public ClassPoolFiller(ClassPool classPool)
        {
            this.classPool = classPool;
        }
    
    
        // Implementations for ClassVisitor.
    
        public void visitAnyClass(Clazz clazz)
        {
            classPool.addClass(clazz);
        }
    }
    

    代码比较简单,ClassPoolFiller内部持有了programClassPool,当ProgramClass初始化成功后,就会被添加到programClassPool里面。整个流程大概可以总结为:


    • DirectoryPump 负责遍历目录
    • JarReader 负责解压jar包
    • ClassReader 负责io读取 读取class字节流
    • ProgramClassReader 负责把字节流格式化成 ProgramClass对象

    总结

    本节主要是从源码的角度出发,分析了下proguard是怎么把jar包读取到ClassPool里面的,当把class字节码读取出来并且管理起来,接着就可以开始对它们进行一些裁剪跟混淆工作了,下一节我们将会继续分析proguard是如何裁剪压缩代码的。

    相关文章

      网友评论

          本文标题:proguard源码分析二 class字节码解析

          本文链接:https://www.haomeiwen.com/subject/yutrdltx.html