美文网首页
Hermes源码分析(二)——解析字节码

Hermes源码分析(二)——解析字节码

作者: FingerStyle | 来源:发表于2022-03-23 17:40 被阅读0次

    前面一节讲到字节码序列化为二进制是有固定的格式的,这里我们分析一下源码里面是怎么处理的

    一、写入字节码

    1. 写入头部
      先看BytecodeSerializer的serialize方法,这里初始化了一个BytecodeFileHeader对象,并通过writeBinary方法将其写入文件
    void BytecodeSerializer::serialize(BytecodeModule &BM, const SHA1 &sourceHash) {
      bytecodeModule_ = &BM;
      uint32_t cjsModuleCount = BM.getBytecodeOptions().cjsModulesStaticallyResolved
          ? BM.getCJSModuleTableStatic().size()
          : BM.getCJSModuleTable().size();
      BytecodeFileHeader header{MAGIC,
                                BYTECODE_VERSION,
                                sourceHash,
                                fileLength_,
                                BM.getGlobalFunctionIndex(),
                                BM.getNumFunctions(),
                                static_cast<uint32_t>(BM.getStringKinds().size()),
                                BM.getIdentifierCount(),
                                BM.getStringTableSize(),
                                overflowStringEntryCount_,
                                BM.getStringStorageSize(),
                                static_cast<uint32_t>(BM.getRegExpTable().size()),
                                static_cast<uint32_t>(BM.getRegExpStorage().size()),
                                BM.getArrayBufferSize(),
                                BM.getObjectKeyBufferSize(),
                                BM.getObjectValueBufferSize(),
                                BM.getCJSModuleOffset(),
                                cjsModuleCount,
                                debugInfoOffset_,
                                BM.getBytecodeOptions()};
      writeBinary(header);
      // Sizes of file and function headers are tuned for good cache line packing.
      // If you reorder the format, try to avoid headers crossing cache lines.
      visitBytecodeSegmentsInOrder(*this);
      serializeFunctionsBytecode(BM);
    
      for (auto &entry : BM.getFunctionTable()) {
        serializeFunctionInfo(*entry);
      }
    
      serializeDebugInfo(BM);
    
      if (isLayout_) {
        finishLayout(BM);
        serialize(BM, sourceHash);
      }
    }
    

    这里可以看到首先写入的是魔数,他的值为

    const static uint64_t MAGIC = 0x1F1903C103BC1FC6;
    

    对应的二进制见下图,注意是小端字节序


    magic

    第二项是字节码的版本,笔者的版本是74,也即 上图中的4a00 0000
    第三项是源码的hash,这里采用的是SHA1算法,生成的哈希值是160位,因此占用了20个字节


    source hash

    第四项是文件长度,这个字段是32位的,也就是下图中的为0aa030,转换成十进制就是696368,实际文件大小也是这么多


    file length
    文件大小

    后面的字段类似,就不一一分析了,头部所有字段的类型都可以在BytecodeFileHeader.h 中看到,Hermes按照既定的内存布局把字段写入后再序列化,就得到了我们看到的字节码文件。

    struct BytecodeFileHeader {
      uint64_t magic;
      uint32_t version;
      uint8_t sourceHash[SHA1_NUM_BYTES];
      uint32_t fileLength;
      uint32_t globalCodeIndex;
      uint32_t functionCount;
      uint32_t stringKindCount; // Number of string kind entries.
      uint32_t identifierCount; // Number of strings which are identifiers.
      uint32_t stringCount; // Number of strings in the string table.
      uint32_t overflowStringCount; // Number of strings in the overflow table.
      uint32_t stringStorageSize; // Bytes in the blob of string contents.
      uint32_t regExpCount;
      uint32_t regExpStorageSize;
      uint32_t arrayBufferSize;
      uint32_t objKeyBufferSize;
      uint32_t objValueBufferSize;
      uint32_t cjsModuleOffset; // The starting module ID in this segment.
      uint32_t cjsModuleCount; // Number of modules.
      uint32_t debugInfoOffset;
      BytecodeOptions options;
    
    1. 按序写入其他段
      还是看BytecodeSerializer的serialize方法, 在writeBinary之后调用了visitBytecodeSegmentsInOrder,这个方法是通过visitor模式去写入其他段,这是一个模版方法,里面调用了visitor的对应方法。 visitor在BytecodeSerializer和BytecodeFileFields里面都有各自的实现,我们这里只关注BytecodeSerializer的。
    template <typename Visitor>
    void visitBytecodeSegmentsInOrder(Visitor &visitor) {
      visitor.visitFunctionHeaders();
      visitor.visitStringKinds();
      visitor.visitIdentifierTranslations();
      visitor.visitSmallStringTable();
      visitor.visitOverflowStringTable();
      visitor.visitStringStorage();
      visitor.visitArrayBuffer();
      visitor.visitObjectKeyBuffer();
      visitor.visitObjectValueBuffer();
      visitor.visitRegExpTable();
      visitor.visitRegExpStorage();
      visitor.visitCJSModuleTable();
    }
    

    这里写入的数据很多,以函数头的写入为例,我们调用了visitFunctionHeader方法,并通过byteCodeModule拿到函数的签名,将其写入函数表(存疑,在实际的文件中并没有看到这一部分)。注意这些数据必须按顺序写入,因为读出的时候也是按对应顺序来的。

    void BytecodeSerializer::visitFunctionHeaders() {
      pad(BYTECODE_ALIGNMENT);
      serializeFunctionTable(*bytecodeModule_);
    }
    
    void BytecodeSerializer::serializeFunctionTable(BytecodeModule &BM) {
      for (auto &entry : BM.getFunctionTable()) {
        if (options_.stripDebugInfoSection) {
          // Change flag on the actual BF, so it's seen by serializeFunctionInfo.
          entry->mutableFlags().hasDebugInfo = false;
        }
        FunctionHeader header = entry->getHeader();
        writeBinary(SmallFuncHeader(header));
      }
    }
    

    二、 读取字节码

    我们知道react-native 在加载字节码的时候需要调用hermes的prepareJavaScript方法, 那这个方法做了些什么事呢?

    std::shared_ptr<const jsi::PreparedJavaScript>
    HermesRuntimeImpl::prepareJavaScript(
        const std::shared_ptr<const jsi::Buffer> &jsiBuffer,
        std::string sourceURL) {
      std::pair<std::unique_ptr<hbc::BCProvider>, std::string> bcErr{};
      auto buffer = std::make_unique<BufferAdapter>(std::move(jsiBuffer));
      vm::RuntimeModuleFlags runtimeFlags{};
      runtimeFlags.persistent = true;
    
      bool isBytecode = isHermesBytecode(buffer->data(), buffer->size());
    #ifdef HERMESVM_PLATFORM_LOGGING
      hermesLog(
          "HermesVM", "Prepare JS on %s.", isBytecode ? "bytecode" : "source");
    #endif
    
      // Construct the BC provider either from buffer or source.
      if (isBytecode) {
        bcErr = hbc::BCProviderFromBuffer::createBCProviderFromBuffer(
            std::move(buffer));
      } else {
        compileFlags_.lazy =
            (buffer->size() >=
             ::hermes::hbc::kDefaultSizeThresholdForLazyCompilation);
    #if defined(HERMESVM_LEAN)
        bcErr.second = "prepareJavaScript source compilation not supported";
    #else
        bcErr = hbc::BCProviderFromSrc::createBCProviderFromSrc(
            std::move(buffer), sourceURL, compileFlags_);
    #endif
      }
      if (!bcErr.first) {
        LOG_EXCEPTION_CAUSE(
            "Compiling JS failed: %s", bcErr.second.c_str());
        throw jsi::JSINativeException(
            "Compiling JS failed: \n" + std::move(bcErr.second));
      }
      return std::make_shared<const HermesPreparedJavaScript>(
          std::move(bcErr.first), runtimeFlags, std::move(sourceURL));
    }
    

    这里做了两件事情:
    1. 判断是否是字节码,如果是则调用createBCProviderFromBuffer,否则调用createBCProviderFromSrc,我们这里只关注createBCProviderFromBuffer
    2.通过BCProviderFromBuffer的构造方法得到文件头和函数头的信息(populateFromBuffer方法),下面是这个方法的实现。

    BCProviderFromBuffer::BCProviderFromBuffer(
       std::unique_ptr<const Buffer> buffer,
       BytecodeForm form)
       : buffer_(std::move(buffer)),
         bufferPtr_(buffer_->data()),
         end_(bufferPtr_ + buffer_->size()) {
     ConstBytecodeFileFields fields;
     if (!fields.populateFromBuffer(
             {bufferPtr_, buffer_->size()}, &errstr_, form)) {
       return;
     }
     const auto *fileHeader = fields.header;
     options_ = fileHeader->options;
     functionCount_ = fileHeader->functionCount;
     globalFunctionIndex_ = fileHeader->globalCodeIndex;
     debugInfoOffset_ = fileHeader->debugInfoOffset;
     functionHeaders_ = fields.functionHeaders.data();
     stringKinds_ = fields.stringKinds;
     identifierTranslations_ = fields.identifierTranslations;
     stringCount_ = fileHeader->stringCount;
     stringTableEntries_ = fields.stringTableEntries.data();
     overflowStringTableEntries_ = fields.stringTableOverflowEntries;
     stringStorage_ = fields.stringStorage;
     arrayBuffer_ = fields.arrayBuffer;
     objKeyBuffer_ = fields.objKeyBuffer;
     objValueBuffer_ = fields.objValueBuffer;
     regExpTable_ = fields.regExpTable;
     regExpStorage_ = fields.regExpStorage;
     cjsModuleOffset_ = fileHeader->cjsModuleOffset;
     cjsModuleTable_ = fields.cjsModuleTable;
     cjsModuleTableStatic_ = fields.cjsModuleTableStatic;
    }
    

    BytecodeFileFields的populateFromBuffer方法也是一个模版方法,注意这里调用populateFromBuffer方法的是一个 ConstBytecodeFileFields对象,他代表的是不可变的字节码字段。

    template <bool Mutable>
    bool BytecodeFileFields<Mutable>::populateFromBuffer(
        Array<uint8_t> buffer,
        std::string *outError,
        BytecodeForm form) {
      if (!sanityCheck(buffer, form, outError)) {
        return false;
      }
    
      // Helper type which populates a BytecodeFileFields. This is nested inside the
      // function so we can leverage BytecodeFileFields template types.
      struct BytecodeFileFieldsPopulator {
        /// The fields being populated.
        BytecodeFileFields &f;
    
        /// Current buffer position.
        Pointer<uint8_t> buf;
    
        /// A pointer to the bytecode file header.
        const BytecodeFileHeader *h;
    
        /// End of buffer.
        const uint8_t *end;
    
        BytecodeFileFieldsPopulator(
            BytecodeFileFields &fields,
            Pointer<uint8_t> buffer,
            const uint8_t *bufEnd)
            : f(fields), buf(buffer), end(bufEnd) {
          f.header = castData<BytecodeFileHeader>(buf);
          h = f.header;
        }
    
        void visitFunctionHeaders() {
          align(buf);
          f.functionHeaders =
              castArrayRef<SmallFuncHeader>(buf, h->functionCount, end);
        }
    .....
    }
    

    细心的读者会发现这里也有visitFunctionHeaders方法, 这里主要为了复用visitBytecodeSegmentsInOrder的逻辑,把populator当作一个visitor来按顺序读取buffer的内容,并提前加载到BytecodeFileFields里面,以减少后面执行字节码时解析的时间。

    Hermes引擎在读取了字节码之后会通过解析BytecodeFileHeader这个结构体中的字段来获取一些关键信息,例如bundle是否是字节码格式,是否包含了函数,字节码的版本是否匹配等。注意这里我们只是解析了头部,没有解析整个字节码,后面执行字节码时才会解析剩余的部分。

    三、执行字节码

    evaluatePreparedJavaScript这个方法,主要是调用了HermesRuntime的 runBytecode方法,这里hermesPrep时上一步解析头部时获取的BCProviderFromBuffer实例。

    jsi::Value HermesRuntimeImpl::evaluatePreparedJavaScript(
        const std::shared_ptr<const jsi::PreparedJavaScript> &js) {
      return maybeRethrow([&] {
        assert(
            dynamic_cast<const HermesPreparedJavaScript *>(js.get()) &&
            "js must be an instance of HermesPreparedJavaScript");
        auto &stats = runtime_.getRuntimeStats();
        const vm::instrumentation::RAIITimer timer{
            "Evaluate JS", stats, stats.evaluateJS};
        const auto *hermesPrep =
            static_cast<const HermesPreparedJavaScript *>(js.get());
        vm::GCScope gcScope(&runtime_);
        auto res = runtime_.runBytecode(
            hermesPrep->bytecodeProvider(),
            hermesPrep->runtimeFlags(),
            hermesPrep->sourceURL(),
            vm::Runtime::makeNullHandle<vm::Environment>());
        checkStatus(res.getStatus());
        return valueFromHermesValue(*res);
      });
    }
    

    runBytecode这个方法比较长,主要做了几件事情:

    1. 获取globalFunctionIndex
    auto globalFunctionIndex = bytecode->getGlobalFunctionIndex();
    
    1. 创建全局的作用域、用于垃圾回收的Domain和相应的运行时模块。并通过globalFunctionIndex获取到全局入口的代码
    GCScope scope(this);
    
      Handle<Domain> domain = makeHandle(Domain::create(this));
    
      auto runtimeModuleRes = RuntimeModule::create(
          this, domain, nextScriptId_++, std::move(bytecode), flags, sourceURL);
      if (LLVM_UNLIKELY(runtimeModuleRes == ExecutionStatus::EXCEPTION)) {
        return ExecutionStatus::EXCEPTION;
      }
      auto runtimeModule = *runtimeModuleRes;
      auto globalCode = runtimeModule->getCodeBlockMayAllocate(globalFunctionIndex);
    

    这里说明一下,Domain是用于垃圾回收的运行时模块的代理, Domain被创建时是空的,并跟随着运行时模块进行传播, 在运行时模块的整个生命周期内都一直存在。在某个Domain下创建的所有函数都会保持着对这个Domain的强引用。当Domain被回收的时候,这个Domain下的所有函数都不能使用。

    1. 调用runRequireCall来执行全局的入口函数
    if (runtimeModule->hasCJSModules()) {
        auto requireContext = RequireContext::create(
            this, domain, getPredefinedStringHandle(Predefined::dotSlash));
        return runRequireCall(
            this, requireContext, domain, *domain->getCJSModuleOffset(this, 0));
      } else if (runtimeModule->hasCJSModulesStatic()) {
        return runRequireCall(
            this,
            makeNullHandle<RequireContext>(),
            domain,
            *domain->getCJSModuleOffset(this, 0));
      } else {
        // Create a JSFunction which will reference count the runtime module.
        // Note that its handle gets registered in the scope, so we don't need to
        // save it. Also note that environment will often be null here, except if
        // this is local eval.
        auto func = JSFunction::create(
            this,
            domain,
            Handle<JSObject>::vmcast(&functionPrototype),
            environment,
            globalCode);
    
        ScopedNativeCallFrame newFrame{this,
                                       0,
                                       func.getHermesValue(),
                                       HermesValue::encodeUndefinedValue(),
                                       *thisArg};
        if (LLVM_UNLIKELY(newFrame.overflowed()))
          return raiseStackOverflow(StackOverflowKind::NativeStack);
        return shouldRandomizeMemoryLayout_
            ? interpretFunctionWithRandomStack(this, globalCode)
            : interpretFunction(globalCode);
      }
    

    未完待续。。。

    相关文章

      网友评论

          本文标题:Hermes源码分析(二)——解析字节码

          本文链接:https://www.haomeiwen.com/subject/cogcjrtx.html