Phasar-IFDS框架学习笔记

作者: Nevv | 来源:发表于2019-03-10 13:45 被阅读0次

Phasar-IFDS框架学习笔记
20160923 学习笔记 express框架
Android数据库框架 - LitePal
【山外笔记-工具框架】SVN版本控制系统
mapreduce框架详解
Flask学习笔记之第一个Flask程序
学习笔记-框架
react学习笔记二----nodejs服务器搭建及异常处理
Retrofit学习笔记
Android Chart 框架 MPAndroidChart

IFDSAnalysis

Phasar框架中，为了解决IFDS / IDE问题，主要分成了以下三个步骤：

IFDS / IDE 求解器（slover）。
流函数的形式定义IFDS / IDE分析问题（不同IFDS问题定义不同的流函数）。
程序间控制流图的实现（ICFG）。

1.phasar 主函数流程

AnalysisController

在处理完配置文件和命令行解析后，实例化一个 AnalysisController 管理整个分析的流程：

    /********* 处理完参数解析后，下面的代码开始进行实际的分析 *********/
    // At this point we have set-up all the parameters and can start the actual
    // analyses that have been choosen.
    AnalysisController Controller(
        [&lg](bool usingModules) {
          PAMM_GET_INSTANCE;
          START_TIMER("IRDB Construction", PAMM_SEVERITY_LEVEL::Full);
          LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO) << "Set-up IR database.");
          IRDBOptions Opt = IRDBOptions::NONE;
          if (VariablesMap["wpa"].as<bool>()) {
            Opt |= IRDBOptions::WPA;
          }
          if (VariablesMap["mem2reg"].as<bool>()) {
            Opt |= IRDBOptions::MEM2REG;
          }
          if (usingModules) {
            ProjectIRDB IRDB(
                VariablesMap["module"].as<std::vector<std::string>>(), Opt);
            STOP_TIMER("IRDB Construction", PAMM_SEVERITY_LEVEL::Full);
            return IRDB;
          } else {
            // perform a little trick to make OptionsParser only responsible for
            // the project sources
            int OnlyTakeCareOfSources = 2;
            const char *ProjectSources =
                VariablesMap["project"].as<std::string>().c_str();
            const char *DummyProgName = "not_important";
            const char *DummyArgs[] = {DummyProgName, ProjectSources};
            clang::tooling::CommonOptionsParser OptionsParser(
                OnlyTakeCareOfSources, DummyArgs, StaticAnalysisCategory,
                OccurrencesFlag);
            clang::tooling::CompilationDatabase &CompileDB =
                OptionsParser.getCompilations();
            ProjectIRDB IRDB(CompileDB, Opt);
            STOP_TIMER("IRDB Construction", PAMM_SEVERITY_LEVEL::Full);
            return IRDB;
          }
        }(VariablesMap.count("module")),
        // IRDB 管理和维护llvm生成的IR代码和相关信息
        ChosenDataFlowAnalyses,
        // 要执行的数据流分析实例，如IFDSTaintAnalysis
        VariablesMap["wpa"].as<bool>(),  
        // wpa 是否分析整个程序 Whole-program analysis mode (1 or 0)
        VariablesMap["printedgerec"].as<bool>(),
        VariablesMap["graph-id"].as<std::string>());
        // 指定图的id，其后由exportJson将图转化后发送到可视化的服务端

        /********* 使用一个ChosenDataFlowAnalyses来管理整个分析流程 *********/

接下来看下 AnalysisController 的构造方法，其主要是完成以下功能：
- 检查入口点合法性
- 生成 ICFG
- 调用用户指定的 Analyzer

AnalysisController::AnalysisController(
    ProjectIRDB &&IRDB, std::vector<DataFlowAnalysisType> Analyses,
    bool WPA_MODE, bool PrintEdgeRecorder, std::string graph_id)
    : FinalResultsJson() {
  PAMM_GET_INSTANCE;
  auto &lg = lg::get();
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO)
                << "Constructed the analysis controller.");
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO)
                << "Found the following IR files for this project: ");
  for (auto file : IRDB.getAllSourceFiles()) {
    LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO) << "\t" << file);
  }
  // 检查入口点是否合法
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO) << "Check for chosen entry points.");
  vector<string> EntryPoints = {"main"};
  if (VariablesMap.count("entry-points")) {
    std::vector<std::string> invalidEntryPoints;
    for (auto &entryPoint : VariablesMap["entry-points"].as<vector<string>>()) {
      if (IRDB.getFunction(entryPoint) == nullptr) {
        invalidEntryPoints.push_back(entryPoint);
      }
    }
    if (invalidEntryPoints.size()) {
      for (auto &invalidEntryPoint : invalidEntryPoints) {
        LOG_IF_ENABLE(BOOST_LOG_SEV(lg, ERROR)
                      << "Entry point '" << invalidEntryPoint
                      << "' is not valid.");
      }
      throw logic_error("invalid entry points");
    }
    if (VariablesMap["entry-points"].as<vector<string>>().size()) {
      EntryPoints = VariablesMap["entry-points"].as<vector<string>>();
    }
  }
  if (WPA_MODE) {
    // here we link every llvm module into a single module containing the entire
    // IR
    LOG_IF_ENABLE(
        BOOST_LOG_SEV(lg, INFO)
        << "link all llvm modules into a single module for WPA ...\n");
    START_TIMER("Link to WPA Module", PAMM_SEVERITY_LEVEL::Full);
    IRDB.linkForWPA();
    STOP_TIMER("Link to WPA Module", PAMM_SEVERITY_LEVEL::Full);
    LOG_IF_ENABLE(
        BOOST_LOG_SEV(lg, INFO)
        << "link all llvm modules into a single module for WPA ended\n");
  }
  IRDB.preprocessIR();

  // 重构 inter-modular 的类层次结构和虚函数表
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO) << "Reconstruct the class hierarchy.");
  START_TIMER("CH Construction", PAMM_SEVERITY_LEVEL::Core);
  LLVMTypeHierarchy CH(IRDB);
  STOP_TIMER("CH Construction", PAMM_SEVERITY_LEVEL::Core);
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO)
                << "Reconstruction of class hierarchy completed.");

  // 生成 CG
  CallGraphAnalysisType CGType(
      (VariablesMap.count("callgraph-analysis"))
          ? StringToCallGraphAnalysisType.at(
                VariablesMap["callgraph-analysis"].as<string>())
          : CallGraphAnalysisType::OTF);
  // 执行 WPA 分析
  if (WPA_MODE) {
    START_TIMER("CG Construction", PAMM_SEVERITY_LEVEL::Core);
    LLVMBasedICFG ICFG(CH, IRDB, CGType, EntryPoints);
    // 生成 ICFG
    if (VariablesMap.count("callgraph-plugin")) {
      throw runtime_error("callgraph plugin not found");
    }
    STOP_TIMER("CG Construction", PAMM_SEVERITY_LEVEL::Core);
    // ICFG.printAsDot("call_graph.dot");

    // CFG 只在 monotone 函数内分析框架中使用
    LLVMBasedCFG CFG;
    START_TIMER("DFA Runtime", PAMM_SEVERITY_LEVEL::Core);
    /*
     * 根据选择的分析模块执行
     */
    for (DataFlowAnalysisType analysis : Analyses) {
      LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO)
                    << "Performing analysis: " << analysis);
      switch (analysis) {
      case DataFlowAnalysisType::IFDS_TaintAnalysis: {
        TaintSensitiveFunctions TSF;
        // 使用build-in的source和sink
        IFDSTaintAnalysis TaintAnalysisProblem(ICFG, TSF, EntryPoints);
        LLVMIFDSSolver<const llvm::Value *, LLVMBasedICFG &> LLVMTaintSolver(
            TaintAnalysisProblem, false);
        // 实例化一个 solver 进行数据流分析
        cout << "IFDS Taint Analysis ..." << endl;
        LLVMTaintSolver.solve();
        cout << "IFDS Taint Analysis ended" << endl;
        // FinalResultsJson += LLVMTaintSolver.getAsJson();
        if (PrintEdgeRecorder) {
          LLVMTaintSolver.exportJson(graph_id);
        }
        break;
      }
      case DataFlowAnalysisType::IDE_TaintAnalysis: {
        IDETaintAnalysis taintanalysisproblem(ICFG, EntryPoints);
        LLVMIDESolver<const llvm::Value *, const llvm::Value *, LLVMBasedICFG &>
            llvmtaintsolver(taintanalysisproblem, true);
        llvmtaintsolver.solve();
        FinalResultsJson += llvmtaintsolver.getAsJson();
        if (PrintEdgeRecorder) {
          llvmtaintsolver.exportJson(graph_id);
        }
        break;
      }
     …………
     …………
  }
  // 智能化模块分析？
  // Perform module-wise (MW) analysis
  else {
    throw runtime_error(
        "This code will follow soon with an accompanying paper!");
  }
}

下边代码是调用具体的 IFDS 框架求解污点分析问题的代码，其中 LLVMIFDSSolver 继承自 IFDSSolver ，最终的祖先类是 IDESolver ，最基本的函数基本上都封装在这里。

        IFDSTaintAnalysis TaintAnalysisProblem(ICFG, TSF, EntryPoints);
        LLVMIFDSSolver<const llvm::Value *, LLVMBasedICFG &> LLVMTaintSolver(
            TaintAnalysisProblem, false);
        // 实例化一个 solver 进行数据流分析
        cout << "IFDS Taint Analysis ..." << endl;
        LLVMTaintSolver.solve();

2. IDESolver(IFDS的基类)

solve 函数实现流程如下：

调用 submitInitalSeeds
- 根据 initseed 初始化
- 生成 edge function 构建 exploded supergraph
调用 computeValues
- 根据 edge function 计算并保存函数节点的数据流值并保存到 Table<N, D, V> valtab 中
调用 computeAndPrintStatistics
- 计算并通过之前保存的valtab打印运行中的一些信息
- 通过从 computedIntraPathEdges 和 computedInterPathEdges 结构中获取信息
- IntraPathEdges 和 InterPathEdges 由分析过程中调用 saveEdges 函数存储

  virtual void solve() {
      ……
    // computations starting here
    START_TIMER("DFA Phase I", PAMM_SEVERITY_LEVEL::Full);
    // We start our analysis and construct exploded supergraph
    LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO)
                  << "Submit initial seeds, construct exploded super graph");
    /* 初始化，构建超级图 */
    submitInitalSeeds();
    STOP_TIMER("DFA Phase I", PAMM_SEVERITY_LEVEL::Full);
    if (computevalues) {
      START_TIMER("DFA Phase II", PAMM_SEVERITY_LEVEL::Full);
      // 计算最终的值
      // Computing the final values for the edge functions
      LOG_IF_ENABLE(
          BOOST_LOG_SEV(lg, INFO)
          << "Compute the final values according to the edge functions");
      computeValues();
      STOP_TIMER("DFA Phase II", PAMM_SEVERITY_LEVEL::Full);
    }
    LOG_IF_ENABLE(BOOST_LOG_SEV(lg, INFO) << "Problem solved");
    if constexpr (PAMM_CURR_SEV_LEVEL >= PAMM_SEVERITY_LEVEL::Core) {
      computeAndPrintStatistics();
    }
  }

subitInitalSeeds

submitInitalSeeds 的主要功能是初始化种子和初始化分析。

  void submitInitalSeeds() {
    auto &lg = lg::get();
    PAMM_GET_INSTANCE;
    for (const auto &seed : initialSeeds) {
      N startPoint = seed.first;
      LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG)
                    << "Start point: "
                    << ideTabulationProblem.NtoString(startPoint));
      for (const D &value : seed.second) {
        LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG)
                      << "      Value: "
                      << ideTabulationProblem.DtoString(value));
        LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG) << ' ');
        if (!ideTabulationProblem.isZeroValue(value)) {
          INC_COUNTER("Gen facts", 1, PAMM_SEVERITY_LEVEL::Core);
        }
        propagate(zeroValue, startPoint, value, EdgeIdentity<V>::getInstance(),
                  nullptr, false);
      }
      jumpFn->addFunction(zeroValue, startPoint, zeroValue,
                          EdgeIdentity<V>::getInstance()); // 添加边函数
// jumpFn 为指向 std::shared_ptr<JumpFunctions<N, D, M, V, I>> 的智能指针
// addFunction 函数的功能为记录数据流前向和后向的流向，存储为{key=llvm::value,value=edgefunction}的形式
    }
  }

initialSeeds 变量来源于 tabulationProblem.initialSeeds() 方法，根据要分析的特定问题的不同，其生成初始化种子的方法也不同。比如对于 IFDSTaintAnalysis 来说，EntryPoint 为 main 函数的时候，插入的初始值是命令行参数（标记为污染的）和零值，对于其他入口点，只插入零值。

map<IFDSTaintAnalysis::n_t, set<IFDSTaintAnalysis::d_t>>
IFDSTaintAnalysis::initialSeeds() {
  auto &lg = lg::get();
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG)
                << "IFDSTaintAnalysis::initialSeeds()");
  // If main function is the entry point, commandline arguments have to be
  // tainted. Otherwise we just use the zero value to initialize the analysis.
  map<IFDSTaintAnalysis::n_t, set<IFDSTaintAnalysis::d_t>> SeedMap;
  for (auto &EntryPoint : EntryPoints) {
    if (EntryPoint == "main") {
      set<IFDSTaintAnalysis::d_t> CmdArgs;
      for (auto &Arg : icfg.getMethod(EntryPoint)->args()) {
        CmdArgs.insert(&Arg);
      }
      CmdArgs.insert(zeroValue());
      SeedMap.insert(
          make_pair(&icfg.getMethod(EntryPoint)->front().front(), CmdArgs));
    } else {
      SeedMap.insert(make_pair(&icfg.getMethod(EntryPoint)->front().front(),
                               set<IFDSTaintAnalysis::d_t>({zeroValue()})));
    }
  }
  return SeedMap;
}

propagate

propagate 函数负责将数据流值沿着超级图传播，合并已经在目标语句处计算出的边函数
- 参数 sourceVal 为边函数上的源 value
- 参数 target 为目标语句
- 参数 targetval 为目标处的 value
- 参数 f 为 (s0,sourceVal) 到(target,targetVal) 计算出来的 edge function

void
  propagate(D sourceVal, N target, D targetVal,
            std::shared_ptr<EdgeFunction<V>> f,
            /* deliberately exposed to clients */ N relatedCallSite,
            /* deliberately exposed to clients */ bool isUnbalancedReturn) {
    auto &lg = lg::get();
    LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG) << "Propagate flow");
    std::shared_ptr<EdgeFunction<V>> jumpFnE = nullptr;
    std::shared_ptr<EdgeFunction<V>> fPrime;
    if (!jumpFn->reverseLookup(target, targetVal).empty()) {
       // reverseLookup 对于给定的目标语句和值，返回所有关联的源值，返回值是从源值到edge函数的映射。
       // 如果不为空，返回 sourceVal 到 targetVal 的所有 edge function
      jumpFnE = jumpFn->reverseLookup(target, targetVal)[sourceVal];
    }
    if (jumpFnE == nullptr) {
      jumpFnE = allTop; // jump function is initialized to all-top
    }
    fPrime = jumpFnE->joinWith(f); // 合并？
    bool newFunction = !(fPrime->equal_to(jumpFnE));
    if (newFunction) { // 如果是一个新edge function的话，进行传播，
      jumpFn->addFunction(sourceVal, target, targetVal, fPrime);
      PathEdge<N, D> edge(sourceVal, target, targetVal); // 添加边
      PathEdgeCount++;
      pathEdgeProcessingTask(edge);
      /*
      pathEdgeProcessingTask
      处理边，根据其 target 的不同做以下处理(对应于 tabulation 算法)：
        processExit
        processNormalFlow
        processCall
      */
  }

pathEdgeProcessingTask

这个函数主要根据传递过来的边，判断 target node 是不是 call 或者 exit 节点，分别做不同的处理传播

 void pathEdgeProcessingTask(PathEdge<N, D> edge) {
    bool isCall = icfg.isCallStmt(edge.getTarget());
    if (!isCall) {   
      if (icfg.isExitStmt(edge.getTarget())) {
        processExit(edge);
      }
      if (!icfg.getSuccsOf(edge.getTarget()).empty()) {
        processNormalFlow(edge);
      }
    } else {
      processCall(edge);
    }
  }

调用的这三个函数实现的是论文中 IFDS 的基本算法

函数名	功能
processCall	对应算法的13-20行，处理call
processExit	对应算法的21-32行，处理exit
processNormalFlow	对应算法的33-37行，处理普通情况下的数据流传递（除了 call 和 exit ）

processCall 主要流程：
- 对于每一个可能的 callee，首先会检查是否存在 summary 边，有的话直接当作普通的数据流传递，否则就根据call 的函数计算数据流传递。
- 然后处理 call2ret 节点间的数据流传递
processExit 主要流程：
- 对于 caller 传入的数据流并且能最终通过 retsite 传递出来的，添加 summary 边
- 将 caller 函数开始处的数据流值传递到 ret 处

computeValues

根据 edge functions 计算最终值

  void computeValues() {
    auto &lg = lg::get();
    LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG) << "Start computing values");
    // Phase II(i)
    std::map<N, std::set<D>> allSeeds(initialSeeds);
    for (N unbalancedRetSite : unbalancedRetSites) {
      if (allSeeds[unbalancedRetSite].empty()) {
        allSeeds.insert(make_pair(unbalancedRetSite, std::set<D>({zeroValue})));
      }
    }
    // do processing
      // 这里是设置节点的数据流值，并将其保存到 valtab 中
    for (const auto &seed : allSeeds) {
      N startPoint = seed.first;
      for (D val : seed.second) {
        // 设置起始节点数据流值
        setVal(startPoint, val, ideTabulationProblem.bottomElement());
        std::pair<N, D> superGraphNode(startPoint, val);
        valuePropagationTask(superGraphNode);
      }
    }
    // Phase II(ii)
    // we create an array of all nodes and then dispatch fractions of this array
    // to multiple threads
      
    std::set<N> allNonCallStartNodes = icfg.allNonCallStartNodes();
   // 返回既不是call也不是invoke节点的集合
    std::vector<N> nonCallStartNodesArray(allNonCallStartNodes.size());
    size_t i = 0;
    for (N n : allNonCallStartNodes) {
      nonCallStartNodesArray[i] = n;
      i++;
    }
    valueComputationTask(nonCallStartNodesArray);
  }

setVal

给 node 设置数据流值，如果 l 是 top value 的话就从 valtab 中移除

  void setVal(N nHashN, D nHashD, V l) {
    auto &lg = lg::get();
    // 如果是 top value，则不存储
    // TOP is the implicit default value which we do not need to store.
    if (l == ideTabulationProblem.topElement()) {
      valtab.remove(nHashN, nHashD);
    } else {
      valtab.insert(nHashN, nHashD, l);
    }
  }

valuePropagationTask

  // should be made a callable at some point
  void valuePropagationTask(std::pair<N, D> nAndD) {
    N n = nAndD.first;
    // 初始化的种子不一定是函数的入口点
    // our initial seeds are not necessarily method-start points but here they
    // should be treated as such the same also for unbalanced return sites in
    // an unbalanced problem
    if (icfg.isStartPoint(n) || initialSeeds.count(n) ||
        unbalancedRetSites.count(n)) {
      propagateValueAtStart(nAndD, n);
    }
    if (icfg.isCallStmt(n)) {
      propagateValueAtCall(nAndD, n);
    }
  }

如果是函数入口点或者在 initialSeeds 中的话，执行 propagateValueAtStart 进行数据流传递。

propagateValueAtStart

  void propagateValueAtStart(std::pair<N, D> nAndD, N n) {
    PAMM_GET_INSTANCE;
    D d = nAndD.second;
    M p = icfg.getMethodOf(n); // 获取 n 所在的函数
    for (N c : icfg.getCallsFromWithin(p)) {
      for (auto entry : jumpFn->forwardLookup(d, c)) { 
        // 这里的c最开始是 initialSeeds 中的初始数据流值
        /* forwardLookup函数根据给定的source value和目标语句，返回所有目标语句中受到影响的
           目标value和 edge function，（key=target value，value=edge function)      */ 
        D dPrime = entry.first;
        std::shared_ptr<EdgeFunction<V>> fPrime = entry.second;
        N sP = n;
        V value = val(sP, d);
        INC_COUNTER("Value Propagation", 1, PAMM_SEVERITY_LEVEL::Full);
        propagateValue(c, dPrime, fPrime->computeTarget(value)); // 传递数据流
      }
    }
  }

getCallsFromWithin 获取给定函数的所有调用位置

propagateValueAtCall

  void propagateValueAtCall(std::pair<N, D> nAndD, N n) {
    PAMM_GET_INSTANCE;
    D d = nAndD.second;
    for (M q : icfg.getCalleesOfCallAt(n)) { // 获取到 callee
      std::shared_ptr<FlowFunction<D>> callFlowFunction = // call指令的数据流传递函数
          cachedFlowEdgeFunctions.getCallFlowFunction(n, q); 
        // getCallFlowFunction 完成实参和形参的映射
      INC_COUNTER("FF Queries", 1, PAMM_SEVERITY_LEVEL::Full);
      for (D dPrime : callFlowFunction->computeTargets(d)) {
        std::shared_ptr<EdgeFunction<V>> edgeFn =
            cachedFlowEdgeFunctions.getCallEdgeFunction(n, d, q, dPrime); 
          // 获取 edge function
        INC_COUNTER("EF Queries", 1, PAMM_SEVERITY_LEVEL::Full);
        for (N startPoint : icfg.getStartPointsOf(q)) {
          INC_COUNTER("Value Propagation", 1, PAMM_SEVERITY_LEVEL::Full);
          propagateValue(startPoint, dPrime, edgeFn->computeTarget(val(n, d))); 
        }
      }
    }
  }

这里的 cachedFlowEdgeFunctions = tabulationProblem

propagateValue

 void propagateValue(N nHashN, D nHashD, V v) {
    V valNHash = val(nHashN, nHashD);
    V vPrime = joinValueAt(nHashN, nHashD, valNHash, v);
    if (!(vPrime == valNHash)) {
      setVal(nHashN, nHashD, vPrime);
      valuePropagationTask(std::pair<N, D>(nHashN, nHashD));
    }
  }

3.IFDS问题分析实例

IFDSTaintAnalysis

这里以 IFDSTaintAnalysis 为例，这个类中主要实例化了 Tablution 算法中所定义的四个数据流传递函数：

getNormalFlowFunction 普通节点
getCallFlowFunction 函数调用点到被调用函数
getRetFlowFunction ret 节点，在污点分析问题中，用于在函数调用返回后，处理返回值和参数是否被污染，只关注 pointer/reference 类型的指针
getCallToRetFlowFunction call结点其return结点

其实例化的 IFDSTaintAnalysis 类如下：

namespace psr {

class LLVMBasedICFG;

/**
 *
 * 主要处理的数据流向是从 source 点到 sink 点，一旦发现有被标记为污染的数据
 * 流到达 sink 点，就会 leak 报告出来。
 * 
 * @see TaintSensitiveFunctions on how to specify your own
 * taint-sensitive source and sink functions.
 */
class IFDSTaintAnalysis : public DefaultIFDSTabulationProblem<
                              const llvm::Instruction *, const llvm::Value *,
                              const llvm::Function *, LLVMBasedICFG &> {
                                  // {N D M I}
public:
  typedef const llvm::Value *d_t;
  typedef const llvm::Instruction *n_t;
  typedef const llvm::Function *m_t;
  typedef LLVMBasedICFG &i_t;

private:
  TaintSensitiveFunctions SourceSinkFunctions   ; // 存储SourceSink函数
  std::vector<std::string> EntryPoints;  // 函数入口点

public:
  /// Holds all leaks found during the analysis
  std::map<n_t, std::set<d_t>> Leaks;

  /**
   *
   * @param icfg
   * @param TSF
   * @param EntryPoints
   */
  IFDSTaintAnalysis(i_t icfg, TaintSensitiveFunctions TSF,
                    std::vector<std::string> EntryPoints = {"main"});                               
                                  
  virtual ~IFDSTaintAnalysis() = default;

  std::shared_ptr<FlowFunction<d_t>> getNormalFlowFunction(n_t curr,
                                                           n_t succ) override;
  std::shared_ptr<FlowFunction<d_t>> getCallFlowFunction(n_t callStmt,
                                                         m_t destMthd) override;

  std::shared_ptr<FlowFunction<d_t>> getRetFlowFunction(n_t callSite,
                                                        m_t calleeMthd,
                                                        n_t exitStmt,
                                                        n_t retSite) override;
  std::shared_ptr<FlowFunction<d_t>>
  getCallToRetFlowFunction(n_t callSite, n_t retSite,
                           std::set<m_t> callees) override;

// 计算summary的函数，将一个函数内部的数据流传递情况计算summary
  std::shared_ptr<FlowFunction<d_t>>
  getSummaryFlowFunction(n_t callStmt, m_t destMthd) override;

  std::map<n_t, std::set<d_t>> initialSeeds() override;
  d_t createZeroValue() override;
  bool isZeroValue(d_t d) const override;
  void printNode(std::ostream &os, n_t n) const override;
  void printDataFlowFact(std::ostream &os, d_t d) const override;
  void printMethod(std::ostream &os, m_t m) const override;
  void printIFDSReport(std::ostream &os,
                       SolverResults<n_t, d_t, BinaryDomain> &SR) override;
};
} // namespace psr

IFDSTaintAnalysis 在初始化的时候传入了一个 TaintSensitiveFunctions 对象，这个类主要是维护用于污点分析 source 函数和 sink 函数，并且缺省定义了一些常见的函数，并标记了其 Critical argument

DefaultIFDSTabulationProblem

这是一个 IFDS 问题的默认模板类，用于定义一个 IFDS 问题，设置其 solver 的参数：

namespace psr {
template <typename N, typename D, typename M, typename I>
class DefaultIFDSTabulationProblem : public IFDSTabulationProblem<N, D, M, I> {
protected:
  I icfg;
  virtual D createZeroValue() = 0;
  D zerovalue;

public:
    // 进行一些默认的 solver 的配置
  DefaultIFDSTabulationProblem(I icfg) : icfg(icfg) {
    this->solver_config.followReturnsPastSeeds = false;
    this->solver_config.autoAddZero = true;
    this->solver_config.computeValues = true;
    this->solver_config.recordEdges = true;
    this->solver_config.computePersistedSummaries = true;
  }

  virtual ~DefaultIFDSTabulationProblem() = default;

  virtual std::shared_ptr<FlowFunction<D>>
  getSummaryFlowFunction(N callStmt, M destMthd) override {
    return nullptr;
  }
    
  I interproceduralCFG() override { return icfg; }
  D zeroValue() override { return zerovalue; }
};

} // namespace psr

#endif

因此在用户实现自己的 IFDS分析问题的时候，只需要 override 特定的数据流传递函数即可。

4. ICFG 实现

TODO

5. 代码结构

Phasar使用AnalysisController管理整个分析流程，根据选择的特定的数据流问题进行分析。并将IFDS问题分别使用两个类管理，一个是Problem类，用于定义要分析的问题，另一类是Solver类，用于求解IFDS问题：

Problem类的继承关系及基类简述
- DefaultIFDSTabulationProblem：
  - IFDSTabulationProblem
  - FlowFunctions
  - NodePrinter
  - DataFlowFactPrinter
  - MethodPrinter

类	简述
DefaultIFDSTabulationProblem	所有特定 IFDS 问题的基类
IFDSTabulationProblem	保存了zerovalue、配置信息和初始化值
FlowFunctions	定义基本的数据流传递函数
NodePrinter、DataFlowFactPrinter、MethodPrinter	打印特定数据结构的模板类

FlowFunctions 类中实现的方法接口

方法名	功能
getNormalFlowFunction	普通节点的数据流传递函数
getCallFlowFunction	函数调用的数据流传递函数，实参映射到形参
getRetFlowFunction	返回节点的数据流传递函数，在被调函数的最后将数据流值传回caller中其ret节点
getCallToRetFlowFunction	call到ret的数据流传递函数，处理没有被callee影响到的数据流值
getSummaryFlowFunction	默认是返回空指针，用于处理libc库函数或者llvm生成的不能follow进去的函数

IFDSTabulationProblem 类中的方法接口和成员变量

名称	简述
initialSeeds	分析起始程序点和其的初始数据流值
setSolverConfiguration	设置solver
isZeroValue	判断数据流值是否是0值
interproceduralCFG	返回分析问题的ICFG

Slover类的继承关系及简述

LLVMIFDSSolver最终也是继承自 IDESolver，对于特定的IFDS问题的求解主要是调用solve( )方法。
- LLVMIFDSSolver
  - IFDSSolver
    - IDESolver

LLVMIFDSSolver

LLVMIFDSSolver这个类中实现了打印过程间、过程内 pathEdge 方法以及将结果可视化传送给web端的接口。
IFDSSolver

IFDSSolver实现了一个返回给定stmt处ifds结果的方法。
IDESolver

这个类中slove( )函数实现的是对IFDS问题的求解，solve函数主要有三个步骤：
1. submitInitalSeeds 提交初始化数据流值的seed，构建 exploded supergraph
2. computeValues 根据 edge functions 计算最终的数据流值
3. 主要是计算和打印一些分析过程中的信息，比如gen/kill的数据流数量等。

函数名	功能
propagate	负责将数据流沿着exploded super graph向下传递，合并任何可能已经在目标语句处算出数据流值的edge function
setVal	保存每个程序点的数据流值
saveEdges	向computedInterPathEdges和computedIntraPathEdges存储已经计算出来的pathEdges
computeValues	根据边函数计算每一个程序点的数据流值，调用setval函数存入valtab
addEndSummary	添加函数的摘要边

变量名	功能
std::shared_ptr<JumpFunctions<N, D, M, V, I>> jumpFn;	存储并管理exploded super graph中的数据流边
Table<N, D, V> valtab;	存储节点的数据流值
computedInterPathEdges	函数间的边（打印信息用）
computedIntraPathEdges	函数内的边（打印信息用）
Table<N, D, std::map<N, std::set<D>>> incomingtab;	记录所有的函数调用信息（处理exit节点的时候使用）
Table<N, D, Table<N, D, std::shared_ptr<EdgeFunction<V>>>> endsummarytab	存储已经计算出来的summay边。

propagate函数

从jumpFn中反向找能够从source处传递到target处的edge function，如果找不到就用alltop
- 如果是新的edgefuntion的话，说明可以直接从最初点到当前点添加一条边（比如存在a-b的边，然后当前的数据流传递是b-c，会添加一条a-c的边到jumpFunctions中）
- 然后会调用pathEdgeProcessingTask处理从source到target的pathEdge
- pathEdgeProcessingTask函数会根据target的类型做三种不同处理：
  - processExit
    - 首先找到该exit节点的所有caller
    - 根据这些caller传递过来的数据流值和从函数start节点传递到exit节点的数据流值，调用propagate将数据流值从callee的start处传递到ret site处
  - processNormalFlow
    - 处理正常的数据流传递，通过getNormalFlowFunction获取数据流传递函数，然后通过computeNormalFlowFunction计算传递后的数据流值
    - 计算出传递后的数据流值后再次调用propagate函数传递数据流值
  - processCall
    - 处理call语句的时候，首先看callee的函数是不是之前计算过，即通过getSummaryFlowFunction获取其summary flow function，如果存在就当作普通的数据流传递
    - 否则就处理函数实参和形参的映射（记录在incomingtab中）
    - 最后把能够直接传递到ret-site的数据流值进行传递

6.参考:

1. github

2. wiki

7.问题

Table<N, D, std::map<N, std::set<D>>> incomingtab

edges going along calls
incomingtab相关的方法接口

  std::map<N, std::set<D>> incoming(D d1, N sP) {
    return incomingtab.get(sP, d1);
  }

  void addIncoming(N sP, D d3, N n, D d2) {
    incomingtab.get(sP, d3)[n].insert(d2);
  }

使用 incomingtab

processCall 函数中，遇到 call 节点的并找到其 call 的函数的时候，就注册下 callee 的 startpoint 处的数据流值是来自哪里。

for (N sP : startPointsOf) {
  saveEdges(n, sP, d2, res, true);
  // for each result node of the call-flow function
  for (D d3 : res) {
    // create initial self-loop
    propagate(d3, sP, d3, EdgeIdentity<V>::getInstance(), n,
              false); // line 15
    // register the fact that <sp,d3> has an incoming edge from <n,d2>
    // line 15.1 of Naeem/Lhotak/Rodriguez
    addIncoming(sP, d3, n, d2);

processExit 函数中，处理 exit 节点的时候，先找到其对应的 start point，然后对于 start point，添加一条从 start point 处到 exit 的边（存储在 fSummaryReuse 中），然后通过去寻找其 incoming calls，计算 call 到 ret 的数据流值，最后将数据流值从 call 传递到 retsite。

D d1 = edge.factAtSource();
D d2 = edge.factAtTarget();
// for each of the method's start points, determine incoming calls
std::set<N> startPointsOf = icfg.getStartPointsOf(methodThatNeedsSummary);
std::map<N, std::set<D>> inc;
for (N sP : startPointsOf) {
  // line 21.1 of Naeem/Lhotak/Rodriguez
  // register end-summary
  addEndSummary(sP, d1, n, d2, f);
  for (auto entry : incoming(d1, sP)) {
    inc[entry.first] = std::set<D>{entry.second};
  }
}
printEndSummaryTab();
printIncomingTab();
// for each incoming call edge already processed
//(see processCall(..))
for (auto entry : inc) {
  // line 22
  N c = entry.first;
  // for each return site
  for (N retSiteC : icfg.getReturnSitesOfCallAt(c)) {
    // compute return-flow function
    std::shared_ptr<FlowFunction<D>> retFunction =
        cachedFlowEdgeFunctions.getRetFlowFunction(
            c, methodThatNeedsSummary, n, retSiteC);
    INC_COUNTER("FF Queries", 1, PAMM_SEVERITY_LEVEL::Full);
    // for each incoming-call value
    for (D d4 : entry.second) {
      std::set<D> targets =
          computeReturnFlowFunction(retFunction, d1, d2, c, entry.second);
      ADD_TO_HISTOGRAM("Data-flow facts", targets.size(), 1,
                       PAMM_SEVERITY_LEVEL::Full);

实参形参映射

以污点分析为例，在 getCallFlowFunction 函数中实现参数的映射：

shared_ptr<FlowFunction<IFDSTaintAnalysis::d_t>>
IFDSTaintAnalysis::getCallFlowFunction(IFDSTaintAnalysis::n_t callStmt,
                                       IFDSTaintAnalysis::m_t destMthd) {
  auto &lg = lg::get();
  LOG_IF_ENABLE(BOOST_LOG_SEV(lg, DEBUG)
                << "IFDSTaintAnalysis::getCallFlowFunction()");
  string FunctionName = cxx_demangle(destMthd->getName().str());
  // Check if a source or sink function is called:
  // We then can kill all data-flow facts not following the called function.
  // The respective taints or leaks are then generated in the corresponding
  // call to return flow function.
  if (SourceSinkFunctions.isSource(FunctionName) ||
      (SourceSinkFunctions.isSink(FunctionName))) {
    return KillAll<IFDSTaintAnalysis::d_t>::getInstance();
  }
  // Map the actual into the formal parameters
  if (llvm::isa<llvm::CallInst>(callStmt) ||
      llvm::isa<llvm::InvokeInst>(callStmt)) {
    return make_shared<MapFactsToCallee>(llvm::ImmutableCallSite(callStmt),
                                         destMthd);
  }
  // Pass everything else as identity
  return Identity<IFDSTaintAnalysis::d_t>::getInstance();
}

具体实现：

MapFactsToCallee::MapFactsToCallee(
    const llvm::ImmutableCallSite &callSite, const llvm::Function *destMthd,
    function<bool(const llvm::Value *)> predicate)
    : destMthd(destMthd), predicate(predicate) {
  // Set up the actual parameters
  for (unsigned idx = 0; idx < callSite.getNumArgOperands(); ++idx) {
    actuals.push_back(callSite.getArgOperand(idx));
  }
  // Set up the formal parameters
  for (unsigned idx = 0; idx < destMthd->arg_size(); ++idx) {
    formals.push_back(getNthFunctionArgument(destMthd, idx));
  }
}

在 processCall 函数中：

    D d1 = edge.factAtSource();
    N n = edge.getTarget(); // a call node; line 14...
    D d2 = edge.factAtTarget();
    std::shared_ptr<FlowFunction<D>> function =
            cachedFlowEdgeFunctions.getCallFlowFunction(n, sCalledProcN);
    std::set<D> res = computeCallFlowFunction(function, d1, d2);
    for (N sP : startPointsOf) {
          saveEdges(n, sP, d2, res, true);
          // for each result node of the call-flow function
          for (D d3 : res) {
            // create initial self-loop
            propagate(d3, sP, d3, EdgeIdentity<V>::getInstance(), n,
                      false); // line 15
            // register the fact that <sp,d3> has an incoming edge from <n,d2>
            // line 15.1 of Naeem/Lhotak/Rodriguez
            addIncoming(sP, d3, n, d2);

Phasar-IFDS框架学习笔记
IFDSAnalysis Phasar框架中，为了解决IFDS / IDE问题，主要分成了以下三个步骤： IFDS...
20160923 学习笔记 express框架
20160923 学习笔记 express框架安装 espress express框架文件相关知识 http:...
Android数据库框架 - LitePal
Android数据库框架 - LitePal学习笔记
【山外笔记-工具框架】SVN版本控制系统
【山外笔记-框架工具】SVN版本控制系统学习资料： 1、本文打印版下载地址：[山外笔记-框架工具笔记]SVN版本...
mapreduce框架详解
参考：hadoop 学习笔记：mapreduce框架详解 [toc] 总结 Mapreduce是一个计算框架，既然...
Flask学习笔记之第一个Flask程序
笔记大纲最近在学习Flask框架，本系列课程为学习笔记。大纲： Python环境创建Flask项目 Debug...
学习笔记-框架
感谢并不相识的朋友，把他的所学分享于我，在这里再次说声谢谢，真的非常感谢！日醒难以做到，日学可以坚持。今天学习的...
react学习笔记二----nodejs服务器搭建及异常处理
承接部分 react学习笔记一-----react前端框架的初步搭建说明本次学习笔记可能和react没有关系，...
Retrofit学习笔记
Retrofit学习笔记一、什么是Retrofit框架 Retrofit框架是Square公司出品的目前非常流行...
Android Chart 框架 MPAndroidChart
Android Chart 框架 MPAndroidChart 学习笔记1_框架初始化点击这里查看项目源码 1....

Phasar-IFDS框架学习笔记

IFDSAnalysis

1.phasar 主函数流程

AnalysisController

2. IDESolver(IFDS的基类)

subitInitalSeeds

propagate

pathEdgeProcessingTask

computeValues

setVal

valuePropagationTask

propagateValueAtStart

propagateValueAtCall

propagateValue

3.IFDS问题分析实例

IFDSTaintAnalysis

DefaultIFDSTabulationProblem

4. ICFG 实现

TODO

5. 代码结构

6.参考:

1. github

2. wiki

7.问题

相关文章

Phasar-IFDS框架学习笔记

20160923 学习笔记 express框架

Android数据库框架 - LitePal

【山外笔记-工具框架】SVN版本控制系统

mapreduce框架详解

Flask学习笔记之第一个Flask程序

学习笔记-框架

react学习笔记二----nodejs服务器搭建及异常处理

Retrofit学习笔记

Android Chart 框架 MPAndroidChart

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读