一次正交设计之旅

作者: 刘光聪 | 来源:发表于2019-04-29 19:17 被阅读0次

遗留代码

假如存在一段遗留代码，使用vector<vector<int>>表示了一个复杂的领域对象。程序包括计数与缓存两种基本特性，其中计数因规则变化而变化。

static vector<vector<int>> getFlaggedCells(vector<vector<int>>& board) {
  vector<vector<int>> result;
  for(auto x : board) {
    if(x[0] == 4) {   
      result.add(x);
    }   
  }
  return result;
}

int countFlaggedCells(vector<vector<int>>& board) {
  vector<vector<int>> result = getFlaggedCells(board);
  return result.size();
}

static vector<vector<int>> getUnflaggedCells(vector<vector<int>>& board) {
  vector<vector<int>> result;
  for(auto x : board) {
    if(x[0] == 3) {   
      result.add(x);
    }   
  }
  return result;
}

int countUnflaggedCells(vector<vector<int>>& board) {
  vector<vector<int>> result = getUnflaggedCells(board);
  return result.size();
}

static vector<vector<int>> getAliveCells(vector<vector<int>>& board) {
  vector<vector<int>> result;
  for(auto x : board) {
    if(x[0] == 2) {   
      result.add(x);
    }   
  }
  return result;
}

int countAliveCells(vector<vector<int>>& board) {
  vector<vector<int>> result = getAliveCells(board);
  return result.size();
}

// global variable
vector<vector<int>> store;

void saveAliveCells(vector<vector<int>>& board) {
  store = getAliveCells(board);
}

坏味道

很明显，这段代码存在很多的坏味道。

多级容器：vector<vector<int>>的复杂语法令人抓狂；
重复设计：函数名getFlaggedCells, getUnflaggedCells, getAliveCells之间存在明显的重复代码；
全局变量：草率地将store实现为全局变量，有待进一步斟酌；
幻数：0, 2, 3, 4所代表的具体业务含义，有待进一步明确；
性能：result作为中间结果，可能存在无畏的拷贝开销。

重构不仅仅涉及命名，函数提取等基本原子操作，指导重构背后的逻辑更多地是软件设计本身。例如，封装不稳定的变化，隔离客户与实现间的耦合等等。

多级容器

在遗留系统中传递或返回多级的容器早已司空见惯。复杂的数据结构定义本身就非常晦涩，而其处理代码也往往互相交织在一起，不仅难以理解，还极其脆弱。因为其中任何一个级别的容器发生变化，都会给整个数据结构的处理代码带来影响。

对于多级容器，其处理方法非常简单，将每一级容器都进行封装。封装数据结构的实现细节，并暴露更稳定的API，可以使得客户的代码更加稳定。用户通过扩展算子的方式来获取或操作数据子集，毕竟数据子集与客户的关系更加紧密。

消除幻数

将int重构为枚举类型State，消除幻数。目前不能保证State将来可以被抽象为更具有弹性的类；但是，此处实现为枚举类型已经足够。

enum State {
  INIT, SELECTED, ALIVE, FLAGGED
};

封装第一级容器

提取Cell，封装第一级容器。提取一个master的查询接口，顺带消除幻数0，使其具有更加明确的业务含义。

#include <vector>

struct Cell {
  bool flagged() const {
    return master() == FLAGGED; 
  }

  bool alive() const {
    return master() == ALIVE; 
  }

private:
  State master() const {
    return states.front();
  }

private:
  std::vector<State> states;
};

封装第二级容器

提取GameBoard，封装第二级容器。

struct GameBoard {
  const std::vector<Cell>& getCells() const {
    return cells;
  }

private:
  std::vector<Cell> cells;
};

此处，暂且将cells直接返回给客户，下文再仔细观察客户将遭受哪些困惑。

客户实现

计算"已标记"的单元格的数量时，实现非常简单。

int countFlaggedCells(const GameBoard& board) {
    int num = 0;
    for (auto& cell : board.getCells()) {
      if (cell.flagged()) {
        num++;
      }
    }
    return num;
}

但是，当用户计算"未标记"的单元数量时，countUnflaggedCells与countFlaggedCells之间遭遇重复设计的苦恼。。

int countUnflaggedCells(const GameBoard& board) {
    int num = 0;
    for (auto& cell : board.getCells()) {
      if (!cell.flagged()) {
        num++;
      }
    }
    return num;
}

参数化设计

为了消除两者之间的重复，可以提取一个公共的函数。应用“参数化设计”，消除两者之间的重复。

namespace {
int count(const GameBoard& board, bool flagged) {
    int num = 0;
    for (auto& cell : board.getCells()) {
      if (!cell.flagged() == flagged) {
        num++;
      }
    }
    return num;
}
} // end namespace

int countFlaggedCells(const GameBoard& board) {
  return count(board, true);
}

int countUnflaggedCells(const GameBoard& board) {
  return count(board, false);
}

传递差异化的true/false而消除重复，可能有损程序的可读性。毕竟从用户角度看，区别true/false的确不够清晰。但是，按照“简单设计”的四个基本原则，“消除重复”的优先级，要高于“可读性”的优先级；按照这个原则，自然不必纠结。

简单设计原则，它们的优先级和重要程度依次降低。

通过测试

消除重复

易于理解

没有冗余

重复再现

按照业务需求，为了实现应用程序的高容错性，应用程序需要暂存所有“保活”的单元格，当程序崩溃时可以据此恢复GameBoard的状态。

提取CellSaver，消除遗留系统中的全局变量。事实上，按照DDD(领域驱动设计)的设计思维，最终CellSaver的实例需要聚集在与应用程序生命周期一致的高层对象上，在此不再冗述。

struct CellSaver {
  void save(const GameBoard& board) {
    for (Cell& cell : board.getCells()) {
      if (cell.alive()) {
        cache.push_back(cell);
      }
    }
  }

private:
  std::vector<Cells> cache;
};

不幸的是，CellSaver::save与上文匿名命名空间中的count函数之间存在重复逻辑，程序再次引入了重复设计的坏味道。

搬迁职责

为了更好地观察两者之间的重复，及其减少用户调用GameBoard计数逻辑的复杂度，将遍历的逻辑搬迁回GameBoard，并试图在其内部消除它们之间的重复实现。

好莱坞原则：Tell, Don't Ask

struct GameBoard {
  int countFlaggedCells() const {
    return count(true);
  }

  int countUnflaggedCells() const {
    return count(false);
  }

  std::vector<Cell> getAliveCells() const {
    std::vector<Cell> result;
    for (Cell& cell : cells) {
      if (cell.alive()) {
        result.push_back(cell);
      }
    }
    return result;
  }

private:
  int count(bool flagged) const {
    int num;
    for (Cell& cell : cells) {
      if (cell.flagged() == flagged) {
        num++;
      }
    }
    return num;
  }

private:
  std::vector<Cell> cells;
};

经过一系列职责搬迁的重构过程，用户获取计数的功能，将直接由GameBoard直接提供。但是，count与getAliveCells之间依然存在重复逻辑，但重复的差异更易于观察和对比。

分离变化

观察两者之间的重复逻辑，程序存在2个变化的方向。

线性遍历的算法实现；
用户逻辑：计数，缓存。

提取CellCollector抽象接口，隔离GameBoard遍历算法与其客户逻辑(计数，缓存)之间的耦合。GameBoard作为Cell的生产者，客户作为Cell的消费者，通过CellCollector将它们之间的逻辑相互隔离，使其它们可以相互独立地变化，相互正交。

struct CellCollector {
  virtual void add(const Cell&) = 0;
  virtual ~CellCollector() {}
};

struct GameBoard {
  void list(CellCollector& col) const {
    for (auto& cell : cells) {
      col.add(cell);
    }
  }

private:
  std::vector<Cell> cells;
};

客户按照CellCollector的契约，扩展定义消费单个Cell的实现逻辑。

计数逻辑

namespace {
struct FlaggedCellCounter : CellCollector {
  FlaggedCellCounter(bool flagged) : flagged(flagged) {
  }

  int get() const {
    return num;
  }

private:
  void add(const Cell& cell) override {
    if (cell.flagged() == flagged) {
      num++;
    }
  }

  int num = 0;
};
} // end namespace 

// private, 在此忽略头文件的声明
inline int GameBoard::count(bool flagged) {
  FlaggedCellCounter counter(flagged);
  list(counter);
  return counter.get(); 
}

int GameBoard::countFlaggedCells() const {
  return count(true); 
}

int GameBoard::countUnflaggedCells() const {
  return count(false); 
}

但是，FlaggedCellCounter实现的逻辑与“已标记/未标记”强度耦合。为了实现“保活”的计数，此时必然导致计数逻辑的重复设计。

namespace {
struct AliveCellCounter : CellCollector {
  int get() const {
    return num;
  }

private:
  void add(const Cell& cell) override {
    if (cell.alive()) {
      num++;
    }
  }

  int num = 0;
};
} // end namespace 

int GameBoard::countAliveCells() const {
  AliveCellCounter counter;
  list(counter);
  return counter.get(); 
}

显然，公开的countAliveCells与私有的count之间存在重复设计，及其AliveCellCounter与FlaggedCellCounter的结构性重复。

更稳定的抽象

仔细观察AliveCellCounter与FlaggedCellCounter之间的结构性重复，它们都存在相同的计数规则和结果返回的逻辑，仅仅计数的前置谓词存在差异。因此，使用泛型提取前置谓词，使得谓词逻辑与Counter的具体实现相互解耦。

编译时多态：C++的模板技术是一种典型的“编译时多态”技术，遵循朴素的“鸭子编程”的设计思维。

namespace {

template <typename Pred>
struct CellCounter : CellCollector {
  CellCounter(Pred pred) : pred(pred) {
  }

  int get() const {
    return num;
  }

private:
  void add(const Cell& cell) override {
    if (pred(cell)) {
      num++;
    }
  }

  int  num = 0;
  Pred pred;
};
} // end namespace 

// private, 此处略去头文件中的声明。
template <typename Pred>
inline int GameBoard::count(Pred pred) const {
  CellCounter counter(pred);
  list(counter);
  return counter.get();
}

int GameBoard::countAliveCells() const {
  return count([](auto& cell) {
    return cell.alive();
  }); 
}

int GameBoard::countUnflaggedCells() const {
  return count([](auto& cell) {
    return !cell.flagged();
  }); 
}

int GameBoard::countAliveCells() const {
  return count([](auto& cell) {
    return cell.flagged();
  }); 
}

缓存逻辑

最后，缓存逻辑搬迁至客户代码，因为其目标存储在客户侧。

private继承：在CellStore::save内部，CellStore与CellCollector之间满足李氏替换原则，堪称C++的必杀技之一。反观诸如Java之流，不得不声明为public继承，无论是从逻辑上，还是语义上都存在明显的缺陷。

struct CellStore : private CellCollector {
  void save(const GameBoard& board) {
    board.list(*this);
  }

private:
  void add(const Cell& cell) override {
    if (cell.alive()) {
      cache.push_back(cell);
    }
  }

private:
  std::vector<Cell> cache;
};

总结

回顾既有的遗留系统，存在明显的重复设计、全局变量的依赖性、晦涩的多级容器的复杂数据结构、计数与缓存逻辑不能复用、及其诸如幻数、命名等低级编程水平。

应用封装技术，将复杂的多级容器的数据结构分拆到GameBoard, Cell等领域对象，使用值对象State表示Cell的状态逻辑。最后，应用“分离关注点”，将遍历算法搬迁回GameBoard实现代码的高度复用。

在客户端，为了降低客户调用计数的逻辑，应用好莱坞原则，搬迁计数的逻辑到GameBoard中；与之相反，缓存功能因为目标存储由客户自身维护，应用private继承，扩展实现缓存的功能。