【转向JavaScript系列】AST in Modern Ja

作者: ronniegong | 来源:发表于2017-12-20 15:08 被阅读439次

【转向JavaScript系列】AST in Modern Ja
前端的这一堆工具到底是在干嘛？
【转向JavaScript系列】深入理解Web Worker
不会学的AST
javascript AST
AST（抽象语法树）
JavaScript30 学习笔记导航
《Webpack》
javascript事件的异步机制
Modern PHP 笔记（二）：良好实践

What is AST

什么是AST?AST是Abstract Syntax Tree（抽象语法树）的缩写。
传说中的程序员三大浪漫是编译原理、图形学、操作系统，不把AST玩转，显得逼格不够，而本文目标就是为你揭示AST在现代化JavaScript项目中的应用。

var a = 42
function addA(d){
  return a + d;
}

按照语法规则书写的代码，是用来让开发者可阅读、可理解的。对编译器等工具来讲，它可以理解的就是抽象语法树了，在网站javascript-ast里，可以直观看到由源码生成的图形化语法树

生成抽象语法树需要经过两个阶段：

分词（tokenize）
语义分析(parse)

其中，分词是将源码source code分割成语法单元，语义分析是在分词结果之上分析这些语法单元之间的关系。

以var a = 42这句代码为例，简单理解，可以得到下面分词结果

[
    {type:'identifier',value:'var'},
    {type:'whitespace',value:' '},    
    {type:'identifier',value:'a'},
    {type:'whitespace',value:' '},
    {type:'operator',value:'='},
    {type:'whitespace',value:' '},
    {type:'num',value:'42'},
    {type:'sep',value:';'}
]

实际使用babylon6解析这一代码时，分词结果为

生成的抽象语法树为

{
    "type":"Program",
    "body":[
        {
            "type":"VariableDeclaration",
            "kind":"var",
            "declarations":{
                "type":"VariableDeclarator",
                "id":{
                    "type":"Identifier",
                    "value":"a"
                },
                "init":{
                    "type":"Literal",
                    "value":42
                }
            }
        }
    ]
}

社区中有各种AST parser实现

早期有uglifyjs和esprima
espree, 基于esprima，用于eslint,Introducing Espree, an Esprima alternative
acorn,号称是相对于esprima性能更优， Acorn: yet another JavaScript parser
babylon,出自acorn,用于babel
babel-eslint,babel团队维护的，用于配合使用ESLint, GitHub - babel/babel-eslint: ESLint using Babel as the parser.

AST in ESLint

ESLint是一个用来检查和报告JavaScript编写规范的插件化工具，通过配置规则来规范代码，以no-cond-assign规则为例，启用这一规则时，代码中不允许在条件语句中赋值，这一规则可以避免在条件语句中，错误的将判断写成赋值

//check ths user's job title
if(user.jobTitle = "manager"){
  user.jobTitle is now incorrect
}

ESLint的检查基于AST，除了这些内置规则外，ESLint为我们提供了API，使得我们可以利用源代码生成的AST，开发自定义插件和自定义规则。

module.exports = {
    rules: {
        "var-length": {
            create: function (context) {
                //规则实现
            }
        }
    }
};

自定义规则插件的结构如上，在create方法中，我们可以定义我们关注的语法单元类型并且实现相关的规则逻辑，ESLint会在遍历语法树时，进入对应的单元类型时，执行我们的检查逻辑。

比如我们要实现一条规则，要求赋值语句中，变量名长度大于两位

module.exports = {
    rules: {
        "var-length": {
            create: function (context) {
                return {
                    VariableDeclarator: node => {
                        if (node.id.name.length < 2) {
                            context.report(node, 'Variable names should be longer than 1 character');
                        }
                    }
                };
            }
        }
    }
};

为这一插件编写package.json

{
    "name": "eslint-plugin-my-eslist-plugin",
    "version": "0.0.1",
    "main": "index.js",
    "devDependencies": {
        "eslint": "~2.6.0"
    },
    "engines": {
        "node": ">=0.10.0"
    }
}

在项目中使用时，通过npm安装依赖后，在配置中启用插件和对应规则

"plugins": [
    "my-eslint-plugin"
]

"rules": {
    "my-eslint-plugin/var-length": "warn"
}

通过这些配置，便可以使用上述自定义插件。

有时我们不想要发布新的插件，而仅想编写本地自定义规则，这时我们可以通过自定义规则来实现。自定义规则与插件结构大致相同，如下是一个自定义规则，禁止在代码中使用console的方法调用。

const disallowedMethods = ["log", "info", "warn", "error", "dir"];
module.exports = {
    meta: {
        docs: {
            description: "Disallow use of console",
            category: "Best Practices",
            recommended: true
        }
    },
    create(context) {
        return {
            Identifier(node) {
                const isConsoleCall = looksLike(node, {
                    name: "console",
                    parent: {
                        type: "MemberExpression",
                        property: {
                            name: val => disallowedMethods.includes(val)
                        }
                    }
                });
                // find the identifier with name 'console'
                if (!isConsoleCall) {
                    return;
                }

                context.report({
                    node,
                    message: "Using console is not allowed"
                });
            }
        };
    }
};

AST in Babel

Babel是为使用下一代JavaScript语法特性来开发而存在的编译工具，最初这个项目名为6to5，意为将ES6语法转换为ES5。发展到现在，Babel已经形成了一个强大的生态。

业界大佬的评价：Babel is the new jQuery

Babel的工作过程经过三个阶段，parse、transform、generate，具体来说，如下图所示，在parse阶段，使用babylon库将源代码转换为AST，在transform阶段，利用各种插件进行代码转换，如图中的JSX transform将React JSX转换为plain object，在generator阶段，再利用代码生成工具，将AST转换成代码。

Babel为我们提供了API让我们可以对代码进行AST转换并且进行各种操作

import * as babylon from "babylon";
import traverse from "babel-traverse";
import generate from "babel-generator";

const code = `function square(n) {
    return n * n;
}`

const ast = babylon.parse(code);
traverse(ast,{
    enter(path){
        if(path.node.type === 'Identifier' && path.node.name === 'n'){
            path.node.name = 'x'
        }
    }
})
generate(ast,{},code)

直接使用这些API的场景倒不多，项目中经常用到的，是各种Babel插件，比如 babel-plugin-transform-remove-console插件，可以去除代码中所有对console的方法调用，主要代码如下

module.exports = function({ types: t }) {
  return {
    name: "transform-remove-console",
    visitor: {
      CallExpression(path, state) {
        const callee = path.get("callee");

        if (!callee.isMemberExpression()) return;

        if (isIncludedConsole(callee, state.opts.exclude)) {
          // console.log()
          if (path.parentPath.isExpressionStatement()) {
            path.remove();
          } else {
          //var a = console.log()
            path.replaceWith(createVoid0());
          }
        } else if (isIncludedConsoleBind(callee, state.opts.exclude)) {
          // console.log.bind()
          path.replaceWith(createNoop());
        }
      },
      MemberExpression: {
        exit(path, state) {
          if (
            isIncludedConsole(path, state.opts.exclude) &&
            !path.parentPath.isMemberExpression()
          ) {
          //console.log = func
            if (
              path.parentPath.isAssignmentExpression() &&
              path.parentKey === "left"
            ) {
              path.parentPath.get("right").replaceWith(createNoop());
            } else {
            //var a = console.log
              path.replaceWith(createNoop());
            }
          }
        }
      }
    }
  };

使用这一插件，可以将程序中如下调用进行转换

console.log()
var a = console.log()
console.log.bind()
var b = console.log
console.log = func

//output
var a = void 0
(function(){})
var b = function(){}
console.log = function(){}

上述Babel插件的工作方式与前述的ESLint自定义插件/规则类似，工具在遍历源码生成的AST时，根据我们指定的节点类型进行对应的检查。

在我们开发插件时，是如何确定代码AST树形结构呢？可以利用AST explorer方便的查看源码生成的对应AST结构。

AST in Codemod

Codemod可以用来帮助你在一个大规模代码库中，自动化修改你的代码。
jscodeshift是一个运行codemods的JavaScript工具，主要依赖于recast和ast-types两个工具库。recast作为JavaScript parser提供AST接口，ast-types提供类型定义。

利用jscodeshift接口，完成前面类似功能，将代码中对console的方法调用代码删除

export default (fileInfo,api)=>{
    const j = api.jscodeshift;
    
    const root = j(fileInfo.source);
    
    const callExpressions = root.find(j.CallExpression,{
        callee:{
            type:'MemberExpression',
            object:{
                type:'Identifier',
                name:'console'
            }
        }
    });
    
    callExpressions.remove();
    
    return root.toSource();
}

如果想要代码看起来更加简洁，也可以使用链式API调用

export default (fileInfo,api)=>{
    const j = api.jscodeshift;

    return j(fileInfo.source)
        .find(j.CallExpression,{
            callee:{
                type:'MemberExpression',
                object:{
                    type:'Identifier',
                    name:'console'
                }
            }
        })
        .remove()
        .toSource();
}

在了解了jscodeshift之后，头脑中立即出现了一个疑问，就是我们为什么需要jscodeshift呢？利用AST进行代码转换，Babel不是已经完全搞定了吗？

带着这个问题进行一番搜索，发现Babel团队这处提交说明babel-core: add options for different parser/generator。

前文提到，Babel处理流程中包括了parse、transform和generation三个步骤。在生成代码的阶段，Babel不关心生成代码的格式，因为生成的编译过的代码目标不是让开发者阅读的，而是生成到发布目录供运行的，这个过程一般还会对代码进行压缩处理。

这一次过程在使用Babel命令时也有体现，我们一般使用的命令形式为

babel src -d dist

而在上述场景中，我们的目标是在代码库中，对源码进行处理，这份经过处理的代码仍需是可读的，我们仍要在这份代码上进行开发，这一过程如果用Babel命令来体现，实际是这样的过程

babel src -d src

在这样的过程中，我们会检查转换脚本对源代码到底做了哪些变更，来确认我们的转换正确性。这就需要这一个差异结果是可读的，而直接使用Babel完成上述转换时，使用git diff输出差异结果时，这份差异结果是混乱不可读的。

基于这个需求，Babel团队现在允许通过配置自定义parser和generator

{
    "plugins":[
        "./plugins.js"
    ],
    "parserOpts":{
        "parser":"recast"
    },
    "generatorOpts":{
        "generator":"recast"
    }
}

假设我们有如下代码，我们通过脚本，将代码中import模式进行修改

import fs, {readFile} from 'fs'
import {resolve} from 'path'
import cp from 'child_process'

resolve(__dirname, './thing')

readFile('./thing.js', 'utf8', (err, string) => {
  console.log(string)
})

fs.readFile('./other-thing', 'utf8', (err, string) => {
  const resolve = string => string
  console.log(resolve())
})

cp.execSync('echo "hi"')

//转换为
import fs from 'fs';
import _path from 'path';
import cp from 'child_process'

_path.resolve(__dirname, './thing')

fs.readFile('./thing.js', 'utf8', (err, string) => {
  console.log(string)
})

fs.readFile('./other-thing', 'utf8', (err, string) => {
  const resolve = string => string
  console.log(resolve())
})

cp.execSync('echo "hi"')

完成这一转换的plugin.js为

module.exports = function(babel) {
  const { types: t } = babel
  // could just use https://www.npmjs.com/package/is-builtin-module
  const nodeModules = [
    'fs', 'path', 'child_process',
  ]

  return {
    name: 'node-esmodule', // not required
    visitor: {
      ImportDeclaration(path) {
        const specifiers = []
        let defaultSpecifier
        path.get('specifiers').forEach(specifier => {
          if (t.isImportSpecifier(specifier)) {
            specifiers.push(specifier)
          } else {
            defaultSpecifier = specifier
          }
        })
        const {node: {value: source}} = path.get('source')
        if (!specifiers.length || !nodeModules.includes(source)) {
          return
        }
        let memberObjectNameIdentifier
        if (defaultSpecifier) {
          memberObjectNameIdentifier = defaultSpecifier.node.local
        } else {
          memberObjectNameIdentifier = path.scope.generateUidIdentifier(source)
          path.node.specifiers.push(t.importDefaultSpecifier(memberObjectNameIdentifier))
        }
        specifiers.forEach(specifier => {
          const {node: {imported: {name}}} = specifier
          const {referencePaths} = specifier.scope.getBinding(name)
          referencePaths.forEach(refPath => {
            refPath.replaceWith(
              t.memberExpression(memberObjectNameIdentifier, t.identifier(name))
            )
          })
          specifier.remove()
        })
      }
    }
  }
}

删除和加上parserOpts和generatorOpts设置允许两次，使用git diff命令输出结果，可以看出明显的差异

使用recast

不使用recast

AST in Webpack

Webpack是一个JavaScript生态的打包工具，其打出bundle结构是一个IIFE(立即执行函数)

(function(module){})([function(){},function(){}]);

Webpack在打包流程中也需要AST的支持，它借助acorn库解析源码，生成AST，提取模块依赖关系

在各类打包工具中，由Rollup提出，Webpack目前也提供支持的一个特性是treeshaking。treeshaking可以使得打包输出结果中，去除没有引用的模块，有效减少包的体积。

//math.js
export {doMath, sayMath}

const add = (a, b) => a + b
const subtract = (a, b) => a - b
const divide = (a, b) => a / b
const multiply = (a, b) => a * b

function doMath(a, b, operation) {
  switch (operation) {
    case 'add':
      return add(a, b)
    case 'subtract':
      return subtract(a, b)
    case 'divide':
      return divide(a, b)
    case 'multiply':
      return multiply(a, b)
    default:
      throw new Error(`Unsupported operation: ${operation}`)
  }
}

function sayMath() {
  return 'MATH!'
}

//main.js
import {doMath}
doMath(2, 3, 'multiply') // 6

上述代码中，math.js输出doMath,sayMath方法，main.js中仅引用doMath方法，采用Webpack treeshaking特性，再加上uglify的支持，在输出的bundle文件中，可以去掉sayMath相关代码，输出的math.js形如

export {doMath}

const add = (a, b) => a + b
const subtract = (a, b) => a - b
const divide = (a, b) => a / b
const multiply = (a, b) => a * b

function doMath(a, b, operation) {
  switch (operation) {
    case 'add':
      return add(a, b)
    case 'subtract':
      return subtract(a, b)
    case 'divide':
      return divide(a, b)
    case 'multiply':
      return multiply(a, b)
    default:
      throw new Error(`Unsupported operation: ${operation}`)
  }
}

进一步分析main.js中的调用，doMath(2, 3, 'multiply') 调用仅会执行doMath的一个分支，math.js中定义的一些help方法如add,subtract,divide实际是不需要的，理论上，math.js最优可以被减少为

export {doMath}

const multiply = (a, b) => a * b

function doMath(a, b) {
  return multiply(a, b)
}

基于AST，进行更为完善的代码覆盖率分析，应当可以实现上述效果，这里只是一个想法，没有具体的实践。参考Faster JavaScript with SliceJS

参考文章

【转向JavaScript系列】AST in Modern Ja
What is AST 什么是AST?AST是Abstract Syntax Tree（抽象语法树）的缩写。传说中...
前端的这一堆工具到底是在干嘛？
Modern JavaScript Explained For Dinosaurs 原文：Modern JavaS...
【转向JavaScript系列】深入理解Web Worker
【转向JavaScript系列】深入理解Web Worker 有别于Java/C#等编程语言，Javascript...
不会学的AST
小白总结的译文，出自《AST for JavaScript developers》为什么要谈AST 如果你查看目...
javascript AST
https://github.com/jquery/esprima 从JavaScript源代码形成AST htt...
AST（抽象语法树）
原文出自《AST for JavaScript developers》https://baijiahao....
JavaScript30 学习笔记导航
JavaScript30 JavaScript30 是 Wes Bos 制作的一系列教程，有30个例子，都以纯Ja...
《Webpack》
webpack is a module bundler for modern JavaScript applica...
javascript事件的异步机制
APRIL 20, 2015 Events, Concurrency and JavaScript Modern ...
Modern PHP 笔记（二）：良好实践
系列笔记：Modern PHP 笔记（一）：语言特性Modern PHP 笔记（二）：良好实践Modern PHP...