美文网首页
语法分析

语法分析

作者: 小小青蛙不怕风吹雨打 | 来源:发表于2017-03-10 17:30 被阅读0次

手写语法分析使用递归下降分析法算符优先分析法

BNF

语法分析对应上下文无关文法。定义时一般用BNF描述出来。lua的BNF大致如下

chunk ::= block

block ::= {stat [";"]}

stat ::=
     "do" block "end" |
     "while" exp "do" block "end" |
     "if" exp "then" block {"elseif" exp "then" block} ["else" block] "end" |
     "for" Name "=" exp "," exp ["," exp] "do" block "end" |
     "for" namelist "in" explist "do" block "end" |
     "function" funcname funcbody |
     "local" "function" Name funcbody |
     "local" namelist ["=" explist] |
     "return" [explist] |
     "break" |
     varlist "=" explist |
     Name {tableindex | funccall} funccall

namelist ::= Name {"," Name}

varlist ::= var {"," var}

var ::= Name [{tableindex | funccall} tableindex]

funcname ::= Name {"." Name} [":" Name]

funcbody ::= "(" [parlist] ")" block "end"

parlist ::= Name {"," Name} ["," "..."] | "..."

explist ::= {exp ","} exp

tableconstructor ::= "{" [fieldlist] "}"

fieldlist ::= field {fieldsep field} [fieldsep]

field ::= "[" exp "]" "=" exp | Name "=" exp | exp

fieldsep ::= "," | ";"

exp ::= mainexp | exp binop exp

mainexp ::= nil | false | true | Number | String |
     "..." | function | tableconstructor |
     prefixexp |
     unop exp|

function ::= "function" funcbody

prefixexp ::= (Name | "(" exp ")") {tableindex | funccall}

tableindex ::= "[" exp "]" | "." Name

funccall ::= args | ":" Name args

args ::=  "(" [explist] ")" | tableconstructor

binop ::= "+" | "-" | "*" | "/" | "^" | "%" | ".." |
     "<" | "<=" | ">" | ">=" | "==" | "~=" |
     "and" | "or"

unop ::= "-" | "not" | "#"

这个语法定义经过整理,便于手写解析器。

递归下降解析

递归下降分析法用于处理常规语句。上面的BNF,经过特殊设计,读取至多两个前缀单词就可以确定选什么解析函数,可以算是LL(2)文法了。
代码实现比较无脑,差不多是一个BNF语句对应一个解析函数和一个语法树结点类型。读取前缀单词,看看应该选择什么解析函数。

以stat为例伪代码

Block ParseBlock()
{
    var block = new Block();
    for (; ; )
    {
        SyntaxTree statement = null;
        var token_ahead = LookAhead();
        switch (token_ahead.m_type)
        {
            case (int)';':
                NextToken(); continue;
            case (int)TokenType.DO:
                statement = ParseDoStatement(); break;
            case (int)TokenType.WHILE:
                statement = ParseWhileStatement(); break;
            case (int)TokenType.IF:
                statement = ParseIfStatement(); break;
            case (int)TokenType.FOR:
                statement = ParseForStatement(); break;
            case (int)TokenType.FOREACH:
                statement = ParseForEachStatement(); break;
            case (int)TokenType.FUNCTION:
                statement = ParseFunctionStatement(); break;
            case (int)TokenType.LOCAL:
                statement = ParseLocalStatement(); break;
            case (int)TokenType.RETURN:
                statement = ParseReturnStatement(); break;
            case (int)TokenType.BREAK:
                statement = ParseBreakStatement(); break;
            case (int)TokenType.CONTINUE:
                statement = ParseContinueStatement(); break;
            default:
                statement = ParseOtherStatement();
                break;
        }
        if (statement == null)
            break;
        block.statements.Add(statement);
    }
    return block;
}

一个特殊的地方,最后两个stat,赋值语句和函数调用语句不容易做区分。
他们的开头结构类似,都是Name {tableindex | funccall},区别是函数调用语句的结尾是funccall,赋值语句不是。
也就是需要比较后缀结构,但是结构是一样的,用同一个函数解析,检查结果类型就行了。

bool IsVar(SyntaxTree t)
{
    return t is TableAccess || t is Terminator;
}

SyntaxTree ParseOtherStatement()
{
    // lua做了限制,其他语句只有两种,assign statement and func call
    SyntaxTree exp;
    if (LookAhead().m_type == (int)TokenType.NAME)
    {
        exp = ParsePrefixExp();
        if(IsVar(exp))
        {
            // assign statement
            var assign_statement = new AssignStatement();
            assign_statement.var_list.Add(exp);
            while(LookAhead().m_type != (int)'=')
            {
                if (NextToken().m_type != (int)',')
                    throw new ParserException("expect ',' to split var-list");
                if (LookAhead().m_type != (int)TokenType.NAME)
                    throw new ParserException("expect 'id' to start var");
                exp = ParsePrefixExp();
                if (!IsVar(exp))
                    throw new ParserException("expect var here");
                assign_statement.var_list.Add(exp);
            }
            NextToken();// skip '='
            assign_statement.exp_list = ParseExpList();
            return assign_statement;
        }
        else
        {
            Debug.Assert(exp is FuncCall);
            return exp;
        }
    }
    else
    {
        if (IsMainExp())
            throw new ParserException("incomplete statement");
        return null;
    }
}

算符优先分析法

上面的BNF有个地方是递归下降方法解析不了的,文法定义里存在递归结构,基本所有的语言都有这类表达式语句。

exp ::= mainexp | exp binop exp

这儿做了相当的简化,mainexp包含了"(" exp ")" | unop exp,简化后表达式就是最简单的二元表达式了。
二元表达式是最基本的表达式了,算符优先分析法可以很好的解析这种语法结构。
文字简述过程:
设置两个栈,操作数栈和操作符栈。
扫描表达式,遇到操作数直接入栈,遇到操作符时,做下优先级判断确定下一步操作。
如果操作符优先级高于栈顶操作符,则入栈;否则,弹出栈顶操作符并从操作数栈里弹出两个操作数,组合成二元表达式,当成新的操作数入栈,并继续这个过程,直到当前操作符入栈。
写代码时有个特别的技巧,可以用函数调用栈代替操作数栈和操作符栈。

        SyntaxTree ParseExp(int left_priority = 0)
        {
            var exp = ParseMainExp();
            while(true)
            {
                int right_priority = GetOpPriority(LookAhead());
                if (left_priority < right_priority ||(left_priority == right_priority && IsRightAssociation(LookAhead())))
                {
                    // C++的函数参数执行顺序没有明确定义,方便起见,不在函数参数里搞出两个有依赖的函数调用,方便往C++里迁移
                    var op = NextToken();
                    exp = new BinaryExpression(exp, op, ParseExp(right_priority));
                }
                else
                {
                    return exp;
                }
            }
        }

相关文章

网友评论

      本文标题:语法分析

      本文链接:https://www.haomeiwen.com/subject/xplogttx.html