美文网首页
The Definitive ANTLR 4 Reference

The Definitive ANTLR 4 Reference

作者: letsgetdrunk | 来源:发表于2019-03-27 17:14 被阅读0次

    Chapter 2 The Big Picture

    2.1 Let's Get Meta!

    • interpreter
      if an application computes or “executes” sentences, we call that
      application an interpreter. Examples include calculators, configuration file
      readers, and Python interpreters.
    • translator
      If we’re converting sentences from one language to another, we call that application a translator. Examples include Java to C# converters and compilers.
    • parsers or syntax analyzers
      Programs that recognize (identify the various components in a phrase and differentiate it from other phrases) languages.
    • Two stages of parsing
      • lexical analysis or tokenizing (lexer)
        The process of grouping characters into words or symbols (tokens)
      • tokens
        Tokens consist of at least two pieces of information: the token type (identifying the lexical structure) and the text matched for that token by the lexer.
      • parser
        The second stage is the actual parser and feeds off of these tokens to recognize
        the sentence structure.
        By default, ANTLR-generated parsers build a data structure called a parse tree or syntax
        tree
        that records how the parser recognized the structure of the input sentence
        and its component phrases.
    Example

    We would specify phrase structure with a set of rules. Here is the grammar rule that corresponds to the first level of the assign subtree from the diagram":
    assign : ID '=' expr ';' ; // match an assignment statement like "sp = 100;"

    2.2 Implementing Parsers (之后有时间重新读一遍)

    The ANTLR tool generates recursive-descent parsers from grammar rules.

    A more general term for this kind of parsing is top-down parsing; recursive-descent parsers are just one kind of top-down parser implementation.

    The stat symbol means calling method stat() for the parse tree. To get an idea of what recursive-descent parsers look like, here’s the (slightly cleaned up) method that ANTLR generates for rule assign:

    // assign : ID '=' expr ';' ;
    void assign() { // method generated from rule assign
    match(ID); // compare ID to current input symbol then consume
    match('=');
    expr(); // match an expression by calling expr()
    match(';');
    }
    

    For example, the stat rule that invokes assign
    likely has a list of other kinds of statements.

    /** Match any kind of statement starting at the current input position */
    stat: assign // First alternative ('|' is alternative separator)
    | ifstat // Second alternative
    | whilestat
    ...
    ;
    

    A parsing rule for stat looks like a switch.

    void stat() {
    switch ( «current input token» ) {
    CASE ID : assign(); break;
    CASE IF : ifstat(); break; // IF is token type for keyword 'if'
    CASE WHILE : whilestat(); break;
    ...
    default : «raise no viable alternative exception»
    }
    }
    

    2.3 You Can't Put Too Much Water into a Nuclear Reactor (ambiguous phrase)

    We have to provide unambiguous grammars.

    • duplicate is an obvious ambiguities
    • some more subtle ambiguities:
    stat: expr ';' // expression statement
    | ID '(' ')' ';' // function call statement
    ;
    expr: ID '(' ')'
    | INT
    ;
    

    Here are the two interpretations of input f(); starting in rule stat:


    Ambiguities can occur in the lexer as well as the parser, but ANTLR resolves
    them so the rules behave naturally

    BEGIN : 'begin' ; // match b-e-g-i-n sequence; ambiguity resolves to BEGIN
    ID : [a-z]+ ; // match one or more of any lowercase letter
    

    that input beginner would match only to rule ID. The lexer would not match beginner as BEGIN followed by an ID matching input ner.

    2.4 Building Language Applications Using Parse Trees

    相关文章

      网友评论

          本文标题:The Definitive ANTLR 4 Reference

          本文链接:https://www.haomeiwen.com/subject/wwizvqtx.html