美文网首页
Java Lexcial Structure

Java Lexcial Structure

作者: Yuan2021 | 来源:发表于2017-08-24 22:57 被阅读53次

    Lexical analysis

    lexical analysis is the process of translation from a raw Unicode character stream to a sequence of tokens. The tokens are the terminal symbols of the syntactic grammar. A program that perform lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. In detail, there are three steps in turn :

    1. translate all Unicode escapes to the corresponding Unicode character, for example, translate \n to 0A
    2. recognize line terminators to separate the stream resulting from step 1 to the input characters and terminators, this step will save line numbers of source code so that you can debug your program by some error message with corresponding line number
    3. split result from step 2 to white space (including line terminator), comments and tokens , and then tokens are reserved

    Tokens

    Token is a very important concept in compiler. Java tokens contain :

    • Identifier
    • Keyword
    • Literal
    • Separator
    • Operator

    The Tokens are non-terminal symbols of the lexical grammar with characters as terminal symbols, like this :

    BooleanLiteral:
           true
           false
    

    but the terminal symbols of the syntactic grammar. A parser which analyze the syntax of programming language uses token stream as input, and abstract syntax tree (AST) as output.

    References

    1. https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html
    2. https://en.wikipedia.org/wiki/Lexical_analysis

    相关文章

      网友评论

          本文标题:Java Lexcial Structure

          本文链接:https://www.haomeiwen.com/subject/cjsydxtx.html