美文网首页
CS536 Reading Notes: The Scanner

CS536 Reading Notes: The Scanner

作者: Nahnuj | 来源:发表于2014-09-10 06:32 被阅读26次

    The Scanner

    Overview

    • use a scanner generator such as lex or jlex
    • The input to a scanner includes one regular expression for each token

    Finite State Machine

    • S is the start state, each FSM has exactly one.
    • A is a final state, drawn using a double circle. A FSM may have more than one final state.
    • The finite-state machine stops when it gets stuck or when it has consumed all of the input characters.

    An input string is accepted by a FSM if

    * The entire string is consumed, and
    * The machine ends in a final state.
    

    Formal Definition

    An FSM is a 5-tuple: (Q,Σ,δ,q,F)
    • Q is a finite set of states ({S,A,B} in the above example).
    • Σ (an uppercase sigma) is the alphabet of the machine, a finite set of characters that label the edges ({+,−,0,1,...,9} in the above example).
    • q is the start state, an element of Q (S in the above example).
    • F is the set of final states, a subset of Q ({B} in the above example).
    • δ is the state transition relation: Q×Σ→Q (i.e., it is a function that takes two arguments -- a state in Q and a character in Σ -- and returns a state in Q).

    Regular Expressions

    Pay attention to the precedence of operators.
    "|" < "." < "*"
    

    for example: letter(letter|digit)* is different with
    letter.letter|digit*, which is (letter.letter)|(digit*).

    Using Regular Expressions and FSMs to Define a Scanner

    There is a theorem that says that for every regular expression, there is a finite-state machine that defines the same language, and vice versa.

    相关文章

      网友评论

          本文标题:CS536 Reading Notes: The Scanner

          本文链接:https://www.haomeiwen.com/subject/ligytttx.html