美文网首页
如何描述一种语言?

如何描述一种语言?

作者: AlphaHinex | 来源:发表于2022-04-24 09:05 被阅读0次

    原文地址:https://alphahinex.github.io/2022/04/24/bnf-ebnf/

    cover

    description: "BNF 和 EBNF 了解一下"
    date: 2022.04.24 08:26
    categories:
    - Others
    tags: [Compiler, Parser]
    keywords: BNF, EBNF, compiler, parser


    如何描述一种语言

    当我们需要定义一种语言的规范或语法时,需要将这种语言通过一种形式描述出来,用普通的文字来描述,会非常的繁琐,那么有什么工具来帮助我们描述这种语言的规范呢?

    BNF

    BNF 就是一种描述语法的规范,是 Backus–Naur form 或 Backus normal form 的缩写,其形式非常简单:

     <symbol> ::= __expression__
    

    在这里:

    • <symbol> 代表一个非终止符,通过 <> 来定义,如 <主语><rule>
    • __expression__ 由一个或多个符号组成,符号可以是一个终止符,也可以是一个非终止符;
    • ::= 代表一个非终止符的规则,左边的符号必须通过右边的表达式来替代,如 <rule1>::=__exp1__ <rule2><rule2>::=__exp2__,则 <rule1> 可以表示为 __exp1__ __exp2__
    • 多个符号可以通过 | 间隔,表示一个非终止符的可能值,如 <rule>::=__exp1__ | __exp2__
    • 从未出现在左边的符号代表终止符,意为可以终止替代操作;
    • 字面量可通过 "' 来定义。

    所以 BNF 自身语法中所使用到的符号,只有 <>|::="',简单,但却可以描述复杂的语法规则。

    关于 BNF 更详细的描述,可以参考 wikipedia,或其他相关资料。

    EBNF

    BNF 还有很多变种形式,EBNF 就是其中之一。

    EBNF 是 Extended Backus–Naur form 的缩写,对 BNF 的符号和语法进行了简化和扩充。

    较为混乱的是,EBNF 也有很多变种形式,国际标准化组织(International Organization for Standardization,简称 ISO)于 1996 年通过了一个 EBNF 的标准 ISO/IEC 14977,该标准中定义的 EBNF(下称 ISO EBNF)中可使用的符号如下:

    Usage Notation
    definition =
    concatenation ,
    termination ;
    alternation |
    optional [ ... ]
    repetition { ... }
    grouping ( ... )
    terminal string " ... "
    terminal string ' ... '
    comment (* ... *)
    special sequence ? ... ?
    exception -

    使用 ISO EBNF 描述整数的规则如下:

    digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
    digit                = "0" | digit excluding zero ;
    natural number = digit excluding zero, { digit } ;
    integer = "0" | [ "-" ], natural number ;
    

    遗憾的是,ISO EBNF 的出现,并没有统一 ENBF 的使用,即便在 ISO 自己的其他标准中,也并没有全部使用 ISO EBNF。

    用 EBNF 定义的语法,同样也可以通过 BNF 来描述,只不过使用 BNF 描述时会繁琐一些,例如重复就不太容易通过 BNF 来描述,需要借助中间规则来实现。

    关于 EBNF 更详细的描述,可以参考 wikipedia,或其他相关资料。

    实例

    让我们来看一些使用 BNF 或 EBNF 定义的语言实例。

    BNF

    BNF 自己的语法,可以使用 BNF 来描述:

     <syntax>         ::= <rule> | <rule> <syntax>
     <rule>           ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
     <opt-whitespace> ::= " " <opt-whitespace> | ""
     <expression>     ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression>
     <line-end>       ::= <opt-whitespace> <EOL> | <line-end> <line-end>
     <list>           ::= <term> | <term> <opt-whitespace> <list>
     <term>           ::= <literal> | "<" <rule-name> ">"
     <literal>        ::= '"' <text1> '"' | "'" <text2> "'"
     <text1>          ::= "" | <character1> <text1>
     <text2>          ::= '' | <character2> <text2>
     <character>      ::= <letter> | <digit> | <symbol>
     <letter>         ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
     <digit>          ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
     <symbol>         ::=  "|" | " " | "!" | "#" | "$" | "%" | "&" | "(" | ")" | "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | ">" | "=" | "<" | "?" | "@" | "[" | "\" | "]" | "^" | "_" | "`" | "{" | "}" | "~"
     <character1>     ::= <character> | "'"
     <character2>     ::= <character> | '"'
     <rule-name>      ::= <letter> | <rule-name> <rule-char>
     <rule-char>      ::= <letter> | <digit> | "-"
    

    BNF 的语法也可以用图来表示:

    BNF syntax diagram

    ISO EBNF

    使用 ISO EBNF 来描述自己:

    letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
           | "H" | "I" | "J" | "K" | "L" | "M" | "N"
           | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
           | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
           | "c" | "d" | "e" | "f" | "g" | "h" | "i"
           | "j" | "k" | "l" | "m" | "n" | "o" | "p"
           | "q" | "r" | "s" | "t" | "u" | "v" | "w"
           | "x" | "y" | "z" ;
    digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
    symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
           | "'" | '"' | "=" | "|" | "." | "," | ";" ;
    character = letter | digit | symbol | "_" ;
     
    identifier = letter , { letter | digit | "_" } ;
    terminal = "'" , character , { character } , "'" 
             | '"' , character , { character } , '"' ;
     
    lhs = identifier ;
    rhs = identifier
         | terminal
         | "[" , rhs , "]"
         | "{" , rhs , "}"
         | "(" , rhs , ")"
         | rhs , "|" , rhs
         | rhs , "," , rhs ;
    
    rule = lhs , "=" , rhs , ";" ;
    grammar = { rule } ;
    

    ISO EBNF 语法图:

    ISO EBNF syntax diagram

    MySQL SQL Statement

    在 MySQL 的官方文档中,是通过 EBNF 来对 SQL 语句 进行描述的,比如 创建数据库 语句的语法:

    CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] db_name
        [create_option] ...
    
    create_option: [DEFAULT] {
        CHARACTER SET [=] charset_name
      | COLLATE [=] collation_name
      | ENCRYPTION [=] {'Y' | 'N'}
    }
    

    更多语法实例可以查阅 官方文档

    H2 Grammar

    H2 Database 的 文档 中,包括 SQL 语句的 BNF 和铁轨图(railroad diagram)两个版本,比如 创建 SCHEMA

    CREATE SCHEMA [ IF NOT EXISTS ]
    { name [ AUTHORIZATION ownerName ] | [ AUTHORIZATION ownerName ] }
    [ WITH tableEngineParamName [,...] ]
    
    railroad diagram

    更多语法实例可以查阅 官方文档

    相关文章

      网友评论

          本文标题:如何描述一种语言?

          本文链接:https://www.haomeiwen.com/subject/vgwaertx.html