美文网首页
spark sql 2.3 源码解读 - antlr4 && S

spark sql 2.3 源码解读 - antlr4 && S

作者: sddyljsx | 来源:发表于2018-08-16 23:48 被阅读0次

    ​ 接着上一篇文章,本章将介绍 第1步:sql 语句经过 SqlParser 解析成 Unresolved Logical Plan

    ​ 当我们执行:

    val sqlDF = spark.sql("SELECT name FROM people order by name")
    

    ​ 看一下sql函数:

    def sql(sqlText: String): DataFrame = {
        Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
      }
    
    def parsePlan(sqlText: String): LogicalPlan
    

    ​ parsePlan 函数将 sql语句变成了 LogicalPlan

    class SparkSqlParser(conf: SQLConf) extends AbstractSqlParser {
      val astBuilder = new SparkSqlAstBuilder(conf)
    
      private val substitutor = new VariableSubstitution(conf)
    
      protected override def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
        super.parse(substitutor.substitute(command))(toResult)
      }
    }
    
    /** Creates LogicalPlan for a given SQL string. */
    override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
      astBuilder.visitSingleStatement(parser.singleStatement()) match {
        case plan: LogicalPlan => plan
        case _ =>
          val position = Origin(None, None)
          throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position)
      }
    }
    
    protected def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
        logDebug(s"Parsing command: $command")
    
        val lexer = new SqlBaseLexer(new UpperCaseCharStream(CharStreams.fromString(command)))
        lexer.removeErrorListeners()
        lexer.addErrorListener(ParseErrorListener)
    
        val tokenStream = new CommonTokenStream(lexer)
        val parser = new SqlBaseParser(tokenStream)
        parser.addParseListener(PostProcessor)
        parser.removeErrorListeners()
        parser.addErrorListener(ParseErrorListener)
    
        try {
          try {
            // first, try parsing with potentially faster SLL mode
            parser.getInterpreter.setPredictionMode(PredictionMode.SLL)
            toResult(parser)
          }
          catch {
            case e: ParseCancellationException =>
              // if we fail, parse with LL mode
              tokenStream.seek(0) // rewind input stream
              parser.reset()
    
              // Try Again.
              parser.getInterpreter.setPredictionMode(PredictionMode.LL)
              toResult(parser)
          }
        }
        catch {
          case e: ParseException if e.command.isDefined =>
            throw e
          case e: ParseException =>
            throw e.withCommand(command)
          case e: AnalysisException =>
            val position = Origin(e.line, e.startPosition)
            throw new ParseException(Option(command), e.message, position, position)
        }
      }
    

    ​ 仔细阅读 parse函数,可以发现其中主要的工作主力是SqlBaseLexer 和 SparkSqlAstBuilder,它们都是antlr4相关的代码。下一节会对antlr4进行详细介绍。

    相关文章

      网友评论

          本文标题:spark sql 2.3 源码解读 - antlr4 && S

          本文链接:https://www.haomeiwen.com/subject/elpabftx.html