美文网首页
记-Spark-Mac上IDEA配置Spark开发环境

记-Spark-Mac上IDEA配置Spark开发环境

作者: Andrew0000 | 来源:发表于2019-05-08 19:49 被阅读0次

IDEA和JDK的安装

IDEA官网下载对应版本安装,而JDK的安装也是 官网下载点击下一步安装即可。

Scala和Spark的安装

  1. 打开IDEA,在Welcome界面打开configure的Plugins中搜索选择Scala安装即可。
  2. 回到Welcome界面新建Project,选择上面安装的Scala项目,选择IDEA构建项目。(不推荐sbt,总是要加载下载很慢的)
  3. Spark的下载安装可以参考我 前一篇
  4. 打开Project项目后,在 File->Project Structure->Librarie点击加号添加Spark安装目录下的jar包,这样就完成了在IDEA的Spark配置。

Spark开发环境的测试

在src的main下新建一个Object文件,测试代码如下:

import org.apache.spark._
import org.apache.spark.streaming._

object StreamingTest1 {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val sc = new SparkContext(conf)
    // 这个在官网基础上添加,使得输出级别为WARN级别,避免INFO信息过多。
    // 多说一句,其实Python对Spark日志的几乎相同。
    sc.setLogLevel("WARN")
        // 为了便于看清楚对流数据的Wordcount,把时间间隔设置为5s。
    val ssc = new StreamingContext(sc, Seconds(5))
    
    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("localhost", 9999)
    // Split each line into words
    val words = lines.flatMap(_.split(" "))
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)

    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()
    ssc.start()             // Start the computation
    ssc.awaitTermination()  // Wait for the computation to terminate

  }
}

点击运行该文件,控制台输出信息中。

在终端命令行输入打开:nc -lk 9999并且输入:hhh hhh aa aa cc

得出结果:

Time: 1557314380000 ms
-------------------------------------------
(aa,2)
(hhh,2)
(cc,1)

补充

如果要一劳永逸解决开发的日志级别,可以在src的main目录下新建一个log4j.properties文件。

内容输入:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

小结

相比较于Pycharm配置Spark的Python开发环境,IDEA配置Spark的Scala开发环境便于阅读源码和逻辑。

同时配置也简单明了,就是JDK、IDEA、Scala、Spark的依次安装和相互连接。同时指出,可以自定义安装Scala和Spark而不是用IDEA自带的。要做的就是配置即可。

最后不管是Pycharm还是IDEA都是IDE工具,便于我们开发测试,多关注逻辑,让工具替我们管理。

相关文章

网友评论

      本文标题:记-Spark-Mac上IDEA配置Spark开发环境

      本文链接:https://www.haomeiwen.com/subject/xnovoqtx.html