记-Spark-Mac上IDEA配置Spark开发环境

作者: Andrew0000 | 来源:发表于2019-05-08 19:49 被阅读0次

记-Spark-Mac上IDEA配置Spark开发环境
IDEA配置本地开发环境连接远程集群访问Hdfs，Spark S
Windows下搭建Spark开发测试环境
IDEA配置spark开发环境
第四章用IDEA写一个spark小程序
记-Spark-Mac下用Pycharm搭建Spark开发环境
一、开发环境
IDEA+jdk12配置过程记录
2018-11-12Pyspark win环境配置参考
windows环境下使用idea搭建spark开发环境

IDEA和JDK的安装

IDEA官网下载对应版本安装，而JDK的安装也是官网下载点击下一步安装即可。

Scala和Spark的安装

打开IDEA，在Welcome界面打开configure的Plugins中搜索选择Scala安装即可。
回到Welcome界面新建Project，选择上面安装的Scala项目，选择IDEA构建项目。(不推荐sbt，总是要加载下载很慢的)
Spark的下载安装可以参考我前一篇。
打开Project项目后，在 File->Project Structure->Librarie点击加号添加Spark安装目录下的jar包，这样就完成了在IDEA的Spark配置。

Spark开发环境的测试

在src的main下新建一个Object文件，测试代码如下：

import org.apache.spark._
import org.apache.spark.streaming._

object StreamingTest1 {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val sc = new SparkContext(conf)
    // 这个在官网基础上添加，使得输出级别为WARN级别，避免INFO信息过多。
    // 多说一句，其实Python对Spark日志的几乎相同。
    sc.setLogLevel("WARN")
        // 为了便于看清楚对流数据的Wordcount，把时间间隔设置为5s。
    val ssc = new StreamingContext(sc, Seconds(5))
    
    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("localhost", 9999)
    // Split each line into words
    val words = lines.flatMap(_.split(" "))
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)

    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()
    ssc.start()             // Start the computation
    ssc.awaitTermination()  // Wait for the computation to terminate

  }
}

点击运行该文件，控制台输出信息中。

在终端命令行输入打开：nc -lk 9999并且输入：hhh hhh aa aa cc

得出结果：

Time: 1557314380000 ms
-------------------------------------------
(aa,2)
(hhh,2)
(cc,1)

补充

如果要一劳永逸解决开发的日志级别，可以在src的main目录下新建一个log4j.properties文件。

内容输入：

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

小结

相比较于Pycharm配置Spark的Python开发环境，IDEA配置Spark的Scala开发环境便于阅读源码和逻辑。

同时配置也简单明了，就是JDK、IDEA、Scala、Spark的依次安装和相互连接。同时指出，可以自定义安装Scala和Spark而不是用IDEA自带的。要做的就是配置即可。

最后不管是Pycharm还是IDEA都是IDE工具，便于我们开发测试，多关注逻辑，让工具替我们管理。

网友评论

本文标题：记-Spark-Mac上IDEA配置Spark开发环境

本文链接：https://www.haomeiwen.com/subject/xnovoqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

记-Spark-Mac上IDEA配置Spark开发环境

IDEA和JDK的安装

Scala和Spark的安装

Spark开发环境的测试

补充

小结

相关文章