IDEA和JDK的安装
IDEA官网下载对应版本安装,而JDK的安装也是 官网下载点击下一步安装即可。
Scala和Spark的安装
- 打开IDEA,在Welcome界面打开configure的Plugins中搜索选择Scala安装即可。
- 回到Welcome界面新建Project,选择上面安装的Scala项目,选择IDEA构建项目。(不推荐sbt,总是要加载下载很慢的)
- Spark的下载安装可以参考我 前一篇。
- 打开Project项目后,在
File->Project Structure->Librarie
点击加号添加Spark安装目录下的jar包,这样就完成了在IDEA的Spark配置。
Spark开发环境的测试
在src的main下新建一个Object文件,测试代码如下:
import org.apache.spark._
import org.apache.spark.streaming._
object StreamingTest1 {
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val sc = new SparkContext(conf)
// 这个在官网基础上添加,使得输出级别为WARN级别,避免INFO信息过多。
// 多说一句,其实Python对Spark日志的几乎相同。
sc.setLogLevel("WARN")
// 为了便于看清楚对流数据的Wordcount,把时间间隔设置为5s。
val ssc = new StreamingContext(sc, Seconds(5))
// Create a DStream that will connect to hostname:port, like localhost:9999
val lines = ssc.socketTextStream("localhost", 9999)
// Split each line into words
val words = lines.flatMap(_.split(" "))
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
}
点击运行该文件,控制台输出信息中。
在终端命令行输入打开:nc -lk 9999
并且输入:hhh hhh aa aa cc
得出结果:
Time: 1557314380000 ms
-------------------------------------------
(aa,2)
(hhh,2)
(cc,1)
补充
如果要一劳永逸解决开发的日志级别,可以在src的main目录下新建一个log4j.properties文件。
内容输入:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
小结
相比较于Pycharm配置Spark的Python开发环境,IDEA配置Spark的Scala开发环境便于阅读源码和逻辑。
同时配置也简单明了,就是JDK、IDEA、Scala、Spark的依次安装和相互连接。同时指出,可以自定义安装Scala和Spark而不是用IDEA自带的。要做的就是配置即可。
最后不管是Pycharm还是IDEA都是IDE工具,便于我们开发测试,多关注逻辑,让工具替我们管理。
网友评论