Spark中使用Scala实现WordCount业务
创建一个Project
sbt选择1.0.4
Scala选择2.11.8
配置路径
Project Sources
Dependencies
新建object
MyScalaWordCount.scala
本地模式
object MyScalaWordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MyScalaWordCount").setMaster("local");
//创建一个SparkContext对象
val sc = new SparkContext(conf)
//执行WordCount
val result = sc.textFile("hdfs://bigdata02:9000/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
//打印在屏幕上
result.foreach(println)
//释放资源
sc.stop()
}
}
导出Jar包在服务器上运行
MyScalaWordCount.scala
生成jar包
object MyScalaWordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MyScalaWordCount");
//创建一个SparkContext对象
val sc = new SparkContext(conf)
//执行WordCount
val result = sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
//打印在屏幕上
result.foreach(println)
//释放资源
sc.stop()
}
}
打包操作
Project Sources
Artifacts
详细步骤参考
Build Artifacts
导出成功
上传Jar包到服务器并执行
cd /home/bigdata/apps/spark-2.1.0-bin-hadoop2.7
./bin/spark-submit --master spark://bigdata02:7077 --class nx.MyScalaWordCount /home/bigdata/data/SparkScalaWork.jar hdfs://bigdata02:9000/wordcount.txt hdfs://bigdata02:9000/output/spark/wc0928
hdfs dfs -cat /output/spark/wc0928/part-00000
网友评论