二种方法实现Spark计算WordCount

作者: printf200 | 来源:发表于2019-04-18 08:16 被阅读0次

二种方法实现Spark计算WordCount
二种方法实现Spark计算WordCount
二种方法实现Spark计算WordCount
二种方法实现Spark计算WordCount
二种方法实现Spark计算WordCount
史上最快! 10小时大数据入门实战(九)- 前沿技术拓展Spar
大数据面试
PySpark的使用
Spark 实现WordCount
Spark | WordCount

1.spark-shell
val lines = sc.textFile("hdfs://spark1:9000/spark.txt")
val words = lines.flatMap(line => line.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.foreach(wordcount => println(wordcount._1 + " appeared " + wordcount._2 + " times"))

2.Scala for idea
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>

package cn.spark.study.core

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object WordCount {

def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("spark://hadoop:7077");
//.setMaster("local[2]");//本地运行（windows）
val sc = new SparkContext(conf)

val lines = sc.textFile(args(0), 1);
val words = lines.flatMap { line => line.split(" ")}
val pairs = words.map {word => (word, 1)}
val wordCount = pairs.reduceByKey(_ + _)
wordCount.foreach(wordCount => println(wordCount._1 + " appeared " + wordCount._2 + " times"))

}
}

最后，需要使用spark submit提交到spark集群中进行运行，执行脚本如下：
/usr/local/spark/bin/spark-submit
--class cn.spark.study.core.WordCount
/usr/local/spark-study/scala/wordcount.jar
/root/test.txt
~

注意：需要停止spark-shell，否则可能出现内存不足错误（Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources）