-
Spark提交scala程序的方法
1、在Spark-shell中像在python交互窗口一样敲scala代码,会提前有一个sc的变量代表SparkContext
2、使用类似spark-submit --class Hello HelloWorld.jar的命令提交一个jar包,其中Hello是入口类,HelloWorld.jar是打包好的jar包,并说明启动类,这个jar包里面的程序需要自己声明SparkContext变量 -
Spark提交python程序的方法
1、同上,但是打开的是pyspark
2、同上,但是提交的是xxx.py文件即可
spark-submit xxx.py -
用Scala编写WordCount程序并打包成Jar
object WordCount {
def main(args: Array[String]): Unit = {
// setMaster设置主机URL,local表示本地,2表示线程个数
val sparkConf: SparkConf = new SparkConf().setMaster("local[2]").setAppName("WordCount")
val sc = new SparkContext(sparkConf)
val lines: RDD[String] = sc.textFile("C:/java/spark_practise/src/main/resources/input/word.txt")
wordCount1(lines)
sc.stop()
}
def wordCount1(lines : RDD[String]): Unit = {
val words: RDD[String] = lines.flatMap(_.split(" "))
val wordToOne: RDD[(String, Int)] = words.map((_, 1))
val wordToCount: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
wordToCount.foreach(println(_))
}
}
- Python脚本
import sys
from pyspark import SparkContext, SparkConf
if __name__ == "__main__":
# create Spark context with Spark configuration
conf = SparkConf().setAppName("Word Count - Python").set("spark.hadoop.yarn.resourcemanager.address", "192.168.0.104:8032")
sc = SparkContext(conf=conf)
# read in text file and split each document into words
words = sc.textFile("C:/java/spark_practise/src/main/resources/input/word.txt").flatMap(lambda line: line.split(" "))
# count the occurrence of each word
wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a,b:a +b)
print("spark python output................")
print(wordCounts.collect())
网友评论