美文网首页
Spark第一次任务提交(Scala和Python)

Spark第一次任务提交(Scala和Python)

作者: 抬头挺胸才算活着 | 来源:发表于2021-12-16 08:47 被阅读0次
  • Spark提交scala程序的方法
    1、在Spark-shell中像在python交互窗口一样敲scala代码,会提前有一个sc的变量代表SparkContext
    2、使用类似spark-submit --class Hello HelloWorld.jar的命令提交一个jar包,其中Hello是入口类,HelloWorld.jar是打包好的jar包,并说明启动类,这个jar包里面的程序需要自己声明SparkContext变量

  • Spark提交python程序的方法
    1、同上,但是打开的是pyspark
    2、同上,但是提交的是xxx.py文件即可
    spark-submit xxx.py

  • 用Scala编写WordCount程序并打包成Jar

object WordCount {
  def main(args: Array[String]): Unit = {
    // setMaster设置主机URL,local表示本地,2表示线程个数
    val sparkConf: SparkConf = new SparkConf().setMaster("local[2]").setAppName("WordCount")
    val sc = new SparkContext(sparkConf)

    val lines: RDD[String] = sc.textFile("C:/java/spark_practise/src/main/resources/input/word.txt")
    wordCount1(lines)
    sc.stop()
  }

  def wordCount1(lines : RDD[String]): Unit = {
    val words: RDD[String] = lines.flatMap(_.split(" "))
    val wordToOne: RDD[(String, Int)] = words.map((_, 1))
    val wordToCount: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
    wordToCount.foreach(println(_))
  }
}
  • Python脚本
import sys

from pyspark import SparkContext, SparkConf

if __name__ == "__main__":

  # create Spark context with Spark configuration
  conf = SparkConf().setAppName("Word Count - Python").set("spark.hadoop.yarn.resourcemanager.address", "192.168.0.104:8032")
  sc = SparkContext(conf=conf)

  # read in text file and split each document into words
  words = sc.textFile("C:/java/spark_practise/src/main/resources/input/word.txt").flatMap(lambda line: line.split(" "))

  # count the occurrence of each word
  wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a,b:a +b)

  print("spark python output................")
  print(wordCounts.collect())

相关文章

网友评论

      本文标题:Spark第一次任务提交(Scala和Python)

      本文链接:https://www.haomeiwen.com/subject/pvapfrtx.html