执行spark的任务的工具
一、spark-submit
相当于hadoop jar命令-->提交MapReduce任务(jar文件)
执行官方examples,路径/home/bigdata/apps/spark-2.1.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/
SparkPi.scala 蒙特卡罗求Pi
org.apache.spark.examples.SparkPi
cd /home/bigdata/apps/spark-2.1.0-bin-hadoop2.7
./bin/spark-submit --master spark://bigdata02:7077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.1.0.jar 200
二、sparkshell
类似于ScalaREPL命令行,Spark交互式命令行
1.本地模式 bin/spark-shell
启动
./bin/spark-shell
2.集群模式
bin/spark-shell --master spark://bigdata02:7077
示例:
hdfs dfs -cat /wordcount.txt
sc.textFile("hdfs://bigdata02:9000/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
sc.textFile("hdfs://bigdata02:9000/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://bigdata02:9000/output/spark/wc111")
hdfs dfs -cat /output/spark/wc111/part-00000
hdfs dfs -cat /output/spark/wc111/part-00001
只产生一个分区
repartition
sc.textFile("hdfs://bigdata02:9000/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).repartition(1).saveAsTextFile("hdfs://bigdata02:9000/output/spark/wc112")
hdfs dfs -cat /output/spark/wc112/part-00000
拆分语句执行
scala> var rdd1 = sc.textFile("hdfs://bigdata02:9000/wordcount.txt")
scala> rdd1.collect
scala> var rdd2 = rdd1.flatMap(_.split(" "))
scala> rdd2.collect
scala> var rdd3 = rdd2.map((_,1))
scala> rdd3.collect
scala> var rdd4 = rdd3.reduceByKey(_+_)
scala> rdd4.collect
网友评论