Spark实现本地开发
代码如下
package sparksql
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
/**
* Created by IBM on 2017/4/15.
*/
object SparkSqlLearn extends App{
val conf = new SparkConf().setMaster("spark://192.168.137.10:7077").setAppName("SparkSql")
.setJars(List("D:\\java\\idea\\SparkLearn\\out\\artifacts\\SparkLearn_jar\\SparkLearn.jar"))
//val conf = new SparkConf().setMaster("local").setAppName("SparkSql")
val sc = new SparkContext(conf)
val data = Array("app app","asd app","demo llp","demo")
val re :RDD[(String,Int)]= sc.parallelize(data).flatMap(str => str.split(" ")).map(str => (str,1)).reduceByKey({case (x,y) => x + y})
val re_array = re.collect()
for((key,value) <- re_array){
println(key + " value is " + value)
}
println(re.count())
println("hello")
}
上面第一个要注意的就是setMaster("spark://192.168.137.10:7077")设置远程服务器
另外就是.setJars(List("D:\java\idea\SparkLearn\out\artifacts\SparkLearn_jar\SparkLearn.jar"))告诉Spark 集群我们要提交的作业的代码在哪里,也就是我们包含我们程序的Jar包的路径,记住路径中千万别包含中文。
配置打包:
Paste_Image.png记得一定要勾选Include in project build
然后执行build,生成Jar包,最终生成的Jar如下:
Paste_Image.png然后执行代码,输出结果如下:
Paste_Image.png
网友评论