mac 默认安装java10, spark需要java8
安装东西, 如果开了代理可能报错, 说ssl证书不对blabla... ..., 需要关掉代理(我已经被代理坑过n多次了... ...)
0x01 安装java8
0x02 修改 .zshrc
export JAVA_8_HOME=`/usr/libexec/java_home -v 1.8`
alias jdk8="export JAVA_HOME=$JAVA_8_HOME"
执行jdk8
后java -version
就是8了
0x03 下载spark, 设置path
0x04 执行spark-shell
可以进入交互界面, 那么问题不大(如果出错, 可能是java版本问题)
0x05 sbt下, run spark scala
下载scala, 解压, 配置path, 用spark版本必须2.11.x (真他妈的坑爹啊)
scala -version
2.11.0
2.3.1 spark 配置文件/build.sbt
name := "sparkLearning"
version := "1.0"
sbtVersion := "1.1.2"
scalaVersion := "2.11.0"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.scala-lang" % "scala-library" % "2.11.0"
)
2.2.0 spark 配置文件/build.sbt
name := "sparkLearning"
version := "1.0"
sbtVersion := "1.1.2"
scalaVersion := "2.11.0"
val sparkVersion = "2.2.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.scala-lang" % "scala-library" % "2.11.0"
)
先用官方scala pi例子跑一下
src/main/scala/pi.scala
github spark scala pi example
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// scalastyle:off println
package org.apache.spark.examples
import scala.math.random
import org.apache.spark.sql.SparkSession
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.appName("Spark Pi")
// 从git里拷贝来后, 如果报错说config blabla..., 需要添加下面这行
.config("spark.master", "local")
.getOrCreate()
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y <= 1) 1 else 0
}.reduce(_ + _)
println(s"Pi is roughly ${4.0 * count / (n - 1)}")
spark.stop()
}
}
// scalastyle:on println
然后
sbt
sbt> sbtVersion
1.2.1
sbt> reload
sbt> update
sbt> run
# 看到Pi is roughly 3.1456557282786415就算跑出来了
在 ~/.ivy2/
下, 有下载的jar包
# 这里的2.11是scala版本号, 后面是spark版本号
➜ jars ls
spark-core_2.11-2.2.0.jar spark-core_2.11-2.3.1.jar.part
➜ jars pwd
/Users/<user_name>/.ivy2/cache/org.apache.spark/spark-core_2.11/jars
sbt-assembly 把依赖包也打到jar包里, fat jars
/projects/
添加这一行(官方文档说是加在project/assembly.sbt
, 有时候报错) :
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.7")
/build.sbt
:
lazy val root = (project in file(".")).
settings(
name := "name",
version := "0.1",
scalaVersion := "2.11.11",
mainClass in Compile := Some("com.example.a.ClassA")
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.2.2",
"org.apache.spark" %% "spark-hive" % "2.2.2",
"org.json4s" %% "json4s-jackson" % "3.6.0",
"org.scala-lang.modules" %% "scala-xml" % "1.1.0",
"commons-io" % "commons-io" % "2.6",
"org.apache.hadoop" % "hadoop-hdfs" % "2.6.0",
"org.scalaz" %% "scalaz-core" % "7.2.26",
"com.typesafe" % "config" % "1.2.0",
"org.scalaj" %% "scalaj-http" % "2.4.1"
)
// 合并策略
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
网友评论