编译与运行Standalone

作者: AlstonWilliams | 来源:发表于2018-10-10 08:48 被阅读27次

编译与运行Standalone
Spark-Standalone模式
android点二
Standalone&&Yarn
编译与运行，编译与解释
深入理解Spark 2.1 Core （六）：资源调度的原理与源
Gobblin部署模式
java反射
（三十二）单步调试技术
Java Review (Java开发环境)

阅读源码，肯定少不了编译和运行这一步。

我选择的源码的版本是Spark 2.4.0-SNAPSHOT这一个版本。

编译的方法很简单，只需要在Spark的源码目录下，运行下面的命令就好了：

./build/mvn -DskipTests clean package

编译比较耗时间，占的CPU也较高。所以建议晚上睡觉时，开着电脑让它编译完成。

编译完以后，就可以运行了。这里我们为了调试方便，只是运行的Standalone，这样就不需要额外安装Hadoop的那一套，或者Mesos这些东西。

Standalone的运行方式也很简单。

首先，运行Spark master:

sbin/start-master.sh

然后，在其日志中，我们能够看到一个master的url:

我们要记住这个url，后面多次要使用。

然后，我们再来启动一个slave节点:

sbin/start-slave.sh spark://alstonwilliams:7077

start-slave.sh后面跟的是master的url。你应该换成你的。

然后，修改配置文件(位于conf目录下)，将spark-defaults.conf修改成下面这样子:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
spark.master                     spark://alstonwilliams:7077
spark.eventLog.enabled           true
spark.eventLog.dir               file:///tmp/spark-events
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

其中spark.master是告诉Application,如何找到master。spark.eventLog.enabled和spark.eventLog.dir是配合HistoryServer使用的，如果不设置，Application不会输出日志，我们在HistoryServer中也就看不到我们跑过的Application。

另外，需要注意的是，spark.eventLog.dir对应的目录一定要存在，否则HistoryServer启动时会报错的。

好了，上面这些完成以后，通过sbin/start-history-server.sh启动一个HistoryServer，我们就可以愉快的玩耍了。

对了，Spark Master WebUI的端口号，默认是8080，Spark Worker WebUI的端口号，默认是8081。如果你同时还在开发Web应用，那么这两个端口大概率会被占用。我们可以通过修改conf/spark-env.sh来设置新的端口。在我的本机上，我分别设置成了9090和9091: