Flink Standalone Cluster 集群安装

作者: it_zzy | 来源:发表于2018-09-12 17:44 被阅读13次

Flink Standalone Cluster 集群安装
【7】安装Flink
Flink On Yarn如何查看任务日志
Spark On Yarn如何查看任务日志
FLink集群搭建(standalone cluster)
远程debug flink 1.9.0
01_A_flink集群部署 (standalone 、yarn
flink standalone ha cluster的安装
Flink入门-部署
flink 任务提交

Flink Standalone Cluster 集群安装

本文主要介绍如何将Flink以分布式模式运行在集群上（可能是异构的）。

环境准备

Flink 运行在所有类 UNIX 环境上，例如 Linux、Mac OS X 和 Cygwin（对于Windows），而且要求集群由一个master节点和一个或多个worker节点组成。在安装系统之前，确保每台机器上都已经安装了下面的软件：

Java 1.8.x或更高版本
ssh（Flink的脚本会用到sshd来管理远程组件）

如果你的集群还没有完全装好这些软件，你需要安装/升级它们。例如，在 Ubuntu Linux 上，你可以执行下面的命令安装 ssh 和 Java ：

sudo apt-get install ssh 
sudo apt-get install openjdk-8-jre

SSH免密码登录

译注：安装过Hadoop、Spark集群的用户应该对这段很熟悉，如果已经了解，可跳过。

为了能够启动/停止远程主机上的进程，master节点需要能免密登录所有worker节点。最方便的方式就是使用ssh的公钥验证了。要安装公钥验证，首先以最终会运行Flink的用户登录master节点。所有的worker节点上也必须要有同样的用户（例如：使用相同用户名的用户）。由于以前安装过ES，用的是es用户，所以本文会以 es用户为例。非常不建议使用 root 账户，这会有很多的安全问题。

当你用需要的用户登录了master节点，你就可以生成一对新的公钥/私钥。下面这段命令会在 ~/.ssh 目录下生成一对新的公钥/私钥。

ssh-keygen -b 2048 -P '' -f ~/.ssh/id_rsa

接下来，将公钥添加到用于认证的authorized_keys文件中：

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

最后，将authorized_keys文件分发给集群中所有的worker节点，你可以重复地执行下面这段命令：

scp ~/.ssh/authorized_keys <worker>:~/.ssh/

将上面的<worker>替代成相应worker节点的IP/Hostname。完成了上述拷贝的工作，你应该就可以从master上免密登录其他机器了。

ssh <worker>

配置JAVA_HOME

Flink 需要master和worker节点都配置了JAVA_HOME环境变量。有两种方式可以配置。
一种是，你可以在conf/flink-conf.yaml中设置env.java.home配置项为Java的安装路径。
另一种是，sudo vi /etc/profile，在其中添加JAVA_HOME：

#java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH

#node
export NODE_HOME=/usr/local/es/node-v9.11.1-linux-x64
export PATH=$NODE_HOME/bin:$PATH

#maven
export MAVEN_HOME=/usr/local/software/maven-3.5.3
export PATH=$MAVEN_HOME/bin:$PATH

#hadoop
export HADOOP_HOME=/usr/local/software/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_ROOT_LOGGER=INFO,console
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

#hive
export HIVE_HOME=/usr/local/software/hive-2.2.0
export PATH=$HIVE_HOME/bin:$PATH

然后使环境变量生效，并验证 Java 是否安装成功

es@es1:/root$ java -version
openjdk version "1.8.0_162"
OpenJDK Runtime Environment (build 1.8.0_162-8u162-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.162-b12, mixed mode)

安装Flink

进入下载页面。请选择一个与你的Hadoop版本相匹配的Flink包。如果你不打算使用Hadoop，选择任何版本都可以。
我这里下载的flink是flink-1.5.1，hadoop是hadoop-2.8.3。
在下载了最新的发布包后，拷贝到master节点上，并解压：

tar xzf flink-1.5.1/.tgz
cd flink-1.5.1/

配置Flink

在解压完之后，你需要编辑conf/flink-conf.yaml配置Flink。

设置jobmanager.rpc.address配置项为你的master节点地址。另外为了明确 JVM 在每个节点上所能分配的最大内存，我们需要配置jobmanager.heap.mb和taskmanager.heap.mb，值的单位是 MB。如果对于某些worker节点，你想要分配更多的内存给Flink系统，你可以在相应节点上设置FLINK_TM_HEAP环境变量来覆盖默认的配置。
flink-conf.yaml配置如下：

es@es2:/usr/local/software/flink-1.5.1$ cat conf/flink-conf.yaml
################################################################################
#  Licensed to the Apache Software Foundation (ASF) under one
#  or more contributor license agreements.  See the NOTICE file
#  distributed with this work for additional information
#  regarding copyright ownership.  The ASF licenses this file
#  to you under the Apache License, Version 2.0 (the
#  "License"); you may not use this file except in compliance
#  with the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
# limitations under the License.
################################################################################


#==============================================================================
# Common
#==============================================================================

# The external address of the host on which the JobManager runs and can be
# reached by the TaskManagers and any clients which want to connect. This setting
# is only used in Standalone mode and may be overwritten on the JobManager side
# by specifying the --host <hostname> parameter of the bin/jobmanager.sh executable.
# In high availability mode, if you use the bin/start-cluster.sh script and setup
# the conf/masters file, this will be taken care of automatically. Yarn/Mesos
# automatically configure the host name based on the hostname of the node where the
# JobManager runs.

#jobmanager.rpc.address: localhost
jobmanager.rpc.address: es2

# The RPC port where the JobManager is reachable.

jobmanager.rpc.port: 6123


# The heap size for the JobManager JVM

jobmanager.heap.mb: 1024


# The heap size for the TaskManager JVM

taskmanager.heap.mb: 1024


# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.

taskmanager.numberOfTaskSlots: 1

# The parallelism used for programs that did not specify and other parallelism.

parallelism.default: 1

# The default file system scheme and authority.
#
# By default file paths without scheme are interpreted relative to the local
# root file system 'file:///'. Use this to override the default and interpret
# relative paths relative to a different file system,
# for example 'hdfs://mynamenode:12345'
#
# fs.default-scheme

#==============================================================================
# High Availability
#==============================================================================

# The high-availability mode. Possible options are 'NONE' or 'zookeeper'.
#
# high-availability: zookeeper

# The path where metadata for master recovery is persisted. While ZooKeeper stores
# the small ground truth for checkpoint and leader election, this location stores
# the larger objects, like persisted dataflow graphs.
#
# Must be a durable file system that is accessible from all nodes
# (like HDFS, S3, Ceph, nfs, ...)
#
# high-availability.storageDir: hdfs:///flink/ha/

# The list of ZooKeeper quorum peers that coordinate the high-availability
# setup. This must be a list of the form:
# "host1:clientPort,host2:clientPort,..." (default clientPort: 2181)
#
# high-availability.zookeeper.quorum: localhost:2181


# ACL options are based on https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
# It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" (ZOO_OPEN_ACL_UNSAFE)
# The default value is "open" and it can be changed to "creator" if ZK security is enabled
#
# high-availability.zookeeper.client.acl: open

#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================

# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled.
#
# Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the
# <class-name-of-factory>.
#
# state.backend: filesystem

# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints

# Default target directory for savepoints, optional.
#
# state.savepoints.dir: hdfs://namenode-host:port/flink-checkpoints

# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend).
#
# state.backend.incremental: false

#==============================================================================
# Web Frontend
#==============================================================================

# The address under which the web-based runtime monitor listens.
#
#jobmanager.web.address: 0.0.0.0

# The port under which the web-based runtime monitor listens.
# A value of -1 deactivates the web server.

rest.port: 8081

# Flag to specify whether job submission is enabled from the web-based
# runtime monitor. Uncomment to disable.

#jobmanager.web.submit.enable: false

#==============================================================================
# Advanced
#==============================================================================

# Override the directories for temporary files. If not specified, the
# system-specific Java temporary directory (java.io.tmpdir property) is taken.
#
# For framework setups on Yarn or Mesos, Flink will automatically pick up the
# containers' temp directories without any need for configuration.
#
# Add a delimited list for multiple directories, using the system directory
# delimiter (colon ':' on unix) or a comma, e.g.:
#     /data1/tmp:/data2/tmp:/data3/tmp
#
# Note: Each directory entry is read from and written to by a different I/O
# thread. You can include the same directory multiple times in order to create
# multiple I/O threads against that directory. This is for example relevant for
# high-throughput RAIDs.
#
# io.tmp.dirs: /tmp

# Specify whether TaskManager's managed memory should be allocated when starting
# up (true) or when memory is requested.
#
# We recommend to set this value to 'true' only in setups for pure batch
# processing (DataSet API). Streaming setups currently do not use the TaskManager's
# managed memory: The 'rocksdb' state backend uses RocksDB's own memory management,
# while the 'memory' and 'filesystem' backends explicitly keep data as objects
# to save on serialization cost.
#
# taskmanager.memory.preallocate: false

# The classloading resolve order. Possible values are 'child-first' (Flink's default)
# and 'parent-first' (Java's default).
#
# Child first classloading allows users to use different dependency/library
# versions in their application than those in the classpath. Switching back
# to 'parent-first' may help with debugging dependency issues.
#
# classloader.resolve-order: child-first

# The amount of memory going to the network stack. These numbers usually need
# no tuning. Adjusting them may be necessary in case of an "Insufficient number
# of network buffers" error. The default min is 64MB, teh default max is 1GB.
#
# taskmanager.network.memory.fraction: 0.1
# taskmanager.network.memory.min: 67108864
# taskmanager.network.memory.max: 1073741824

#==============================================================================
# Flink Cluster Security Configuration
#==============================================================================

# Kerberos authentication for various components - Hadoop, ZooKeeper, and connectors -
# may be enabled in four steps:
# 1. configure the local krb5.conf file
# 2. provide Kerberos credentials (either a keytab or a ticket cache w/ kinit)
# 3. make the credentials available to various JAAS login contexts
# 4. configure the connector to use JAAS/SASL

# The below configure how Kerberos credentials are provided. A keytab will be used instead of
# a ticket cache if the keytab path and principal are set.

# security.kerberos.login.use-ticket-cache: true
# security.kerberos.login.keytab: /path/to/kerberos/keytab
# security.kerberos.login.principal: flink-user

# The configuration below defines which JAAS login contexts

# security.kerberos.login.contexts: Client,KafkaClient

#==============================================================================
# ZK Security Configuration
#==============================================================================

# Below configurations are applicable if ZK ensemble is configured for security

# Override below configuration to provide custom ZK service name if configured
# zookeeper.sasl.service-name: zookeeper

# The configuration below must match one of the values set in "security.kerberos.login.contexts"
# zookeeper.sasl.login-context-name: Client

#==============================================================================
# HistoryServer
#==============================================================================

# The HistoryServer is started and stopped via bin/historyserver.sh (start|stop)

# Directory to upload completed jobs to. Add this directory to the list of
# monitored directories of the HistoryServer as well (see below).
#jobmanager.archive.fs.dir: hdfs:///completed-jobs/

# The address under which the web-based HistoryServer listens.
#historyserver.web.address: 0.0.0.0

# The port under which the web-based HistoryServer listens.
#historyserver.web.port: 8082

# Comma separated list of directories to monitor for completed jobs.
#historyserver.archive.fs.dir: hdfs:///completed-jobs/

# Interval in milliseconds for refreshing the monitored directories.
#historyserver.archive.fs.refresh-interval: 10000

最后，你需要提供一个集群中worker节点的列表。因此，就像配置HDFS，编辑conf/slaves文件，然后输入每个worker节点的 IP/Hostname。每一个worker结点之后都会运行一个 TaskManager。
每一条记录占一行，就像下面展示的一样：

es@es2:/usr/local/software/flink-1.5.1$ cat conf/slaves
#localhost
es1
es2

conf:

conf

译注：conf/master文件是用来做JobManager HA的，在这里不需要配置

每一个worker节点上的 Flink 路径必须一致。你可以使用共享的 NSF 目录，或者拷贝整个 Flink 目录到各个worker节点。

cp -r /path/to/flink <worker>:/path/to/

请查阅配置页面了解更多关于Flink的配置。
特别的，这几个

TaskManager 总共能使用的内存大小（taskmanager.heap.mb）
每一台机器上能使用的 CPU 个数（taskmanager.numberOfTaskSlots）
集群中的总 CPU 个数（parallelism.default）
临时目录（taskmanager.tmp.dirs）

是非常重要的配置项。

启动Flink

下面的脚本会在本地节点启动一个 JobManager，然后通过 SSH 连接所有的worker节点（slaves文件中所列的节点），并在每个节点上运行 TaskManager。现在你的 Flink 系统已经启动并运行了。跑在本地节点上的 JobManager 现在会在配置的 RPC 端口上监听并接收任务。

假定你在master节点上，并在Flink目录中：

bin/start-cluster.sh

master上启动的进程：

slave上启动的进程：

访问8081端口：

可以看到两个taskManager都成功加入进来了。

要停止Flink，也有一个 stop-cluster.sh 脚本。

添加 JobManager/TaskManager 实例到集群中

你可以使用 bin/jobmanager.sh 和 bin/taskmanager 脚本来添加 JobManager 和 TaskManager 实例到你正在运行的集群中。

添加一个 JobManager

bin/jobmanager.sh (start cluster)|stop|stop-all

添加一个 TaskManager

bin/taskmanager.sh start|stop|stop-all

确保你是在需要启动/停止相应实例的节点上运行的这些脚本。

参考：

Flink Standalone Cluster 集群安装
Flink Standalone Cluster 集群安装本文主要介绍如何将Flink以分布式模式运行在集群上（...
【7】安装Flink
Standalone Cluster 环境安装步骤官网下载 flink 1.6.1 在主节点上解压flink，...
Flink On Yarn如何查看任务日志
无论Flink还是Spark都支持自建集群(standalone cluster)。但是为了保证稳定性和资源隔离等...
Spark On Yarn如何查看任务日志
无论Flink还是Spark都支持自建集群(standalone cluster)。但是为了保证稳定性和资源隔离等...
FLink集群搭建(standalone cluster)
FLink Installation(Standalone Mode) 说明：安装教程的参考了Apache Fl...
远程debug flink 1.9.0
安装部署：以Standalone Cluster为例，步骤请参考官方文档安装完成之后在flink-1.9.0/c...
01_A_flink集群部署 (standalone 、yarn
flink常用的部署模式可能有如下几种 standalone cluster模式 flink on yarn模式 ...
flink standalone ha cluster的安装
jobmanager和taskmanager 首先client端会提交job,然后由jobmanager进行处理,...
Flink入门-部署
Flink提供了多种部署方式，本文主要介绍local cluster、standalone cluser、yarn...
flink 任务提交
1、standalone 集群提交： bin/flink run -c xuwei.tech.streaming....