Apache Kafka 部署与启动
介绍完kafka基础信息,下面进行部署和启动介绍。
安装前的环境准备
由于Kafka是用Scala语言开发的,运行在JVM上,因此在安装Kafka之前需要先安装JDK。
最好选择JDK1.8+的版本。
安装JDK
kafka依赖zookeeper,所以需要先安装zookeeper
安装zookeeper
获取zookeeper压缩包:
[root@node-100 local]# mkdir zookeeper
[root@node-100 local]# cd zookeeper/
[root@node-100 local]# wget http://mirror.bit.edu.cn/apache/zookeeper/stable/zookeeper-3.4.12.tar.gz
解压:
[root@node-100 zookeeper]# tar -zxvf zookeeper-3.4.12.tar.gz
进入解压好的目录,修改配置文件:
[root@node-100 zookeeper]# ls
zookeeper-3.4.12
[root@node-100 zookeeper]# cd zookeeper-3.4.12/
[root@node-100 zookeeper-3.4.12]# ls
bin conf dist-maven ivysettings.xml lib NOTICE.txt README_packaging.txt src zookeeper-3.4.12.jar.asc zookeeper-3.4.12.jar.sha1
build.xml contrib docs ivy.xml LICENSE.txt README.md recipes zookeeper-3.4.12.jar zookeeper-3.4.12.jar.md5
[root@node-100 zookeeper-3.4.12]# cd conf
[root@node-100 conf]# ls
configuration.xsl log4j.properties zoo_sample.cfg
[root@node-100 conf]# cp zoo_sample.cfg zoo.cfg.bak
[root@node-100 conf]# mv zoo_sample.cfg zoo.cfg
[root@node-100 conf]# ls
configuration.xsl log4j.properties zoo.cfg zoo.cfg.bak
[root@node-100 conf]#
修改日志目录:
[root@node-100 conf]# vim zoo.cfg
修改:
dataDir=/usr/local/zookeeper/zookeeper-3.4.12/logs #日志目录
clientPort=2181 #端口
启动服务端:
[root@node-100 zookeeper-3.4.12]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@node-100 zookeeper-3.4.12]#
启动客户端:
[root@node-100 zookeeper-3.4.12]# bin/zkCli.sh -server 192.168.5.100:2181
Connecting to 192.168.5.100:2181
2019-01-03 23:15:32,779 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT
2019-01-03 23:15:32,782 [myid:] - INFO [main:Environment@100] - Client environment:host.name=node-100
2019-01-03 23:15:32,782 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_191
2019-01-03 23:15:32,783 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2019-01-03 23:15:32,783 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/local/java/jdk1.8.0_191/jre
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/zookeeper/zookeeper-3.4.12/bin/../build/classes:/usr/local/zookeeper/zookeeper-3.4.12/bin/../build/lib/*.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/slf4j-log4j12-1.7.25.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/slf4j-api-1.7.25.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/netty-3.10.6.Final.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/log4j-1.2.17.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/audience-annotations-0.5.0.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../zookeeper-3.4.12.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../src/java/lib/*.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../conf:.:/usr/local/java/jdk1.8.0_191/lib:/usr/local/java/jdk1.8.0_191/jre/lib:.:/usr/local/java/jdk1.8.0_191/lib:/usr/local/java/jdk1.8.0_191/jre/lib:.:/usr/local/java/jdk1.8.0_191/lib:/usr/local/java/jdk1.8.0_191/jre/lib:
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-327.el7.x86_64
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2019-01-03 23:15:32,784 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2019-01-03 23:15:32,785 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/usr/local/zookeeper/zookeeper-3.4.12
2019-01-03 23:15:32,786 [myid:] - INFO [main:ZooKeeper@441] - Initiating client connection, connectString=192.168.5.100:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@69d0a921
Welcome to ZooKeeper!
JLine support is enabled
2019-01-03 23:15:32,879 [myid:] - INFO [main-SendThread(192.168.5.100:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server 192.168.5.100/192.168.5.100:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-03 23:15:32,967 [myid:] - INFO [main-SendThread(192.168.5.100:2181):ClientCnxn$SendThread@878] - Socket connection established to 192.168.5.100/192.168.5.100:2181, initiating session
2019-01-03 23:15:33,020 [myid:] - INFO [main-SendThread(192.168.5.100:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server 192.168.5.100/192.168.5.100:2181, sessionid = 0x1000030aeca0000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 192.168.5.100:2181(CONNECTED) 0]
查看根节点:
[zk: 192.168.5.100:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: 192.168.5.100:2181(CONNECTED) 1]
开始部署Kafka
第一步:下载安装包
创建kafka目录
[root@node-100 local]# cd /usr/local
[root@node-100 local]# mkdir kafka
获取安装包:kafka_2.12-2.1.0.tgz(这是目前最新的版本,如果实际生产中应用,最好下载之前的release版本,例如:1.1.0 release版本)
wget http://mirrors.shu.edu.cn/apache/kafka/2.1.0/kafka_2.12-2.1.0.tgz
tar -xvf kafka_2.12-2.1.0.tgz
cd kafka_2.12-2.1.0/
第二步:启动服务
修改配置文件:server.properties
[root@node-100 kafka_2.12-2.1.0]# cd config/
[root@node-100 config]# ls
connect-console-sink.properties connect-file-sink.properties connect-standalone.properties producer.properties trogdor.conf
connect-console-source.properties connect-file-source.properties consumer.properties server.properties zookeeper.properties
connect-distributed.properties connect-log4j.properties log4j.properties tools-log4j.properties
[root@node-100 config]# vim server.properties
server.properties :
############################# Server Basics #############################
# 每一个broker在集群中的唯一表示,要求是正数。当该服务器的IP地址发生改变时,broker.id没有变化,则不会影响consumers的消息情况
broker.id=0
# broker server服务端口
port=9092
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
# broker处理消息的最大线程数,一般情况下不需要去修改
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
# broker处理磁盘IO的线程数,数值应该大于你的硬盘数
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
# socket server的发送缓冲区,socket的调优参数SO_SNDBUFF
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
# socket server的接受缓冲区,socket的调优参数SO_RCVBUFF
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
# socket请求的最大数值,防止serverOOM,message.max.bytes必然要小于socket.request.max.bytes,会被topic创建时的指定参数覆盖
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
# kafka数据的存放地址,多个地址的话用逗号分割 /data/kafka-logs-1,/data/kafka-logs-2
log.dirs=/usr/local/kafka/kafka_2.12-2.1.0/data/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
# 每个topic的分区个数,若是在topic创建时候没有指定的话会被topic创建时的指定参数覆盖
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
# 每个数据目录用来日志恢复的线程数目
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
# 组元数据内部主题的复制因子
# 对于开发测试以外的任何其他测试,建议大于1的值以确保可用性,如3。
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# The number of messages to accept before forcing a flush of data to disk
# log文件”sync”到磁盘之前累积的消息条数,因为磁盘IO操作是一个慢操作,但又是一个”数据可靠性"的必要手段,所以此参数的设置,
# 需要在"数据可靠性"与"性能"之间做必要的权衡.如果此值过大,将会导致每次"fsync"的时间较长(IO阻塞),如果此值过小,
# 将会导致"fsync"的次数较多,这也意味着整体的client请求有一定的延迟.物理server故障,将会导致没有fsync的消息丢失.
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
# 仅仅通过interval来控制消息的磁盘写入时机,是不足的.此参数用于控制"fsync"的时间间隔,
# 如果消息量始终没有达到阀值,但是离上一次磁盘同步的时间间隔达到阀值,也将触发.
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The minimum age of a log file to be eligible for deletion due to age
# 每个日志文件删除之前保存的时间。默认数据保存时间对所有topic都一样。
# log.retention.minutes和log.retention.bytes都是用来设置删除日志文件的,无论哪个属性已经溢出。
# 这个属性设置可以在topic基本设置时进行覆盖。
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
# 每个topic下每个partition保存数据的总量;
# 注意,这是每个partitions的上限,因此这个数值乘以partitions的个数就是每个topic保存的数据总量。
# 同时注意:如果log.retention.hours和log.retention.bytes都设置了,
# 则超过了任何一个限制都会造成删除一个段文件。
# 这项设置可以由每个topic设置时进行覆盖。
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
# topic partition的日志存放在某个目录下诸多文件中,这些文件将partition的日志切分成一段一段的;
# 这个属性就是每个文件的最大尺寸;当尺寸达到这个数值时,就会创建新文件。此设置可以由每个topic基础设置时进行覆盖。
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
# 检查日志分段文件的间隔时间,以确定是否文件属性是否到达删除要求。300000(5 minutes)
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
# 指定zookeeper的连接的字符串,格式是hostname:port,
# 此处host和port都是zookeeper server的host和port,
# 为避免某个zookeeper 机器宕机之后失联,你可以指定多个hostname:port,
# 使用逗号作为分隔:hostname1:port1,hostname2:port2,hostname3:port3
# 可以在zookeeper连接字符串中加入zookeeper的chroot路径,
# 此路径用于存放他自己的数据,
# 方式:hostname1:port1,hostname2:port2,hostname3:port3/chroot/path
zookeeper.connect=192.168.5.100:2181
# Timeout in ms for connecting to zookeeper
# 客户端在建立通zookeeper连接中的最大等待时间
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
#以下配置指定GroupCoordinator将延迟初始使用者重新平衡的时间(以毫秒为单位)。
#当新成员加入该组时,重新平衡将被group.initial.rebalance.delay.ms的值进一步延迟,最大值为max.poll.interval.ms。
#默认值为3秒。
#我们在这里将其覆盖为0,因为它为开发和测试提供了更好的开箱即用体验。
#然而,在生产环境中,默认值3秒更合适,因为这将有助于避免在应用程序启动期间不必要的、潜在的昂贵的重新平衡。
group.initial.rebalance.delay.ms=0
更多配置信息可以参考:
https://yq.aliyun.com/ziliao/417941
https://www.cnblogs.com/fillPv/p/5953852.html
下面来启动kafka:
[root@node-100 kafka_2.12-2.1.0]# ls
bin config data libs LICENSE NOTICE site-docs
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-server-start.sh -daemon config/server.properties
[root@node-100 kafka_2.12-2.1.0]#
提示:
启动脚本语法:kafka-server-start.sh [-daemon] server.properties
可以看到,server.properties的配置路径是一个强制的参数,
-daemon表示以后台进程运行,否则ssh客户端退出后,就会停止服务。
(注意,在启动kafka时会使用linux主机名关联的ip地址,
所以需要把主机名和linux的ip映射配置到本地host里,用vim /etc/hosts)
我们进入zookeeper目录通过zookeeper客户端查看下zookeeper的目录树
[zk: localhost:2181(CONNECTED) 1] ls /
[cluster, controller_epoch, controller, brokers, zookeeper, admin, isr_change_notification, consumers, log_dir_event_notification, latest_producer_id_block, config]
[zk: localhost:2181(CONNECTED) 2] ls /brokers/ids
[0]
[zk: localhost:2181(CONNECTED) 3]
ok,启动成功。
网友评论