一、概述
这是Hadoop之旅的第二站,之前我们已经搭好了Hadoop和HBase,今天我们在此基础上建立Hive。
我们接着之前的架构图进行补充
我们假设只有三台服务器,并且要在三台服务器上做高可用。
服务 | 服务器1 | 服务器2 | 服务器3 |
---|---|---|---|
NameNode | √ | Δ | |
DataNode | √ | √ | √ |
JournalNode | √ | √ | √ |
ResourceManager | √ | ||
NodeManager | √ | √ | √ |
Zookeeper | √ | √ | √ |
ZKFC | √ | √ | √ |
HMaster | √ | Δ | |
HRegionServer | √ | √ | √ |
Metastore | √ | ||
Hive | √ | √ | √ |
二、MySQL环境搭建
因为Hive需要一些RDB的特性来支持,所以我们选择MySQL8作为支撑。
官网下载的xz压缩包需要用xz -d FILENAME
来解压,然后再使用tar -xzvf
首先创建MySQL用户
groupadd mysql
useradd -r -g mysql mysql
mysqld --initialize
初始化之后,去/etc/init.d/mysql
修改路径
basedir=/home/user/mysql-8.0.16/
datadir=/home/user/mysql_data/
然后就可以通过以下命令启动MySQL
service mysql start
这里我估计启动不起来,因为初始化的时候没有设置用户,很多文件夹没有给权限,按照提示把需要的权限都给mysql用户。
如果客户端连不上,找到你的mysql.sock
所在位置,做一个软链接到tmp文件夹下面。
进入MySQL改好密码后创建Hive用户
create database hive;
CREATE USER 'hive' IDENTIFIED BY 'hive';
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
flush privileges;
这里MySQL装得比较混乱,其实是我故意初始化时候少设置参数来看看有什么坑。
MySQL8太强了,好好感受一下吧!
三、Hive的配置
解压缩Hive到我们的用户目录文件夹下面。
3.1、hive-env.sh
首先创建一个配置文件
cp hive-env.sh.template hive-env.sh
进而编辑这个文件
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/home/user/hadoop2
# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export JAVA_HOME=/usr
export HADOOP_HOME=$HADOOP_HOME
export HIVE_HOME=/home/user/apache-hive-2.3.5-bin
export HIVE_CONF_DIR=${HIVE_HOME}/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
export HIVE_AUX_JARS_PATH=${HIVE_HOME}/lib
3.2、hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/user/hive_cache</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/user/hive_cache</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
在制定目录建立hive_cache文件夹。
这边mysql的端口是要配置成3306,虽然netstat显示了33060。这个问题要以后研究一下,@todo...
3.3、下载JAR包
你可以根据你的系统版本,去官网找你需要的JAR包
https://dev.mysql.com/downloads/connector/j/
[root@hadoop1 user]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
下载好之后如果是RPM的话需要解压缩取出JAR
rpm2archive mysql-connector-java-8.0.16-1.el7.noarch.rpm
tar -zxvf mysql-connector-java-8.0.16-1.el7.noarch.rpm.tgz -C mysql-connector-java-8.0.16-1.el7/
把解压缩的JAR包放到你上面配置的lib文件夹下面
mv mysql-connector-java.jar apache-hive-2.3.5-bin/lib/
四、部署从服务器
图省事,直接文件夹穿过去,有点慢
scp -r apache-hive-2.3.5-bin root@hadoop2:/home/user/
scp -r apache-hive-2.3.5-bin root@hadoop3:/home/user/
slave端配置, 修改 conf/hive-site.xml 文件
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop1:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
五、Hive的初始化
[root@hadoop1 user]# apache-hive-2.3.5-bin/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/user/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
然后我们在主服务器启动Metastore
./apache-hive-2.3.5-bin/bin/hive --service metastore &
这时候可以看到RunJar进程已经在了
[root@hadoop1 user]# jps
2464 NodeManager
16417 HQuorumPeer
6033 Jps
32066 NameNode
32180 DataNode
18470 HMaster
5926 RunJar
18589 HRegionServer
32383 JournalNode
我们在hadoop2打开hive,用sql创建一个库一个表,在hadoop3也可以看到。Hive集群搭建完毕
网友评论