1. Hadoop安装
下载hadoop压缩包,修改~/.bash_profile
export HADOOP_HOME=~/hadoop/hadoop-3.1.4
export PATH=$PATH:$HADOOP_HOME/bin
创建输入文件
$ cd $HADOOP_HOME/share/hadoop/mapreduce/
$ mkdir in
$ echo "hello world" > in/a.txt
$ echo "hello world bla bla" > in/b.txt
测试word count
$ hadoop jar hadoop-mapreduce-examples-3.1.4.jar wordcount in out
查看结果
$ hdfs dfs -cat out/*

配置ssh公钥登录
修改etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/opt/openjdk/
export HADOOP_HOME=<HOME_DIR>/hadoop/hadoop-3.1.4
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
修改etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
修改etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the filesystem
$ hdfs namenode -format
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
访问http://localhost:9870,可以看到如下页面

YARN配置
修改etc/hadoop/mapred-site.xml
:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
修改etc/hadoop/yarn-site.xml
:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
启动ResourceManager daemon和NodeManager daemon:
$ sbin/start-yarn.sh
查看resourceManager,http://localhost:8088

2. Hive
下载解压,如果官方下载慢可以用清华镜像
$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
$ tar -zxvf apache-hive-3.1.2-bin.tar.gz
$ cd ~/hive/apache-hive-3.1.2-bin
设置HIVE_HOME,添加hive bin路径,修改~/.bash_profile
$ export HIVE_HOME=$HOME/hive/apache-hive-3.1.2-bin
$ export PATH=$PATH:$HIVE_HOME/bin
在hdfs创建文件
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir -p /user/hive/warehouse
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
进入hive CLI,执行简单sql
$ hive
hive > show tables;
hive > CREATE TABLE pokes (foo INT, bar STRING);
Running HiveServer2 and Beeline
Starting from Hive 2.1, we need to run the schematool command below as an initialization step. For example, we can use "derby" as db type.
$ $HIVE_HOME/bin/schematool -dbType <db type> -initSchema
HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline. HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other capabilities of HiveServer2. To run HiveServer2 and Beeline from shell:
$ $HIVE_HOME/bin/hiveserver2
$ $HIVE_HOME/bin/beeline -u jdbc:hive2://$HS2_HOST:$HS2_PORT
依赖
java安装
$ brew install java
$ echo 'export PATH="/usr/local/opt/openjdk/bin:$PATH"' >> ~/.zshrc
$ source ~/.zshrc
hadoop只支持java8/11,所以最好安装java 8/11,装了java16启动yarn时会有报错
hadoop java versions: https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions

异常
运行hive异常: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
网友评论