原文地址: https://crowall.com/topic/84
PS: 首先确保你已正常安装了 brew
和 JDK。
1. 安装 Hadoop
brew install hadoop
2. 配置
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.0.0/
默认情况下,你安装的 hadoop 配置目录在 /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
(注意版本号差别,可以先直接看下这个目录下你安装的版本 /usr/local/Cellar/hadoop/
)。
MBP:~ tony$ ll /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
total 304
drwxr-xr-x 30 tony admin 960B 5 3 17:43 .
drwxr-xr-x 3 tony admin 96B 12 9 03:17 ..
-rw-r--r-- 1 tony admin 7.7K 12 9 03:30 capacity-scheduler.xml
-rw-r--r-- 1 tony admin 1.3K 12 9 03:32 configuration.xsl
-rw-r--r-- 1 tony admin 1.2K 12 9 03:30 container-executor.cfg
-rw-r--r-- 1 tony admin 774B 12 9 03:17 core-site.xml
-rw-r--r-- 1 tony admin 16K 12 9 03:42 hadoop-env.sh
-rw-r--r-- 1 tony admin 3.2K 12 9 03:17 hadoop-metrics2.properties
-rw-r--r-- 1 tony admin 10K 12 9 03:17 hadoop-policy.xml
-rw-r--r-- 1 tony admin 3.3K 12 9 03:17 hadoop-user-functions.sh.example
-rw-r--r-- 1 tony admin 775B 12 9 03:19 hdfs-site.xml
-rw-r--r-- 1 tony admin 1.4K 12 9 03:19 httpfs-env.sh
-rw-r--r-- 1 tony admin 1.6K 12 9 03:19 httpfs-log4j.properties
-rw-r--r-- 1 tony admin 21B 12 9 03:19 httpfs-signature.secret
-rw-r--r-- 1 tony admin 620B 12 9 03:19 httpfs-site.xml
-rw-r--r-- 1 tony admin 3.4K 12 9 03:17 kms-acls.xml
-rw-r--r-- 1 tony admin 1.3K 12 9 03:17 kms-env.sh
-rw-r--r-- 1 tony admin 1.7K 12 9 03:17 kms-log4j.properties
-rw-r--r-- 1 tony admin 682B 12 9 03:17 kms-site.xml
-rw-r--r-- 1 tony admin 13K 12 9 03:17 log4j.properties
-rw-r--r-- 1 tony admin 1.7K 12 9 03:32 mapred-env.sh
-rw-r--r-- 1 tony admin 4.0K 12 9 03:32 mapred-queues.xml.template
-rw-r--r-- 1 tony admin 758B 12 9 03:32 mapred-site.xml
drwxr-xr-x 3 tony admin 96B 12 9 03:17 shellprofile.d
-rw-r--r-- 1 tony admin 2.3K 12 9 03:17 ssl-client.xml.example
-rw-r--r-- 1 tony admin 2.6K 12 9 03:17 ssl-server.xml.example
-rw-r--r-- 1 tony admin 2.6K 12 9 03:19 user_ec_policies.xml.template
-rw-r--r-- 1 tony admin 10B 12 9 03:17 workers
-rw-r--r-- 1 tony admin 5.3K 12 9 03:30 yarn-env.sh
-rw-r--r-- 1 tony admin 690B 12 9 03:30 yarn-site.xml
编辑 hadoop-env.sh
文件
cd /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
vim hadoop-env.sh
# 查找 HADOOP_OPTS
MBP:hadoop tony$ cat hadoop-env.sh |grep -n "export HADOOP_OPTS"
90:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
92:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
106: export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
107: export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
108: export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
# 第 92 行取消注释,并加上一行 JAVA_HOME (注意不要直接 Copy 我的路径)
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home"
配置 HDFS 的访问地址及存储路径
# 配置 hadoop TMP 目录路径 (此处请随意,自己建个地址也行)
mkdir -p /tmp/hadoop/hdfs/tmp
chmod -R 777 /tmp/hadoop/hdfs/tmp
# 修改 core-site.xml 文件
vim core-site.xml
# 加上配置的属性
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
设置 MapReduce 的访问地址
vim mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
设置备份机制
我们本地运行是伪分布式,不需要默认的备份 3份了,改为 1份即可。
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
格式化
hdfs namenode -format
执行结果
MBP:hadoop tony$ hdfs namenode -format
WARNING: /usr/local/Cellar/hadoop/3.0.0/libexec/logs does not exist. Creating.
2018-05-18 14:13:14,899 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = MBP.local/{我的IP...}
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.0.0
..... 此处省略一大段
2018-05-18 14:13:16,119 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1297562978-{我的IP...}-1526623996110
2018-05-18 14:13:16,137 INFO common.Storage: Storage directory /tmp/hadoop/hdfs/tmp/dfs/name has been successfully formatted.
2018-05-18 14:13:16,177 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-05-18 14:13:16,312 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds.
2018-05-18 14:13:16,329 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-05-18 14:13:16,335 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at MBP.local/{我的IP...}
************************************************************/
终于完成了配置,准备开始跑了。
3. 运行
启动 HDFS
hadoop 的可执行程序在 /usr/local/Cellar/hadoop/3.0.0/sbin/
目录下(注意版本号)。
cd /usr/local/Cellar/hadoop/3.0.0/sbin/
./start-dfs.sh //启动 HDFS
./stop-dfs.sh //停止 HDFS
MBP:sbin tony$ ./start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [MBP.local]
2018-05-18 14:38:31,125 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
# 可以通过 jps 命令查看进程
MBP:sbin tony$ jps
16816 Jps
56752
90759 NameNode
69335 Launcher
91002 SecondaryNameNode
98799
90863 DataNode
启动 HDFS 常见问题
问题1: localhost: ssh: connect to host localhost port 22: Connection refused
这个问题,参照下面的解决方法:
解决办法;
设置所有用户允许远程登录:
系统偏好设置 -> 共享 -> 远程登录 -> 允许访问 => 所有用户
image
然后配置 SSH
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
最后重新启动即可。
问题2: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
开启日志重新运行 ./start-dfs.sh
,查看日志:
MBP:sbin tony$ ./start-dfs.sh
Starting namenodes on [localhost]
localhost: namenode is running as process 80560. Stop it first.
Starting datanodes
localhost: datanode is running as process 80661. Stop it first.
Starting secondary namenodes [MBP.local]
MBP.local: secondarynamenode is running as process 80796. Stop it first.
2018-05-18 14:58:45,871 DEBUG util.Shell: setsid is not available on this machine. So not using it.
2018-05-18 14:58:45,872 DEBUG util.Shell: setsid exited with exit code 0
2018-05-18 14:58:46,065 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since startup], valueName=Time)
2018-05-18 14:58:46,077 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since last successful login], valueName=Time)
2018-05-18 14:58:46,078 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
2018-05-18 14:58:46,108 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
2018-05-18 14:58:46,138 DEBUG security.Groups: Creating new Groups object
2018-05-18 14:58:46,140 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: java.library.path=/Users/tn-ma-l30000122/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2018-05-18 14:58:46,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-05-18 14:58:46,143 DEBUG util.PerformanceAdvisory: Falling back to shell based
2018-05-18 14:58:46,145 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
2018-05-18 14:58:46,256 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
2018-05-18 14:58:46,260 DEBUG security.UserGroupInformation: hadoop login
2018-05-18 14:58:46,261 DEBUG security.UserGroupInformation: hadoop login commit
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: tony
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: tony" with name tony
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: User entry: "tony"
2018-05-18 14:58:46,265 DEBUG security.UserGroupInformation: UGI loginUser:tony (auth:SIMPLE)
2018-05-18 14:58:46,266 DEBUG security.UserGroupInformation: PrivilegedAction as:tony (auth:SIMPLE) from:org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:315)
看了网上的解决办法,都是 hadoop 2.x 的,我这里是 3.0,所以不行,Google 了一下,发现 StackOverFlow 上发现了这个帖子
人家的回答是直接改日志级别。。。
然后就可以只显示错误,不显示警告 😂
# 修改日志文件 etc/hadoop/log4j.properties
vim /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/log4j.properties
# 加一行
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
4. 体验
命令行操作
创建目录
MBP:sbin tony$ hadoop fs -ls /
MBP:sbin tony$ hadoop fs -mkdir /demo
MBP:sbin tony$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - tony supergroup 0 2018-05-22 11:06 /demo
用 hdfs 命令创建目录
cd /usr/local/Cellar/hadoop/3.0.0/
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/tony
复制一些文件进去
cd /usr/local/Cellar/hadoop/3.0.0/
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put libexec/etc/hadoop/*.xml input
执行例子
bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'
查看验证
bin/hdfs dfs -get output output
cat output/*
或者
bin/hdfs dfs -cat output/*
Web 可视化界面
打开 http://localhost:9870/ 发现此时已经能访问了。如图:
image可以看到集群详细的信息:
image可以看到用命令行创建的目录:
image
复制进去的文件(每个文件占用一个 block,128M...):
image
网友评论