美文网首页我爱编程
macOS 下体验 Hadoop

macOS 下体验 Hadoop

作者: _已注销 | 来源:发表于2018-05-23 20:18 被阅读212次

    原文地址: https://crowall.com/topic/84

    PS: 首先确保你已正常安装了 brew 和 JDK。

    1. 安装 Hadoop

    brew install hadoop
    

    2. 配置

    export HADOOP_HOME=/usr/local/Cellar/hadoop/3.0.0/
    

    默认情况下,你安装的 hadoop 配置目录在 /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/ (注意版本号差别,可以先直接看下这个目录下你安装的版本 /usr/local/Cellar/hadoop/)。

    MBP:~ tony$ ll /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
    total 304
    drwxr-xr-x  30 tony  admin   960B  5  3 17:43 .
    drwxr-xr-x   3 tony  admin    96B 12  9 03:17 ..
    -rw-r--r--   1 tony  admin   7.7K 12  9 03:30 capacity-scheduler.xml
    -rw-r--r--   1 tony  admin   1.3K 12  9 03:32 configuration.xsl
    -rw-r--r--   1 tony  admin   1.2K 12  9 03:30 container-executor.cfg
    -rw-r--r--   1 tony  admin   774B 12  9 03:17 core-site.xml
    -rw-r--r--   1 tony  admin    16K 12  9 03:42 hadoop-env.sh
    -rw-r--r--   1 tony  admin   3.2K 12  9 03:17 hadoop-metrics2.properties
    -rw-r--r--   1 tony  admin    10K 12  9 03:17 hadoop-policy.xml
    -rw-r--r--   1 tony  admin   3.3K 12  9 03:17 hadoop-user-functions.sh.example
    -rw-r--r--   1 tony  admin   775B 12  9 03:19 hdfs-site.xml
    -rw-r--r--   1 tony  admin   1.4K 12  9 03:19 httpfs-env.sh
    -rw-r--r--   1 tony  admin   1.6K 12  9 03:19 httpfs-log4j.properties
    -rw-r--r--   1 tony  admin    21B 12  9 03:19 httpfs-signature.secret
    -rw-r--r--   1 tony  admin   620B 12  9 03:19 httpfs-site.xml
    -rw-r--r--   1 tony  admin   3.4K 12  9 03:17 kms-acls.xml
    -rw-r--r--   1 tony  admin   1.3K 12  9 03:17 kms-env.sh
    -rw-r--r--   1 tony  admin   1.7K 12  9 03:17 kms-log4j.properties
    -rw-r--r--   1 tony  admin   682B 12  9 03:17 kms-site.xml
    -rw-r--r--   1 tony  admin    13K 12  9 03:17 log4j.properties
    -rw-r--r--   1 tony  admin   1.7K 12  9 03:32 mapred-env.sh
    -rw-r--r--   1 tony  admin   4.0K 12  9 03:32 mapred-queues.xml.template
    -rw-r--r--   1 tony  admin   758B 12  9 03:32 mapred-site.xml
    drwxr-xr-x   3 tony  admin    96B 12  9 03:17 shellprofile.d
    -rw-r--r--   1 tony  admin   2.3K 12  9 03:17 ssl-client.xml.example
    -rw-r--r--   1 tony  admin   2.6K 12  9 03:17 ssl-server.xml.example
    -rw-r--r--   1 tony  admin   2.6K 12  9 03:19 user_ec_policies.xml.template
    -rw-r--r--   1 tony  admin    10B 12  9 03:17 workers
    -rw-r--r--   1 tony  admin   5.3K 12  9 03:30 yarn-env.sh
    -rw-r--r--   1 tony  admin   690B 12  9 03:30 yarn-site.xml
    
    

    编辑 hadoop-env.sh 文件

    cd /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
    vim hadoop-env.sh
    
    # 查找 HADOOP_OPTS
    MBP:hadoop tony$ cat hadoop-env.sh |grep -n "export HADOOP_OPTS"
    90:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
    92:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
    106:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
    107:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
    108:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
    
    # 第 92 行取消注释,并加上一行 JAVA_HOME (注意不要直接 Copy 我的路径)
    
    export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
    export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home"
    

    配置 HDFS 的访问地址及存储路径

    # 配置 hadoop TMP 目录路径 (此处请随意,自己建个地址也行)
    
    mkdir -p /tmp/hadoop/hdfs/tmp
    chmod -R 777 /tmp/hadoop/hdfs/tmp
    
    # 修改 core-site.xml 文件
    vim core-site.xml
    
    # 加上配置的属性
    
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/tmp/hadoop/hdfs/tmp</value>
            <description>A base for other temporary directories.</description>
        </property>
    
        <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:8020</value>
        </property>
    </configuration>
    

    设置 MapReduce 的访问地址

    vim mapred-site.xml
    
    <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>localhost:8021</value>
        </property>
    </configuration>
    

    设置备份机制

    我们本地运行是伪分布式,不需要默认的备份 3份了,改为 1份即可。

    vim hdfs-site.xml
    
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    

    格式化

    hdfs namenode -format
    

    执行结果

    MBP:hadoop tony$ hdfs namenode -format
    WARNING: /usr/local/Cellar/hadoop/3.0.0/libexec/logs does not exist. Creating.
    2018-05-18 14:13:14,899 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = MBP.local/{我的IP...}
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 3.0.0
    
    ..... 此处省略一大段
    
    2018-05-18 14:13:16,119 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1297562978-{我的IP...}-1526623996110
    2018-05-18 14:13:16,137 INFO common.Storage: Storage directory /tmp/hadoop/hdfs/tmp/dfs/name has been successfully formatted.
    2018-05-18 14:13:16,177 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    2018-05-18 14:13:16,312 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds.
    2018-05-18 14:13:16,329 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    2018-05-18 14:13:16,335 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at MBP.local/{我的IP...}
    ************************************************************/
    

    终于完成了配置,准备开始跑了。

    3. 运行

    启动 HDFS

    hadoop 的可执行程序在 /usr/local/Cellar/hadoop/3.0.0/sbin/ 目录下(注意版本号)。

    cd /usr/local/Cellar/hadoop/3.0.0/sbin/
    
    ./start-dfs.sh  //启动 HDFS
    ./stop-dfs.sh   //停止 HDFS
    
    
    MBP:sbin tony$ ./start-dfs.sh
    Starting namenodes on [localhost]
    Starting datanodes
    Starting secondary namenodes [MBP.local]
    2018-05-18 14:38:31,125 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    
    # 可以通过 jps 命令查看进程
    MBP:sbin tony$ jps
    16816 Jps
    56752 
    90759 NameNode
    69335 Launcher
    91002 SecondaryNameNode
    98799 
    90863 DataNode
    

    启动 HDFS 常见问题

    问题1: localhost: ssh: connect to host localhost port 22: Connection refused 这个问题,参照下面的解决方法:

    解决办法;

    设置所有用户允许远程登录:

    系统偏好设置 -> 共享 -> 远程登录 -> 允许访问 => 所有用户
    
    image

    然后配置 SSH

    ssh-keygen -t rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    

    最后重新启动即可。

    问题2: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    开启日志重新运行 ./start-dfs.sh,查看日志:

    MBP:sbin tony$ ./start-dfs.sh 
    Starting namenodes on [localhost]
    localhost: namenode is running as process 80560.  Stop it first.
    Starting datanodes
    localhost: datanode is running as process 80661.  Stop it first.
    Starting secondary namenodes [MBP.local]
    MBP.local: secondarynamenode is running as process 80796.  Stop it first.
    2018-05-18 14:58:45,871 DEBUG util.Shell: setsid is not available on this machine. So not using it.
    2018-05-18 14:58:45,872 DEBUG util.Shell: setsid exited with exit code 0
    2018-05-18 14:58:46,065 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
    2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
    2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
    2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since startup], valueName=Time)
    2018-05-18 14:58:46,077 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since last successful login], valueName=Time)
    2018-05-18 14:58:46,078 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
    2018-05-18 14:58:46,108 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
    2018-05-18 14:58:46,138 DEBUG security.Groups:  Creating new Groups object
    2018-05-18 14:58:46,140 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
    2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
    2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: java.library.path=/Users/tn-ma-l30000122/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
    2018-05-18 14:58:46,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2018-05-18 14:58:46,143 DEBUG util.PerformanceAdvisory: Falling back to shell based
    2018-05-18 14:58:46,145 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
    2018-05-18 14:58:46,256 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
    2018-05-18 14:58:46,260 DEBUG security.UserGroupInformation: hadoop login
    2018-05-18 14:58:46,261 DEBUG security.UserGroupInformation: hadoop login commit
    2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: tony
    2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: tony" with name tony
    2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: User entry: "tony"
    2018-05-18 14:58:46,265 DEBUG security.UserGroupInformation: UGI loginUser:tony (auth:SIMPLE)
    2018-05-18 14:58:46,266 DEBUG security.UserGroupInformation: PrivilegedAction as:tony (auth:SIMPLE) from:org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:315)
    

    看了网上的解决办法,都是 hadoop 2.x 的,我这里是 3.0,所以不行,Google 了一下,发现 StackOverFlow 上发现了这个帖子

    hadoop WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

    人家的回答是直接改日志级别。。。
    然后就可以只显示错误,不显示警告 😂

    # 修改日志文件 etc/hadoop/log4j.properties
    vim /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/log4j.properties
    
    # 加一行
    log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR 
    

    4. 体验

    命令行操作

    创建目录

    MBP:sbin tony$ hadoop fs -ls /
    MBP:sbin tony$ hadoop fs -mkdir /demo
    MBP:sbin tony$ hadoop fs -ls /
    Found 1 items
    drwxr-xr-x   - tony supergroup          0 2018-05-22 11:06 /demo
    

    用 hdfs 命令创建目录

    cd /usr/local/Cellar/hadoop/3.0.0/
    bin/hdfs dfs -mkdir /user
    bin/hdfs dfs -mkdir /user/tony
    

    复制一些文件进去

    cd /usr/local/Cellar/hadoop/3.0.0/
    bin/hdfs dfs -mkdir input
    bin/hdfs dfs -put libexec/etc/hadoop/*.xml input
    

    执行例子

    bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'
    

    查看验证

    bin/hdfs dfs -get output output
    cat output/*
    
    或者
    
    bin/hdfs dfs -cat output/*
    

    Web 可视化界面

    打开 http://localhost:9870/ 发现此时已经能访问了。如图:

    image

    可以看到集群详细的信息:

    image

    可以看到用命令行创建的目录:


    image

    复制进去的文件(每个文件占用一个 block,128M...):


    image

    参考链接

    相关文章

      网友评论

        本文标题:macOS 下体验 Hadoop

        本文链接:https://www.haomeiwen.com/subject/lqsdjftx.html