Mac 上安装 Hadoop

作者: 李飞_fd28 | 来源:发表于2019-12-12 16:25 被阅读0次

官方地址

https://hadoop.apache.org/

环境准备

1. 下载安装包

https://mirrors.cnnic.cn/apache/hadoop/common/stable2/

~/opt/hadoop-3.2.1.tar.gz

2. 解压缩

tar xzf hadoop-3.2.1.tar.gz
mv hadoop-3.2.1/*  ~/opt/hadoop/
cd ~/opt/hadoop/
ls -l
-rw-r--r--@  1 lifei  staff  150569  9 10 22:35 LICENSE.txt
-rw-r--r--@  1 lifei  staff   22125  9 10 22:35 NOTICE.txt
-rw-r--r--@  1 lifei  staff    1361  9 10 22:35 README.txt
drwxr-xr-x@ 13 lifei  staff     416  9 11 00:51 bin
drwxr-xr-x@  3 lifei  staff      96  9 10 23:58 etc
drwxr-xr-x@  7 lifei  staff     224  9 11 00:51 include
drwxr-xr-x@  3 lifei  staff      96  9 11 00:51 lib
drwxr-xr-x@ 14 lifei  staff     448  9 11 00:51 libexec
drwxr-xr-x@ 29 lifei  staff     928  9 10 23:58 sbin
drwxr-xr-x@  4 lifei  staff     128  9 11 01:11 share

3. Hadoop操作模式

  1. 本地/独立模式:下载Hadoop在系统中,默认情况下之后,它会被配置在一个独立的模式,用于运行Java程序。
  2. 模拟分布式模式:这是在单台机器的分布式模拟。Hadoop守护每个进程,如 hdfs, yarn, MapReduce 等,都将作为一个独立的java程序运行。这种模式对开发非常有用。
  3. 完全分布式模式:这种模式是完全分布式的最小两台或多台计算机的集群。

在本地模式下安装Hadoop

有单个JVM运行任何守护进程一切都运行。独立模式适合于开发期间运行MapReduce程序,因为它很容易进行测试和调试。

设置Hadoop

ls ~/.bashrc
ls: /Users/lifei/.bashrc: No such file or directory
vi ~/.bashrc
export HADOOP_HOME=/Users/lifei/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
保存 ~/.bashrc 执行 wq
source ~/.bashrc 
执行: hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
 or    hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
  where CLASSNAME is a user-provided Java class

  OPTIONS is none or any of:

--config dir                     Hadoop config directory
--debug                          turn on shell script debug mode
--help                           usage information
buildpaths                       attempt to add class files from build tree
hostnames list[,of,host,names]   hosts to use in slave mode
hosts filename                   list of hosts to use in slave mode
loglevel level                   set the log4j level for this command
workers                          turn on worker mode

  SUBCOMMAND is one of:


    Admin Commands:

daemonlog     get/set the log level for each daemon

    Client Commands:

archive       create a Hadoop archive
checknative   check native Hadoop and compression libraries availability
classpath     prints the class path needed to get the Hadoop jar and the required libraries
conftest      validate configuration XML files
credential    interact with credential providers
distch        distributed metadata changer
distcp        copy file or directories recursively
dtutil        operations related to delegation tokens
envvars       display computed Hadoop environment variables
fs            run a generic filesystem user client
gridmix       submit a mix of synthetic job, modeling a profiled from production load
jar <jar>     run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath       prints the java.library.path
kdiag         Diagnose Kerberos Problems
kerbname      show auth_to_local principal conversion
key           manage keys via the KeyProvider
rumenfolder   scale a rumen input trace
rumentrace    convert logs into a rumen trace
s3guard       manage metadata on S3
trace         view and modify Hadoop tracing settings
version       print the version

    Daemon Commands:

kms           run KMS, the Key Management Server

SUBCOMMAND may print help when invoked w/o parameters or with -h.

部署成功。

运行Hadoop

运行Hadoop 界的 Hello World !!!,统计字符的数量。
准备工作

mkdir ~/hadoop_workspace
mkdir ~/hadoop_workspace/input
echo 'Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。' > ~/hadoop_workspace/input/hello.txt

准备工作完成后如下:

(base) lifeideMacBook-Pro:input lifei$ pwd
/Users/lifei/hadoop_workspace/input
(base) lifeideMacBook-Pro:input lifei$ ls
hello.txt
(base) lifeideMacBook-Pro:input lifei$ cat hello.txt 
Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。

开始运行

cd ~/hadoop_workspace
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output

开始运行中,控制台输出如下:

(base) lifeideMacBook-Pro:hadoop_workspace lifei$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
2019-12-12 16:20:18,928 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-12-12 16:20:19,110 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2019-12-12 16:20:24,165 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2019-12-12 16:20:24,165 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2019-12-12 16:20:29,421 INFO input.FileInputFormat: Total input files to process : 1
2019-12-12 16:20:29,470 INFO mapreduce.JobSubmitter: number of splits:1
2019-12-12 16:20:29,573 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1284066301_0001
2019-12-12 16:20:29,573 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-12-12 16:20:29,675 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2019-12-12 16:20:29,675 INFO mapreduce.Job: Running job: job_local1284066301_0001
2019-12-12 16:20:29,676 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2019-12-12 16:20:29,681 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-12-12 16:20:29,681 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-12 16:20:29,682 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-12-12 16:20:29,714 INFO mapred.LocalJobRunner: Waiting for map tasks
2019-12-12 16:20:29,715 INFO mapred.LocalJobRunner: Starting task: attempt_local1284066301_0001_m_000000_0
2019-12-12 16:20:29,731 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-12-12 16:20:29,731 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-12 16:20:29,738 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2019-12-12 16:20:29,738 INFO mapred.Task:  Using ResourceCalculatorProcessTree : null
2019-12-12 16:20:29,741 INFO mapred.MapTask: Processing split: file:/Users/lifei/hadoop_workspace/input/hello.txt:0+153
2019-12-12 16:20:29,798 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2019-12-12 16:20:29,798 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2019-12-12 16:20:29,798 INFO mapred.MapTask: soft limit at 83886080
2019-12-12 16:20:29,798 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2019-12-12 16:20:29,798 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2019-12-12 16:20:29,801 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-12-12 16:20:29,806 INFO mapred.LocalJobRunner: 
2019-12-12 16:20:29,806 INFO mapred.MapTask: Starting flush of map output
2019-12-12 16:20:29,806 INFO mapred.MapTask: Spilling map output
2019-12-12 16:20:29,806 INFO mapred.MapTask: bufstart = 0; bufend = 189; bufvoid = 104857600
2019-12-12 16:20:29,806 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214364(104857456); length = 33/6553600
2019-12-12 16:20:29,826 INFO mapred.MapTask: Finished spill 0
2019-12-12 16:20:29,841 INFO mapred.Task: Task:attempt_local1284066301_0001_m_000000_0 is done. And is in the process of committing
2019-12-12 16:20:29,843 INFO mapred.LocalJobRunner: map
2019-12-12 16:20:29,843 INFO mapred.Task: Task 'attempt_local1284066301_0001_m_000000_0' done.
2019-12-12 16:20:29,848 INFO mapred.Task: Final Counters for attempt_local1284066301_0001_m_000000_0: Counters: 18
    File System Counters
        FILE: Number of bytes read=316857
        FILE: Number of bytes written=840334
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=1
        Map output records=9
        Map output bytes=189
        Map output materialized bytes=172
        Input split bytes=115
        Combine input records=9
        Combine output records=6
        Spilled Records=6
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=257425408
    File Input Format Counters 
        Bytes Read=153
2019-12-12 16:20:29,849 INFO mapred.LocalJobRunner: Finishing task: attempt_local1284066301_0001_m_000000_0
2019-12-12 16:20:29,849 INFO mapred.LocalJobRunner: map task executor complete.
2019-12-12 16:20:29,852 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2019-12-12 16:20:29,852 INFO mapred.LocalJobRunner: Starting task: attempt_local1284066301_0001_r_000000_0
2019-12-12 16:20:29,860 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-12-12 16:20:29,860 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-12 16:20:29,860 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2019-12-12 16:20:29,860 INFO mapred.Task:  Using ResourceCalculatorProcessTree : null
2019-12-12 16:20:29,864 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@67d50939
2019-12-12 16:20:29,866 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2019-12-12 16:20:29,885 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=2672505600, maxSingleShuffleLimit=668126400, mergeThreshold=1763853824, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-12-12 16:20:29,888 INFO reduce.EventFetcher: attempt_local1284066301_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-12-12 16:20:29,923 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1284066301_0001_m_000000_0 decomp: 168 len: 172 to MEMORY
2019-12-12 16:20:29,930 INFO reduce.InMemoryMapOutput: Read 168 bytes from map-output for attempt_local1284066301_0001_m_000000_0
2019-12-12 16:20:29,932 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 168, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->168
2019-12-12 16:20:29,933 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2019-12-12 16:20:29,934 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-12-12 16:20:29,934 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-12-12 16:20:29,948 INFO mapred.Merger: Merging 1 sorted segments
2019-12-12 16:20:29,949 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 155 bytes
2019-12-12 16:20:29,957 INFO reduce.MergeManagerImpl: Merged 1 segments, 168 bytes to disk to satisfy reduce memory limit
2019-12-12 16:20:29,958 INFO reduce.MergeManagerImpl: Merging 1 files, 172 bytes from disk
2019-12-12 16:20:29,958 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2019-12-12 16:20:29,958 INFO mapred.Merger: Merging 1 sorted segments
2019-12-12 16:20:29,958 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 155 bytes
2019-12-12 16:20:29,959 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-12-12 16:20:29,979 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-12-12 16:20:29,981 INFO mapred.Task: Task:attempt_local1284066301_0001_r_000000_0 is done. And is in the process of committing
2019-12-12 16:20:29,982 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-12-12 16:20:29,982 INFO mapred.Task: Task attempt_local1284066301_0001_r_000000_0 is allowed to commit now
2019-12-12 16:20:29,983 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1284066301_0001_r_000000_0' to file:/Users/lifei/hadoop_workspace/output
2019-12-12 16:20:29,984 INFO mapred.LocalJobRunner: reduce > reduce
2019-12-12 16:20:29,984 INFO mapred.Task: Task 'attempt_local1284066301_0001_r_000000_0' done.
2019-12-12 16:20:29,984 INFO mapred.Task: Final Counters for attempt_local1284066301_0001_r_000000_0: Counters: 24
    File System Counters
        FILE: Number of bytes read=317233
        FILE: Number of bytes written=840660
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Reduce input groups=6
        Reduce shuffle bytes=172
        Reduce input records=6
        Reduce output records=6
        Spilled Records=6
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=257425408
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Output Format Counters 
        Bytes Written=154
2019-12-12 16:20:29,984 INFO mapred.LocalJobRunner: Finishing task: attempt_local1284066301_0001_r_000000_0
2019-12-12 16:20:29,985 INFO mapred.LocalJobRunner: reduce task executor complete.
2019-12-12 16:20:30,682 INFO mapreduce.Job: Job job_local1284066301_0001 running in uber mode : false
2019-12-12 16:20:30,683 INFO mapreduce.Job:  map 100% reduce 100%
2019-12-12 16:20:30,684 INFO mapreduce.Job: Job job_local1284066301_0001 completed successfully
2019-12-12 16:20:30,692 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=634090
        FILE: Number of bytes written=1680994
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=1
        Map output records=9
        Map output bytes=189
        Map output materialized bytes=172
        Input split bytes=115
        Combine input records=9
        Combine output records=6
        Reduce input groups=6
        Reduce shuffle bytes=172
        Reduce input records=6
        Reduce output records=6
        Spilled Records=12
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=514850816
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=153
    File Output Format Counters 
        Bytes Written=154

查看执行结果:

(base) lifeideMacBook-Pro:hadoop_workspace lifei$ ls -l output/
total 8
-rw-r--r--  1 lifei  staff    0 12 12 16:20 _SUCCESS
-rw-r--r--  1 lifei  staff  142 12 12 16:20 part-r-00000

查看 part-r-0000 的结果

(base) lifeideMacBook-Pro:hadoop_workspace lifei$ cat output/part-r-00000 
Lightbatis  1
MyBatis 2
增强  2
数据库持久层,更简洁列易用。  1
数据库持久层,更简洁列易用。Lightbatis    1
版Java   2

Hadoop 单机版安装完成。

相关文章

网友评论

    本文标题:Mac 上安装 Hadoop

    本文链接:https://www.haomeiwen.com/subject/fkrtnctx.html