官方地址
环境准备
1. 下载安装包
https://mirrors.cnnic.cn/apache/hadoop/common/stable2/
~/opt/hadoop-3.2.1.tar.gz
2. 解压缩
tar xzf hadoop-3.2.1.tar.gz
mv hadoop-3.2.1/* ~/opt/hadoop/
cd ~/opt/hadoop/
ls -l
-rw-r--r--@ 1 lifei staff 150569 9 10 22:35 LICENSE.txt
-rw-r--r--@ 1 lifei staff 22125 9 10 22:35 NOTICE.txt
-rw-r--r--@ 1 lifei staff 1361 9 10 22:35 README.txt
drwxr-xr-x@ 13 lifei staff 416 9 11 00:51 bin
drwxr-xr-x@ 3 lifei staff 96 9 10 23:58 etc
drwxr-xr-x@ 7 lifei staff 224 9 11 00:51 include
drwxr-xr-x@ 3 lifei staff 96 9 11 00:51 lib
drwxr-xr-x@ 14 lifei staff 448 9 11 00:51 libexec
drwxr-xr-x@ 29 lifei staff 928 9 10 23:58 sbin
drwxr-xr-x@ 4 lifei staff 128 9 11 01:11 share
3. Hadoop操作模式
- 本地/独立模式:下载Hadoop在系统中,默认情况下之后,它会被配置在一个独立的模式,用于运行Java程序。
- 模拟分布式模式:这是在单台机器的分布式模拟。Hadoop守护每个进程,如 hdfs, yarn, MapReduce 等,都将作为一个独立的java程序运行。这种模式对开发非常有用。
- 完全分布式模式:这种模式是完全分布式的最小两台或多台计算机的集群。
在本地模式下安装Hadoop
有单个JVM运行任何守护进程一切都运行。独立模式适合于开发期间运行MapReduce程序,因为它很容易进行测试和调试。
设置Hadoop
ls ~/.bashrc
ls: /Users/lifei/.bashrc: No such file or directory
vi ~/.bashrc
export HADOOP_HOME=/Users/lifei/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
保存 ~/.bashrc 执行 wq
source ~/.bashrc
执行: hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
where CLASSNAME is a user-provided Java class
OPTIONS is none or any of:
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
buildpaths attempt to add class files from build tree
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
daemonlog get/set the log level for each daemon
Client Commands:
archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath prints the java.library.path
kdiag Diagnose Kerberos Problems
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
s3guard manage metadata on S3
trace view and modify Hadoop tracing settings
version print the version
Daemon Commands:
kms run KMS, the Key Management Server
SUBCOMMAND may print help when invoked w/o parameters or with -h.
部署成功。
运行Hadoop
运行Hadoop 界的 Hello World !!!,统计字符的数量。
准备工作
mkdir ~/hadoop_workspace
mkdir ~/hadoop_workspace/input
echo 'Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。' > ~/hadoop_workspace/input/hello.txt
准备工作完成后如下:
(base) lifeideMacBook-Pro:input lifei$ pwd
/Users/lifei/hadoop_workspace/input
(base) lifeideMacBook-Pro:input lifei$ ls
hello.txt
(base) lifeideMacBook-Pro:input lifei$ cat hello.txt
Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。Lightbatis 增强 MyBatis 版Java 数据库持久层,更简洁列易用。
开始运行
cd ~/hadoop_workspace
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
开始运行中,控制台输出如下:
(base) lifeideMacBook-Pro:hadoop_workspace lifei$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
2019-12-12 16:20:18,928 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-12-12 16:20:19,110 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2019-12-12 16:20:24,165 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2019-12-12 16:20:24,165 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2019-12-12 16:20:29,421 INFO input.FileInputFormat: Total input files to process : 1
2019-12-12 16:20:29,470 INFO mapreduce.JobSubmitter: number of splits:1
2019-12-12 16:20:29,573 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1284066301_0001
2019-12-12 16:20:29,573 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-12-12 16:20:29,675 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2019-12-12 16:20:29,675 INFO mapreduce.Job: Running job: job_local1284066301_0001
2019-12-12 16:20:29,676 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2019-12-12 16:20:29,681 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-12-12 16:20:29,681 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-12 16:20:29,682 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-12-12 16:20:29,714 INFO mapred.LocalJobRunner: Waiting for map tasks
2019-12-12 16:20:29,715 INFO mapred.LocalJobRunner: Starting task: attempt_local1284066301_0001_m_000000_0
2019-12-12 16:20:29,731 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-12-12 16:20:29,731 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-12 16:20:29,738 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2019-12-12 16:20:29,738 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
2019-12-12 16:20:29,741 INFO mapred.MapTask: Processing split: file:/Users/lifei/hadoop_workspace/input/hello.txt:0+153
2019-12-12 16:20:29,798 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2019-12-12 16:20:29,798 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2019-12-12 16:20:29,798 INFO mapred.MapTask: soft limit at 83886080
2019-12-12 16:20:29,798 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2019-12-12 16:20:29,798 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2019-12-12 16:20:29,801 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-12-12 16:20:29,806 INFO mapred.LocalJobRunner:
2019-12-12 16:20:29,806 INFO mapred.MapTask: Starting flush of map output
2019-12-12 16:20:29,806 INFO mapred.MapTask: Spilling map output
2019-12-12 16:20:29,806 INFO mapred.MapTask: bufstart = 0; bufend = 189; bufvoid = 104857600
2019-12-12 16:20:29,806 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214364(104857456); length = 33/6553600
2019-12-12 16:20:29,826 INFO mapred.MapTask: Finished spill 0
2019-12-12 16:20:29,841 INFO mapred.Task: Task:attempt_local1284066301_0001_m_000000_0 is done. And is in the process of committing
2019-12-12 16:20:29,843 INFO mapred.LocalJobRunner: map
2019-12-12 16:20:29,843 INFO mapred.Task: Task 'attempt_local1284066301_0001_m_000000_0' done.
2019-12-12 16:20:29,848 INFO mapred.Task: Final Counters for attempt_local1284066301_0001_m_000000_0: Counters: 18
File System Counters
FILE: Number of bytes read=316857
FILE: Number of bytes written=840334
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=9
Map output bytes=189
Map output materialized bytes=172
Input split bytes=115
Combine input records=9
Combine output records=6
Spilled Records=6
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=257425408
File Input Format Counters
Bytes Read=153
2019-12-12 16:20:29,849 INFO mapred.LocalJobRunner: Finishing task: attempt_local1284066301_0001_m_000000_0
2019-12-12 16:20:29,849 INFO mapred.LocalJobRunner: map task executor complete.
2019-12-12 16:20:29,852 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2019-12-12 16:20:29,852 INFO mapred.LocalJobRunner: Starting task: attempt_local1284066301_0001_r_000000_0
2019-12-12 16:20:29,860 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-12-12 16:20:29,860 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-12 16:20:29,860 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2019-12-12 16:20:29,860 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
2019-12-12 16:20:29,864 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@67d50939
2019-12-12 16:20:29,866 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2019-12-12 16:20:29,885 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=2672505600, maxSingleShuffleLimit=668126400, mergeThreshold=1763853824, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-12-12 16:20:29,888 INFO reduce.EventFetcher: attempt_local1284066301_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-12-12 16:20:29,923 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1284066301_0001_m_000000_0 decomp: 168 len: 172 to MEMORY
2019-12-12 16:20:29,930 INFO reduce.InMemoryMapOutput: Read 168 bytes from map-output for attempt_local1284066301_0001_m_000000_0
2019-12-12 16:20:29,932 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 168, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->168
2019-12-12 16:20:29,933 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2019-12-12 16:20:29,934 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-12-12 16:20:29,934 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-12-12 16:20:29,948 INFO mapred.Merger: Merging 1 sorted segments
2019-12-12 16:20:29,949 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 155 bytes
2019-12-12 16:20:29,957 INFO reduce.MergeManagerImpl: Merged 1 segments, 168 bytes to disk to satisfy reduce memory limit
2019-12-12 16:20:29,958 INFO reduce.MergeManagerImpl: Merging 1 files, 172 bytes from disk
2019-12-12 16:20:29,958 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2019-12-12 16:20:29,958 INFO mapred.Merger: Merging 1 sorted segments
2019-12-12 16:20:29,958 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 155 bytes
2019-12-12 16:20:29,959 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-12-12 16:20:29,979 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-12-12 16:20:29,981 INFO mapred.Task: Task:attempt_local1284066301_0001_r_000000_0 is done. And is in the process of committing
2019-12-12 16:20:29,982 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-12-12 16:20:29,982 INFO mapred.Task: Task attempt_local1284066301_0001_r_000000_0 is allowed to commit now
2019-12-12 16:20:29,983 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1284066301_0001_r_000000_0' to file:/Users/lifei/hadoop_workspace/output
2019-12-12 16:20:29,984 INFO mapred.LocalJobRunner: reduce > reduce
2019-12-12 16:20:29,984 INFO mapred.Task: Task 'attempt_local1284066301_0001_r_000000_0' done.
2019-12-12 16:20:29,984 INFO mapred.Task: Final Counters for attempt_local1284066301_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=317233
FILE: Number of bytes written=840660
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=6
Reduce shuffle bytes=172
Reduce input records=6
Reduce output records=6
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=257425408
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=154
2019-12-12 16:20:29,984 INFO mapred.LocalJobRunner: Finishing task: attempt_local1284066301_0001_r_000000_0
2019-12-12 16:20:29,985 INFO mapred.LocalJobRunner: reduce task executor complete.
2019-12-12 16:20:30,682 INFO mapreduce.Job: Job job_local1284066301_0001 running in uber mode : false
2019-12-12 16:20:30,683 INFO mapreduce.Job: map 100% reduce 100%
2019-12-12 16:20:30,684 INFO mapreduce.Job: Job job_local1284066301_0001 completed successfully
2019-12-12 16:20:30,692 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=634090
FILE: Number of bytes written=1680994
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=9
Map output bytes=189
Map output materialized bytes=172
Input split bytes=115
Combine input records=9
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=172
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=514850816
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=153
File Output Format Counters
Bytes Written=154
查看执行结果:
(base) lifeideMacBook-Pro:hadoop_workspace lifei$ ls -l output/
total 8
-rw-r--r-- 1 lifei staff 0 12 12 16:20 _SUCCESS
-rw-r--r-- 1 lifei staff 142 12 12 16:20 part-r-00000
查看 part-r-0000 的结果
(base) lifeideMacBook-Pro:hadoop_workspace lifei$ cat output/part-r-00000
Lightbatis 1
MyBatis 2
增强 2
数据库持久层,更简洁列易用。 1
数据库持久层,更简洁列易用。Lightbatis 1
版Java 2
Hadoop 单机版安装完成。
网友评论