简介
Big data:
结构化数据:约束
半结构化数据
非结构化数据:没有元数据
日志数据是非结构化数据
搜索引擎:搜索组件、索引组件
爬虫、蜘蛛程序--爬取数据是非结构化、半结构化
分词器
存储
分析处理
google的三篇论文:
GFS 2003年 The Google File System
MapReduce 2004年 Simplified Data Processing On Large Cluster
BigTable 2006年 A Distributed Storage System for Structure Data
山寨版:
HDFS
MapReduce
HBase
HDFS + MapReduce = Hadoop
Nutch爬取数据-->loosen:数据越大,处理速度越慢;想解决方案--google的论文发布
MapReduce是批处理程序,速度和性能差
NAS,SAN 共享存储
存储系统只有一个,io压力过大,不适用; 集中式传统的解决方案
分布式存储
有中心节点 有元数据存储 GFS/HDFS
无中心节点
NN:NameNode
SNN:Secondary 第二节点,避免NN down掉后,重读数据文件,耗时过长
数据持久化--事务日志-->image--磁盘镜像,保证元数据不丢失;
Hadoop2.0后使用zookeeper高可用,元数据存放在共享存储NFS上。
DN:DataNode
数据副本,保证数据完整;
heartbeat
数据块列表:
数据为中心,存在哪些节点上;
节点为中心,有哪些数据块;
Job Tracker TaskTracker
数据在哪--程序就在哪
namenode 和 jobe tracker 在一起,容易造成系统瓶颈
datanote 和 tasktracker 在一起
函数式编程
把一个函数当成另一个函数的参数
Lisp,ML函数式编程语言:高级函数:
map, fold
map:
map(f())
map:接受一个函数为参数,并将其应用于列表中的所有元素,从而生产一个结果列表;
fold:
接受两个参数:函数,初始值
fold(g(), init)
mapreduce:
mapper, reducer
shuffle and sort 整理传输和排序
k-v 数据
同一个键只能发往同一个reducer
可能会mapreduce多次
mapper-->combiner-->partitioner--reduccer
mapper 和 reducer 输入、输出的键不同
combiner 输入、输出的键相同
MRv1(hadoop1)-->MRv2(hadoop2)
MRv1:Cluster resource manager, Data processing
MRv2:
YARN:Cluster resource manager
MRv2:Data processing
MR:batch 批处理
Tez:execution engine
RM resource manager
NM node ...
AM apply ...
container mr
如下图1
hadoop生态系统 如下图2
sqoop
从其他关系型数据库中抽取数据导入到Hadoop中;
将Hadoop中的数据抽取出来,结构化后,导入到关系型数据库中;
Flume
日志收集存储到Hadoop中
hive
pig
HBase 列式存储
数据序列化:把非流式化数据转化为流式化数据,而且还可以还原回来。
storm 数据统计和分析
Hadoop Distribution:
Cloudera:CDH
Hortonworks:HDP
商业版
Intel:IDH
MapR
单机模型:测试用,程序是否可以应用到Hadoop里
伪分布式模型:运行于单机
分布式模型:集群模型
Hadoop:基于Java语言
1
2
3
4
Hadoop 伪分布式
centos 7 1804
NAT 192.168.25.14
仅主机 192.168.50.14
禁用防火墙
禁用selinux
yum -y install wget vim lrzsz net-tools ntpdate
yum -y install epel-release-latest-7.noarch.rpm
cat /etc/hosts
192.168.25.11 node1.fgq.com node1
192.168.25.12 node2.fgq.com node2
192.168.25.13 node3.fgq.com node3
192.168.25.14 node4.fgq.com node4
192.168.25.15 node5.fgq.com node5
crontab -e
*/5 * * * * ntpdate time3.aliyun.com && hwclock -w
[root@node4 ~]# mkdir -p /fgq/base-env/
[root@node4 ~]# cd /fgq/base-env/
下载jdk包 jdk-8u152-linux-x64.tar.gz
下载Hadoop包 hadoop-2.9.2.tar.gz
传到这个目录下,并解压
[root@node4 base-env]# tar zxf jdk-8u152-linux-x64.tar.gz
[root@node4 base-env]# tar zxf hadoop-2.9.2.tar.gz
[root@node4 base-env]# ln -s jdk1.8.0_152 jdk
[root@node4 base-env]# ln -s hadoop-2.9.2 hadoop
[root@node4 ~]# vim /etc/profile
最下面添加如下信息:
export JAVA_HOME=/fgq/base-env/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
[root@node4 ~]# source /etc/profile
[root@node4 ~]# java -version
java version "1.8.0_152"
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
[root@node4 ~]# vim /etc/profile.d/hadoop.sh
export HADOOP_HOME=/fgq/base-env/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_MAPPERD_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
[root@node4 ~]# source /etc/profile.d/hadoop.sh
[root@node4 ~]# cd /fgq/base-env/hadoop
[root@node4 hadoop]# groupadd hadoop
[root@node4 hadoop]# useradd -g hadoop yarn
[root@node4 hadoop]# useradd -g hadoop hdfs
[root@node4 hadoop]# useradd -g hadoop mapred
[root@node4 hadoop]# mkdir -p /fgq/data/hadoop/hdfs/{nn,snn,dn}
[root@node4 hadoop]# chown -R hdfs:hadoop /fgq/data/hadoop/hdfs
[root@node4 hadoop]# ll /fgq/data/hadoop/hdfs
[root@node4 hadoop]# mkdir logs
[root@node4 hadoop]# chmod g+w logs 确保logs用户组有写权限
[root@node4 hadoop]# chown -R yarn:hadoop logs
[root@node4 hadoop]# chown -R yarn:hadoop ./*
[root@node4 hadoop]# ll
[root@node4 hadoop]# cd etc/hadoop/
[root@node4 hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node4:8020</value>
<final>true</final>
</property>
</configuration>
[root@node4 hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///fgq/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///fgq/data/hadoop/hdfs/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///fgq/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///fgq/data/hadoop/hdfs/snn</value>
</property>
</configuration>
[root@node4 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@node4 hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@node4 hadoop]# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>node4:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node4:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node4:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node4:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node4:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
</configuration>
[root@node4 hadoop]# vim slaves
node4
[root@node4 hadoop]# su - hdfs
## 格式化
[hdfs@node4 ~]$ hdfs namenode -format
19/03/02 10:45:03 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = node4.fgq.com/192.168.25.14
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.2
... ...
19/03/02 10:45:05 INFO common.Storage: Storage directory /fgq/data/hadoop/hdfs/nn has been successfully formatted.
19/03/02 10:45:05 INFO namenode.FSImageFormatProtobuf: Saving image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
19/03/02 10:45:05 INFO namenode.FSImageFormatProtobuf: Image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
19/03/02 10:45:05 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/03/02 10:45:05 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node4.fgq.com/192.168.25.14
************************************************************/
显示有 successfully 字样,表示OK
[hdfs@node4 ~]$ ls /fgq/data/hadoop/hdfs/nn/current/
fsimage_0000000000000000000 fsimage_0000000000000000000.md5 seen_txid VERSION
## 启动namenode
[hdfs@node4 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-namenode-node4.fgq.com.out
[hdfs@node4 ~]$ less /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-namenode-node4.fgq.com.log
[hdfs@node4 ~]$ jps #java的ps命令查看进程
1769 NameNode
1851 Jps
[hdfs@node4 ~]$ jps -h
illegal argument: -h
usage: jps [-help]
jps [-q] [-mlvV] [<hostid>]
Definitions:
<hostid>: <hostname>[:<port>]
[hdfs@node4 ~]$ jps -v
1879 Jps -Denv.class.path=.:/fgq/base-env/jdk/lib/dt.jar:/fgq/base-env/jdk/lib/tools.jar -Dapplication.home=/fgq/base-env/jdk1.8.0_152 -Xms8m
1769 NameNode -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/fgq/base-env/hadoop-2.9.2/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/fgq/base-env/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=/fgq/base-env/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/fgq/base-env/hadoop-2.9.2/logs -Dhadoop.log.file=hadoop-hdfs-namenode-node4.fgq.com.log -Dhadoop.home.dir=/fgq/base-env/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/fgq/base-env/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS
## 启动secondarynamenode
[hdfs@node4 ~]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-secondarynamenode-node4.fgq.com.out
[hdfs@node4 ~]$ jps
1990 Jps
1769 NameNode
1945 SecondaryNameNode
## 启动datanode
[hdfs@node4 ~]$ hadoop-daemon.sh start datanode
starting datanode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-datanode-node4.fgq.com.out
名称节点一般不作为数据节点,但此处是伪分布式
[hdfs@node4 ~]$ jps
1769 NameNode
1945 SecondaryNameNode
2073 DataNode
2155 Jps
[hdfs@node4 ~]$ hdfs dfs -ls / #根路径下没有目录,创建一个目录
[hdfs@node4 ~]$ hdfs dfs -mkdir /test
[hdfs@node4 ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hdfs supergroup 0 2019-03-02 11:29 /test
注意属主和属组
注意:如果需要其他用户对hdfs有写入权限,需要在hdfs-site.xml文件中添加一项属性定义:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
##上传文件
[hdfs@node4 ~]$ hdfs dfs -put /etc/fstab /test/fstab
[hdfs@node4 ~]$ hdfs dfs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - hdfs supergroup 0 2019-03-02 11:37 /test
-rw-r--r-- 1 hdfs supergroup 501 2019-03-02 11:37 /test/fstab
/test/fatab这个文件是在远程的hdfs上的
本地文件系统位置:
[root@node4 ~]# vim /fgq/data/hadoop/hdfs/dn/current/BP-1435152656-192.168.25.14-1551494705143/current/finalized/subdir0/subdir0/blk_1073741825
文件过大,分片时,本地文件系统路径也可以查看访问,但是可能会存放于不同目录下
dfs访问接口查看:
[hdfs@node4 ~]$ hdfs dfs -cat /test/fstab
#
# /etc/fstab
# Created by anaconda on Thu Feb 28 17:13:02 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=aebd58c2-fdc1-44ad-b33a-9b6efdf9488c / xfs defaults 0 0
UUID=2fe4fb6b-aab1-42f7-b024-45be2e7065f5 /boot xfs defaults 0 0
UUID=79f97775-e6bb-494d-827a-1f5aa3423c6d swap swap defaults 0 0
[hdfs@node4 ~]$ exit
logout
## 切换至yarn用户,启动yarn服务
[root@node4 hadoop]# su - yarn
[yarn@node4 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-resourcemanager-node4.fgq.com.out
[yarn@node4 ~]$ jps
3376 Jps
3141 ResourceManager
[yarn@node4 ~]$ yarn-daemon.sh start nodemanager
starting nodemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-nodemanager-node4.fgq.com.out
[yarn@node4 ~]$ jps
3141 ResourceManager
3525 Jps
3417 NodeManager
Web UI接口浏览
HDFS 和 YARN ResourceManager 各自提供了一个web接口
通过这些接口可以查看HDFS 集群以及YARN集群的相关状态信息
HDFS-NameNode http://192.168.25.14:50070 如下图1
YARN-ResourceManager http://192.168.25.14:8088 如下图2
注意:yarn-site.xml文件中 yarn.resourcemanager.webapp.address 属性的值如果定义为"localhost:8088",则其WebUI仅监听于127.0.0.1地址上的8088端口。
1
2
2
Hadoop运行程序
[root@node4 ~]# cd /fgq/base-env/hadoop/share/hadoop/mapreduce/
[root@node4 mapreduce]# ls
Hadoop-YARN自带了许多样例程序,其中的 hadoop-mapreduce-examples-2.9.2.jar 可用作 mapreduce程序,供测试用
注意:要切换用户至hdfs
[root@node4 ~]# su - hdfs
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar
需要指定参数--项目名称,如下:
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
19/03/02 15:49:46 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
19/03/02 15:49:47 INFO input.FileInputFormat: Total input files to process : 1
19/03/02 15:49:48 INFO mapreduce.JobSubmitter: number of splits:1
19/03/02 15:49:48 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/02 15:49:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551512925744_0001
19/03/02 15:49:49 INFO impl.YarnClientImpl: Submitted application application_1551512925744_0001
19/03/02 15:49:49 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1551512925744_0001/
19/03/02 15:49:49 INFO mapreduce.Job: Running job: job_1551512925744_0001
19/03/02 15:49:57 INFO mapreduce.Job: Job job_1551512925744_0001 running in uber mode : false
19/03/02 15:49:57 INFO mapreduce.Job: map 0% reduce 0%
19/03/02 15:50:02 INFO mapreduce.Job: map 100% reduce 0%
19/03/02 15:50:06 INFO mapreduce.Job: map 100% reduce 100%
19/03/02 15:50:07 INFO mapreduce.Job: Job job_1551512925744_0001 completed successfully
19/03/02 15:50:07 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=591
FILE: Number of bytes written=397951
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=594
HDFS: Number of bytes written=433
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2693
Total time spent by all reduces in occupied slots (ms)=2021
Total time spent by all map tasks (ms)=2693
Total time spent by all reduce tasks (ms)=2021
Total vcore-milliseconds taken by all map tasks=2693
Total vcore-milliseconds taken by all reduce tasks=2021
Total megabyte-milliseconds taken by all map tasks=2757632
Total megabyte-milliseconds taken by all reduce tasks=2069504
Map-Reduce Framework
Map input records=11
Map output records=54
Map output bytes=625
Map output materialized bytes=591
Input split bytes=93
Combine input records=54
Combine output records=38
Reduce input groups=38
Reduce shuffle bytes=591
Reduce input records=38
Reduce output records=38
Spilled Records=76
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=193
CPU time spent (ms)=1250
Physical memory (bytes) snapshot=462479360
Virtual memory (bytes) snapshot=4232617984
Total committed heap usage (bytes)=292028416
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=501
File Output Format Counters
Bytes Written=433
[hdfs@node4 ~]$ hdfs dfs -lsr /test
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 1 hdfs supergroup 501 2019-03-02 11:37 /test/fstab
drwxr-xr-x - hdfs supergroup 0 2019-03-02 15:50 /test/fstab_out
-rw-r--r-- 1 hdfs supergroup 0 2019-03-02 15:50 /test/fstab_out/_SUCCESS
-rw-r--r-- 1 hdfs supergroup 433 2019-03-02 15:50 /test/fstab_out/part-r-00000
生成一个 /etc/fstab_out 文件目录,目录下生成两个文件 _SUCCESS 和 part-r-00000 ,表示OK
## 查看统计字母的次数
[hdfs@node4 ~]$ hdfs dfs -cat /test/fstab_out/part-r-00000
# 7
'/dev/disk' 1
/ 1
/boot 1
/etc/fstab 1
0 6
17:13:02 1
2019 1
28 1
Accessible 1
Created 1
Feb 1
See 1
Thu 1
UUID=2fe4fb6b-aab1-42f7-b024-45be2e7065f5 1
UUID=79f97775-e6bb-494d-827a-1f5aa3423c6d 1
UUID=aebd58c2-fdc1-44ad-b33a-9b6efdf9488c 1
anaconda 1
and/or 1
are 1
blkid(8) 1
by 2
defaults 3
filesystems, 1
findfs(8), 1
for 1
fstab(5), 1
info 1
maintained 1
man 1
more 1
mount(8) 1
on 1
pages 1
reference, 1
swap 2
under 1
xfs 2
报错
运行jar程序时,报错
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
19/03/02 15:16:11 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
19/03/02 15:16:13 INFO ipc.Client: Retrying connect to server: node4/192.168.25.14:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
报错:一直尝试链接,原因是 yarn 服务的 resourcemanager 没有启动
##### 解决
[root@node4 ~]# su - yarn
[yarn@node4 ~]$ jps
5440 Jps
3417 NodeManager
[yarn@node4 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-resourcemanager-node4.fgq.com.out
[yarn@node4 ~]$ jps
5478 ResourceManager
3417 NodeManager
5711 Jps
## 再次启动mapreduce程序,仍然报错
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
19/03/02 15:29:10 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
19/03/02 15:29:11 INFO input.FileInputFormat: Total input files to process : 1
19/03/02 15:29:11 INFO mapreduce.JobSubmitter: number of splits:1
19/03/02 15:29:11 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/02 15:29:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551511594876_0001
19/03/02 15:29:12 INFO impl.YarnClientImpl: Submitted application application_1551511594876_0001
19/03/02 15:29:12 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1551511594876_0001/
19/03/02 15:29:12 INFO mapreduce.Job: Running job: job_1551511594876_0001
19/03/02 15:29:17 INFO mapreduce.Job: Job job_1551511594876_0001 running in uber mode : false
19/03/02 15:29:17 INFO mapreduce.Job: map 0% reduce 0%
19/03/02 15:29:17 INFO mapreduce.Job: Job job_1551511594876_0001 failed with state FAILED due to: Application application_1551511594876_0001 failed 2 times due to AM Container for appattempt_1551511594876_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2019-03-02 15:29:17.428]Exception from container-launch.
Container id: container_1551511594876_0001_02_000001
Exit code: 1
[2019-03-02 15:29:17.429]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ed300000, 35651584, 0) failed; error='Cannot allocate memory' (errno=12)
[2019-03-02 15:29:17.429]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ed300000, 35651584, 0) failed; error='Cannot allocate memory' (errno=12)
For more detailed output, check the application tracking page: http://node4:8088/cluster/app/application_1551511594876_0001 Then click on links to logs of each attempt.
. Failing the application.
19/03/02 15:29:17 INFO mapreduce.Job: Counters: 0
显示内存不足,关机,增大内存到3G,再开机启动服务
启动
su - hdfs
hdfs namenode -format 第一次需要执行
hadoop-daemon.sh start namenode
hadoop-daemon.sh start secondarynamenode
hadoop-daemon.sh start datanode
jps
1769 NameNode
1945 SecondaryNameNode
2073 DataNode
2155 Jps
hdfs dfs -mkdir /test 第一次需要执行
hdfs dfs -put /etc/fstab /test/fstab 第一次需要执行
hdfs dfs -lsr / 第一次需要执行
su - yarn
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
jps
3141 ResourceManager
3525 Jps
3417 NodeManager
su - hdfs
yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
hdfs dfs -lsr /test
hdfs dfs -cat /test/fstab_out/part-r-00000
网友评论