1、大数据时代背景
移动互联、社交网络、电子商务导致各种数据迅速膨胀并变大。
1 PB = 1,024 TB = 1,048,576 GB =1,125,899,906,842,624 Bytes
1 EB = 1,024 PB = 1,048,576 TB = 1,152,921,504,606,846,976 Bytes
1 ZB = 1,024 EB = 1,180,591,620,717,411,303,424 Bytes
1 YB = 1,024 ZB = 1,208,925,819,614,629,174,706,176 Bytes
data:image/s3,"s3://crabby-images/e5c9f/e5c9ffaa24f3202729acbc89549c33710e87fc62" alt=""
数据能为我们带来什么?------商业价值
海量数据如何处理?------Hadoop
data:image/s3,"s3://crabby-images/a3968/a3968224706307ccb25d986a0b40af564fd4629c" alt=""
MapReduce------并行计算框架
BigTable------join耗费资源,列式存储
创建者:Doug Cutting 棕黄色的大象
data:image/s3,"s3://crabby-images/52cba/52cbaddc94abb2f2d756eae0f614927d6002c9a3" alt=""
Hadoop安装:
3台虚拟机
集群模式
data:image/s3,"s3://crabby-images/06af4/06af434db0cd7ebf7e00385e0f2b3f819c95d196" alt=""
vim /etc/sysconfig/network 修改主机名:HOSTNAME=h101
重启虚拟机 init 6
vim /etc/sysconfig/network-scripts/ifcfg-eth0 修改虚拟机的主机IP
data:image/s3,"s3://crabby-images/3f86d/3f86d378a0dd36be67873e44cb76ea7715096c03" alt=""
service network restart 网卡生效
vim /etc/hosts 修改hosts做映射
输入:192.168.16.101 h101
192.168.16.102 h102
192.168.16.103 h103
配置完后 :ssh h102 直接跳到h102
创建Hadoop用户:useradd hadoop
password hadoop
三台虚拟机安装JDK
tar -zxvf jdk.XX.tar.gz -C /usr/
配置环境变量 vim /etc/profile
data:image/s3,"s3://crabby-images/d877c/d877cbfebbd66f6f67c86d8d818cd9479e8bfcc7" alt=""
让环境变量生效 source /etc/profile 或者init 6
验证环境变量生效 /usr/jdk1.7/bin/java -version
1、上传并解压Hadoop安装包
2、在conf文件夹下的hadoop-env.sh 最后一行添加
export JAVA_HOME=/usr/jdk1.7.0_25/
3、修改core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://h101:9000</value>
</property>
4、修改hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
5、修改mapred-site.
<property>
<name>mapred.job.tracker</name>
<value>h101:9001</value>
</property>
6、修改masters
localhost修改为主节点主机名:h101
7、修改slaves
localhost修改为从节点主机名:h102
h103
8、拷贝到其他两个节点
scp -r /home/hadoop/hadoop-0.20.2-cdh3u5 h103:/home/hadoop/
9、授权:所有节点上的hadoop-0.20.2-cdh3u5目录的所有者改为Hadoop用户
chown hadoop.hadoop /usr/local/hadoop-0.20.2-cdh3u5/ -R
10、免密钥操作配置SSH服务:Hadoop 用户下
ssh-keygen -t rsa
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h101
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h102
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h103
11、namenode格式化:
cd /home/hadoop/hadoop-0.20.2-cdh3u5
bin/hadoop namenode -format
12、验证
bin/start-all.sh 开启服务
jps查看进程:
分布式文件系统HDFS
hadoop2.0默认大小是128M。
data:image/s3,"s3://crabby-images/4d865/4d86598356a9f9e28df7c3787d511c66aee10c85" alt=""
data:image/s3,"s3://crabby-images/4823d/4823d45e2a7d3cec5e95d7afaf0ced4961e3e03a" alt=""
data:image/s3,"s3://crabby-images/a5c1d/a5c1da508c599cf347d2428a19658b66d4e1f61e" alt=""
data:image/s3,"s3://crabby-images/0d2b0/0d2b0486e71c03fe1a990ae348b4e713fd31b8bd" alt=""
data:image/s3,"s3://crabby-images/ea3d9/ea3d9f7c178cc4921f4883a1ef88393dbc4782b9" alt=""
data:image/s3,"s3://crabby-images/3226d/3226dc548e917a9045b283f4244e9d0e676e2dc9" alt=""
data:image/s3,"s3://crabby-images/147c7/147c78504ee7bf47d9a53e62723e8ee649d5dd6f" alt=""
data:image/s3,"s3://crabby-images/618e3/618e34d9bb39665f926bcc4cae478c97f83aff2c" alt=""
data:image/s3,"s3://crabby-images/e3aae/e3aaea123d66fdcbd082e3b64b590067778bdf2b" alt=""
Hadoop基本命令
$hadoop_home/bin/hadoop fs <args>
hadoop fs -ls/lsr uri
实际例子:./hadoop fs -ls hdfs://h101:9000/
在主节点上可以简写成 ./hadoop fs -ls /
cat 显示一个或多个文件内容到控制台
put 将本地一个文件或多个文件导入HDFS
get 下载
data:image/s3,"s3://crabby-images/0cb0b/0cb0bfdf3c2e0acc4eec2cbbac9e7601cbd73378" alt=""
打开安全模式:./hadoop dfsadmin -safemode enter
关闭安全模式:./hadoop dfsadmin -safemode leave
安全模式下不能删除文件
删除文件:./hadoop fs -rmr /a.txt
./hadoop dfsadmin -report
MapReduce介绍
data:image/s3,"s3://crabby-images/8b13b/8b13b314867721613207bb958b2161d5348ab74d" alt=""
data:image/s3,"s3://crabby-images/e973e/e973e56638d60d500652e557b0fc0b794ce007fd" alt=""
data:image/s3,"s3://crabby-images/5ee44/5ee44a6dea5dd7a973775bfc1a7b4b767c18748f" alt=""
网友评论