-
- Hadoop三种模式便捷切换以及让命令行提示符显式完整路径
(1)Hadoop三种模式便捷切换
首先执行如下命令:
[yylin@big etc]$ cp -r hadoop local [yylin@big etc]$ cp -r hadoop pseudo [yylin@big etc]$ cp -r hadoop full [yylin@big etc]$ rm -rf hadoop [yylin@big etc]$ ln -s pseudo hadoop
- Hadoop三种模式便捷切换以及让命令行提示符显式完整路径
上面几条命令的结果是,将原有的hadoop文件夹复制三份,分别命名为local、pseudo和full,代表本地模式、伪分布式模式以及完全分布式模式,然后删除原有的hadoop文件夹,再创建hadoop符号链接指向pseudo文件夹。
这样做的好处是,如果想在这三种模式中切换的话,只需要修改hadoop符号链接指向不同的文件夹就好了。
(2)让命令行提示符显式完整路径
(a)编辑/etc目录下的profile文件,添加环境变量PS1
export PS1='[\u@\h `pwd`]\$'
(b)生效,执行命令
source /etc/profile
- 配置Hadoop伪分布式模式
(1)进入${HADOOP_HOME}/etc/hadoop目录,在该目录下有好多文件,主要需要配置的是四个文件,分别是core-site.xml、hdfs-site.xml、mapred-site.xml和yarn-site.xml。
![](https://img.haomeiwen.com/i15108298/bd34ae87a6a25fc1.png)
从图中看到,没有mapred-site.xml,有个mapred-site.xml.template,复制并重命名为mapred-site.xml即可。
(2)编辑core-site.xml:
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://s135/</value>
</property>
</configuration>
(3)编辑hdfs-site.xml,可以看到副本数为3:
<?xml version="1.0"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<!-- 配置hadoop.tmp.dir目录 -->
<name>hadoop.tmp.dir</name>
<value>/home/yylin/hadoop</value>
</property>
</configuration>
(4)编辑mapred-site.xml:
注意:cp mapred-site.xml.template mapred-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
(5)编辑yarn-site.xml:
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>s135</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 配置SSH
(1)检查是否安装了ssh相关软件包(openssh-server + openssh-clients + openssh)
[yylin@big hadoop]$ yum list installed | grep ssh
![](https://img.haomeiwen.com/i15108298/72d2dda241584a36.png)
(2)检查是否启动了sshd进程
[yylin@big hadoop]$ ps -Af | grep sshd
![](https://img.haomeiwen.com/i15108298/b3c4d50f7ab94f73.png)
(3)在client侧生成公私秘钥对。
[yylin@big ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
![](https://img.haomeiwen.com/i15108298/680b37d97a122282.png)
(4)生成~/.ssh文件夹,里面有id_rsa(私钥) + id_rsa.pub(公钥)
![](https://img.haomeiwen.com/i15108298/984f5579f7349a84.png)
(5)追加公钥到~/.ssh/authorized_keys文件中(文件名、位置固定)
[yylin@big ~]$ cd ~/.ssh
[yylin@big .ssh]$ cat id_rsa.pub >> authorized_keys
![](https://img.haomeiwen.com/i15108298/d251817e0b891f38.png)
(6)修改authorized_keys的权限为644.
[yylin@big .ssh]$ chmod 644 authorized_keys
![](https://img.haomeiwen.com/i15108298/f5d2eeb48ce436a6.png)
(7)测试
[yylin@big .ssh]$ ssh localhost
![](https://img.haomeiwen.com/i15108298/7a2de01e6b3a7791.png)
![](https://img.haomeiwen.com/i15108298/e41e5f8db9255bcc.png)
(8)格式化名称节点
运行命令:hdfs namenode -format
到此为止,完全分布模式配置成功。
(9)启动Hadoop服务
执行命令:
start-dfs.sh
启动namenodes、datanodes、secondary namenodes
start-yarn.sh
启动resourcemanager、nodemanagers
(10)浏览器访问namenode节点
http://s135:9870
可以看到hadoop的信息
![](https://img.haomeiwen.com/i15108298/d6d24af8437191c0.png)
- 运行WordCount入门程序
(1)WordCount程序介绍
WordCount是hadoop学习的helloword级别的练习程序,可以统计英文单词的数目,在hadoop中有此示例程序,代码如下:
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
将代码拷入工程中。
接下来用Maven构建工程,用idea导出jar包。
idea的pom.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>nwpu</groupId>
<artifactId>WordCount</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<!--注意,此处必须是main()方法对应类的完整路径 -->
<mainClass>org.apache.hadoop.examples.WordCountAPP</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>assembly</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
然后导出jar包
![](https://img.haomeiwen.com/i15108298/9c8dda6ceb0d4f96.png)
依次点击idea右侧栏的maven->WordCount->Lifecycle->(clean compile package install),即可生成jar包,目录如下:
![](https://img.haomeiwen.com/i15108298/34dbe6924561f855.png)
jar包导入服务器上,hello.txt文件put到hdfs文件系统上:
hdfs dfs -put input /user/yylin/
运行命令
hadoop jar WordCount-1.0-SNAPSHOT.jar /user/yylin/input /user/yylin/output
![](https://img.haomeiwen.com/i15108298/d40c16dbb8cd20c8.png)
![](https://img.haomeiwen.com/i15108298/b5da3afcfb2f240a.png)
![](https://img.haomeiwen.com/i15108298/f087230b866e3c07.png)
成功了。
网友评论