HDFS 作为hadoop 生态的一部分,部署HDFS即需要部署Hadoop。
这里由于节点数量有限,所以我们选择单节点(伪分布式部署)。
部署HDFS3.2.2
-
安装 jdk1.8.0_201
- 官网下载 jdk-8u201-linux-x64.tar.gz。将其放在 /usr/local/java,然后执行
tar -zxvf jdk-8u201-linux-x64.tar.gz
解压。 - 配置环境变量,编辑 /etc/profile 文件,在后面加入:
执行export JAVA_HOME=/usr/local/java/jdk1.8.0_201 export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile
使修改生效。 - 输入如下命令验证
# java # java -version # javac
- 官网下载 jdk-8u201-linux-x64.tar.gz。将其放在 /usr/local/java,然后执行
-
配置ssh 免密登录
如果已经生成过公钥,则执行:# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # chmod 0600 ~/.ssh/authorized_keys
没有的话,则先执行:
# ssh-keygen -t rsa
生成公钥。
-
安装hadoop
去 https://archive.apache.org/dist/hadoop/core/hadoop-3.2.2/hadoop-3.2.2.tar.gz 下载 hadoop,放置在 /usr/local/ 下面,使用tar -zxvf hadoop-3.2.2.tar.gz
解压。
编辑/etc/profile,加入:export HADOOP_HOME=/usr/local/hadoop-3.2.2 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
执行
source /etc/profile
使修改生效。
输入如下命令验证:# hadoop version
-
修改hdfs配置文件
因未使用到 MapReduce 模块,仅配置HDFS的配置文件即可。配置文件在/usr/local/hadoop-3.2.2/etc/hadoop目录下。- hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_201
- core-site.xml
<property> <!--DataNode将发送心跳到NameNode的端口--> <name>fs.defaultFS</name> <value>hdfs://192.168.48.141:9000</value> </property> <!--临时文件存储目录,没有则新建--> <property> <name>hadoop.tmp.dir</name> <value>/tmp</value> </property>
这里注意改成自己的ip。
- hdfs-site.xml
<!--单点,配置备份为1--> <property> <name>dfs.replication</name> <value>1</value> </property> <!--分块大小--> <property> <name>dfs.blocksize</name> <value>698351616</value> </property> <!--NameNode存储目录,没有则新建--> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop-3.2.2/hdfs/name</value> </property> <!--DateNode存储目录,没有则新建--> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop-3.2.2/hdfs/data</value> </property>
- /usr/local/hadoop-3.2.2/sbin/start-dfs.sh 和 /usr/local/hadoop-3.2.2/sbin/stop-dfs.sh 开头加入:
HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
- (这一步可不做)在 /usr/local/hadoop-3.2.2/sbin/start-yarn.sh 和 /usr/local/hadoop-3.2.2/sbin/stop-yarn.sh 开头加入:
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
-
启动
启动前先格式化HDFS# /usr/local/hadoop-3.2.2/bin/hdfs namenode -format
启动
# /usr/local/hadoop-3.2.2/sbin/start-dfs.sh
也可以按照如下方式启动单个进程:
# /usr/local/hadoop-3.2.2/sbin/hadoop-daemon.sh start namenode # /usr/local/hadoop-3.2.2/sbin/hadoop-daemon.sh start datanode # /usr/local/hadoop-3.2.2/sbin/hadoop-daemon.sh start secondarynamenode
停止
# /usr/local/hadoop-3.2.2/sbin/stop-dfs.sh
也可以按照如下方式停止单个进程:
# /usr/local/hadoop-3.2.2/sbin/hadoop-daemon.sh stop namenode # /usr/local/hadoop-3.2.2/sbin/hadoop-daemon.sh stop datanode # /usr/local/hadoop-3.2.2/sbin/hadoop-daemon.sh stop secondarynamenode
查看进程是否启动
image.png# jps
其他注意关闭防火墙,有问题可以查看日志:/usr/local/hadoop-3.2.2/logs
启动成功后,可通过web管理页面http://IP:9870/(比如我这里是:http://192.168.48.141:9870/)查看HDFS的使用情况,也可管理文件。管理页面默认无密码,进入页面的人员都可操作。
使用java-api 操作hdfs
- 首先需要配置 window 的 hadoop 环境,过程很多,可以参考:https://github.com/autopear/Intellij-Hadoop
- 建立工程,代码如下:
package edu.ucr.cs.merlin;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.log4j.BasicConfigurator;
import org.junit.Before;
import org.junit.Test;
public class HDFSDemo {
FileSystem fs = null;
@Before
public void init() throws Exception {
fs = FileSystem.get(new URI("hdfs://192.168.48.140:9000"), new Configuration(), "root");
}
@Test
public void testUpload() throws Exception {
InputStream in = new FileInputStream("/root/install.log");
OutputStream out = fs.create(new Path("/log123.log"));
IOUtils.copyBytes(in, out, 1024, true);
}
@Test
public void testMkdir() throws IllegalArgumentException, IOException {
boolean flag = fs.mkdirs(new Path("/a/aa"));
System.out.println(flag);
}
@Test
public void testDel() throws IllegalArgumentException, IOException {
boolean flag = fs.delete(new Path("/a"), true);
System.out.println(flag);
}
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.48.140:9000"), new Configuration(), "root");
// InputStream in = fs.open(new Path("/jdk"));
// OutputStream out = new FileOutputStream("/home/jdk1.7.tar.gz");
// IOUtils.copyBytes(in, out, 4096, true);
FSDataOutputStream fsout = fs.create(new Path("/test2.txt"));
fsout.write("hello world".getBytes());
}
}
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>wordcount</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<hadoop.version>3.2.2</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<scope>compile</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>apache</id>
<url>http://maven.apache.org</url>
</repository>
</repositories>
</project>
log4j.properties 如下:
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
hdfs 常用命令
显示当前目录结构
hadoop fs -ls <path>
# 递归显示当前目录结构
hadoop fs -ls -R <path>
创建目录
hadoop fs -mkdir <path>
# 递归创建目录
hadoop fs -mkdir -p <path>
删除操作
# 删除文件
hadoop fs -rm <path>
# 递归删除目录和文件
hadoop fs -rm -R <path>
本地加载文件到HDFS
# 二选一执行即可
hadoop fs -put [localsrc] [dst]
hadoop fs - copyFromLocal [localsrc] [dst]
参考资料
1、https://blog.csdn.net/y5201h/article/details/120024999
2、https://blog.csdn.net/S1124654/article/details/125522613
3、https://blog.csdn.net/S1124654/article/details/125522613
网友评论