美文网首页Hadoop在简书HadoopAwesome Docker
通过Docker快速搭建Hadoop测试环境

通过Docker快速搭建Hadoop测试环境

作者: JeryZen | 来源:发表于2017-04-06 16:35 被阅读1148次

    搭过Hadoop的人都知道,Hadoop的搭建过程非常的繁琐,需要配置大量的环境,修改大量的配置文件,因此搭建一个可用的测试环境非常浪费时间。好在Docker的出现,就是帮助我们解决这类问题,有了Docker我们可以快速搭建一个可用的Hadoop集群供测试使用。

    本文使用Github上的一个Dockerfile来实现,做了一些细微的修改来增强国内使用的体验。Github地址

    直接clone github的repository,进入repository目录:

    以下内容摘自README.md

    Apache Hadoop 2.7.1 Docker image

    DockerPullsDockerPulls
    DockerStarsDockerStars

    Note: this is the master branch - for a particular Hadoop version always check the related branch

    A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry.

    Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.

    FYI: All the former Hadoop releases (2.3, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2, 2.6.0) are available in the GitHub branches or our Docker Registry - check the tags.

    适合国内使用的修改

    这个版本修改Dockerfile时区为中国区。考虑到中国网络下载下列文件会非常的慢,所以把所有文件全部改为自行提供,而不是通过curl的方式调用,因此需要提供几个文件在当前目录下:

    可以分别另寻渠道自行下载

    添加docker-compose.yml文件,添加logs映射,快速启动

    Build the image

    If you'd like to try directly from the Dockerfile you can build the image as:

    docker build  -t sequenceiq/hadoop-docker:2.7.1 .
    

    Pull the image

    The image is also released as an official Docker image from Docker's automated build repository - you can always pull or refer the image when launching containers.

    docker pull sequenceiq/hadoop-docker:2.7.1
    

    通过docker-compose启动

    docker-compose up -d
    

    测试环境可用

    使用

    docker exec -it 容器名称 bash 
    

    进入容器终端

    执行下面的命令:

    cd $HADOOP_PREFIX
    # run the mapreduce
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'
    
    # check the output
    bin/hdfs dfs -cat output/*
    

    Hadoop native libraries, build, Bintray, etc

    The Hadoop build process is no easy task - requires lots of libraries and their right version, protobuf, etc and takes some time - we have simplified all these, made the build and released a 64b version of Hadoop nativelibs on this Bintray repo. Enjoy.

    Automate everything

    As we have mentioned previousely, a Docker file was created and released in the official Docker repository

    结尾

    最后提供几个Hadoop的常用web url:

    相关文章

      网友评论

        本文标题:通过Docker快速搭建Hadoop测试环境

        本文链接:https://www.haomeiwen.com/subject/uueqattx.html