美文网首页我爱编程
hadoop cluster config

hadoop cluster config

作者: xncode | 来源:发表于2017-08-17 10:03 被阅读0次

    有配置文件的默认值参考,可见Hadoop安装目录下的share/doc

    只读的默认配置:core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml

    特定的配置:etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml

    另外对于hadoop的脚本,可以使用etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh配置环境变量,至少需要JAVA_HOME,此外对于每个部件还可以通过不同的OPTS指定:

    NameNode    HADOOP_NAMENODE_OPTS
    DataNode    HADOOP_DATANODE_OPTS
    Secondary NameNode        HADOOP_SECONDARYNAMENODE_OPTS
    ResourceManager YARN_RESOURCEMANAGER_OPTS
    NodeManager YARN_NODEMANAGER_OPTS
    WebAppProxy YARN_PROXYSERVER_OPTS
    Map Reduce Job History Server         HADOOP_JOB_HISTORYSERVER_OPTS
    

    hadoop-env.sh

    HADOOP_PID_DIR
    HADOOP_LOG_DIR
    HADOOP_HEAPSIZE YARN_HEAPSIZE YARN_RESOURCEMANAGER_HEAPSIZE YARN_NODEMANAGER_HEAPSIZE YARN_PROXYSERVER_HEAPSIZE HADOOP_JOB_HISTORYSERVER_HEAPSIZE
    

    core-site.xml 配置通用属性

    fs.defaultFS    Namenode URI
    io.file.buffer.size 读写sequence file时的缓冲大小 131072
    

    hdfs-site.xml 配置HDFS属性

    dfs.namenode.name.dir   NameNode存储数据,可以设置用逗号分隔的一系列路径,这样可使得数据冗余
    dfs.hosts / dfs.hosts.exclude   指定datanode或排除datanode
    dfs.blocksize   HDFS的块大小 默认268435456(256MB)
    dfs.namenode.handler.count  NameNode处理DataNode的RPC请求的handler数量,默认为100,如果datanode很多的话需要相应增加
    
    dfs.datanode.data.dir   DataNode存储数据,可以设置用逗号分隔的一系列路径,这样会使得数据分散到这些路径上,这样可使得数据读取变快
    

    yarn-site.xml 配置YARN属性

    yarn.acl.enable 默认为false
    yarn.admin.acl  默认为* 任何人都可以
    yarn.log-aggregation-enable 是否聚集log,默认为false
    yarn.nodemanager.remote-app-log-dir 聚集的日志存储路径 是在hdfs中
    yarn.nodemanager.remote-app-log-dir-suffix  聚集的日志的前缀
    yarn.log-aggregation.retain-seconds 
    yarn.log-aggregation.retain-check-interval-seconds  
    
    yarn.resourcemanager.hostname   资源管理器的主机名
    yarn.resourcemanager.scheduler.class    调度器类型CapacityScheduler (recommended), FairScheduler (recommended), or FifoScheduler
    
    yarn.scheduler.minimum-allocation-mb    容器的最少内存配置
    yarn.scheduler.maximum-allocation-mb    容器的最大内存配置
    yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path   指定NodeManagers或排除NodeManagers
    
    yarn.nodemanager.resource.memory-mb 一个机器提供运行的物理内存
    yarn.nodemanager.vmem-pmem-ratio    虚拟内存-物理内存比例
    yarn.nodemanager.local-dirs 中间文件的读写路径
    yarn.nodemanager.log-dirs   日志文件路径
    yarn.nodemanager.log.retain-seconds 日志保存是否,默认10800(3小时)
    yarn.nodemanager.aux-services   mapreduce_shuffle
    

    mapred-site.xml 用于配置mapred属性

    mapreduce.framework.name    yarn    Execution framework set to Hadoop YARN.
    mapreduce.map.memory.mb 1536    Larger resource limit for maps.
    mapreduce.map.java.opts -Xmx1024M   Larger heap-size for child jvms of maps.
    mapreduce.reduce.memory.mb  3072    Larger resource limit for reduces.
    mapreduce.reduce.java.opts  -Xmx2560M   Larger heap-size for child jvms of reduces.
    mapreduce.task.io.sort.mb   512 Higher memory-limit while sorting data for efficiency.
    mapreduce.task.io.sort.factor   100 More streams merged at once while sorting files.
    mapreduce.reduce.shuffle.parallelcopies 50  Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.

    相关文章

      网友评论

        本文标题:hadoop cluster config

        本文链接:https://www.haomeiwen.com/subject/upnirxtx.html