有配置文件的默认值参考,可见Hadoop安装目录下的share/doc
只读的默认配置:core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml
特定的配置:etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml
另外对于hadoop的脚本,可以使用etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh配置环境变量,至少需要JAVA_HOME,此外对于每个部件还可以通过不同的OPTS指定:
NameNode HADOOP_NAMENODE_OPTS
DataNode HADOOP_DATANODE_OPTS
Secondary NameNode HADOOP_SECONDARYNAMENODE_OPTS
ResourceManager YARN_RESOURCEMANAGER_OPTS
NodeManager YARN_NODEMANAGER_OPTS
WebAppProxy YARN_PROXYSERVER_OPTS
Map Reduce Job History Server HADOOP_JOB_HISTORYSERVER_OPTS
hadoop-env.sh
HADOOP_PID_DIR
HADOOP_LOG_DIR
HADOOP_HEAPSIZE YARN_HEAPSIZE YARN_RESOURCEMANAGER_HEAPSIZE YARN_NODEMANAGER_HEAPSIZE YARN_PROXYSERVER_HEAPSIZE HADOOP_JOB_HISTORYSERVER_HEAPSIZE
core-site.xml 配置通用属性
fs.defaultFS Namenode URI
io.file.buffer.size 读写sequence file时的缓冲大小 131072
hdfs-site.xml 配置HDFS属性
dfs.namenode.name.dir NameNode存储数据,可以设置用逗号分隔的一系列路径,这样可使得数据冗余
dfs.hosts / dfs.hosts.exclude 指定datanode或排除datanode
dfs.blocksize HDFS的块大小 默认268435456(256MB)
dfs.namenode.handler.count NameNode处理DataNode的RPC请求的handler数量,默认为100,如果datanode很多的话需要相应增加
dfs.datanode.data.dir DataNode存储数据,可以设置用逗号分隔的一系列路径,这样会使得数据分散到这些路径上,这样可使得数据读取变快
yarn-site.xml 配置YARN属性
yarn.acl.enable 默认为false
yarn.admin.acl 默认为* 任何人都可以
yarn.log-aggregation-enable 是否聚集log,默认为false
yarn.nodemanager.remote-app-log-dir 聚集的日志存储路径 是在hdfs中
yarn.nodemanager.remote-app-log-dir-suffix 聚集的日志的前缀
yarn.log-aggregation.retain-seconds
yarn.log-aggregation.retain-check-interval-seconds
yarn.resourcemanager.hostname 资源管理器的主机名
yarn.resourcemanager.scheduler.class 调度器类型CapacityScheduler (recommended), FairScheduler (recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb 容器的最少内存配置
yarn.scheduler.maximum-allocation-mb 容器的最大内存配置
yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path 指定NodeManagers或排除NodeManagers
yarn.nodemanager.resource.memory-mb 一个机器提供运行的物理内存
yarn.nodemanager.vmem-pmem-ratio 虚拟内存-物理内存比例
yarn.nodemanager.local-dirs 中间文件的读写路径
yarn.nodemanager.log-dirs 日志文件路径
yarn.nodemanager.log.retain-seconds 日志保存是否,默认10800(3小时)
yarn.nodemanager.aux-services mapreduce_shuffle
mapred-site.xml 用于配置mapred属性
mapreduce.framework.name yarn Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb 1536 Larger resource limit for maps.
mapreduce.map.java.opts -Xmx1024M Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb 3072 Larger resource limit for reduces.
mapreduce.reduce.java.opts -Xmx2560M Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb 512 Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor 100 More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies 50 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
网友评论