美文网首页我爱编程
2018-01-24 6 HDFS Architecture a

2018-01-24 6 HDFS Architecture a

作者: 鸭鸭学语言 | 来源:发表于2018-01-25 05:01 被阅读0次

Architecture

Summary: 

HDFS is a scalable distributed filesystem. Haddoop distrubutes the big data as block on local data which is closed by compute. Nodes consists of heterogeneous low price commodity hardware.

key point of design: 

distribute data as block to scalable data nodes.

feature:

Data high availability is by data replication in different nodes.

Simplified coherency model - once write many read.

move computation close to data

Relax POSIX requirements - increase thoughput

Achitecture:

Name Node - manage the file system namespace and regulates the access to files by clients.

Data Nodes - manage storage; serve read/write requests from clients; block creation\deletion\replication based on instructions from Name Node.


Performance Envelope

Every block has represented as a object.

default block size is 64MB.  The file size depends on how many blocks to create, then :

impact the memory usage and netowork load from the perspective of namespace

impact the number of map task which process block, even further the disk IO performance.

How to improve performance:

- merge small file

- sequence files

- HBASE, HIVE configuration

- CombineFileInputFormat


Write/Replication/Read Processes on HDFS

initially, data is cached at client buffer until it reaches a block size. then:


lesson 6 - slides

HDFS command list

HDFS Architect (official document)

相关文章

网友评论

    本文标题:2018-01-24 6 HDFS Architecture a

    本文链接:https://www.haomeiwen.com/subject/todmaxtx.html