Hadoop input split size vs block

Hadoop input split size vs block

作者: SeanC52111 | 来源:发表于2017-10-09 19:27 被阅读0次

Hadoop input split size vs block
hadoop （七）高级编程
CompactibleFreeListSpace 之三
LSTM参数详解（其余RNN类似）
使用split_size优化的ODPS SQL的场景
使用split_size优化的ODPS SQL的场景
Hadoop重点知识梳理---分而治之
Hadoop中Block和Split的区别
Input ‘split_dim’ of ‘Split’ Op
Hadoop HDFS Block和 Input Splits（

The answer by @user1668782 is a great explanation for the question and I'll try to give a graphical depiction of it.
Assume we have a file of 400MB with consists of 4 records(e.g : csv file of 400MB and it has 4 rows, 100MB each)

enter image description here
If the HDFS Block Size is configured as 128MB, then the 4 records will not be distributed among the blocks evenly. It will look like this.

enter image description here
Block 1 contains the entire first record and a 28MB chunk of the second record.
If a mapper is to be run on Block 1, the mapper cannot process since it won't have the entire second record.
This is the exact problem that input splits solve. Input splits respects logical record boundaries.

Lets Assume the input split size is 200MB

enter image description here
Therefore the input split 1 should have both the record 1 and record 2. And input split 2 will not start with the record 2 since record 2 has been assigned to input split 1. Input split 2 will start with record 3.

This is why an input split is only a logical chunk of data. It points to start and end locations with in blocks.

Hope this helps.

相关文章

Hadoop input split size vs block
The answer by @user1668782 is a great explanation for the...
hadoop （七）高级编程
hadoop （七）高级编程 MapReduce过程输入（input）：将输入数据分成一个个split，并将sp...
CompactibleFreeListSpace 之三
1、block_size / block_size_no_stall / block_size_nopar这三个方...
LSTM参数详解（其余RNN类似）
输入数据 input: (seq_len, batch_size, input_size)LSTM(input_s...
使用split_size优化的ODPS SQL的场景
使用split_size优化的ODPS SQL的场景首先有两个大背景需要说明如下：说明1：split_size...
使用split_size优化的ODPS SQL的场景
使用split_size优化的ODPS SQL的场景首先有两个大背景需要说明如下：说明1：split_size，...
Hadoop重点知识梳理---分而治之
1、hadoop 1.X 数据块块默认64M；2.X 128M （该值可以更改，dfs.block.size 在...
Hadoop中Block和Split的区别
1 Block 当我们把文件上传到HDFS时，文件会被分块，这个是真实物理上的划分。每块的大小可以通过hadoop...
Input ‘split_dim’ of ‘Split’ Op
Input ‘split_dim’ of ‘Split’ Op has type float32 that doe...
Hadoop HDFS Block和 Input Splits（
HDFS 将文件按照一定大小的块进行切割，（我们可以通过 dfs.blocksize 参数来设置 HDFS 块的大...

网友评论

本文标题：Hadoop input split size vs block

本文链接：https://www.haomeiwen.com/subject/hmgpyxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Hadoop input split size vs block|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！