Introduction of Hadoop/ MapReduc

作者: 张荣恩Sophia | 来源:发表于2017-09-09 06:30 被阅读29次

Introduction of Hadoop/ MapReduc
Yarn
Spark 的shell操作
Hadoop introduction
初学Hadoop之图解MapReduce与WordCount示例
Cassandra
How to Install Hadoop in Stand-A
Intellij IDEA提交远程Hadoop MapReduc
Python开发MapReduce
大数据Spark和Hadoop以及区别（干货）

What is MapReduce ?

Parallel programming model for big data processing:

split data> chunks

define steps to process chunks

process the chunks parallelly

Hadoop is a platform implements MapReduce .

1. Map

<key1, value1> -> <key2, value2>

eg: <line#, text string > -> < word, count>

After mapping, the oupput is passed to Reduce part

2. Reduce

Merge/Reduce the output of Mapping phase, which is optional .

The output of MapReduce could be printed, Summed, Counted , loaded to DB or sent to next MapReduce job

Idea: MapReduce , massive unstructured data storage

Physical: Jave classes for and The Hadoop Distributed file System

Hadoop Operational Modes

Java MapReduce Mode: read record incrementally

Streaming Mode: Any language, input can be a line or stream

MapReduce and HDFS

Query Languages for Hadoop

Builds on core Hadoop to enhanve the development and manpulation of Hadoop cluster

Pig:Data flow language and execution enviroment

Hive(HiveQL) Query language based on SQL for building MapReduced jobs

HBase Column oriented database

Pig(Data flow language in Latin)

2 Execution environment modes:

Local flie system

MapReduce in Hadoop environment

Suitable for large dataset and batch processing

网友评论

我爱编程

本文标题：Introduction of Hadoop/ MapReduc

本文链接：https://www.haomeiwen.com/subject/kqkojxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！