美文网首页我爱编程
Introduction of Hadoop/ MapReduc

Introduction of Hadoop/ MapReduc

作者: 张荣恩Sophia | 来源:发表于2017-09-09 06:30 被阅读29次

What is MapReduce ?

Parallel programming model for big data processing:

split data> chunks

define steps to process chunks

process the chunks parallelly

    Hadoop is a platform implements MapReduce . 

1. Map

<key1, value1>  -> <key2, value2>

eg: <line#, text string >   -> < word, count>

After mapping, the oupput is passed to Reduce part

2. Reduce

Merge/Reduce the output of Mapping phase, which is optional .

The output of MapReduce could be printed, Summed, Counted , loaded to DB or sent to next MapReduce job

Idea: MapReduce , massive unstructured data storage

Physical: Jave classes for and The Hadoop Distributed file System

Hadoop Operational Modes

Java MapReduce Mode: read record incrementally

Streaming Mode: Any language, input can be a line or stream

MapReduce and HDFS

Query Languages for Hadoop

Builds on core Hadoop to enhanve the development and manpulation of Hadoop cluster

Pig:Data flow language and execution enviroment

Hive(HiveQL) Query language based on SQL for building MapReduced jobs

HBase  Column oriented database 

Pig(Data flow language in Latin)

2 Execution environment modes:

Local flie system

MapReduce in Hadoop environment

Suitable for large dataset and batch processing

相关文章

网友评论

    本文标题:Introduction of Hadoop/ MapReduc

    本文链接:https://www.haomeiwen.com/subject/kqkojxtx.html