Big Data

作者: 虎耳 | 来源:发表于2016-02-22 12:11 被阅读16次

[TOC]

- [BigData](#bigdata)
    - [Glossary](#glossary)
    - [Hadoop Ecosystem](#hadoop-ecosystem)

BigData

Glossary

  • HDFS - Hadoop Distributed File System

Storage Layer, distributed, scalable, java-based, large volumes of unstructured data

  • MapReduce

Compute Layer, software framework. Jobs, Map function, Reduce function

  • Hive

a framework, Hadoop-based warehousing like. HiveSQL: SQL like language,
convert to MapReduce to query Hadoop

  • Pig

Hadoop-based language, for data pipelines

  • HBase

non-relational database, open source implementation of Googl BigTable. Column DB
lookups in Hadoop, add transaction capability on Hadoop

  • Flume

framework, populate data Hadoop with data.
could be used to collect logs, agent(file,syslog), collector, storage(file, HDFS)

  • Oozie

workflow processing system, support multiple language. similar to Aether

  • Ambari

web-based tool, to depoy/manage/monitor Hadoop cluster

  • Avro

RPC and data serialization framework
no need run code-gen when schema Change
similar to Thrift/ProtocolBuffer

  • Mahout

data mining lib, implement modelling using Map Reduce model

  • Sqoop

connective tool, move data from non-Hadoop data store to Hadoop

  • HCatalog
  • BigTop
  • Zookeeper

Provide distributed configuration service, synchronization service and
naming registry

  • Storm
  • Kafka
  • Spark
  • Mesos

abstract compute resoure (CPU, memory, storage) from machines (physical or virtual)

  • Docker
  • Kubernetes
  • ElasticSearch
  • Jenkins

Hadoop Ecosystem

Hadoop Ecocsystem
st=>start: Start:>http://www.google.com[blank]
e=>end:>http://www.google.com
op1=>operation: My Operation
sub1=>subroutine: My Subroutine
cond=>condition: Yes
or No?:>http://www.google.com
io=>inputoutput: catch something...

st->op1->cond
cond(yes)->io->e
cond(no)->sub1(right)->op1

相关文章

网友评论

      本文标题:Big Data

      本文链接:https://www.haomeiwen.com/subject/ullbkttx.html