AWS

作者: klory | 来源:发表于2017-09-20 06:29 被阅读8次

    IaaS (Infrastructure as a Service, like Azure, Google Could, these kinds of virtual services), this is used to build large companies involved in different kinds of servers.

    Not like DigitalOcean and Linode(VPS - virtual process service). It is more for building wordpress or kind of small websites involved in single server.

    Services

    • CDN (CloudFront)
      Content deliver network, make you to access the website from the closest place.
    • Glacier
      Store data that is not used frequently
    • Storage
      Store data that is used frequently
    • Virtual Server
    • Lambda
      Pure compute without worrying the server.
    • Database

    Benefits

    • Scalable(just spend more money)
    • Total Cost of Ownership is low , you need to hire people to deal with different servers and modules, like power, cooler, etc.
    • Highly reliable for price point
    • Centralized Billing and Management

    Problems

    • lock in
    • learning curve
    • cost adds up

    Pricing

    • compute
    • storage
    • bandwidth
    • interaction

    Normal File system

    • Linux default disk block size = 4 KB, file smaller than a block, the rest of the block will be wasted
    • GFS <-> HDFS
    • MapReduce <-> Hadoop

    HDFS

    • Specially designed FS for storing big data with a streaming access pattern (write once and read as many as you want)
    • default disk block size = 64MB, file smaller than a block, the rest of the block will NOT BE wasted

    Hadoop

    daemons

    • master daemons: name node, secondary name node, job tracker
    • slaves daemons: data node, task tracker

    example - theory

    • we(client) have 200MB data, so we need 4 blocks
    • we need 1 name node(nn), and several data node(dn), e.g. 8 data nodes.
    • nn creates metadata, creates daemons.
    • nn passes metadata back to client. Then client distributes the blocks to the data nodes and make replications based on the info from name node.
    • the data nodes send heartbeats back to the nn to notify that it is alive.
    • client sends code to the data node
    • job tracker tells task trackers to do its job
    • after the job are finished, the job tracker will assign a reducer.

    example - real world

    • split data(documents) into input splits, and pass them to Record Readers,
      then send them to the mapper. (default for text jobs is to split document into lines then send the lines to the mappers).
    • then shuffle the data to make the pairs with the same key together, default shuffle(sort) in Hadoop is alphabetically.
    • then reduce (each reducer reduces one key)

    HDFS instructions

    • step 1
      hdfs dfs -ls /, hdfs dfs -mkdir, hdfs dfs -put, hdfs dfs -get
    • step 2 move file to hdfs
      hdfs dfs -put input.txt /user/class/
    • step 3 complie
      javac -cp $HADOOP_core.jar *.java
    • step 4
      jar cvf test.jar *.class
    • step 5
      hadoop jar wordcount.jar ...WordCount

    Setup

    Setup your AWS accounts by following the below steps:

    1. Go to AWS (https://aws.amazon.com/) and create an account. You need to enter your credit card info.
    2. You can find an AWS account number in your AWS profile. Use that account number to apply for AWS educate credits at https://aws.amazon.com/education/awseducate/apply/ It will take a few hours before your receive an email confirming your credits are active.

    If you have not received your AWS educate credits and are not using free tier services you will be charged on your credit card for usage! If you do, you will be responsible for any costs incurred.

    相关文章

      网友评论

          本文标题:AWS

          本文链接:https://www.haomeiwen.com/subject/gdrnsxtx.html