美文网首页
Week 1 July 7 ~ July 13

Week 1 July 7 ~ July 13

作者: Beck码字的日常 | 来源:发表于2018-07-07 14:11 被阅读0次

    TODO: Basics + step by step + TinyURL + Pastebin

    Basics:

    A/C: 看Doc 理解基本概念,google + 实际工作经验 消化理解。
    ETA 7号
    实际完成时间:14号


    Why System Design Interviews?

    Points:

    1. high-level design

    2. guide and move the conversation forward

    3. discussion with interviewer --> core

    4. gather all requirements 因为interviewer 不会告诉你

    5. Leading the conversation: candidate leads the discussion to go broad and deep -> take the interviewer with you step by step

    6. Solving by breaking down: top-down and modularization
      6.1 break problems into modules and solve them independently
      6.2 each component -> sub-problem -> algorithm
      6.3 NOTE:
      大多数情况是没有以前6.2这么理想的情况的(得到具体solution)
      真正重要的是:how you make progress on 解决问题和采用哪种strategy

    7. handle bottleneck:
      7.1 each solution is a kind of trade-off
      7.2 talk about these trade-offs and to measure their impact on the system keeping all the constraints and use cases in mind

    8. Steps:
      8.1 Scoping the problem: No assumptions, Ask, 限制,use cases
      8.2 Abstract design: blocks of the system and the relationships between them.
      8.3 Identify and address the bottlenecks by using the fundamental principles of scalable system design.

    9. Know the preference of the interviewer
      9.1 focus on the right things while discussing the problem 心里有大局掌控的同时 要多问面试官是否想要多讨论一下 当前/某个 部分

    Must-do.

    1. 花几分钟跟interviewer搞清楚 the full scope of the system
    2. 在完成High-level design之后,make sure that interviewer is OK with that. then move on to details --> scale

    Super Important:

    1. NEVER assume things that are not stated!!!

    LB:

    1. LB is to distribute load to multiple resources

    2. Where to ADD:
      2.1 user <--> web server
      2.2 web server <--> internal platform layer(e.g. app server or cache server)
      2.3 internal platform layer <--> DB


      image.png
    3. Implement
      3.1 Smart Clients
      In a word: 纯自己develop,可以在各个layer,包括user和db server
      3.2 Hardware Load Balancers
      In a word: 纯硬件
      3.3 Software Load Balancers
      In a word: 用LB的软件,但是有时候需要software + hardware结合(如果如果无法控制user的host的port)


    Cache

    1. 各种Cache
    1.1 Application server cache
    request 到哪个node,哪个node就cache一下
    pros: straightforward
    cons: 如果LB是random分的request,会miss cache
    1.2 Distributed cache
    增加一个node来cache "cache是否available",利用hashing function,每个node上面只存自己需要负责存的cache。当查到自己这儿有相应的cache的时候,(在接到request的node去database那儿拿data前)发一个request告诉接到这个request 的node他这儿有cache
    pros: 解决了application server cache的cons
    cons: 如果有Missing node会很麻烦,即使利用"让多份copy of data到不同的nodes上"来解决这个con也会使整个cache变得很复杂
    1.3 Global Cache
    有两种global cache:

    image.png
    第一种的好处不用说
    第二种的对于有的情况会有好处,比如:(1)cache的file很大 (2)cache的东西是static的,不希望被evict

    1.4 Content Distribution Network (CDN)
    个人理解(To be changed if it's wrong) 本质就是global cache
    2. Cache Invalidation
    三种strategies:
    2.1 Write-through cache
    两边(cache + DB)一起更新
    cons: 特别慢,load大
    2.2 Write-around cache
    只存DB
    cons: 慢, miss cache
    2.3 Write-back cache
    只存cache,定期更新
    cons: 如果cache在DB更新前突然崩了,data可能会丢失
    3. Cache eviction policies
    FIFO, LIFO, LRU, MRU, LFU, RR


    Sharding or Data Partitioning

    -> break up a big database (DB) into many smaller parts

    1. Partition Methods
      1.1 Horizontal partitioning: range-based
      cons: 分布不均
      1.2 Vertical Partitioning: column-based
      cons: 一般都得需要再partition
      1.3 Directory Based Partitioning: mapping

    2. Partitioning Criteria
      2.1 Key or Hash-based partitioning:
      hash -> consistent hashing
      2.2 List partitioning:
      Only contains the content with specific values
      2.3 Round-robin partitioning
      2.4 Composite partitioning

    3. Common Problems of Sharding
      a. Joins and Denormalization:
      problem: 不能join
      solution: denormalization -> con: have to deal with data inconsistency(caused by denormalization)

    b. Referential integrity:
    problem: not support foreign key constraint
    solution: application needs to (1)handle itself (2)clean up dangling references

    c. Rebalancing:
    problem: 有时候各种原因会导致各个shard间unbalance
    solution: Hash-based partitioning with consistent hashing


    Index

    • used to improve the speed of data retrieval operations on the data store
    • It's a data structure

    Proxies

    Used to filter requests or log requests or transform requests
    ?As a cache?
    Collapse the same requests from a system-wide perspective
    Collapse the requests that are 空间相同(spatially close together,如同一个DB)

    在(1)high load的情况下(2)cache 有限的情况下 非常有用


    Queues

    LB有时候也会让request分配不均,产生可以本避免的latency。
    用了Queue, client不需要就在那儿等着某个Server的response。

    好处:

    1. asynchronously run tasks
    2. 可以有更灵活的retry机制, fault tolerance.

    Queue的限制: size of data and the number of outstanding requests


    Redundancy and Replication

    Backup important data or service.

    1. Failover
    2. shared-nothing architecture

    SQL vs. NoSQL

    Differ in (1) the way they were built (2) the kind of info they stored (3) how they store it

    SQL: structured, pre-defined schema(例子:电话簿 姓名-电话-地址)
    NoSQL: unstructured, distributed, dynamic schema(例子:folder 姓名-一切关于这个人的东西,如地址、电话、FB点赞数)

    NoSQL types:

    1. Key-Value Stores:
      Stored in key-value pairs.
      e.g. Redis, Dynamo
    2. Document Databases:
      Stored in documents. Documents are grouped by collections. Each document 可以有完全不同的数据结构。
      e.g. MongoDB
    3. Wide-Column Databases:
      Colum families --> container of rows. 不需要提前了解columns, 每个row 不需要有同样的column.
      常用于large dataset.
      e.g. Cassandra, HBase.
    4. Graph Databases:
      用于存那些关系容易用graph来表示的数据

    High level differences between SQL and NoSQL

    • Storage:
      SQL: row --> entity
      NoSQL: different models
    • Schema
      SQL: fixed schema <-- whole database
      NoSQL: dynamic
    • Querying
      SQL: SQL -> data
      NoSQL: focus on collections, different DB --> different syntax
    • Scalability
      SQL: vertically scalable
      NoSQL: horizontally scalable
    • Reliability or ACID Compliancy
      SQL: Good
      NoSQL: Bad

    SQL VS. NoSQL - Which one to use?

    • When SQL:
      (1) ensure ACID compliance
      (2)data is structured and unchanging
    • When NoSQL:
      (1) data with no or little structure.
      (2) Used in Cloud Computing. Scalable.
      (3) Rapid development.

    CAP Theorem

    三个中最多满足两个。


    CAP

    Consistent Hashing

    如果想不起来了可以参考wiki 或者
    https://www.educative.io/collection/page/5668639101419520/5649050225344512/5709068098338816
    上面的flash


    Long-Polling vs WebSockets vs Server-Sent Events

    Ajax Polling
    client 发request --> server 收到然后处理发response(不管有没有能回的都要回,所以会产生empty response)
    一般都是固定频率的发request
    cons: too many requests
    Long polling
    client 发一个request,但是NOT expect 马上回复 --> server收到request后,每当有东西需要回的时候再回 --> client收到response再马上发一个request

    WebSocket
    persistent connection between client and server
    双方想什么时候发data就什么时候发

    Server-Sent Events
    只server想什么时候给client发就什么时候发
    如果client想给server发data需要用别的protocol

    相关文章

      网友评论

          本文标题:Week 1 July 7 ~ July 13

          本文链接:https://www.haomeiwen.com/subject/zkuwuftx.html