（GeekBand）系统设计与实践案例分析

作者: Linary_L | 来源:发表于2017-01-10 12:58 被阅读0次

（GeekBand）系统设计与实践案例分析
GeekBand 系统设计与实践 First Week
GeekBand 系统设计与实践 Thrid Week
GeekBand 系统设计与实践
运营笔记五-- 运营流量分析
系统设计与实践（一） GeekBand
系统设计与实践（三） GeekBand
系统设计与实践（二） GeekBand
基于Spark2.x新闻网大数据实时分析可视化系统项目
（GeekBand）系统设计与实践系统设计七剑客

案例

News Feeds
Stats Server
Web Crawler
Amazon Product Page

News feed（信息流）

Define feed

Organize

aggregate（分类）
dedup（去重）
sort（排序）

Level1.0

Database Schema:

User
Friendship
News

GetNewsfeed:

merge news
Newsfeed vs News

Why bad?

100+ friends

1Query-->Get friends list

1Query-->

SELECT news

WHERE timestamp>xxx
AND sourceid IN friend list
LIMIT 1000

IN is slow

Either Sequential scan or 100+ index queries

Level 2.0

Pull vs Push

Pull:Get news from each friend,merge them together.(NewsFeed generated when user request)

Push:NewsFeed generated when news generated.(we have another table to store newsfeed,may cause duplicate news)

Push:

1Query to select latest 1000 newsfeed.
100+ insert queries(Async)

Disadvantage:News Delay.

Level 3.0

Popular star(Justin Bieber)

Flowers 13M+

Async Push may cause over 30 minutes(13M+ insertions,delay too long)

Push+Pull

for popular star,don't push news to flowers

for every newfeed reqiest,merge non-popular user newfeed(push) and popular users newsfeed(pull)

Level 4.0

Push disadvantage

Realtime
Storage(Duplicate)
Edit

Go back to PULL:

Cache users' latest (14days) news
Broadcast multiple request to multiple servers(Shard by userld).
Merge & sort newsfeed
Cache newsfeeds for this user with timestamp

Click Stats Server

How are click stats stored

A poor candidate will suggest write-back to a data store on every click

A good candidate will suggest some form of aggregation tier that accepts clickstream data,aggregates it,and writes back a persistent data store periodically

A great candidate will suggest alow-latecy messaging system to bugger the click data and transfer it to the aggregation tier.

If daily,storing in hdfs and running map/reduce jobs to compute stats is a reasonable approach

If near real-time,the aggregation logic should compute stats

PS：要如何统计鼠标点击的次数以及相关区域呢？普通的程序员会将每次点击的数据（log）直接存储在数据库一层。比较好的程序员会在前段与数据库间加一个中间层，为点击的数据流做一次聚合，每隔一段时间（1分钟或10分钟）做一次刷新，存储到数据库，大大减轻了后端的压力。优秀的程序员综合以上的两种情况，对于数据量很大，实时性效果不高的情况下，可以通过分布式的批处理方式，将刷新聚合层的时间定位一天。对于时效性强的要适当缩短刷新时间。

Cache Requirement

When a request comes look it up in the cache and if it hits then return the response from here and do not pass the request to the system.
If the request is not found in the cache then pass it on to the system.
Since cache can only store the last n requests,Insert the n+1th request in the cache and delete one of the older requests from the cache
Design one cache such that all operations can be done in O(1)-lookup,delete and insert.

PS:如何设计cache（LRU设计相关）：

在层中缓存部分请求的处理方式，如果接收的请求在层中存在对应的处理方式，则无需把请求发送到后端系统
如果在层中找不到对应处理，则发送需求到后端
以定长队列的形式缓存，缓存最近的n个需求，头进尾出
将层中的匹配操作算法控制在O(1)范围

Web Crawler

爬虫

Amazon Product Page

The product page includes information such as

product information
user information
recommended products(what do other customers buy after viewing this item,recommendations for you like this product,etc)

Reference

http://highscalability.com
The Log:What every software engineer should know about real-time data's unifying abstraction
Job Interviews:How should I prepare system design questions for Goole/Facebook Interview?
HOW TO ACE A SYSTEMS DESIGN INTERVIEW
<Design Pattern>
<Design_Oatterns_For_Dummies.pdf>
http://www.hiredintech.com/app

网友评论

本文标题：（GeekBand）系统设计与实践案例分析

本文链接：https://www.haomeiwen.com/subject/sgpkbttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

（GeekBand）系统设计与实践 案例分析

案例