用户研究发现netflix的用户在一到两屏看过10-20个title之后,在60s-90s过后就会失去兴趣。推荐系统的目的就是在两屏之内让用户找到感兴趣的东西。
how each member watches (e.g., the device, time of day, day of week, intensity of watching)
有这么几种推荐策略:
1)Personalized Video Ranker
orders the entire catalog of videos (or subsets selected by genre or other filtering) for each member profile in a personalized way。
Because we use PVR so widely, it must be good at general- purpose relative rankings throughout the entire catalog; this limits how personalized it can actually be
PVR需要对一个分类下所有的视频进行rank,需要对所有分类都进行排序,这实际上限制了个性化
2) Top-N Video Ranker
find the best few personalized recommendations in the entire catalog for each member, that is, focusing only on the head of the ranking, a freedom that PVR does not have because it gets used to rank arbitrary subsets of the catalog
TVR其实是用对头部的视频进行rank,挑出topn出来,所以方法上比PVR更自由。但是这俩其实共享了很多相同的属性,比如
3)Treding Now
used to drive the Trending Now row,有两部分情况表现很好:
- 季节性的热点,比如情人节
- 短期实时热点,比如飓风
4)Continue Watching
the continue watching ranker sorts the subset of recently viewed titles based on our best estimate of whether the member intends to resume watching or rewatch,主要特征有 - 上次看过的时间间隔
- 什么时候放弃的(中间、开始、结尾)
- 使用的设备
- 其他[相关]标题是不是看过
5)Video-Video Similarity
an unpersonalized algorithm that computes a ranked list of videos—the similars—for every video in our catalog,the choice of which BYW rows make it onto a homepage is personalized
6) Page Generation: Row Selection and Ranking
select and order rows from a large pool of candidates to create an ordering optimized for relevance and diversity(怎么评估的相关性和多样性?A recent blogpost Learning a Personalized Homepage)
7) Evidence
Evidence selection algorithms evaluate all the possible evidence items that we can display for every recommendation, to select the few that we think will be most helpful to the member viewing the recommendation。推荐理由的选择和展示
decide whether to show that a certain movie won an Oscar or instead show the member that the movie is similar to another video recently watched by that member
8)Search
a) search recommends videos for a given query as alternative results for a failed search.
b)we know about the searching member’s taste is also especially important for us. - One algorithm attempts to find the videos that match a given query
- Another algorithm predicts interest in a concept given a partial query
- A third algorithm finds video recommendations for a given concept
-
商业价值
The effective catalog size (ECS) is a metric that describes how spread viewing is across the items in our catalog.tells us how many videos are required to account for a typical hour streamed.
ECS的计算方法如下:
图片.png
Notethat pi ≥ pi+1 for i=1,...,N−1and 综合为1.
-
衡量标准
直觉跟线上效果不一定相关,比如“house of cards”看起来更相似的相关推荐结果效果并不如更宽泛的结果.
we have observed that improving engagement—the time that our members spend viewing Netflix content—is strongly correlated with improving retention.
显著性和测试的cell数量关系很大,For example, if we find that 50% of the members in the test have retained when we compute our retention metric, then we need roughly 2 million members per cell to measure a retention delta of 50.05% to 49.95%=0.1% with statistical confidence. this type of plot can be used as a guide to choose the sample size for the cells in a test, for example, detecting a retention delta of 0.2% requires the sample size traced by the black line labeled 0.2%, which changes as a function of the average retention rate when the experiment stops, being maximum (south of 500k members per cell) when the retention rate is 50%.
图片.png
离线测试加速迭代,Offline experiments allow us to iterate quickly on algorithm prototypes, and to prune the candidate variants that we use in actual A/B experiments.
-
关键问题
1)Better Experimentation Protocols
还是需要更好地离线和在线评测指标来综合整体的收益,比如在长期收益和短期收益的衡量上
2)Global Algorithms
3)Controlling for Presentation Bias
introduce randomness into the recommendations
4)Page Construction
It took us a couple of years to find a fully personalized algorithm to construct a page of recommendations that A/B tested better than a page based on a template (itself optimized through years of A/B testing)
5)Member Coldstarting
Today, our member coldstart approach has evolved into a survey given during the sign-up process, during which we ask new members to select videos from an algorithmically populated set that we use as input into all of our algorithms.
6)Choosing the Best Evidence to Support Each Recommendation
highlight different aspects of a video, such as an actor or director involved in it -
延伸阅读
Learning a Personalized Homepage
图片.png
We want our recommendations to be accurate in that they are relevant to the tastes of our members, but they also need to be diverse so that we can address the spectrum of a member’s interests versus only focusing on one. We want to be able to highlight the depth in the catalog we have in those interests and also the breadth we have across other areas to help our members explore and even find new interests. We want our recommendations to be fresh and responsive to the actions a member takes, such as watching a show, adding to their list, or rating; but we also want some stability so that people are familiar with their homepage and can easily find videos they’ve been recommended in the recent past
二维的多行,横着天然满足相关性,竖着天然满足多样性。
we consider important
- the quality of the videos in the row,
- the amount of diversity on the page
- the affinity of members for specific kinds of rows
- and the quality of the evidence we can surface for each video.
A simple way to add in diversity is to switch from a row-ranking approach to a stage-wise approach using a scoring function that considers both a row as well as its relationship to both the previous rows and the previous videos already chosen for the page.Other approaches to greedily add diversity based on submodular function maximization can also be used.
Diversity can also be additionally incorporated into the scoring model when considering the features of a row compared to the rest of the page by looking at how similar the row is to the rest of the rows or the videos in the row to the videos on the rest of the page.
网友评论