how Not to Sort by average rating
rating的错误方式
-
Score = (Positive ratings) − (Negative ratings)
- item 1: 1000 ratings (600 positive ratings, 400 negative ratings)
- item 2: 10000 ratings(5500 positive ratings, 4500 negative ratings)
item1不应该放在item2前面
-
Score = Average rating = (Positive ratings) / (Total ratings)
- item 1: 1 positive , 0 negative
- item 2: 100 positive , 1 negative
item 1 不应该放在item2前面
rating的正确公式
Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter
评分给定下,95%的可能性positive rating的真实比例至少是多少
(pos:积极评分的数目 n: 总评分的数 confidence:置信率)
r语言实现.png
sql语言实现.png
应用场景(不限于sorting)
- 检测垃圾邮件: What percentage of people who see this item will mark it as spam?
- 创造best of list: What percentage of people who see this item will mark it as “best of”?
- most emailed list: What percentage of people who see this page will click “Email”?
How Hacker News ranking algorithm works
随着时间的增长,得分变低。同时gravity增加的时候得分会减少地更快
python.png对于old stories,时间影响变小(曲线平滑,主要依赖于vote)
对于new stories,是时间和vote的同时作用
网友评论