美文网首页
A Collection of Data Science Tak

A Collection of Data Science Tak

作者: Echo_1cc5 | 来源:发表于2019-01-21 11:17 被阅读0次

1⃣️. How would you improve engagement on FB? 【How do you increase X on site Y?】

1. Define the metric

- Based on the company's mission 

- Has to be measurable (e.g the proportion of users who take at least one action( like, post, upload) per day / % of questions that get at least one response within a day / response >=3 up votes within the first day ) 

2. Pick the variables

- User characters: sex, age, country, #of friends

- Behavior: device, came from ads/SEO(Search Engine Optimization)/direct link, session time

3. Pick a model

- Random Forest: high accuracy/ works well in high dimension, with categorical variables and outliers

- Get model insights from partial dependence and variable importance plots

4. Come up with one good and one bad scenarios (realistic segments)

- i.e users from Argentina are not very engaged / Indians<30 yrs are very engaged

5. Define next step

- Check Spanish translation, more localized version? 

- Reach more young Indians via ads or other market campaigns.

2⃣️【Can't split randomly】Outline a testing strategy to see if the new app is better?

***It's difficult to design a A/B test in marketplace or social networks since users are connected

1. Can't randomly split users because it will effect the control group.

2. Test by market. Comparable markets (main metrics are expected to be similar) in pairs

3. Choose sample size

    - Precision based: 

    - Power based: n=ƒ(a,b)2s^2/∂^2  a-sig b-power s-SD ∂-the smallest difference

4. Check if the result is significant

[Bonus] check for novelty effect (waiting for couple weeks and see the improvement)

3⃣️A/B test wins the significant p-value but choose to not make change

1. Human labor cost : 1) engineer 2) PM 3)customer service 4)opportunity-cost

2. Risk of bugs

3. Future maintenance fee

4. Inferential stats: large sample size -> significant p

                               check the effect size (Cohen's = (x-x0)/SD / Pearson (r) )

5. Maybe novelty effect

4⃣️【Missing Values】Will Uber trips without rider review be better, worse or same?

- Non-random missing values, can' t assume have the same distribution as non-missing ones

- ML predict : supervised learning (waiting time, trip duration, cost, driver/rider info, time...)

- Company should keep running experiments and reduce this issue. Incentivize users to leave a review(coupon etc.)

5⃣️Jeans is not doing well, demand or supply problem?

-Run a campaign about the jeans. High CTR(click-through-rate) means high demand

- Look at conversion rate.  (remove noise: only consider people who used filters/ people whose session time is above 5 mins)

- If there is a supply problem, check filter usage -- is price too high?

6⃣️Drawbacks of supervised learning predicting frauds

- Majority are legitimate, so model tends to have high classification power

- Change the model internal loss to penalize more false negatives, using an extremely aggressive cut-off point (>0.1), or reweighing the training events.  --- massive data with positive cases is required

- If didn't detect fraud before, this will negatively reinforce the model

- There is always a time-lag considering people coming up with new techniques to cheat

- Using anomaly detection (problem: in high dimension, tend to consider every transaction as an outlier, needs massive investment in terms of time) and supervised ML

**False Positive : Type 1 error (没罪说有罪)

False Negative:Type 2 error(有罪说没罪)

相关文章

网友评论

      本文标题:A Collection of Data Science Tak

      本文链接:https://www.haomeiwen.com/subject/onmqjqtx.html