Q: During analysis, how do you treat missing values?
A:
First, we need to know the pattern of missing data:1. Missing completely at random (MCAR): there is no pattern in the missing data on any variables. (The most and the best situation); 2. Missing at random (pattern not affect primary dependent variables);3. Missing not at random (pattern affect primary dependent variables)
And then we can choose different methods to deal with missing values:
Deletion: If we have enough observations and the missing data is random, we can delete the observations with missing values and don't introduce bias.
Imputation: 1. Replace missing values with mean/ median/ mode or set default value; 2. Replace missing data by building models(eg. Regression/ KNN, etc.)
Others: Complex methods like Multiple Imputation (MI), Hot Deck, etc.
Ignorance: Some models, like random forest, can deal with missing values by itself.
Interview questions are from DataAppLab (Wechat: Datalaus)
Jun.27th, 2017 Seattle
网友评论