逻辑检错每个变量不一样,差别大,大家遇到了随机应变就行了。
缺失值和异常值都在下面的例子了。这些例子可以很轻易地在您的机器上实现,唯一需要一个外部命令,这样来安装:
- 联网的电脑
- 打开stata
- 命令窗口键入:
ssc install fmiss
注意:安装只需要一次即可。
检缺失值
/*make a dataset that contains missing values*/
sysuse auto, clear
set more off
replace make = "" in 1
replace price = . in 2
replace price = . in 7
replace mpg = . in 7
replace rep78 = . in 10
replace headroom = . in 11
replace trunk = . in 9
replace weight = . in 5
fmiss /*a simple glimpse works sometimes*/
misstable summarize, generate(miss_)
keep miss_*
egen obs_with_na = rowtotal(*)
drop if obs_with_na == 0
list /*I personally prefer this version of result, perhaps you should save this into .xlsx*/
/*optional*/
drop obs_with_na
tostring *, replace
for var *: replace X = "" if X == "0"
for var *: replace X = "missing" if X == "1"
list
export excel using "chk_missing.xlsx", replace
批量绘图导出箱图_检错或可使用(histogram may also help)
sysuse auto, clear
set more off
cap log close
des
!rmdir /s chk_miss_figure /*be careful that anything in this directory will be removed*/
!mkdir chk_miss_figure
foreach var of varlist price mpg rep head tru wei len turn dis gear fore {
graph box `var'
graph export "chk_miss_figure/`var'.png", replace
}
批量检异常值
sysuse auto, clear
set more off
cap log close
gen ID = _n
gen test = _n + 1
des
/*you are required to revise the folling four lines of code to setup*/
log using chk_outlier.txt, text replace /*the file name of results*/
global threshold 2 /*bigger than which times of sd is consisdered as a outlier, 2 or 3 recommended*/
global must_shown ID test /*variables showed in results, usually used to identify obs*/
global chk_vars price mpg rep head tru wei len turn dis gear fore /*variables to be checked, very important*/
/*do not change anything of below codes unless you understand exactly what you doing*/
foreach var of global chk_vars {
qui: su `var'
cap drop temp
qui: gen temp = 1 if abs(`var' - r(mean)) > $threshold * r(sd) & `var' != .
qui: su temp
if r(sum) != 0 {
di "===========`var'==============="
list $must_shown `var' if temp == 1
di "==============================="
}
}
log close
/*another example, helpful in your analysis*/
gen outlier = 0
foreach var of global chk_vars{
qui: su `var'
qui: replace outlier = 1 if abs(`var' - r(mean)) > $threshold * r(sd) & `var' != .
}
tab outlier
网友评论