美文网首页
检错技术积累(by stata)_14Nov2019

检错技术积累(by stata)_14Nov2019

作者: liang_rujiang | 来源:发表于2019-11-14 23:14 被阅读0次

逻辑检错每个变量不一样,差别大,大家遇到了随机应变就行了。
缺失值和异常值都在下面的例子了。这些例子可以很轻易地在您的机器上实现,唯一需要一个外部命令,这样来安装:

  1. 联网的电脑
  2. 打开stata
  3. 命令窗口键入: ssc install fmiss

注意:安装只需要一次即可。

检缺失值

/*make a dataset that contains missing values*/
sysuse auto, clear
set more off
replace make = "" in 1
replace price = . in 2
replace price = . in 7
replace mpg = . in 7
replace rep78 = . in 10
replace headroom = . in 11
replace trunk = . in 9
replace weight = . in 5

fmiss /*a simple glimpse works sometimes*/

misstable summarize, generate(miss_)
keep miss_*

egen obs_with_na = rowtotal(*)
drop if obs_with_na == 0
list /*I personally prefer this version of result, perhaps you should save this into .xlsx*/

/*optional*/
drop obs_with_na
tostring *, replace
for var *: replace X = "" if X == "0"
for var *: replace X = "missing" if X == "1"
list

export excel using "chk_missing.xlsx", replace

批量绘图导出箱图_检错或可使用(histogram may also help)

sysuse auto, clear
set more off
cap log close
des

!rmdir /s chk_miss_figure /*be careful that anything in this directory will be removed*/
!mkdir chk_miss_figure

foreach var of varlist price mpg rep head tru wei len turn dis gear fore {
    graph box `var'
    graph export "chk_miss_figure/`var'.png", replace
}

批量检异常值

sysuse auto, clear
set more off
cap log close

gen ID = _n
gen test = _n + 1
des

/*you are required to revise the folling four lines of code to setup*/
log using chk_outlier.txt, text replace /*the file name of results*/
global threshold 2 /*bigger than which times of sd is consisdered as a outlier, 2 or 3 recommended*/
global must_shown ID test /*variables showed in results, usually used to identify obs*/
global chk_vars price mpg rep head tru wei len turn dis gear fore /*variables to be checked, very important*/

/*do not change anything of below codes unless you understand exactly what you doing*/
foreach var of global chk_vars {
    qui: su `var'
    
    cap drop temp
    qui: gen temp = 1 if abs(`var' - r(mean)) > $threshold * r(sd) & `var' != .
    qui: su temp
    
    if r(sum) != 0 {
        di "===========`var'==============="
        list $must_shown `var' if temp == 1
        di "==============================="
    }
}

log close

/*another example, helpful in your analysis*/
gen outlier = 0

foreach var of global chk_vars{
    qui: su `var'
    qui: replace outlier = 1 if abs(`var' - r(mean)) > $threshold * r(sd) & `var' != .
}

tab outlier

相关文章

  • 检错技术积累(by stata)_14Nov2019

    逻辑检错每个变量不一样,差别大,大家遇到了随机应变就行了。缺失值和异常值都在下面的例子了。这些例子可以很轻易地在您...

  • 汉明码检错

    本人第一次接触编码纠错检错,如果说的有问题,希望大佬可以无情的辱骂。 1.汉明码检错的目标。 2.汉明码检错的原理...

  • 重磅新命令:songbl—利用搜索自己收藏的stata帖子

    前言 互联网上stata资源非常丰富,包括许多大牛写的stata技术博客,以及众多“好事者”热心回复的帖子,例如连...

  • 技术积累

    人人都需要技术积累。技术的范围广,怎么使用铁锨更省力,如何使得服务端保存客户端输入的数据,都应该算是技术。所以,只...

  • 技术积累

    数学基础 MCMC 采样 MCMC 采样 一、机器学习 1、无监督学习 聚类 Kmeans 聚类 降维 PCA 理...

  • 技术积累

    学习方法论 根据心理学家、教育家大卫·库珀的学习基本结构,总结出学习线路经验-反思性观察-抽象概括-实验,衍生出针...

  • TCP/IP差错控制-确认和重传

    差错控制-确认和重传:一、发送端发送的数据帧由数据和检错码组成二、接收端用检错码判别数据帧是否出错三、如果数据没有...

  • Stata语言编程 | 介绍篇

    [本文由李佳恩著,余志春翻译] 一:目录 Stata 的简介 Stata 的优点 Stata 的实操 Stata ...

  • eslint忽略检错

    1.忽略某一行的检错 在行末添加注释: // eslint-disable-line [错误类型] 2.忽略某类检...

  • Stata:让图片透明——你不要掩盖我的光芒

    Stata连享会 (知乎 | 简书 | 码云) Stata 现场培训报名中 Source: Stata: Tra...

网友评论

      本文标题:检错技术积累(by stata)_14Nov2019

      本文链接:https://www.haomeiwen.com/subject/vrodictx.html