模型系列-双重差分|DID 直观介绍

作者: 5a41eb2ceec6 | 来源:发表于2018-10-13 15:58 被阅读67次

模型系列-双重差分|DID 直观介绍
双重差分模型DID 模型介绍
模型系列-DID直观介绍
双重差分模型DID python操作
双重差分模型DID stata操作
双重差分模型DID 结果分析
学习笔记DID（其一）
Stata：双重差分的固定效应模型 (DID)
双重差分模型DID 概念性理解
PSM-DID资料

image.png

This article is taken from my reading notes.And the contents are based on sildes wrriten by Jeff Wooldridge and video shared by Douglas McKee, a senior lecturer in the Cornell Economics department.While the note is published in WeChat Official Accounts called Think_Lab

In the basic setting, outcomes are observed for two groups for two,time periods. One of the groups is exposed to a treatment in the second period but not in the first period. The second group is not exposed to the treatment during either period.Structure can apply to repeated cross sections or panel data.(Jeff Wooldridge,2011)

Suppose Sao Paulo (Brazil) institutes a free lunch program in elementary school in 2009. There are many reasons to expect students to perform better in school if they are guaranteed a free meal. But how large such effects might be ?Suppose also that Brazilian fifth graders take a standardized math test at the end of every year.So how we evaluate the effects of this program(free lunch) on test scores?

One way to evaluate the program would be to compare test scores of kids in Sao Paul in 2010 with the scores of Sao Paul in 2008 before the program was implemented. This difference is certainly partially due to the program.We can call the difference D1.Why we say D1 is partially due to the program? Suppose there was an important international soccer tournament druing the week of the exam in 2008 but not in 2010.So the tournament might also influence the differnce in the test scores between the two periods.So what we have is that D1 is both the program effects and what we are going to call the trend what else might be happening at the same time.

Suppose we also observe test scores in 2008 and 2010 in Rio another large city in Brazil.If we are willing to assume that the difference across time in Rio is reflective of what would have happened in Sao Paulo then we can use the difference(D2) as an approximation of what the trend is.Now we get our difference-in-difference "D1-D2"

DID1.png

When is diff-in-diff(DID) useful?

You want to evalueate a program or treatment
You have treatment and control groups
You observe them before and after

BUT

Treatment is not random
Other things were happening while the program was in effect
You can't control for all the potential confounders

Key Asuumption

Trend in control group approximates what would have happend in the treatment group in the absence of the treatment

Diff-in-diff data needs

You need data on

The treatment group and a control group
Covering pre and post treatment(i.e. program intervention)

Structure

Aggregate data or
Pooled cross-section or
Longitudinal data

DID2.png

DID3.png

We are saying we believe test scores are determined by this regression model.If we take the conditional expected value of y (E[y|x]),we get this linear combination and the error term drops out.

The difference across time in the control group is going to be β1.And the difference across time in the treatment group is going to be β1+β3.Between the two is β3(DID estimate)