| unit | year | y_it | d_it | x_i | x_it |
|---|---|---|---|---|---|
| 1 | 2005 | 0.627 | 0 | 1.553 | -0.484 |
| 1 | 2006 | -0.157 | 0 | 1.553 | 0.472 |
| 1 | 2007 | -0.261 | 0 | 1.553 | -0.406 |
| 1 | 2008 | -0.319 | 0 | 1.553 | -0.735 |
| 1 | 2009 | -1.670 | 0 | 1.553 | -2.017 |
| 1 | 2010 | -2.920 | 0 | 1.553 | -2.331 |
HPOL8539
John Graves
September 21, 2022
The rest of us:
Econometricians:
The rest of us:
learnr exercise for eight DID estimators.Data selection and setup is key.
Important
Data selection and setup is key.
Important
Data selection and setup is key.
Important
time - cohort_tx_time), never treated units should get a value of -Inf.Once you do all the above, you are teed up to fit any of the NextGen DID estimators.
Benefit of new estimators: in principle, you’re being transparent about your comparisons and the “weights” you use to aggregate across groups.
Double-edged sword: many of the estimators have R & Stata commands, but they can be a bit of a “black box.”
Roth et al. (2022) provides a very nice, approchable summary of the recent literature. Link to paper.
Let’s start out with the simplest possible data structure.
unityeary_itd_itx_ix_itWe actually won’t use x_i or x_it today, so we can ignore them for now.
Let’s take a look at the first few rows of our data:
| unit | year | y_it | d_it | x_i | x_it |
|---|---|---|---|---|---|
| 1 | 2005 | 0.627 | 0 | 1.553 | -0.484 |
| 1 | 2006 | -0.157 | 0 | 1.553 | 0.472 |
| 1 | 2007 | -0.261 | 0 | 1.553 | -0.406 |
| 1 | 2008 | -0.319 | 0 | 1.553 | -0.735 |
| 1 | 2009 | -1.670 | 0 | 1.553 | -2.017 |
| 1 | 2010 | -2.920 | 0 | 1.553 | -2.331 |
Our next objective is to identify the various treatment cohorts in our data.
To do this, we will ask “when is the first year this unit is treated”?
Based on the answer to this question, we will be able to identify all of the cohorts based on the first year they are treated.
df_ <-
df %>%
#########
# Step 1a
#########
arrange(unit,year) %>%
# What year is the unit first treated in?
mutate(first_treated = as.integer(lag(d_it==0) & d_it==1)) %>%
group_by(unit) %>%
# Is the unit never treated (i.e., d_it is never 1)
mutate(never_treated = as.integer(sum(d_it)==0)) %>%
ungroup() %>%
mutate(first_treated = year * first_treated) %>%
group_by(unit) %>%
# Treatment cohort
mutate(treatment_cohort = max(first_treated)) %>%
ungroup() df_ <-
df %>%
#########
# Step 1a
#########
arrange(unit,year) %>%
# What year is the unit first treated in?
mutate(first_treated = as.integer(lag(d_it==0) & d_it==1)) %>%
group_by(unit) %>%
# Is the unit never treated (i.e., d_it is never 1)
mutate(never_treated = as.integer(sum(d_it)==0)) %>%
ungroup() %>%
mutate(first_treated = year * first_treated) %>%
group_by(unit) %>%
# Treatment cohort
mutate(treatment_cohort = max(first_treated)) %>%
ungroup() df_ <-
df %>%
#########
# Step 1a
#########
arrange(unit,year) %>%
# What year is the unit first treated in?
mutate(first_treated = as.integer(lag(d_it==0) & d_it==1)) %>%
group_by(unit) %>%
# Is the unit never treated (i.e., d_it is never 1)
mutate(never_treated = as.integer(sum(d_it)==0)) %>%
ungroup() %>%
mutate(first_treated = year * first_treated) %>%
group_by(unit) %>%
# Treatment cohort
mutate(treatment_cohort = max(first_treated)) %>%
ungroup() Important
If all groups are eventually treated, you’ll need to “trim” the data of all observations on or after the last treated cohort date.
-Inf for never treated units.df_ <-
df %>%
#########
# Step 1a
#########
...
##########
# Step 2
##########
# Relative time variable
mutate(rel_time = year - treatment_cohort) %>%
mutate(rel_time = ifelse(never_treated==1, -Inf, rel_time)) %>%
# Create dummy indicators for treatment cohorts, years, and relative time.
dummy_cols(c("treatment_cohort","year","rel_time")) %>%
# Rename to remove the minus sign
rename_at(vars(contains("rel_time_")),function(x) gsub("-","lag",x)) %>%
# Never treated units should have values of 0 for all relative-time indicators.
mutate_at(vars(contains("rel_time_")),function(x) ifelse(is.na(x),0,x)) FastDummies package.- sign can complicate variable names, so we just replace with lag.df_ <-
df %>%
#########
# Step 1a
#########
...
##########
# Step 2
##########
# Relative time variable
mutate(rel_time = year - treatment_cohort) %>%
mutate(rel_time = ifelse(never_treated==1, -Inf, rel_time)) %>%
# Create dummy indicators for treatment cohorts, years, and relative time.
dummy_cols(c("treatment_cohort","year","rel_time")) %>%
# Rename to remove the minus sign
rename_at(vars(contains("rel_time_")),function(x) gsub("-","lag",x)) %>%
# Never treated units should have values of 0 for all relative-time indicators.
mutate_at(vars(contains("rel_time_")),function(x) ifelse(is.na(x),0,x)) df_ <-
df %>%
#########
# Step 1a
#########
...
##########
# Step 2
##########
# Relative time variable
mutate(rel_time = year - treatment_cohort) %>%
mutate(rel_time = ifelse(never_treated==1, -Inf, rel_time)) %>%
# Create dummy indicators for treatment cohorts, years, and relative time.
dummy_cols(c("treatment_cohort","year","rel_time")) %>%
# Rename to remove the minus sign
rename_at(vars(contains("rel_time_")),function(x) gsub("-","lag",x)) %>%
# Never treated units should have values of 0 for all relative-time indicators.
mutate_at(vars(contains("rel_time_")),function(x) ifelse(is.na(x),0,x)) Let’s take a look at a few rows of our final data:
| unit | year | y_it | d_it | x_i | x_it | first_treated | never_treated | treatment_cohort | rel_time | treatment_cohort_0 | treatment_cohort_2010 | treatment_cohort_2013 | year_2005 | year_2006 | year_2007 | year_2008 | year_2009 | year_2010 | year_2011 | year_2012 | year_2013 | year_2014 | year_2015 | year_2016 | year_2017 | year_2018 | year_2019 | year_2020 | year_2021 | year_2022 | rel_time_lag1 | rel_time_lag2 | rel_time_lag3 | rel_time_lag4 | rel_time_lag5 | rel_time_lag6 | rel_time_lag7 | rel_time_lag8 | rel_time_lagInf | rel_time_0 | rel_time_1 | rel_time_2 | rel_time_3 | rel_time_4 | rel_time_5 | rel_time_6 | rel_time_7 | rel_time_8 | rel_time_9 | rel_time_10 | rel_time_11 | rel_time_12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2010 | -2.920 | 0 | 1.553 | -2.331 | 0 | 0 | 2013 | -3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2011 | -2.979 | 0 | 1.553 | -1.304 | 0 | 0 | 2013 | -2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2012 | -0.744 | 0 | 1.553 | -0.905 | 0 | 0 | 2013 | -1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2013 | 0.194 | 1 | 1.553 | -1.677 | 2013 | 0 | 2013 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2014 | -0.641 | 1 | 1.553 | -2.088 | 0 | 0 | 2013 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2015 | -1.243 | 1 | 1.553 | -0.602 | 0 | 0 | 2013 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
learnr exercise to fit the various DID estimators.