where \(i\) indexes individual units and \(t\) indexes time, \(\mathbf{X}_i\) are unit-level attributes, \(D_{it}\) is a treatment indicator (set to 1 if the observation is treated and in the post-treatment period). Finally, \(U_i\) captures unobserved unit-level heterogeneity.
Estimation: Random vs. Fixed Effects
Random effects uses an approach called “quasi-demeaning” or “partial pooling.”
In the above, \(\sigma_u\) is the variance of unit-level heterogeneity, and \(\sigma_{\epsilon}\) is the variance of \(\epsilon_{it}\). And recall \(T\) is the total number of repeated unit-level observations in our panel (e.g., the total number of time periods, the total number of patients for a given physician, etc.)
When \(\theta = 0\), it just reduces pooled regression.
When \(\theta = 1\), it is equivalent to fixed effects regression. This only isolates variation within units to estimate the regression coefficients (i.e., units are only compared to themselves, which is why the unobserved heterogeneity is accounted for).
Essentially, when you fit a random effects regression, Stata estimates \(\theta\) and then plugs that estimate it into the above model.
(When you fit a fixed effects regression, Stata uses the demeaning approach–unless, of course, you manually fit a dummy variable model)
Often \(0 < \theta < 1\), hence the term “partial-pooling.” That is, the regression draws on both variation “within” units and variation “between” units.
The degree to which within and between variation is used depends on the data context.
Remember, when \(\theta=0\) the random effects regression reduces to pooled OLS. When is this the case?
If the term \(T\sigma^2_u\) is 0, i.e., \(\sigma^2_u=0\) or there is no variation in individual heterogeneity.
Remember, when \(\theta=1\) the random effects regression reduces to fixed effects. When is this the case?
If the term \(T\sigma^2_u\) gets super large (more formally, it would need to blast off towards infinity…).
If we have a very “large” panel of observations on each unit (i.e., large \(T\)) there is sufficient “within” variation and we really don’t need to do any pooling across units.
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
library(broom)library(knitr)theme_set(theme_bw())
Code
params_panel <-list(N =1000,T =2,tx_time =2, rho_t =0.8,beta_0 =0.5,beta_1 =2,tau =0.5,p_d =0.5)dgp_panel <-function(params) {with(params, {# Time effects t_ <-data.frame(t =1:T,gamma_t =arima.sim(n=T, list(ar = rho_t, order=c(1,0,0))) %>%as.vector())# Individual measures and effects i_ <-data.frame(unit_id =1:N,x_i =rnorm(N, mean =0, sd =1),u_i =rnorm(N, mean =0, sd =1)) %>%rowwise() %>%# This allows us to get each value's pr_treated in the line below. mutate(pr_treated = boot::inv.logit(u_i)) %>%ungroup() %>%# This undoes the rowwise # Treatment indicatormutate(d_i =rbinom(N, size =1, prob = pr_treated)) %>%ungroup()crossing(unit_id = i_$unit_id,t = t_$t) %>%left_join(i_,"unit_id") %>%left_join(t_,"t") %>%mutate(d_i =ifelse(t<tx_time,0,d_i)) %>%mutate(y_i = beta_0 + beta_1 * x_i + tau * d_i + u_i + gamma_t +rnorm(N, mean =0, sd =1)) %>%select(unit_id, t, y_i , x_i, d_i) })}estimator_fn_re <-function(df) {lmer(y_i ~ d_i + (1|unit_id) , df)}disc_fn_re <-function(fit) { fit %>%summary() %>%pluck("coefficients") %>%data.frame() %>%rownames_to_column() %>% janitor::clean_names() %>%filter(rowname=="d_i") %>%pull(estimate) %>%as.vector()}generate_estimate_discriminate_re <-function(params) { params %>%# Step 1: Parameterize the problemdgp_panel() %>%# Step 2: Define the data generation processestimator_fn_re() %>%# Step 3: Estimate disc_fn_re() %>%# Step 4: Pull out what you needdata.frame(tau_hat = .) # store the result as a data frame object}construct_dm <-function(df) { df_ <- df %>%# mutate(y_i = y_i + mean(y_i),# d_i = d_i + mean(d_i)) %>% # group_by(t) %>% # mutate(y_i = y_i - mean(y_i),# d_i = d_i - mean(d_i)) %>% group_by(unit_id) %>%mutate(y_i = y_i -mean(y_i),d_i = d_i -mean(d_i))return(df_)}estimate_dm <-function(df) {lm(y_i ~ d_i , data = df)}disc_fn_dm =function(fit) { fit_ =broom::tidy(fit) # This cleans up the fitted regression object out =fit_ %>%filter(term=="d_i") %>%pull(estimate)return(out)}set.seed(123)df_c <-modifyList(params_panel,list(T=400,tx_time=200)) %>%dgp_panel() %>%mutate(index = t-200) %>%mutate(T2=as.integer(index %in%c(-1,0))) %>%mutate(T4 =as.integer(index >-3& index <2)) %>%mutate(T10 =as.integer(index >=-5& index<5)) %>%mutate(T20 =as.integer(index >=-10& index <10 )) %>%mutate(T50 =1 ) df <- df_c %>%filter(T4==1)params_panel %>%dgp_panel() %>%head() %>%kable(digits=3)
unit_id
t
y_i
x_i
d_i
1
1
2.855
0.502
0
1
2
2.825
0.502
0
2
1
5.893
1.434
0
2
2
7.620
1.434
1
3
1
3.225
2.166
0
3
2
6.555
2.166
0
Let’s first show the equivalance between the random effect model and a quasi-demeaned linear regression model. We’ll do so by focusing on a case where \(T=4\).
fit_re <- df %>%estimator_fn_re()summary(fit_re)
Linear mixed model fit by REML ['lmerMod']
Formula: y_i ~ d_i + (1 | unit_id)
Data: df
REML criterion at convergence: 14733.6
Scaled residuals:
Min 1Q Median 3Q Max
-2.0705 -0.7249 0.1037 0.6245 2.0637
Random effects:
Groups Name Variance Std.Dev.
unit_id (Intercept) 5.462 2.337
Residual 1.085 1.041
Number of obs: 4000, groups: unit_id, 1000
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.29151 0.07661 3.805
d_i -0.51889 0.04550 -11.403
Correlation of Fixed Effects:
(Intr)
d_i -0.152
Notice the use of the term “fixed effects” is based on a traditional statistical definition–these are the regression parameters not assumed to come from a stochastic (statsitical) distribution, such as the random effects.
Let’s now extract out the variance parameter estimates \(\sigma^2_{\epsilon}\) and \(\sigma^2_u\):