Quasi-Demeaning: The Relationship Between Fixed and Random Effects

Setup

Assume the following data generation process:

\[ Y_{it} = \mathbf{X}_{i}'\beta + \tau D_{it} + U_i +\epsilon_{it} \]

where \(i\) indexes individual units and \(t\) indexes time, \(\mathbf{X}_i\) are unit-level attributes, \(D_{it}\) is a treatment indicator (set to 1 if the observation is treated and in the post-treatment period). Finally, \(U_i\) captures unobserved unit-level heterogeneity.

Estimation: Random vs. Fixed Effects

  • Random effects uses an approach called “quasi-demeaning” or “partial pooling.”

  • The random effects model can be represented as:

\[ \begin{align} (Y_{it}-\theta \bar{Y_i}) = (\mathbf{X}_{it}-\theta \bar{\mathbf{X}_i})'\beta + \tau (D_{it}-\theta \bar{D_i}) + (\epsilon_{it} - \theta \bar{\epsilon_i}) \quad \quad \quad Eq. 9 \end{align} \]

where

\[ \theta = 1 - \bigg [\frac{\sigma^2_\epsilon}{(\sigma^2_\epsilon + T\sigma^2_u)}\bigg ]^{1/2} \]

In the above, \(\sigma_u\) is the variance of unit-level heterogeneity, and \(\sigma_{\epsilon}\) is the variance of \(\epsilon_{it}\). And recall \(T\) is the total number of repeated unit-level observations in our panel (e.g., the total number of time periods, the total number of patients for a given physician, etc.)

  • When \(\theta = 0\), it just reduces pooled regression.

  • When \(\theta = 1\), it is equivalent to fixed effects regression. This only isolates variation within units to estimate the regression coefficients (i.e., units are only compared to themselves, which is why the unobserved heterogeneity is accounted for).

  • Essentially, when you fit a random effects regression, Stata estimates \(\theta\) and then plugs that estimate it into the above model.

  • (When you fit a fixed effects regression, Stata uses the demeaning approach–unless, of course, you manually fit a dummy variable model)

  • Often \(0 < \theta < 1\), hence the term “partial-pooling.” That is, the regression draws on both variation “within” units and variation “between” units.

  • The degree to which within and between variation is used depends on the data context.

  • Remember, when \(\theta=0\) the random effects regression reduces to pooled OLS. When is this the case?

    • If the term \(T\sigma^2_u\) is 0, i.e., \(\sigma^2_u=0\) or there is no variation in individual heterogeneity.
  • Remember, when \(\theta=1\) the random effects regression reduces to fixed effects. When is this the case?

    • If the term \(T\sigma^2_u\) gets super large (more formally, it would need to blast off towards infinity…).

    • If we have a very “large” panel of observations on each unit (i.e., large \(T\)) there is sufficient “within” variation and we really don’t need to do any pooling across units.

Setup

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(lme4)
Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack
library(broom)
library(knitr)
theme_set(theme_bw())
Code
params_panel <- list(
  N = 1000,
  T = 2,
  tx_time = 2, 
  rho_t = 0.8,
  beta_0 = 0.5,
  beta_1 = 2,
  tau = 0.5,
  p_d = 0.5
)

dgp_panel <- function(params) {
  with(params, {

    # Time effects
    t_ <-
      data.frame(t = 1:T,
                 gamma_t = arima.sim(n=T, list(ar = rho_t, order=c(1,0,0))) %>% as.vector())

    # Individual measures and effects
    i_ <-
      data.frame(
        unit_id = 1:N,
        x_i = rnorm(N, mean = 0, sd = 1),
        u_i = rnorm(N, mean = 0, sd = 1)) %>%
      rowwise() %>% # This allows us to get each value's pr_treated in the line below. 
      mutate(pr_treated = boot::inv.logit(u_i)) %>% 
      ungroup() %>%  # This undoes the rowwise 
      # Treatment indicator
      mutate(d_i = rbinom(N, size = 1, prob = pr_treated)) %>% 
      ungroup()

    crossing(unit_id = i_$unit_id,t = t_$t) %>%
      left_join(i_,"unit_id") %>%
      left_join(t_,"t") %>%
      mutate(d_i = ifelse(t<tx_time,0,d_i)) %>%
      mutate(y_i = beta_0 + beta_1 * x_i + tau * d_i + u_i + gamma_t + rnorm(N, mean = 0, sd = 1)) %>% 
      select(unit_id, t, y_i , x_i, d_i)
  })
}

estimator_fn_re <- function(df) {
  lmer(y_i ~  d_i + (1|unit_id) , df)
}

disc_fn_re <- function(fit) {
  fit %>% summary() %>% pluck("coefficients") %>%
    data.frame() %>%
    rownames_to_column() %>%
    janitor::clean_names() %>%
    filter(rowname=="d_i") %>%
    pull(estimate) %>%
    as.vector()
}

generate_estimate_discriminate_re <- function(params) {
  params %>% # Step 1: Parameterize the problem
      dgp_panel() %>%  # Step 2: Define the data generation process
        estimator_fn_re() %>%  # Step 3: Estimate 
          disc_fn_re() %>% # Step 4: Pull out what you need
            data.frame(tau_hat = .) # store the result as a data frame object
}

construct_dm <- function(df) {
  df_ <- 
    df %>% 
      # mutate(y_i = y_i + mean(y_i),
      #       d_i = d_i + mean(d_i)) %>% 
      #       group_by(t) %>% 
      #       mutate(y_i = y_i - mean(y_i),
      #       d_i = d_i - mean(d_i)) %>% 
            group_by(unit_id) %>% 
            mutate(y_i = y_i - mean(y_i),
            d_i = d_i - mean(d_i))

  return(df_)
}

estimate_dm <- function(df) {
  lm(y_i ~ d_i   , data = df)
}

disc_fn_dm = function(fit) {
  fit_ =broom::tidy(fit)   # This cleans up the fitted regression object
  out =fit_ %>% 
    filter(term=="d_i") %>% 
    pull(estimate)
  
  return(out)
}


set.seed(123)
df_c <- 
  modifyList(params_panel,list(T=400,tx_time=200)) %>% 
  dgp_panel() %>% 
  mutate(index = t-200) %>% 
  mutate(T2= as.integer(index %in% c(-1,0))) %>% 
  mutate(T4 = as.integer(index > -3 & index <2)) %>%
  mutate(T10 = as.integer(index >= -5 & index<5)) %>% 
  mutate(T20 = as.integer(index >= -10 & index <10 )) %>%
  mutate(T50 = 1 ) 

df <- df_c %>% filter(T4==1)

params_panel %>% 
  dgp_panel() %>% 
  head() %>% 
  kable(digits=3)
unit_id t y_i x_i d_i
1 1 2.855 0.502 0
1 2 2.825 0.502 0
2 1 5.893 1.434 0
2 2 7.620 1.434 1
3 1 3.225 2.166 0
3 2 6.555 2.166 0

Let’s first show the equivalance between the random effect model and a quasi-demeaned linear regression model. We’ll do so by focusing on a case where \(T=4\).

fit_re <- 
  df %>% 
  estimator_fn_re()

summary(fit_re)
Linear mixed model fit by REML ['lmerMod']
Formula: y_i ~ d_i + (1 | unit_id)
   Data: df

REML criterion at convergence: 14733.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0705 -0.7249  0.1037  0.6245  2.0637 

Random effects:
 Groups   Name        Variance Std.Dev.
 unit_id  (Intercept) 5.462    2.337   
 Residual             1.085    1.041   
Number of obs: 4000, groups:  unit_id, 1000

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.29151    0.07661   3.805
d_i         -0.51889    0.04550 -11.403

Correlation of Fixed Effects:
    (Intr)
d_i -0.152
  • Notice the use of the term “fixed effects” is based on a traditional statistical definition–these are the regression parameters not assumed to come from a stochastic (statsitical) distribution, such as the random effects.

Let’s now extract out the variance parameter estimates \(\sigma^2_{\epsilon}\) and \(\sigma^2_u\):

get_re <- function(fit) {
    x <- VarCorr(fit) %>% data.frame() %>% 
    tibble() %>% 
    select(grp,var1,vcov)
}

re_est <- 
  fit_re %>% 
  get_re()
grp var1 vcov
unit_id (Intercept) 5.462082
Residual NA 1.084670

We can then use these to calculatate \(\theta\):

sigma2_e = fit_re %>% get_re() %>% filter(grp=="Residual") %>% pull(vcov)
sigma2_ui = fit_re %>% get_re() %>% filter(grp=="unit_id") %>% pull(vcov)
theta_i = 1 - sqrt(sigma2_e/(4*sigma2_ui + sigma2_e))

theta_i
[1] 0.7825205

Next we will create a quasi-demeaned version of our input data:

df_ <- 
  df %>% 
  group_by(unit_id) %>% 
  mutate(y_i = y_i - theta_i * mean(y_i),
         d_i = d_i - theta_i * mean(d_i))

We now fit a quasi-demeaned

fit_qdm <- lm(y_i ~ d_i, data = df_)
summary(fit_qdm)

Call:
lm(formula = y_i ~ d_i, data = df_)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1586 -0.7768  0.0980  0.7425  3.2119 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.06340    0.01666   3.805 0.000144 ***
d_i         -0.51889    0.04550 -11.403  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.041 on 3998 degrees of freedom
Multiple R-squared:  0.0315,    Adjusted R-squared:  0.03126 
F-statistic:   130 on 1 and 3998 DF,  p-value: < 2.2e-16

Now let’s compare:

res <- 
  fit_qdm %>% tidy() %>% 
  select(term,estimate_qdm = estimate) 

cbind(res,df %>% estimator_fn_re() %>% summary() %>% pluck("coefficients") %>% 
  data.frame() %>% 
  select(estimate_re = Estimate)) %>% 
  mutate(theta = theta_i) %>% 
  select(term, estimate_re, estimate_qdm, theta) %>% 
  remove_rownames() %>% 
  kable(digits = 3)
term estimate_re estimate_qdm theta
(Intercept) 0.292 0.063 0.783
d_i -0.519 -0.519 0.783
Code
get_theta <- function(fit) {
  sigma2_e = fit %>% get_re() %>% filter(grp=="Residual") %>% pull(vcov)
  sigma2_ui = fit %>% get_re() %>% filter(grp=="unit_id") %>% pull(vcov)
  theta = 1 - sqrt(sigma2_e/(4*sigma2_ui + sigma2_e))
  theta 
}

t_qdm <- tibble(
  T2 = df_c %>% filter(T2==1) %>% estimator_fn_re() %>% disc_fn_re(),
  theta2  = df_c %>% filter(T2==1) %>% estimator_fn_re() %>% get_theta(),
  T2fe = df_c %>% filter(T2==1) %>% construct_dm() %>% estimate_dm() %>% disc_fn_dm()  ,
  T4 = df_c %>% filter(T4==1) %>% estimator_fn_re() %>% disc_fn_re(),
  theta4  = df_c %>% filter(T4==1) %>% estimator_fn_re() %>% get_theta(),
  T4fe = df_c %>% filter(T4==1) %>% construct_dm() %>% estimate_dm() %>% disc_fn_dm()    ,
  T10 = df_c %>% filter(T10==1) %>% estimator_fn_re() %>% disc_fn_re(),
  theta10  = df_c %>% filter(T10==1) %>% estimator_fn_re() %>% get_theta(),  
  T10fe = df_c %>% filter(T10==1) %>% construct_dm() %>% estimate_dm() %>% disc_fn_dm()   ,
  T20 = df_c %>% filter(T20==1) %>% estimator_fn_re() %>% disc_fn_re(),
  theta20  = df_c %>% filter(T20==1) %>% estimator_fn_re() %>% get_theta(),  
  T20fe = df_c %>% filter(T20==1) %>% construct_dm() %>% estimate_dm() %>% disc_fn_dm()  ,
  T400 = df_c %>%  estimator_fn_re() %>% disc_fn_re(),
  theta400 =  df_c %>%  estimator_fn_re() %>% get_theta(),
  T400fe = df_c %>% construct_dm() %>% estimate_dm() %>% disc_fn_dm()
)
  
t_qdm %>% 
  select(-starts_with("theta")) %>% 
  gather(measure,value) %>% 
  mutate(method = ifelse(grepl("fe",measure),"FE","RE")) %>% 
  mutate(measure = gsub("fe","",measure)) %>% 
  mutate(measure = case_when(measure == "T2" ~ "T = 2",
                             measure == "T4" ~ "T = 4",
                             measure == "T10" ~ "T = 10",
                             measure == "T20" ~ "T = 20",
                             measure == "T400" ~ "T = 400")) %>% 
  rename(time_periods = measure) %>% 
  spread(method, value) %>% 
  mutate(time_periods = factor(time_periods, levels = c("T = 2","T = 4", "T = 10","T = 20" , "T = 400"))) %>% 
  arrange(time_periods) %>% 
  kable()