Matching in the Context of Network Data:
Statistical inference without SIENA and ERGMS

George G. Vega Yon

November 17, 2016

Context: Identifying Contagion in Networks

Spatial Autocorrelation Models (SAR)

Spatial Autocorrelation Models (SAR) (cont.)

General overview

How Spurious Correlation Looks like

But before…

Source: Google Correlate,60

Rubin Causal Model (1978)

Rosenbaum and Rubin (1983b)


  1. Unconfoundedness \(W_i \perp (Y_i(0), Y_i(1))\;|X_i\) “Beyond \(X_i\), there are no (unbserved) characteristics of the individual associated both with the potential outcomes and treatment.”

  2. Overlap \(0 < pr(W_i=1|X_i=x) < 1,\;\forall x\) “The conditional distribution of \(X_i\) given \(W_i=0\) completely overlaps the conditional distribution of \(X_i\) given \(W_i=1\).” In other words, all estimated probabilities are strictly between 0 and 1.

    Unconfoundedness+Overlap = Strong Ignorability

Rosenbaum and Rubin (1983b) (cont.)

Usually, we would like to estimate the Average Treatment Effect (\(ATE\))

\[ \begin{align} ATE(x) &= E[Y_i(1) - Y_i(0)| X_i = x] \quad \mbox{(Which we can't estimate)}\\\\ & \mbox{From Unconfoundedness we have} \\ &= E[Y_i(1) | W_i = 1, X_i = x] - E[Y_i(0) | W_i = 0, X_i = x] \\\\ & \mbox{From overlap we get} \\ &= E[Y_i | W_i = 1, X_i = x] - E[Y_i | W_i = 0, X_i = x] \\ \end{align} \]

Which is something that we can estimate

Propensity score

Nearest Neighbor Matching Estimator

  1. Roughly, for each individual \(i\), we find a set \(\mathcal{J}_M(i) \subset \{1,\dots,N\}\) of \(M\) nearest neighbors such that all \(j\in \mathcal{J}_M(i)\) minimize \(\|X_i - X_j\|\) and \(W_i \neq W_j\) (so we match on the opposite group).

    This \(\|X_i - X_j\|\) can be \(|\hat p(X_i) - \hat p(X_j)|\) as well, and further, with \(M=1\).

  2. Then, for each individual \(i\) we generate the following

    \[ \begin{align} \hat Y_i(0) & = Y_i\times1\{W_i=0\} + \left(\frac{1}{M}\sum_{j\in\mathcal{J}_M(i)}Y_j\right)\times 1\{W_i = 1\} \\ \hat Y_i(1) & = Y_i\times1\{W_i=1\} + \left(\frac{1}{M}\sum_{j\in\mathcal{J}_M(i)}Y_j\right)\times 1\{W_i = 0\} \\ \end{align} \]

  3. Then we estimate the \(ATE = \frac{1}{N}\sum_{i=1}^N\left(\hat Y_i(1) - \hat Y_i(0)\right)\)

Regression including propensity score

In a world where observations are statistically independent, Imbens and Wooldridge (2009) suggest estimating the following model

\[ \begin{align} &\min_{\alpha_0,\beta_0}\sum_{i:W_i=0}\frac{(Y_i - \alpha_0 - \beta_0'(X_i - \bar X))^2}{\hat p(X_i)}\\ \tag{20} \mbox{and} &\\ &\min_{\alpha_1,\beta_1}\sum_{i:W_i=1}\frac{(Y_i - \alpha_1 - \beta_1'(X_i - \bar X))^2}{1 - \hat p(X_i)} \end{align} \]

Network exposure

Context of Aral et al. (2009)

Multiple treatments: Aral et al. (2009)


  1. Probability of treatment They use logit model to estimate propensity scores

    \[ p_{lt}(X_{it}) = \Pr\{W_{ilt} = 1|X_{it}\} = \frac{\exp(\alpha_{lt} + \beta_{lt}X_{it})}{1 + \exp(\alpha_{lt} + \beta_{lt}X_{it})} \]

  2. Removing outliers For each individual \(i\) in the set of treated they found a match \(j\) by solving the following problem \(j = \arg\min_{j: W_{ilt}\neq W_{jlt}} |p_{lt}(X_i) - p_{lt}(X_j) |\)

    subject to \(|p_{lt}(X_i) - p_{lt}(X_j) |\leq 2\sigma_d\), where \(\sigma_d = \sqrt{\hat V(|p_{lt}(X_i) - p_{lt}(X_j) |)}\)

  3. Pseudo treatment effect Then, out of the entire set of matches, they counted the number of individuals who had adopted the behavior \(n_+\) and who had not \(n_-\) and computed the ratio.

One important point. The authors do not discuss another important assumption that takes place in the Rubin Causal Model. The Stable Unit Treatment Value Assumption (SUTVA) states that the treatment effects have no spill overs (Sekhon 2007). This can be assumed in the dynamic part of the model arguying that any spill overs happend with time lags and thus won’t affect the outcomes at the current time.