Adding noise to regression is equivalent to regularization

In this note, we explore adding noise to regression problems.

Multiplicative Gaussian noise

In linear / ridge regression, let $X, y$ be data; we will assume that $X$ is standardized, thus $(1/N)\text{diag}(X^TX)=I$. Consider the following multiplicative perturbation

$$ x_{ij} \leftarrow \epsilon_{ij}x_{ij} $$ where $\epsilon_{ij}\sim\mathcal{N}(1,\sigma)$. We will demonstrate that this is equivalent to Tikhonov regularization to the OLS problem $y\sim X\beta$.

In the infinite data case, consider

$$ \min_{\beta}\mathbb{E}\left[ \| y - (E\odot X)\beta \|^2 \right] $$ where $E$ is the Gaussian matrix and $\odot$ denotes Hadamard product.

We expand

$$ \begin{align*} \|y - (E\odot X)\beta\|^2 &= (y - (E\odot X)\beta)^{\top}(y - (E\odot X)\beta) \\ &= y^{\top}y - 2y^{\top}(E\odot X)\beta + \beta^{\top}(E\odot X)^{\top}(E\odot X)\beta \end{align*} $$

For the quadratic term, we define

$$ A := (E\odot X)^{\top}(E\odot X) $$ where each entry

$$ A_{ij} = \sum_k \epsilon_{ki}\epsilon_{kj}x_{ki}x_{kj} $$

Taking expectation on both sides, we have

$$ \begin{align*} \mathbb{E}\left[A_{ij}\right] &= \sum_k\mathbb{E}\left[\epsilon_{ki}\epsilon_{kj}\right]x_{ki}x_{kj} \end{align*} $$

For $i\neq j$, $\mathbb{E}\left[\epsilon_{ki}\epsilon_{kj}\right]=1$. For $i=j$, they are not independent, and we get $\mathbb{E}\left[\epsilon_{ki}\epsilon_{kj}\right]=1+\sigma^2$. This implies that

$$ \begin{align*} \mathbb{E}[A] &= (\mathbf{1}\mathbf{1}^{\top} + \sigma^2I)\odot (X^{\top}X) \\ &= X^{\top}X + \text{diag}(\sigma^2X^{\top}X) \end{align*} $$

Finally,

$$ \begin{align*} \mathbb{E}\left[ \| y - (E\odot X)\beta \|^2 \right] &= \mathbb{E}\left[ y^Ty - 2y^T(E\odot X)\beta + \beta^TA\beta \right] \\ &= y^Ty - 2y^TX\beta + \beta^TX^TX\beta + \beta^T\text{diag}(\sigma^2X^TX)\beta \\ &= \|y - X\beta\|^2 + \sigma^2\| \Gamma\beta \|^2 \end{align*} $$ where $\Gamma := \sqrt{\text{diag}(\sigma^2X^TX)}$. By standardized data $X$, we have $\Gamma = N\cdot I$. Therefore, adding multiplicative noise to the problem is equivalent to ridge regression with $\lambda = N\sigma^2$.

Additive noise

We consider a simple linear regression case.

$$ y = \beta_0 + \beta_1x + e, \quad \mathbb{E}\left[e |x\right] = 0 $$

The infinite-data case gives

$$ \beta_1 = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}. $$

Given data $\{(x_i, y_i)\}$, define $z_i = x_i + \epsilon_i$ where $\epsilon_i$ is independent of $X, e$; and is $\mathcal{N}(0,\sigma^2)$. The new regresion coefficient would be

$$ \begin{align*} \tilde{\beta}_1 &= \frac{\text{Cov}(Y,Z)}{\text{Var}(Z)} \\ &= \frac{\text{Cov}(Y,X+\epsilon)}{\text{Var}(X+\epsilon)} \\ &= \frac{\text{Cov}(Y,X)}{\text{Var}(X) + \sigma^2} \\ &= \beta_1\cdot\frac{\text{Var}(X)}{\text{Var}(X)+\sigma^2} \end{align*} $$

Therefore, $\sigma^2\rightarrow\infty$ will cause $\beta\rightarrow 0$, thus providing shrinkage. We will discuss the multiple linear regression later.s

Multiplicative Gaussian noise#

Additive noise#

Multiplicative Gaussian noise

Additive noise