A near-curated list of math and statistics problems with detailed writeups.
Adding noise to regression is equivalent to regularization
In this note, we explore adding noise to regression problems. Multiplicative Gaussian noise In linear / ridge regression, let $X, y$ be data; we will assume that $X$ is standardized, thus $(1/N)\text{diag}(X^TX)=I$. Consider the following multiplicative perturbation $$ x_{ij} \leftarrow \epsilon_{ij}x_{ij} $$ where $\epsilon_{ij}\sim\mathcal{N}(1,\sigma)$. We will demonstrate that this is equivalent to Tikhonov regularization to the OLS problem $y\sim X\beta$. In the infinite data case, consider $$ \min_{\beta}\mathbb{E}\left[ \| y - (E\odot X)\beta \|^2 \right] $$ where $E$ is the Gaussian matrix and $\odot$ denotes Hadamard product. ...
Conditional expectation is the unique minimizer
In this note, we briefly prove a well-known claim. Question Claim: let $X,Y$ be real-valued random variables. Let $f$ be a differentiable function. Then $\mathbb{E}[Y|X]$ is the (almost sure) unique minimizer of $$ \min_f\mathbb{E}\left[ (Y-f(X))^2 \right] $$ Solution We expand $$ \mathbb{E}\left[ (Y-f(X))^2 \right] = \underbrace{\mathbb{E}\left[ (Y-\mathbb{E}[Y|X])^2 \right]}_{\text{indep. of $f$}} + \mathbb{E}\left[ (f(X)-\mathbb{E}[Y|X])^2 \right] $$ The minimizer over all $f$ for $\mathbb{E}\left[ Y-f(X))^2\right]$ is equivalent to that of $\mathbb{E}\left[ (f(X)-\mathbb{E}[Y|X])^2\right]$. The second term is almost surely nonnegative, and becomes zero at $f(X) = \mathbb{E}[Y|X]$. ...
Correlation changes after conditioning
Question If we have two features, Feature A and Feature B. In a large and sufficiently diverse population, you find that their correlation is $0$. If you recompute the correlation on a restricted subpopulation, should you still expect it to be $0$? Solution Not necessarily. Correlation measures linear association, and depends on the population being sampled. Even if Feature A and Feature B have correlation $0$ in the full population, the correlation can change after restricting to a subpopulation. This can happen because the two features may have a nonlinear relationship, and the full population may have enough symmetry that the overall correlation cancels out. ...
Reservoir sampling
Question Given a set or stream of values, consider an algorithm to randomly generate sample values. Solution If the dataset is very large / stream, we describe reservoir sampling in this note. Goal: sample $k$ items uniformly at random from $n \gg 1$ values, without storing all $n$ values in memory. Base case $k=1$: Store the first item as the current sample. When the $i$-th item arrives $(i \ge 2)$, replace the current sample with probability $1/i$. After processing all items, each item has probability exactly $1/n$. General $k$ case Put the first $k$ items into the reservoir. For item $i=k+1,\dots,n$, generate a random integer $j$ uniformly from $\{1,\dots,i\}$. If $j \le k$, replace the $j$-th element in the reservoir by the new item. After processing all $n$ items, each item is included in the reservoir with probability $k/n$. ...
Hypothesis testing for coin toss
We consider hypothesis testing in this note. Question What is $p$-value? Solution A $p$-value is the probability, assuming null hypothesis $H_0$ is true, of observing a test statistic at least as extreme as the one actually obtained. Suppose the observed test statistic is $t$, and under $H_0$ the statistic $T$ has a known reference distribution. For a right-tailed test, $$ p = \mathbb{P}(T \ge t \mid H_0). $$ For a left-tailed test, $$ p = \mathbb{P}(T \le t \mid H_0). $$ ...
Rotate images
Given an image stored in a 2D Python array, we provide code for rotation by different degrees. Clockwise by 90 degrees Transpose then reverse each row def rotate(image): n = len(image) for i in range(n): for j in range(i+1, n): image[i][j], image[j][i] = image[j][i], image[i][j] for i in range(n): row = image[i] l, r = 0, n-1 while l < r: row[l], row[r] = row[r], row[l] return image Counterclockwise by 90 degrees Reverse each row then transpose Transpose is 180 degrees def rotate_counterclockwise(image): n = len(image) for i in range(n): row = image[i] l, r = 0, n-1 while l < r: row[l], row[r] = row[r], row[l] l += 1 r -= 1 for i in range(n): for j in range(i+1, n): image[i][j], image[j][i] = image[j][i], image[i][j] return image
Max profit with bid and ask
We edit the popular problem series of best time to buy and sell stocks to allow differing buy and sell prices. def max_profit(buy, sell): n = len(buy) dp_holding, dp_not_holding = 0, 0 dp_holding = -sell[0] for i in range(1, n): tmp = dp_holding dp_holding = max(dp_holding, dp_not_holding - buy[i]) dp_not_holding = max(tmp + sell[i], dp_not_holding) return dp_not_holding
Remove duplicate from unsorted array
We provide an implementation of removing duplicated values from an (unsorted) array. Sorted version (link) def remove_duplicate(arr): seen = [] for num in arr: seen.append(num) return seen print(remove_duplicate([1, 2, 2, 4, 4])) print(remove_duplicate([5, 4, 1, 5])) print(remove_duplicate([5, 2, 5, 1])) [1, 2, 4] [5, 4, 1] [5, 2, 1] Time and space: $O(N)$. Not possible to be $O(1)$ since we do not know where the duplicate is located and no ordering information. Extra space is needed to check if value has been encountered before.
O(1) methods for test statistics
In this note, we provide code to compute useful statistics in a dataset in $O(1)$ space. Sliding window max and min from collections import deque def sliding_window_maximum(nums, k): """ nums: observed data stream k: window size ( k <= len(nums) ) """ res = [] q = deque() for i in range(k): num = nums[i] while q and nums[q[-1]] <= num: q.pop() q.append(i) res.append(nums[q[0]]) for i in range(k, len(nums)): num = nums[i] # adjust window if q and q[0] <= i-k: q.popleft() while q and nums[q[-1]] <= num: q.pop() q.append(i) res.append(nums[q[0]]) return res # test nums = [1, 1, 2, 3, -1, 3, 2] k = 2 # => [1, 2, 3, 3, 3, 3] print(sliding_window_maximum(nums, k)) def sliding_window_minimum(nums, k): nums_neg = [-num for num in nums] neg_res = sliding_window_maximum(nums_neg, k) return [-num for num in neg_res] nums = [1, 1, 2, 3, -1, 3, 2] k = 2 # => [1, 1, 2, -1, -1, 2] print(sliding_window_minimum(nums, k)) The outputs are the following ...
Expected loops in shoelaces
In this note, we solve a problem from Introduction to Probability (Blitzstein, Hwang) Question There are 100 shoelaces in a box. At each stage, you pick two random ends and tie them together. Either this results in a longer shoelace (if the two ends came from different pieces), or it results in a loop (if the two ends came from the same piece). What are the expected number of steps until everything is in loops, and the expected number of loops after everything is in loops? Solution Start with small case, let $f(n)$ be the number of loops that can be formed. $\mathbb{E}[f(1)] = 1$, since with probability 1 the two ends of the only shoelace are chosen. Then, ...