Finding (square) matrices
In this note, we discuss finding matrices. By that I mean, suppose one observes some vectors in \(\mathbb{R}^d\), \(\{x_i, y_i\}_{i=1}^m\), one wishes to find the best matrix \(A\in\mathbb{R}^{d\times d}\) such that:
\begin{equation} y_i \approx Ax_i \end{equation}
for all \(i=1,2,\ldots, m\). The natural start is to first collect the vectors in matrices:
\[X = \begin{bmatrix} \mid & \mid & \mid & \mid\\ x_1 & x_2 & \cdots & x_m \\ \mid & \mid & \mid & \mid \end{bmatrix}, Y = \begin{bmatrix} \mid & \mid & \mid & \mid\\ y_1 & y_2 & \cdots & y_m \\ \mid & \mid & \mid & \mid \end{bmatrix}\]Then one aims to find
\[Y \approx AX\]Unconstrained problem, full rank
Perhaps, by solving:
\[\min_{A\in\mathbb{R}^{d\times d}}|| Y - AX ||_F^2\]Suppose for simplicity that \(X\) is full-rank. An SVD of \(X\) would give us:
\[X = U\Sigma V^T\]\(U\in\mathbb{R}^{d\times d}, V\in \mathbb{R}^{m\times m}\) are orthogonal matrices, \(\Sigma\in\mathbb{R}^{d\times m}\) is diagonal, containing the singular values (assume sorted).
Then we have that the solution should be \(A^* = YX^{\dagger}\), where \(X^{\dagger}\) is the (left) pseudoinverse of \(X\), such that \(Y - YX^{\dagger}X = 0\).
We might wish to use this matrix to predict things given a vector. The following MATLAB code implements just this: let \(u\) be a set of training data, \(u_0\) is some new observation. We generate a sequence of new observations after fitting.
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
Y = u(:, 2:m+1);
% fit matrix
A = Y*pinv(X);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = A*u_new(:,i-1);
end
end
More generally, we might say \(y_i \approx f(x_i)\)
And minimize:
\[\min_{f}\sum_{i=1}^m||y_i-f(x_i)||_2^2\]Let’s stick to linear \(f\) and say:
\[y_i \approx x_i + Bx_i\]Then in matrix form:
\[Y-X = BX\]whose solution follows from before, and \(B^* = (Y-X)X^{\dagger}\).
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
Y = u(:, 2:m+1);
% fit matrix
B = (Y-X)*pinv(X);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = u_new(:,i-1) + B*u_new(:,i-1);
end
end
We can add some other terms, such as the following
\[y_i \approx Ax_i + b\]and find \(A, b\). This is readily solved from before because we can re-write:
\[Y \approx \begin{bmatrix} A & b \end{bmatrix} \cdot \begin{bmatrix} X\\ \mathbf{1}^T \end{bmatrix} =: \tilde{A}\tilde{X}\]where \(\mathbf{1}\) is a vector of all \(1\)’s.
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
X_tilde = [X; ones(m,1)'];
Y = u(:, 2:m+1);
% fit matrix
A_tilde = Y*pinv(X_tilde);
A = A_tilde(:,1:end-1);
b = A_tilde(:,end);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = A*u_new(:,i-1) + b;
end
end
We can also write:
\[y_i \approx x_i + Bx_i + b\]which has the same form as before.
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
X_tilde = [X; ones(m,1)'];
Y = u(:, 2:m+1);
% fit matrix
B_tilde = (Y-X)*pinv(X_tilde);
B = B_tilde(:,1:end-1);
b = B_tilde(:,end);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = u_new(:,i-1) + B*u_new(:,i-1) + b;
end
end
We will continue the code in another note, where we find slightly more interesting matrices. Also, it is not always necessary to assume \(X\) is full rank.
Enjoy Reading This Article?
Here are some more articles you might like to read next: