# Finding (square) matrices

In this note, we discuss finding matrices. By that I mean, suppose one observes some vectors in \(\mathbb{R}^d\), \(\{x_i, y_i\}_{i=1}^m\), one wishes to find the best matrix \(A\in\mathbb{R}^{d\times d}\) such that:

\begin{equation} y_i \approx Ax_i \end{equation}

for all \(i=1,2,\ldots, m\). The natural start is to first collect the vectors in matrices:

\[X = \begin{bmatrix} \mid & \mid & \mid & \mid\\ x_1 & x_2 & \cdots & x_m \\ \mid & \mid & \mid & \mid \end{bmatrix}, Y = \begin{bmatrix} \mid & \mid & \mid & \mid\\ y_1 & y_2 & \cdots & y_m \\ \mid & \mid & \mid & \mid \end{bmatrix}\]Then one aims to find

\[Y \approx AX\]## Unconstrained problem, full rank

Perhaps, by solving:

\[\min_{A\in\mathbb{R}^{d\times d}}|| Y - AX ||_F^2\]Suppose for simplicity that \(X\) is full-rank. An SVD of \(X\) would give us:

\[X = U\Sigma V^T\]\(U\in\mathbb{R}^{d\times d}, V\in \mathbb{R}^{m\times m}\) are orthogonal matrices, \(\Sigma\in\mathbb{R}^{d\times m}\) is diagonal, containing the singular values (assume sorted).

Then we have that the solution should be \(A^* = YX^{\dagger}\), where \(X^{\dagger}\) is the (left) pseudoinverse of \(X\), such that \(Y - YX^{\dagger}X = 0\).

We might wish to use this matrix to predict things given a vector. The following MATLAB code implements just this: let \(u\) be a set of training data, \(u_0\) is some new observation. We generate a sequence of new observations after fitting.

```
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
Y = u(:, 2:m+1);
% fit matrix
A = Y*pinv(X);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = A*u_new(:,i-1);
end
end
```

More generally, we might say \(y_i \approx f(x_i)\)

And minimize:

\[\min_{f}\sum_{i=1}^m||y_i-f(x_i)||_2^2\]Let’s stick to linear \(f\) and say:

\[y_i \approx x_i + Bx_i\]Then in matrix form:

\[Y-X = BX\]whose solution follows from before, and \(B^* = (Y-X)X^{\dagger}\).

```
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
Y = u(:, 2:m+1);
% fit matrix
B = (Y-X)*pinv(X);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = u_new(:,i-1) + B*u_new(:,i-1);
end
end
```

We can add some other terms, such as the following

\[y_i \approx Ax_i + b\]and find \(A, b\). This is readily solved from before because we can re-write:

\[Y \approx \begin{bmatrix} A & b \end{bmatrix} \cdot \begin{bmatrix} X\\ \mathbf{1}^T \end{bmatrix} =: \tilde{A}\tilde{X}\]where \(\mathbf{1}\) is a vector of all \(1\)’s.

```
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
X_tilde = [X; ones(m,1)'];
Y = u(:, 2:m+1);
% fit matrix
A_tilde = Y*pinv(X_tilde);
A = A_tilde(:,1:end-1);
b = A_tilde(:,end);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = A*u_new(:,i-1) + b;
end
end
```

We can also write:

\[y_i \approx x_i + Bx_i + b\]which has the same form as before.

```
function u_new = predict(u0, u, m)
% ----------
% u: training data
% u0: initial condition
% m: number of snapshots to use
% ----------
[d, n] = size(u);
% take m snapshots
X = u(:, 1:m);
X_tilde = [X; ones(m,1)'];
Y = u(:, 2:m+1);
% fit matrix
B_tilde = (Y-X)*pinv(X_tilde);
B = B_tilde(:,1:end-1);
b = B_tilde(:,end);
% generate m new vectors
u_new = zeros(d, m);
u_new(:, 1) = u0;
for i = 2:m
u_new(:,i) = u_new(:,i-1) + B*u_new(:,i-1) + b;
end
end
```

We will continue the code in another note, where we find slightly more interesting matrices. Also, it is not always necessary to assume \(X\) is full rank.

## Enjoy Reading This Article?

Here are some more articles you might like to read next: