Finding (square) matrices | Hongli (Bob) Zhao

In this note, we discuss finding matrices. By that I mean, suppose one observes some vectors in \(\mathbb{R}^d\), \(\{x_i, y_i\}_{i=1}^m\), one wishes to find the best matrix \(A\in\mathbb{R}^{d\times d}\) such that:

\begin{equation} y_i \approx Ax_i \end{equation}

for all \(i=1,2,\ldots, m\). The natural start is to first collect the vectors in matrices:

\[X = \begin{bmatrix} \mid & \mid & \mid & \mid\\ x_1 & x_2 & \cdots & x_m \\ \mid & \mid & \mid & \mid \end{bmatrix}, Y = \begin{bmatrix} \mid & \mid & \mid & \mid\\ y_1 & y_2 & \cdots & y_m \\ \mid & \mid & \mid & \mid \end{bmatrix}\]

Then one aims to find

\[Y \approx AX\]

Unconstrained problem, full rank

Perhaps, by solving:

\[\min_{A\in\mathbb{R}^{d\times d}}|| Y - AX ||_F^2\]

Suppose for simplicity that \(X\) is full-rank. An SVD of \(X\) would give us:

\[X = U\Sigma V^T\]

\(U\in\mathbb{R}^{d\times d}, V\in \mathbb{R}^{m\times m}\) are orthogonal matrices, \(\Sigma\in\mathbb{R}^{d\times m}\) is diagonal, containing the singular values (assume sorted).

Then we have that the solution should be \(A^* = YX^{\dagger}\), where \(X^{\dagger}\) is the (left) pseudoinverse of \(X\), such that \(Y - YX^{\dagger}X = 0\).

We might wish to use this matrix to predict things given a vector. The following MATLAB code implements just this: let \(u\) be a set of training data, \(u_0\) is some new observation. We generate a sequence of new observations after fitting.

function u_new = predict(u0, u, m)
    % ----------
    % u: training data
    % u0: initial condition
    % m: number of snapshots to use
    % ----------
    [d, n] = size(u);
    % take m snapshots
    X = u(:, 1:m);
    Y = u(:, 2:m+1);
    % fit matrix
    A = Y*pinv(X);
    % generate m new vectors
    u_new = zeros(d, m);
    u_new(:, 1) = u0;
    for i = 2:m
        u_new(:,i) = A*u_new(:,i-1);
    end
end

More generally, we might say \(y_i \approx f(x_i)\)

And minimize:

\[\min_{f}\sum_{i=1}^m||y_i-f(x_i)||_2^2\]

Let’s stick to linear \(f\) and say:

\[y_i \approx x_i + Bx_i\]

Then in matrix form:

\[Y-X = BX\]

whose solution follows from before, and \(B^* = (Y-X)X^{\dagger}\).

function u_new = predict(u0, u, m)
    % ----------
    % u: training data
    % u0: initial condition
    % m: number of snapshots to use
    % ----------
    [d, n] = size(u);
    % take m snapshots
    X = u(:, 1:m);
    Y = u(:, 2:m+1);
    % fit matrix
    B = (Y-X)*pinv(X);
    % generate m new vectors
    u_new = zeros(d, m);
    u_new(:, 1) = u0;
    for i = 2:m
        u_new(:,i) = u_new(:,i-1) + B*u_new(:,i-1);
    end
end

We can add some other terms, such as the following

\[y_i \approx Ax_i + b\]

and find \(A, b\). This is readily solved from before because we can re-write:

\[Y \approx \begin{bmatrix} A & b \end{bmatrix} \cdot \begin{bmatrix} X\\ \mathbf{1}^T \end{bmatrix} =: \tilde{A}\tilde{X}\]

where \(\mathbf{1}\) is a vector of all \(1\)’s.

function u_new = predict(u0, u, m)
    % ----------
    % u: training data
    % u0: initial condition
    % m: number of snapshots to use
    % ----------
    [d, n] = size(u);
    % take m snapshots
    X = u(:, 1:m);
    X_tilde = [X; ones(m,1)'];
    Y = u(:, 2:m+1);
    % fit matrix
    A_tilde = Y*pinv(X_tilde);
    A = A_tilde(:,1:end-1);
    b = A_tilde(:,end);
    % generate m new vectors
    u_new = zeros(d, m);
    u_new(:, 1) = u0;
    for i = 2:m
        u_new(:,i) = A*u_new(:,i-1) + b;
    end
end

We can also write:

\[y_i \approx x_i + Bx_i + b\]

which has the same form as before.

function u_new = predict(u0, u, m)
    % ----------
    % u: training data
    % u0: initial condition
    % m: number of snapshots to use
    % ----------
    [d, n] = size(u);
    % take m snapshots
    X = u(:, 1:m);
    X_tilde = [X; ones(m,1)'];
    Y = u(:, 2:m+1);
    % fit matrix
    B_tilde = (Y-X)*pinv(X_tilde);
    B = B_tilde(:,1:end-1);
    b = B_tilde(:,end);
    % generate m new vectors
    u_new = zeros(d, m);
    u_new(:, 1) = u0;
    for i = 2:m
        u_new(:,i) = u_new(:,i-1) + B*u_new(:,i-1) + b;
    end
end

We will continue the code in another note, where we find slightly more interesting matrices. Also, it is not always necessary to assume \(X\) is full rank.

Unconstrained problem, full rank

Enjoy Reading This Article?