Learning the Defect in Reduced-order PDF Method

This is a note that documents an idea that we are exploring.

Consider the Van der Pol random ODE (RODE) model:

\[\begin{cases}\dot{x}(t) &= y(t) \\ \dot{y}(t) &= \mu(1-x(t)^2)y(t) - x(t) + \sigma\cdot \eta(t) \end{cases}\]

where $\eta$ is an Ornstein-Uhlenbeck (OU) process representing noisy perturbations of the system. In particular, it is a scaled temporal transform of the standard Wiener process. One may include the stochastic differential equation for $\eta$ into the above system, whose probability density should be decribed by the Fokker-Planck equation, joint in all state variables $x, y, \eta$.

Suppose we are only interested in the probability density of $y$. Recall:

\[f_{x,y,\xi} = f_{x,\xi|y}\cdot f_{y}\]

where $f_{y|x,\xi}$ denotes a conditional density. Then we notice that we may derive an equation for $f_y$ only, by integrating out the variables $x, \xi$ in the joint Fokker-Planck equation, which should look like the following:

\begin{equation} \label{eqn:reduced-order-pdf} \partial_{t} f_y + \partial_{y}\bigg[ (\mu \cdot y \cdot \mathbb{E}[1-x^2|y] - \mathbb{E}[x|y] + \sigma \cdot \mathbb{E}[\eta | y])f_y \bigg] = 0 \end{equation}

or generically:

\[\partial_{t}f_y + \mathcal{L}f_y = 0\]

where $\mathcal{L}$ is a linear operator.

To solve the above equation, there is an issue that we need to deal with. The problem is that the integrals $\mathbb{E}[x|y], \mathbb{E}[1-x^2|y]$ are not known analytically. Approximations are possible by either making assumptions about the noise , upon such assumptions, the integral can be reduced to something more tractable. Or abstract away the math by using “data-driven methods” , replacing the differential term with some other differential terms using something called Pawula’s theorem (does not always apply). One can also approximate the conditional expectation terms with simulated scatter plot data , which tends to place a lot fewer assumptions (see for instance, for synthetic numerical examples).

I am assuming that for practical engineering applications, the goal is to reduce the amount of “sitting in front of your desk with a pen and think for half a day”-type of experiences, and solve numerical problems that are rather generic (i.e. I throw you a system of stochastic equations, you give me the answer). Therefore, perhaps the methodology adopted in are too problem-dependent (we would have to repeat the analytic derivations everytime we see a new problem). To avoid that, let us do regression, that is to solve:

\begin{equation} \label{eqn:regression-problem} \mathbb{E}[z(x, y, \xi)|y] \approx m(y;\theta) \end{equation}

The above has implicit time dependence. $m(y;\theta)$ is a parameterized function approximator that fits some observed values $\{z_i, y_i\}_{i=1}^{N}$. A good question is that why this is necessarily better than blindly plugging data of $\{y_i\}_{i=1}^N$ into a kernel density estimator. Showing this rigorously requires a few more posts, however, the reason underlies “physics-informed” methods, which is an interpolation between using hand calculations to derive formulas, and using nonparameteric methods and not caring about formulas at all.

Suppose for now that we indeed use the regression method mentioned in \eqref{eqn:regression-problem}. Inevitably there will be errors, and that is kind of a problem since you are solving a PDE \eqref{eqn:reduced-order-pdf} with the wrong coefficients. At best (suppose we do not talk about pure math-type existence arguments), the numerical solution will have errors accumulating over time of the integration.

Suppose:

\begin{equation} \mathbb{E}[z|y] = m(y;\theta) + e(y; t) \end{equation}

where $e$ is the (time-dependent) redisual of the estimation problem \eqref{eqn:regression-problem}. The residual enters through the PDE formally as the following:

\[\partial_t f_y + \widehat{\mathcal{L}}f_y = \underbrace{-\widehat{\mathcal{E}}f_y}_{\text{''model defect''}}\]

where $\mathcal{L} = \widehat{\mathcal{L}} + \widehat{\mathcal{E}}$, i.e. we make some error in the linear operator due to estimation.

One way to deal with this defect is to use observed values of $\{y_i\}$ to correct what’s technically predicted by the Fokker-Planck equation/reduced order equation. This is the same idea where one tries to use PDE solutions to solve inverse problems in physics-informed neural nets. We propose to learn the defect term as follows, put $S(y,t) := -\widehat{\mathcal{E}}f_y$.

If $S(y,t)$ were known exactly, one would use operator splitting (e.g. Lie-Trotter) to solve:

\[\begin{cases} \partial_t f_y^{(1)} + \widehat{\mathcal{L}}f_y^{(1)} = 0 \\ f_y^{(1)}(y, t_n) = f_y(y,t_n) \end{cases}\]

where $f_y$ is an initial condition at $t=t_n$, we are propagating the PDE to time $t_{n+1}$. And then solve

\[\begin{cases} \partial_t f_y^{(2)} = S(y,t) \\ f_y^{(2)}(y, t_n) = f_y^{(1)}(y, t_{n+1}) \end{cases}\]

$f_y^{(2)}(y, t_{n+1})$ is our numerical solution at $t=t_{n+1}$.

So we reviewed Lie-Trotter scheme. But, in the practical problem that we described, we do not know $S(y,t)$. However, we can reverse-engineer it by considering the following

\[\begin{cases} \partial_t f_y^{(1)} + \widehat{\mathcal{L}}f_y^{(1)} = 0 \\ f_y^{(1)}(y, t_n) = \widehat{f}_y(y,t_n) \end{cases}\]

where $\widehat{f}_y$ is an empirical distribution from observed data $\{y_i\}$. Then the model defect can be computed just from numerical differentiation:

\[\frac{\widehat{f}_{y}(y, t_{n+k}) - f_y^{(1)}(y, t_{n+k})}{k\Delta t} = \frac12(\underbrace{S(y, t_n)}_{\text{available at step $n$}} + S(y, t_{n+k}))\]

then:

\[S(y, t_{n+k}) = 2\bigg(\frac{\widehat{f}_{y}(y, t_{n+k}) - f_y^{(1)}(y, t_{n+k})}{k\Delta t}\bigg) - S(y, t_n)\]

This method is documented in my related papers (2024) , however, theoretical properties are pending thorough investigation.

Potential Use Cases

One observes some histogram (e.g. sample attribute from a population), and suspects that this histogram changes in time according to some PDE model, and wish to make predictions cheaply.
Obtaining samples by accept-rejection sampling cheaply of a pre-defined quantity of interest depending on a set of stochastic constraints, and computing probabilistic profiles (e.g. moments), which is not possible with brute-force simulation, e.g. damages in rare cascading failures.