Learning the Defect in Reduced-order PDF Method
This is a note that documents an idea that we are exploring.
Consider the Van der Pol random ODE (RODE) model:
\[\begin{cases}\dot{x}(t) &= y(t) \\ \dot{y}(t) &= \mu(1-x(t)^2)y(t) - x(t) + \sigma\cdot \eta(t) \end{cases}\]where \(\eta\) is an Ornstein-Uhlenbeck (OU) process representing noisy perturbations of the system. In particular, it is a scaled temporal transform of the standard Wiener process. One may include the stochastic differential equation for \(\eta\) into the above system, whose probability density should be decribed by the Fokker-Planck equation, joint in all state variables \(x, y, \eta\).
Suppose we are only interested in the probability density of \(y\). Recall:
\[f_{x,y,\xi} = f_{x,\xi|y}\cdot f_{y}\]where \(f_{y|x,\xi}\) denotes a conditional density. Then we notice that we may derive an equation for \(f_y\) only, by integrating out the variables \(x, \xi\) in the joint Fokker-Planck equation, which should look like the following:
\begin{equation} \label{eqn:reduced-order-pdf} \partial_{t} f_y + \partial_{y}\bigg[ (\mu \cdot y \cdot \mathbb{E}[1-x^2|y] - \mathbb{E}[x|y] + \sigma \cdot \mathbb{E}[\eta | y])f_y \bigg] = 0 \end{equation}
or generically:
\[\partial_{t}f_y + \mathcal{L}f_y = 0\]where \(\mathcal{L}\) is a linear operator.
To solve the above equation, there is an issue that we need to deal with. The problem is that the integrals \(\mathbb{E}[x|y], \mathbb{E}[1-x^2|y]\) are not known analytically. Approximations are possible by either making assumptions about the noise
I am assuming that for practical engineering applications, the goal is to reduce the amount of “sitting in front of your desk with a pen and think for half a day”-type of experiences, and solve numerical problems that are rather generic (i.e. I throw you a system of stochastic equations, you give me the answer). Therefore, perhaps the methodology adopted in
\begin{equation} \label{eqn:regression-problem} \mathbb{E}[z(x, y, \xi)|y] \approx m(y;\theta) \end{equation}
The above has implicit time dependence. \(m(y;\theta)\) is a parameterized function approximator that fits some observed values \(\{z_i, y_i\}_{i=1}^{N}\). A good question is that why this is necessarily better than blindly plugging data of \(\{y_i\}_{i=1}^N\) into a kernel density estimator. Showing this rigorously requires a few more posts, however, the reason underlies “physics-informed” methods, which is an interpolation between using hand calculations to derive formulas, and using nonparameteric methods and not caring about formulas at all.
Suppose for now that we indeed use the regression method mentioned in \eqref{eqn:regression-problem}. Inevitably there will be errors, and that is kind of a problem since you are solving a PDE \eqref{eqn:reduced-order-pdf} with the wrong coefficients. At best (suppose we do not talk about pure math-type existence arguments), the numerical solution will have errors accumulating over time of the integration.
Suppose:
\begin{equation} \mathbb{E}[z|y] = m(y;\theta) + e(y; t) \end{equation}
where \(e\) is the (time-dependent) redisual of the estimation problem \eqref{eqn:regression-problem}. The residual enters through the PDE formally as the following:
\[\partial_t f_y + \widehat{\mathcal{L}}f_y = \underbrace{-\widehat{\mathcal{E}}f_y}_{\text{''model defect''}}\]where \(\mathcal{L} = \widehat{\mathcal{L}} + \widehat{\mathcal{E}}\), i.e. we make some error in the linear operator due to estimation.
One way to deal with this defect is to use observed values of \(\{y_i\}\) to correct what’s technically predicted by the Fokker-Planck equation/reduced order equation. This is the same idea where one tries to use PDE solutions to solve inverse problems in physics-informed neural nets. We propose to learn the defect term as follows, put \(S(y,t) := -\widehat{\mathcal{E}}f_y\).
If \(S(y,t)\) were known exactly, one would use operator splitting (e.g. Lie-Trotter) to solve:
\[\begin{cases} \partial_t f_y^{(1)} + \widehat{\mathcal{L}}f_y^{(1)} = 0 \\ f_y^{(1)}(y, t_n) = f_y(y,t_n) \end{cases}\]where \(f_y\) is an initial condition at \(t=t_n\), we are propagating the PDE to time \(t_{n+1}\). And then solve
\[\begin{cases} \partial_t f_y^{(2)} = S(y,t) \\ f_y^{(2)}(y, t_n) = f_y^{(1)}(y, t_{n+1}) \end{cases}\]\(f_y^{(2)}(y, t_{n+1})\) is our numerical solution at \(t=t_{n+1}\).
So we reviewed Lie-Trotter scheme. But, in the practical problem that we described, we do not know \(S(y,t)\). However, we can reverse-engineer it by considering the following
\[\begin{cases} \partial_t f_y^{(1)} + \widehat{\mathcal{L}}f_y^{(1)} = 0 \\ f_y^{(1)}(y, t_n) = \widehat{f}_y(y,t_n) \end{cases}\]where \(\widehat{f}_y\) is an empirical distribution from observed data \(\{y_i\}\). Then the model defect can be computed just from numerical differentiation:
\[\frac{\widehat{f}_{y}(y, t_{n+k}) - f_y^{(1)}(y, t_{n+k})}{k\Delta t} = \frac12(\underbrace{S(y, t_n)}_{\text{available at step $n$}} + S(y, t_{n+k}))\]then:
\[S(y, t_{n+k}) = 2\bigg(\frac{\widehat{f}_{y}(y, t_{n+k}) - f_y^{(1)}(y, t_{n+k})}{k\Delta t}\bigg) - S(y, t_n)\]This method is documented in my related papers (2024)
Potential Use Cases
-
One observes some histogram (e.g. sample attribute from a population), and suspects that this histogram changes in time according to some PDE model, and wish to make predictions cheaply.
-
Obtaining samples by accept-rejection sampling cheaply of a pre-defined quantity of interest depending on a set of stochastic constraints, and computing probabilistic profiles (e.g. moments), which is not possible with brute-force simulation, e.g. damages in rare cascading failures.