Why do we say $(dB_t)^2 \approx dt$
Let $B_t$ be standard Brownian motion.
In Itô calculus we track contributions up to order $dt$.
Although ordinary calculus would ignore second-order terms, in stochastic calculus
the term $(dB_t)^2$ is of order $dt$, not negligible.
To see this:
$$ dB_t \approx B_{t+\Delta t} - B_t \sim \mathcal{N}(0, \Delta t). $$
Let $Z_{\Delta t} \sim \mathcal{N}(0, \Delta t)$.
Then $(dB_t)^2$ behaves like $Z_{\Delta t}^2$ and
$$ \mathbb{E}[Z_{\Delta t}^2] = \Delta t. $$
Next compute the variance:
$$ \mathrm{Var}(Z_{\Delta t}^2) = \mathbb{E}[Z_{\Delta t}^4] - (\Delta t)^2. $$
The fourth moment of a normal is
$$ \mathbb{E}[Z_{\Delta t}^4] = 3 (\Delta t)^2, $$
so
$$ \mathrm{Var}(Z_{\Delta t}^2) = 3(\Delta t)^2 - (\Delta t)^2 = 2(\Delta t)^2. $$
As $\Delta t \to 0$,
- the mean is order $\Delta t$,
- the standard deviation is order $(\Delta t)$,
so the fluctuation shrinks faster than the mean.
Thus $(dB_t)^2$ has a deterministic first-order contribution $\approx dt$.
Given this, the discrete Itô expansion looks like:
$$ f(B_{t+\Delta t}) = f(B_t) + f’(B_t)\,\Delta B_t + \frac{1}{2} f’’(B_t)\, (\Delta B_t)^2 + o((\Delta t)^2). $$
Since $(\Delta B_t)^2 \approx \Delta t$, this becomes
$$ f(B_{t+\Delta t}) - f(B_t) = f’(B_t)\,\Delta B_t + \frac{1}{2} f’’(B_t)\, \Delta t. $$
Passing to differential notation:
$$ df = f’(B_t)\, dB_t + \frac{1}{2} f’’(B_t)\, dt. $$
This shows that $df$ has drift term $\tfrac12 f’’(B_t)$ and diffusion term $f’(B_t)$.
Finally, note that
$$ dB_t\,dt \approx 0, $$
because $\Delta B_t \sim \mathcal{N}(0, \Delta t)$ gives $\Delta B_t \cdot \Delta t$ of order $(\Delta t)^{3/2}$, which is smaller than order $dt$.