Nonlinear Inversion

Authors
Affiliations
University of British Columbia
University of British Columbia

In the previous sections we showed how observations from a linear system can be inverted to generate a candidate for the underlying model mm that produced the data. The model was found by minimizing an objective function composed of a misfit and regularization function. When both terms were represented as a sum of squares, the objective function is quadratic, and the solution can be found by solving a linear system of equations. In practise, things are often more complicated. This occurs when the mapping in the forward problem, d=F[m]d=F[m], is non-linear, or when a non-quadratic regularization or misfit function is desired. In this case the objective function is non-quadratic and it likely has multiple local minima as well as the desired global minimum. There are many ways to attack this problem and some excellent references are (eg XXXX). In our work we will use iterative techniques where we start at a guessed solution and compute a perturbation step that takes us towards a nearest local minimum.

In this chapter we …..

Linear Inversion Review

The inverse problem is set to minimize the objective function ϕ=ϕd+βϕm\phi=\phi_d+\beta\phi_m with the data misfit function ϕd=j=1N(Fj(m)djobsεj)2\phi_d=\sum^N_{j=1}\left(\frac{\mathcal{F}_j(m)-d_j^{obs}}{\varepsilon_j}\right)^2. For the linear problem, the forward problem is defined as d=Gm\mathbf{d}=\mathbf{Gm}, and we make use of quadratic regularization v(mmref)2dv\int_v\left(m-m_{ref}\right)^2dv so as to solve the problem in one step.

Non-linear Inversion

The inverse problem becomes non-linear in any of the following cases:

  • when the forward operation F[m]\mathcal{F}[m] is non-linear
  • when the data misfit ϕd\phi_d is not defined using an l2l_2-norm, e.g. Fi[m]diεi\sum\left|\frac{\mathcal{F}_i[m]-d_i}{\varepsilon_i}\right|
  • when the model objective function is not quadratic

Forward

The forward problem is defined as d=F[m]d=\mathcal{F}[m]. If the forward operation is a linear operator, d=Gm\mathbf{d}=\mathbf{Gm}, but if it is non-linear then F[am1+bm1]aF[m1]+bF[m2]\mathcal{F}[am_1+bm_1] \neq a\mathcal{F}[m_1]+b\mathcal{F}[m_2]. Such examples of non-linear problems include seismic, Maxwell’s first order wave equation, and DC resistivity.

Inverse Problem

We now solve an optimization problem to minimize the objective function ϕ(m)=ϕd+βϕm(m)\phi(m)=\phi_d+\beta\phi_m(m). Step 1 is to discretize the PDE and solve the forward problem, then Step 2 is to iterate until a suitable model is found.

Non-linear Optimization

Consider a single variable xx and function ff. We want to minimize the function f(x)f(x). If f\mathcal{f}, shown below, is the quadratic function

f(x)=14x23x+9=(12x3)2\begin{aligned} f(x)&=\frac{1}{4}x^2-3x+9\\ &=\left(\frac{1}{2}x-3\right)^2 \end{aligned}

then we can take the derivative, set it equal to 0, and solve for xx.

f(x)=(12x3)=0x=6\begin{aligned} f\prime(x)&=\left(\frac{1}{2}x-3\right) \\ &=0\\ x&=6 \end{aligned}

We know that the minimum of f(x)=0f{\prime}(x)=0 and f(x)>0f\prime\prime(x)>0.

Suppose we have a function that is not quadratic, such as

f(x)=(12x3)2+ax3+bx4f(x)=\left(\frac{1}{2}x-3\right)^2+ax^3+bx^4

where there are now multiple minimums, as shown below.

Newton’s Method

  1. Begin with an initial guess of the minimum, xkx_k along the non-quadratic function we want to minimize, f\mathcal{f}.

  2. Approximate f\mathcal{f} with a local quadratic f^\hat{\mathcal{f}} at xkx_k and solve for a step δx\delta x that minimizes f^\hat{\mathcal{f}} . The local quadratic f^\hat{\mathcal{f}} is indicated in the example below by the black dashed parabola.

    f^(xk+δx)=f(xk)+f(xk)δx+12f(xk)δx2+O(δx3)f^=f(xk)δx+12f(xk)δx2+O(δx3)=0f(xk)δx=f(xk)δx=f(xk)f(xk)\begin{aligned}\hat{\mathcal{f}}(x_k+\delta x)&=\mathcal{f}(x_k)+\mathcal{f}\prime (x_k)\delta x+\frac{1}{2}f\prime\prime (x_k)\delta x^2 +\cancel{\mathcal{O}(\delta x^3)}\\ \hat{\mathcal{f}}\prime&=\mathcal{f}\prime (x_k)\delta x+\frac{1}{2}f\prime\prime (x_k)\delta x^2+\cancel{\mathcal{O}(\delta x^3)}\\ &=0\\ \mathcal{f}\prime\prime(x_k)\delta x&=-\mathcal{f}\prime(x_k)\\ \delta x&=-\frac{\mathcal{f}\prime(x_k)}{\mathcal{f}\prime\prime(x_k)}\end{aligned}

  3. Update the guess xk+1=xk+δxx_{k+1}=x_k+\delta x

Quadratic converge: xk+1x<cxkx2\left|x_{k+1}-x^*\right|<c\left|x_k-x^*\right|^2

Things can go wrong with δx=f(xk)f(xk)\delta x=-\frac{\mathcal{f}\prime(x_k)}{\mathcal{f}\prime\prime(x_k)}

If we move in the wrong direction, we likely have negative curvature (f(x))\left(-\mathcal{f}\prime\prime(x)\right) and should instead use only the gradient δx=cf(x)\delta x = c\mathcal{f}\prime (x), where cc is a constant. We choose cc such that f(x+δx)f(x)>ε\left|\mathcal{f}(x+\delta x)-\mathcal{f}(x)\right|>\varepsilon. It is also possible to move in the correct direction, but the curvature f(x)\mathcal{f}\prime\prime(x) is wrong, indicating that our step length α is too large. We can change α and scale the step, so the updated guess xk+1=xk+αδxx_{k+1}=x_k+\alpha\delta x where α<1\alpha<1.

Convergence Conditions

  • f(x)<\mathcal{f}\prime(x)< tolerance
  • δx<\|\delta x\|< tolerance
  • f(x)>0\mathcal{f}\prime\prime (x)>0

Summary (Newton’s Method)

Linear:

  • solution in one step
  • f(x)δx=f(x)\mathcal{f}\prime\prime(x)\delta x=-\mathcal{f}\prime(x)
  • x=f(x)f(x)x^*=-\frac{\mathcal{f}\prime(x)}{\mathcal{f}\prime\prime(x)}

Non-linear

  • iterate to convergence
  • f(xk)δx=f(xk)\mathcal{f}\prime\prime(x_k)\delta x=-\mathcal{f}\prime(x_k)
  • δx=f(xk)f(xk)\delta x=-\frac{\mathcal{f}\prime(x_k)}{\mathcal{f}\prime\prime(x_k)}
  • xk+1=xk+αδx, α<1x_{k+1}=x_k+\alpha\delta x, ~ \alpha<1

Multivariate functions

  • Minimize ϕ(m), m{m1,m2,,mM}\phi(m), ~ m\in\{m_1, m_2, \dots, m_M\}
  • Taylor expansion of ϕ(m)\phi(m)
    • ϕ(m+δm)=ϕ(m)+(mϕ(m))Tδm+12δmTm(mϕ(m))Tδm+O(δm3)\phi(m+\delta m)=\phi(m)+(\nabla_m\phi(m))^T\delta m+\frac{1}{2}\delta m^T\nabla_m(\nabla_m\phi(m))^T\delta m+\mathcal{O}(\delta m^3)
  • Note the similarity to single variable
    • f(x+δx)=f(x)+f(x)δx+12f(x)δx2+O(δx3)\mathcal{f}(x+\delta x)=\mathcal{f}(x)+\mathcal{f}\prime(x)\delta x+\frac{1}{2}\mathcal{f}\prime\prime(x)\delta x^2+\mathcal{O}(\delta x^3)

Define the Gradient

g(m)=mϕ, gRM=(ϕm1ϕmM)\begin{aligned} g(m)&=\nabla_m\phi, ~ g\in \R^M \\ &=\begin{pmatrix} \frac{ \partial \phi}{\partial m_1} \\ \vdots \\ \frac{ \partial \phi}{\partial m_M} \end{pmatrix}\end{aligned}

The minimum of the gradient is defined such that g(m)=0g(m^*)=0.

Define the Hessian (symmetric matrix)

H(m)=mmTϕ, HRM×M=(2ϕm122ϕm1m22ϕm1mM2ϕm2m12ϕm222ϕm2mM2ϕmMm12ϕmMm22ϕmM2)Hi,j=2ϕmimj\begin{aligned} H(m)&=\nabla_m\nabla^T_m\phi, ~H\in\R^{M\times M} \\ &=\begin{pmatrix} \frac{\partial^2 \phi}{\partial m_1^2} & \frac{\partial^2 \phi}{\partial m_1\partial m_2} & \cdots & \frac{\partial^2 \phi}{\partial m_1 \partial m_M} \\ \frac{\partial^2 \phi}{\partial m_2\partial m_1} & \frac{\partial^2 \phi}{\partial m_2^2} & \cdots & \frac{\partial^2 \phi}{\partial m_2\partial m_M} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 \phi}{\partial m_M\partial m_1} & \frac{\partial^2 \phi}{\partial m_M\partial m_2} & \cdots & \frac{\partial^2 \phi}{\partial m_M^2}\end{pmatrix} \\ H_{i,j}&=\frac{\partial^2\phi}{\partial m_i \partial m_j}\end{aligned}

The minimum of the Hessian is defined such that H(m)H(m^*) is positive definite.

Finding a Solution

  1. Begin with an initial model m(k)m^{(k)}
  2. Solve H(m(k))δm=g(m(k)), c.f.{f(x)δx=f(x)}H\left(m^{(k)}\right)\delta m=-g\left(m^{(k)}\right), ~ \text{c.f.}\{\mathcal{f}\prime\prime(x)\delta x=\mathcal{f}\prime(x)\}
  3. Update the model m(k+1)=m(k)+αδmm^{(k+1)}=m^{(k)}+\alpha\delta m

Our Inversion

Minimize

ϕ(m)=12F[m]dobs2+β2m2\phi(m) = \frac{1}{2} \left|\left| \mathcal{F}[m]-d^{obs}\right|\right|^2 + \frac{\beta}{2} \| m \|^2

Gradient

g(m)=mϕ=JT(F[m]dobs)+βm\begin{aligned} g(m) &= \nabla_m \phi \\ &= J^T \left( \mathcal{F}[m]-d^{obs}\right) + \beta m \end{aligned}

Sensitivity:

mF(m)=JJij=Fi[m]mj\begin{aligned} \nabla_m\mathcal{F}(m)&=J\\ J_{ij}&=\frac{\partial \mathcal{F}_i[m]}{\partial m_j} \end{aligned}

Hessian:

H(m)=mg(m)=JTJ+(mJ)T(F[m]dobs)+β\begin{aligned} H(m) &= \nabla_m g(m) \\ &= J^T J + \cancel{ (\nabla_m J)^T \big(\mathcal{F}[m]-d^{obs}\big)} +\beta\end{aligned}

*we neglect the second term

Final:

Hδm=g(JTJ+β)δm=(JTδd+βm)δd=F[m]dobs\begin{aligned} H\delta m &= -g\\ & \downarrow \\ \left(J^TJ+\beta\right)\delta m &=-\left(J^T\delta d+\beta m\right) \\ \delta d &= \mathcal{F}[m]-d^{obs}\end{aligned}

Linear v. Non-linear

Non-linear Problem (F[m]=d)\left(\mathcal{F}[m]=d\right):

(JTJ+β)δm=(JTδd+βm), δd=F[m]dobs\left(J^TJ+\beta\right)\delta m=-\left(J^T\delta d+\beta m\right), ~\delta d=\mathcal{F}[m]-d^{obs}

Linear Problem (Gm=d)(Gm=d):

(GTG+β)δm=(GTd+βm)\left(G^TG+\beta\right)\delta m=-\left(G^Td+\beta m\right)

The sensitivity JJ acts as a local linear for non-linear operator Jδm=δdJ\delta m=\delta d

General Algorithm

Minimize ϕ=ϕd+βϕm\phi=\phi_d+\beta\phi_m

  1. initialize the model and β: m(0), β(0)m^{(0)}, ~\beta^{(0)}
  2. Until convergence is reached, iterate through the following:
    1. Hδm=gH\delta m=-g
    2. m(k+1)=m(k)+αδmm^{(k+1)}=m^{(k)}+\alpha\delta m, (via line search)
    3. β(k+1)=β(k)γ\beta^{(k+1)}=\frac{\beta^{(k)}}{\gamma}, (via cooling)

There are a variety of ways to solve the system and selecting β beyond a line search and the cooling rate.

Summary

ϕ(m)=12F[m]dobs2+β2m2\phi(m)=\frac{1}{2}\left|\left|\mathcal{F}[m]-d^{obs}\right|\right|^2+\frac{\beta}{2}\left|\left|m\right|\right|^2

Linear

  • Forward: d=Gmd=Gm
  • Inverse:

Non-linear

  • d=F[m]d=\mathcal{F}[m]
  • (JTJ+β)δm=(JTδd+βm)\left(J^TJ+\beta\right)\delta m=-\left(J^T\delta d+\beta m\right)
    • δd=F[m]dobs\delta d=\mathcal{F}[m]-d^{obs}
  • m(k+1)=m(k)+αmm^{(k+1)}=m^{(k)}+\alpha m

All understanding from the linear problem is valid for the non-linear problem