# Data Misfit

Authors
Affiliations
University of British Columbia
University of British Columbia

The observations $d^{obs}$ encode our knowledge about the model obtained from the survey. Any candidate solution $m$ must produce simulated (or predicted) data $d$ that “fit” the observations. Central to this goal is the choice of a “criterion” for measuring misfit between observed and predicted data, and a “tolerance” for deciding what value constitutes an acceptable fit. This is a subject on which an enormous amount of literature has been written but a good reference, written for the inverse problem, is Parker (1994). When errors are Gaussian with zero mean and known standard deviation, then the $L_2$ -norm, is an appropriate choice. We define the misfit to be

$\phi_d=\sum^N_{j=1}\left(\frac{d_j-d_j^{obs}}{\epsilon_j}\right)^2$

where $d_j^{obs}$ is the observation, $\epsilon_j$ is its estimated standard deviation, and $d_j$ is the predicted datum. The quantity

$\eta_j = \frac {d_j - d_j^{obs}} {\epsilon_j}$

is a random variable with zero mean and unit standard deviation and $\phi_d$ , which is the sum of squares of these variables, is the well known $\chi^2$ statistical variable. It has an expected value ((3)) and variance ((4)).

$E[\chi^2] = N$
$Var[\chi^2] = 2N$

Thus if we are attempting to find a model $m$ that acceptably fits the data, then models with $\phi_d \simeq N$ are good candidates. For many problems we often denote a target misfit $\phi_d^*$ to be

$\phi_d^*=N$

but this must only be regarded as a reasonable estimate and some flexibility should be entertained.

In times past, it was felt that getting an acceptable fit to the data was a sufficient criterion for having a successful inversion. The observed data $\mathbf{d}_{obs}$ are inverted to produce a model $\mathbf{m}$, which is used to forward model the predicted data $\mathbf{d}$. The observed and predicted data are then compared using the data misfit measure. If $\phi_d<\phi_d^*$ the model $\mathbf{m}$ is accepted, but if not, the inversion parameters are adjusted and the process is repeated until an acceptable model is reached. The workflow procedure is delineated below.

Finding a model that fits the data is a necessary component of an inversion algorithm, but as shown in the next section, it is not sufficient.

References
1. Parker, R. L. (1994). Geophysical Inverse Theory (Vol. 1). Princeton University Press. 10.2307/j.ctvs32s89