4.3.1 Mathematical Background

The optimization target is to find the best parameter vector $\vec{p}$ which gives a minimal residuum vector $\vec{f}$ . Using the sum of square-roots of the elements of $\vec{f}$ as the target function the problem can be written as

This function can be minimized by a general unconstrained optimization method explained in Section 4.4.

From an algorithmic point of view, the feature which distinguishes least square problems from the general unconstrained optimization problem is the structure of the Hessian matrix $\mathop{\nabla }\nolimits ^2 \vec{f}(\vec{x})$ (see Section 4.1.6).

For differentiable residual vectors $\vec{f}(\vec{x})$ the gradient of the cost function $r(\vec{x})$ can be expressed by

$\displaystyle \mathop{\nabla }\nolimits r(\vec{x})$	=	$\displaystyle \frac{1}{2} \mathop{\nabla }\nolimits \vec{f}(\vec{x})^T \vec{f}(\vec{x})=$	(4.30)
	=	$\displaystyle \frac{1}{2} \left ( \begin{array}{c} \frac{\partial }{\partial x_... ... x_n} ( f_{1}(\vec{x})^2 + \cdots + f_{m}(\vec{x})^2) \\ \end{array}\right ) =$	(4.31)
	=	$\displaystyle \left ( \begin{array}{c} f_{1}(\vec{x}) \frac{\partial f_{1}(\vec... ...vec{x}) \frac{\partial f_{m}(\vec{x}) }{\partial x_n} \\ \end{array}\right ) =$	(4.32)
	=	$\displaystyle \left ( \begin{array}{ccc} \frac{\partial f_{1}(\vec{x}) }{\parti... ...array}{c} f_{1}(\vec{x}) \\ \vdots \\ f_{m}(\vec{x}) \\ \end{array}\right ).$	(4.33)

This can be expressed by the Jacobian matrix $J(\vec{x})$ of $\vec{f}(\vec{x})$ (see Appendix B.1)

The Hessian matrix $\mathop{\nabla }\nolimits ^2 r(\vec{x})$ can be expressed by the Jacobian of the gradient of $r(\vec{x})$ (the operator $\mathop{\rm Jac}\nolimits$ builds the Jacobian matrix from a vector function see Appendix B.1)

$\displaystyle \mathop{\nabla }\nolimits ^2 r(\vec{x})$	=	$\displaystyle \mathop{\rm Jac}\nolimits \left ( \mathop{\nabla }\nolimits r(\ve... ...ec{x}) \frac{\partial f_{i}(\vec{x})}{\partial x_{n}} \\ \end{array}\right ) =$	(4.35)
	=	$\displaystyle \left ( \begin{array}{ccc} \frac{\partial \sum_{i=1}^{m} f_{i}(\v... ...artial f_{i}(\vec{x})}{\partial x_{n}}}{\partial x_n} \\ \end{array}\right ) =$	(4.36)
	=	$\displaystyle \left ( \begin{array}{cc} \sum_{i=1}^{m} \left ( \frac{\partial f... ...l^2 f_{i}(\vec{x})}{\partial x_1 x_n} \right ) , & \cdots\\ \end{array}\right.$
		$\displaystyle \left. \begin{array}{cc} \cdots & , \sum_{i=1}^{m} \left ( \frac{... ...c{\partial^2 f_{i}(\vec{x})}{\partial x_n^2} \right ) \\ \end{array}\right ) =$	(4.37)
	=	$\displaystyle J(\vec{x})^T J(\vec{x}) + \sum_{i=1}^m f_{i}(\vec{x}) \mathop{\nabla }\nolimits ^2 f_{i}(\vec{x}).$	(4.38)

The first term of the Hessian matrix $\mathop{\nabla }\nolimits ^2 r(\vec{x})$ can be calculated form the Jacobian matrix $J(\vec{x})$ only. This term dominates the second order term for small residuals.

Using the steepest-descent algorithm the search direction for the next step is given by

On the other hand for the Newton direction the linear system given by (4.10) and (4.38)

These two strategies for the search direction, Newton direction and Steepest-descent, have different convergence properties and are chosen by considering the target function.

If the quadratic model of the target function is accurate the Newton method converges quadratically to the optimum value if the Hessian matrix is positive definite, the start value is sufficiently close to the optimum, and the step length converges to 1 [19].

Using the steepest-descent algorithm the step of the next iteration is always a descent direction for a non vanishing gradient. In practice, the steepest-descent method typically requires a large number of iterations to make progress towards the solution. Trust region methods (see Section 4.1.8) enable a smooth transition from the steepest-descent direction to the Newton direction in a way that gives the global convergence properties of steepest-descent and the fast local convergence of Newton's method. The Levenberg-Marquardt algorithm uses these properties to solve least-squares problems.