Graph Optimization 3 - Optimization Update

From the last article, we get the following negative log likelihood function as our optimization target:

\[F(x)=\sum_{ij}{e_{ij}(x)^T\Omega_{ij}e_{ij}(x)}\]

The optimization problem turns to be:

\[x^*=\arg\min_xF(x)\]

This article will explain how can this optimization problem be solved using Gauss-Newton method.

If we find \(x_0\) as the initial/rough estimation of \(x\), the next step would be looking for a \(\Delta x^*\) to achieve:

\[\Delta x^* = \arg \min_{\Delta x}F(x_0+\Delta x)\]

\(F(x_0+\Delta x)\) can be expanded to be:

\[F(x_0+\Delta x)=\sum_{ij}{e_{ij}(x_0+\Delta x)}^T\Omega_{ij}e_{ij}(x_0+\Delta x)\]

Apply first order Taylor approximation on \(e_{ij}\), we get:

\[e_{ij}(x_0+\Delta x)\approx e_{ij}(x_0)+\frac{d e_{ij}}{dx_{ij}}\Delta x,\]

a Jacobian matrix is defined as:

\[J_{ij}=\frac{de_{ij}}{dx_{ij}}\]

Then we get:

\[F(x_0+\Delta x)\approx \sum_{ij}(e_{ij}(x_0)+J_{ij}\Delta x)^T\Omega_{ij}(e_{ij}(x_0)+J_{ij}\Delta x)\] \[=\sum_{ij}\{e_{ij}(x_0)^T\Omega_{ij}e_{ij}(x_0)+[2e_{ij}(x_0)^T\Omega_{ij}J_{ij}(x_0)]\Delta x+\Delta x^T[J_{ij}(x_0)^T\Omega_{ij}J_{ij}(x_0)]\Delta x\}\]

now we define:

\[c_{ij}(x_0)=e_{ij}(x_0)^T\Omega_{ij}e_{ij}(x_0)\] \[b_{ij}(x_0)=e_{ij}(x_0)^T\Omega_{ij}J_{ij}(x_0)\] \[H_{ij}(x_0)=J_{ij}(x_0)^T\Omega_{ij}J_{ij}\]

Then the equation system is simplified to be:

\[F(x_0+\Delta x)\approx\sum_{ij}c_{ij}(x_0)+2b_{ij}(x_0)\Delta x+\Delta x^TH_{ij}(x_0)\Delta x\]

What’s more, if \(F\) is convex, or at least locally convex in the region close to \(x_0\), \(F(x_0+\Delta x)\) is the minimum if the derivative of \(F(x_0+\Delta x)\) to \(\Delta x\) equals to 0.

\[\frac{dF(x_0+\Delta x)}{d\Delta x}=0,\]

which leads to:

\[\sum_{ij}[2b_{ij}(x_0)+2H_{ij}\Delta x^*]=0,\]

then we can solve \(\Delta x^*\):

\[\sum_{ij}H_{ij}(x_0)\Delta x^*=-\sum_{ij}b_{ij}(x_0)\]

The above equation can be arranged in a sparse matrix in the form of

\[H(x_0)\Delta x^*=-b(x_0)\]

Now \(\Delta x^*\) can be solved by sparse Cholesky factorization. When \(\Delta x^*\) is solved, we can compute:

\[x_1=x_0+\Delta x^*\]

Obviously, the final \(x\) can be solved iteratively:

\[x_n=x_{n-1}+\Delta x^*_{n-1}\]

Wangxin

I am algorithm engineer focused in computer vision, I know it will be more elegant to shut up and show my code, but I simply can't stop myself learning and explaining new things ...