Graph Optimization 3 - Optimization Update
From the last article, we get the following negative log likelihood function as our optimization target:
\[F(x)=\sum_{ij}{e_{ij}(x)^T\Omega_{ij}e_{ij}(x)}\]The optimization problem turns to be:
\[x^*=\arg\min_xF(x)\]This article will explain how can this optimization problem be solved using Gauss-Newton method.
If we find \(x_0\) as the initial/rough estimation of \(x\), the next step would be looking for a \(\Delta x^*\) to achieve:
\[\Delta x^* = \arg \min_{\Delta x}F(x_0+\Delta x)\]\(F(x_0+\Delta x)\) can be expanded to be:
\[F(x_0+\Delta x)=\sum_{ij}{e_{ij}(x_0+\Delta x)}^T\Omega_{ij}e_{ij}(x_0+\Delta x)\]Apply first order Taylor approximation on \(e_{ij}\), we get:
\[e_{ij}(x_0+\Delta x)\approx e_{ij}(x_0)+\frac{d e_{ij}}{dx_{ij}}\Delta x,\]a Jacobian matrix is defined as:
\[J_{ij}=\frac{de_{ij}}{dx_{ij}}\]Then we get:
\[F(x_0+\Delta x)\approx \sum_{ij}(e_{ij}(x_0)+J_{ij}\Delta x)^T\Omega_{ij}(e_{ij}(x_0)+J_{ij}\Delta x)\] \[=\sum_{ij}\{e_{ij}(x_0)^T\Omega_{ij}e_{ij}(x_0)+[2e_{ij}(x_0)^T\Omega_{ij}J_{ij}(x_0)]\Delta x+\Delta x^T[J_{ij}(x_0)^T\Omega_{ij}J_{ij}(x_0)]\Delta x\}\]now we define:
\[c_{ij}(x_0)=e_{ij}(x_0)^T\Omega_{ij}e_{ij}(x_0)\] \[b_{ij}(x_0)=e_{ij}(x_0)^T\Omega_{ij}J_{ij}(x_0)\] \[H_{ij}(x_0)=J_{ij}(x_0)^T\Omega_{ij}J_{ij}\]Then the equation system is simplified to be:
\[F(x_0+\Delta x)\approx\sum_{ij}c_{ij}(x_0)+2b_{ij}(x_0)\Delta x+\Delta x^TH_{ij}(x_0)\Delta x\]What’s more, if \(F\) is convex, or at least locally convex in the region close to \(x_0\), \(F(x_0+\Delta x)\) is the minimum if the derivative of \(F(x_0+\Delta x)\) to \(\Delta x\) equals to 0.
\[\frac{dF(x_0+\Delta x)}{d\Delta x}=0,\]which leads to:
\[\sum_{ij}[2b_{ij}(x_0)+2H_{ij}\Delta x^*]=0,\]then we can solve \(\Delta x^*\):
\[\sum_{ij}H_{ij}(x_0)\Delta x^*=-\sum_{ij}b_{ij}(x_0)\]The above equation can be arranged in a sparse matrix in the form of
\[H(x_0)\Delta x^*=-b(x_0)\]Now \(\Delta x^*\) can be solved by sparse Cholesky factorization. When \(\Delta x^*\) is solved, we can compute:
\[x_1=x_0+\Delta x^*\]Obviously, the final \(x\) can be solved iteratively:
\[x_n=x_{n-1}+\Delta x^*_{n-1}\]