In graph optimization 1 to 3, the math is introduced. Now we can make some hands on practice on programming. The most famous used graph optimization library is g2o due to its good performance in ORB-SLAM. g2o also has its well known drawback - not well commented, not easy to understand. What’s more, most of the tutorials are based on the original sample examples, when you want to make your own vertex or edge, you will again be lost.

This introduction will be based on an easy to understand graph optimization problem with a customized edge implementation.

Modelling the GPS based odometry

Model as graph

The problem is quite easy: we have a vehicle moving around, we use a GPS to measure its 3D absolute positions, by making some rough guesses as initialization, we want to estimate the vehicle’s position based on the GPS’ measurements.

In a SLAM system, we usually want to fuse different sensors, what’s discussed most are fusing camera and IMU, which is also the problem setup for g2o examples. Fusing GPS information is rarely touched, actually GPS sensor fusion is easier to understand. It can be modeled as the following diagram:

g2o_gps_edge

From the last article, we get the following negative log likelihood function as our optimization target:

The optimization problem turns to be:

This article will explain how can this optimization problem be solved using Gauss-Newton method.

Usually, all graph optimization papers or tutorials start from raising the optimization problem: minimizing the following function:

This article will explain where this optimization comes from and why it is related to gaussian noise.

Maximum Likelihood Estimation with Gaussian noise

The probabilistic modeling of graph optimization is based on the Maximum Likelihood Estimation(MLE) algorithm.

The case we use for this article will also be the one we used in the last article:

g2o_edge_vertex_noedge

The idea behind PCA

In the field of machine learning, PCA is used to reduce the dimension of features. Usually we collect a lot of feature to feed the machine learning model, we believe that more features provides more information and will lead to better result.

But some of the features doesn’t really bring new information and they are correlated to some other features. PCA is then introduced to remove this correlation by approximating the original data in its subspace, some of the features of the original data may be correlated to each other in its original space, but this correlation doesn’t exit in its subspace approximation.

Visually, let’s assume that our original data points have 2 features and they can be visualized in a 2D space:

The case we are handling: a 2 layers network

2layer_nn_bpp

Background:the network and symbols

Firstly the network architecture will be described as:

2layer_nn

Dropout regularization

Dropout is a commonly used regularization method, it can be described by the diagram below: only part of the neurons in the whole network are updated. Mathematically, we apply some possibility (we use 0.5) to a neuron to keep it active or keep it asleep:

dropout