# Discriminatively Trained Part Based Models notes

The LSVM (SVM with latent variable) is mostly used for human figure detection, it is very efficiency because it puts the human figure’s structure into consideration: a human figure has hands, head and legs. The LSVM models the human figure structure with 6 parts, and the position of these 6 parts are latent value.

The basic logic is sliding a window on the image, for every position we get a small image patch, by scoring this image patch we can predict whether this image patch contains a human figure or not.

### Defining the score function

Anyway, the first thing to do: defining a score function:

# PCA and Face Recognition - Eigen Face

PCA (Principal component analysis), just as its name shows, it computes the data set’s internal structure, its “principal components”.

Considering a set of 2 dimensional data, for one data point, it has 2 dimensions $$x_1$$ and $$x_2$$ . Now we get n such data points . What is the relationship between the first dimension $$x_1$$ and the second dimension $$x_2$$ ? We compute the so called covariance:

$cov(x_1,x_2)=\frac{\displaystyle \sum_{i=1}^n{(x_1^i-\overline{x_1})(x_2^i-\overline{x_2})} }{n-1}$

the covariance shows how strong is the relationship between $$x_1$$ and $$x_2$$. Its logic is the same as Chebyshev’s sum inequality:

# HoG Feature

HoG (Histograms of Oriented Gradients) feature is a kind of feature used for human figure detection. At an age without deep learning, it is the best feature to do this work.

Just as its name described, HoG feature compute the gradients of all pixels of an image patch/block. It computes both the gradient’s magnitude and orientation, that’s why it’s called “oriented”, then it computes the histogram of the oriented gradients by separating them to 9 ranges.

One image block (upper left corner of the image) is combined of 4 cells, one cell owns a 9 bins histogram, so for one image block we get 4 histograms, and all these 4 histograms will be flattened to one feature vector with a length of 4x9. Compute the feature vectors for all blocks in the image, we get a feature vector map.

Taking one pixel (marked red) from the yellow cell as an example: compute the $$\bigtriangledown_x$$ and $$\bigtriangledown_y$$ of this pixel, then we get its magnitude and orientation(represented by angle). When calculating the histogram, we vote its magnitude to its neighboring 2 bins using bilinear interpolation of angles.

Finally, when we get the 4 histograms of the 4 cells, we normalize them according to the summation of all the 4x9 values.

The details are described in the following chart: