### Review

For the ith sample \((x_i,y_i)\) in the training set, we have the following loss function:

\[L_i= \sum_{j≠y_i}max(0,w_j^T\cdot x_i−w_{y_i}^T\cdot x_i+Δ)\]\(w_j^T\cdot x_i\) is the score classifying \(x_i\) to class j，and \(w_{y_i}^T\cdot x_i\) is the score classifying correctly(classify to class \(y_i\))，\(\omega_i\) is the \(j\)th row of \(W\).

### Problem

Problem 1:

Considering the geometrical meaning of the weight vector \(\omega\), it is easy to find out that \(\omega\) is not unique, \(\omega\) can change in a small area and result in the same \(L_i\).

Problem 2:

It the values in \(\omega\) is scaled, the loss computed will also be scaled by the same ratio. Considering a loss of 15, if we scale all the weights in \(\omega\) by 2, the loss will be scaled to 30. But this kind of scaling is meaningless, it doesn’t really represent the **loss**.