Third Weekly Report GSoC 2012 – Clash of Assumptions

After a weekend of debugging I can proudly say that I am able to run toy examples using my implementation of SO-SVM for the multiclass classification case of use \o/

Just for the record, and for the fun of its discovery, I am going to write about the last bug I fixed in my code. As it is commented in the previous posts, part of the project deals with solving an optimization problem, a QP in particular. The fact is that I could run the code just fine, no more failing ASSERTS popping up nor segmentation faults. However, something weird was going on with the solution. The weight vector $\vec{w}$ was zero for every problem instance. After double checking with MATLAB’s function quadprog that the correct solution for the problem was effectively non-trivial, I tried adding some lines here and there to ensure that the parameters given to MOSEK’s were the correct ones. All the matrices and vectors were ok. After some more lines to print out the bounds of the problem I found it. The variables of the optimization problem were all fixed to zero!

At the beginning I just assumed that if no bound is given for a variable, then MOSEK would consider that this one is free; i.e. it may take values within $(-\infty, \infty)$. However, this is false. If no bound is given, then the reality is that MOSEK fixes the value of the variable to zero. I still don’t get the point of including variables to the optimization vector whose values are fixed… but that’s another story.

For the next time I will try to remember to ensure that the assumptions I make are correct or, even better, to RTFM 🙂

My next objective is to try out the application with bigger examples. Unfortunately my MOSEK license file just allows me to solve problems with up to 300 variables or constraints, which currently prevents me from doing tests with training data bigger than 17 bidimensional training examples!

Second Weekly Report GSoC 2012

Let’s have some fun, fun with optimization!

Basically during the last days my work in the project has been strictly focused on the implementation of the optimization algorithm for SO-SVM (Structured Output – Support Vector Machine). This algorithm is presented and fully analysed in [1]. In the paper the dual formulation of the optimization problem is explored whereas we are concerned with the primal version of it at this point in the project. The reason why the primal has been selected stems from its properties regarding complexity  and efficiency; the dual requires an explicit expansion of the weight vector and the memory requirements are quadratic rather than linear as with the primal formulation. In [2] appears the primal formulation of the algorithm even though the main topic of this article is beyond our current approach for SO learning.

The algorithm belongs to a class called Cutting Plane Algorithms. These optimization methods are based on the idea of refining iteration by iteration the feasible set of solutions using the most violated constraints. In [3] there is a video lecture and around the instant 6:20 it is nicely introduced the intuition of how cutting plane methods work, which I really like.

As it was commented in the previous post, a quadratic program (QP) appears in the algorithm. In particular, every iteration of the algorithm we need to solve a QP that looks like

$\text{minimize}_{\vec{w}, \xi_i} \frac{1}{2} \vec{w}^TC\vec{w} + \sum_{i=1}^N \xi_i$
$s.t.~\forall i~\forall y \in \mathcal{Y} \setminus y_i : \vec{w} \cdot\delta\Psi_i(y) \ge \Delta(y_i, y) - \xi_i$

We are using MOSEK to solve this QP [4]. The QP shown above needs to be translated to a more general formulation as the one provided by MOSEK though. In order to do this, I have created an interface to MOSEK from Shogun inspired by the CPLEX one that is in the toolbox.

Right now the algorithm is close to be finished. However, nothing has been properly tested this far and this makes me feel uncomfortable. Once the algorithm is finally ready, I will start to prepare the first and simplest case of use of the framework, multiclass classification. Debugging and testing time will come then!

References
[1] Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y. Support vector machine learning for interdependent and structured output spaces.
[2] Finley, T., Joachims, T. Training Structural SVMs when Exact Inference is Intractable.
[3] Video lecture by Thomas Finley on Training Structural SVMs when Exact Inference is Intractable. http://videolectures.net/icml08_finley_tssvm/
[4] MOSEK C API Quadratic Optimization. http://docs.mosek.com/6.0/capi/node007.html#329747704
[5] Nowozin, S., Lampert, C. H. Structural Learning and Prediction in Computer Vision. The section 6 is mainly related with this part of the project.