# Shogun is participating in GSoC 2013

It is difficult to come up with a better way to start off the week than with news as good as this. As usual, we have two main types of project ideas in Shogun:

• Accessibility improvements.
• Core machine learning tasks or framework improvements.

Check out the full ideas list for more detailed descriptions.

Providing that Shogun has plenty of useful and interesting machine learning methods but, unfortunately, some of them are not so accessible for users that are not familiar with Shogun’s code base, this year the accessibility improvements projects seem to be particularly important. We expect to have after this summer more interactive demos showing off Shogun’s capabilities.

Nonetheless, there are also some interesting ideas concerning the implementation of new machine learning algorithms. For example, extensions in the Gaussian Processes to support classification, more dimension reduction techniques (are you an expert in ball trees? then we want you!) and some really challenging projects such as large-scale estimation of sparse Gaussian densities. Of course, last but not least, there is a very nice idea about my favourite topic: structured learning! This idea aims at providing some tools to target general structured output models with SO-SVMs, large-scale solvers and kernels.

# A memory-leak-killer combination

Once you have finished coding a program, tested it, debugged it and checked that its behaviour is the expected one, it is quite normal to be invaded by a superhero-like feeling. You just feel smart. The simple thought that your program may not be actually correct hurts your pride, a lot. Well, C/C++ programmers (along with developers of other languages that do not use garbage collection) know that this can be the case. Your program may be leaking memory; i.e. it has reserved and made use of some memory but not freed it properly.

During these last two years I have been using Valgrind [1] to help me find the source of memory leaks. Valgrind has a tool suited for this purpose and it is called memcheck [2]. Even though I have successfully used this tool several times, I must admit that there were some details in the output that I did not quite understand properly. For example, what was the difference between a memory leak caused by a memory block directly or indirectly lost, if the blocks of memory that were still reachable were caused by errors in my code, etc. I am sure that the answers to these questions are in the manual. But, to be honest, I have never read a manual from the first to the last page before starting to use it. In my honest opinion, it is just too time consuming and full of unintelligible details for a non experienced user. Today, I have found this document [3] extremely useful for this task. The concepts are presented in a simple and concise way, together with pictures and some examples to make the topic clear. A MUST read for anyone getting in touch with Valgrind memcheck.

Apart from this, in SHOGUN we use our own system of reference counting to track the number of objects that hold a pointer to another object. In this way, when one object is deleted, the reference counts to all the other objects this object in question was holding are decremented by one, effectively deleting the other objects if their counts go to zero. This system turns out to be very comfortable to use. And now, why am I talking about this reference counting here? Well, because it is indispensable that you make the correct use of it if you want to have your SHOGUN program free of memory leaks. There is a very nice option which enables traces to let you know when a new SHOGUN object is created, when the count to a reference is increased, decreased and to know when an object is deleted. This option, together with the use of Valgrind, has enabled me to detect a bunch of memory leaks in my code in a matter of minutes FTW! I am completely sure that this task would have taken me at least a couple of hours some time ago.

This definitely made my day and I must thank Sergey Lisitsyn for the suggestion of enabling this debug output and Aleksander for his wonderful document about Valgrind memory leaks reports.

PS. Take a look to the code profiling tool provided by Valgrind, cachegrind [4] and its KDE graphical interface kcachegrind! They are simply awesome.

References

[1] Valgrind. http://valgrind.org/
[2] Valgrind manual page about the tool memcheck.
http://valgrind.org/docs/manual/mc-manual.html
[3] Understanding Valgrind memory leaks reports.
https://sigquit.wordpress.com/2010/02/04/understanding-valgrind-memory-leak-reports/
[4] Valgrind manual page about the tool cachegrind.
http://valgrind.org/docs/manual/cg-manual.html
[5] An interesting thread is stack overflow about different types of memory leaks detected by valgrind memcheck.
http://stackoverflow.com/questions/3840582/still-reachable-leak-detected-by-valgrind

I tried to write a tutorial focusing on the type of memory leaks as detected internally by Valgrind and the generated output leak reports.

The PDF is available in the following URL:
http://es.gnu.org/~aleksander/valgrind/valgrind-memcheck.pdf

And the simple C tester to generate each type of memory leak (cases 1 to 9 in the Valgrind Manual) is available here:
http://es.gnu.org/~aleksander/valgrind/valgrind-memcheck.c

Comments, fixes, whatever… highly appreciated. You could even send me a diff file to the original LaTeX source, available in:
http://es.gnu.org/~aleksander/valgrind/valgrind-memcheck.tex

View original post

# Finally, a new project update

It is about time to say something again about how the project is going. First of all, as a matter of excuse (probably a bad one though) I want to say why the blog has not been updated lately. There have been a couple of weeks in which I have been debugging the code for the primal formulation of the SO-SVM based on MOSEK using a simple multiclass classification example. Those two weeks were basically that, debugging testing, debugging again, testing, … it is not interesting stuff to tell the truth but very necessary in any case! However, last week I started to work in the – probably – most interesting part of the project, the Hidden Markov – Support Vector Machine [1].

I like to think of HM-SVMs as a particular instance of the general Structured Output (SO) problem I have been talking about in the previous posts. In SO learning we are trying to find a function $f$ that takes an object from the space $\mathcal{X}$  and outputs and object from the space $\mathcal{Y}$. In HM-SVMs the objects of $\mathcal{Y}$ are sequences of discrete data (e.g. integers). In order to deal with the structure of the sequences properly, we extract features of the data in the space $\mathcal{Y}$ together with the data in the space $\mathcal{X}$. This is done via a function we denote as $\Psi(x, y)$ whose definition appears in the tutorial about HM-SVM that Nico, my GSoC mentor, wrote [2, pages 4 and 5].

Another important piece of the SO-SVM is the argmax function. In HM-SVMs this function is computed using the well-known Viterbi algorithm. I had never studied before this algorithm but I was happy to discover its similar structure to the forward-backward algorithm used to train an HMM from data that I have already studied and implemented [3]. I found this tutorial [4] on the Viterbi algorithm very useful and easy to understand. The implementation I have done in SHOGUN of this algorithm is based on the one in the HMSVM toolbox [5] (in particular, it is in this file [6]).

One more piece of code I have implemented in the toolbox for the HM-SVM is the loss function, commonly denoted as $\Delta$. In this case, we use the Hamming distance defined for strings. In our case, the inputs of the loss function are sequences of the same length so the Hamming loss is simple the count of the number of elements that are different in the sequences (think of it as the minimum number of modifications one may do to one of the sequences to make it equal to the other. Providing that the sequences are of the same length, the only modification allowed is to change the value of an element in one of the sequences).

These three functions, together with some data structures to hold features and labels used by the HM-SVM, constitute what I have implemented so far into SHOGUN about this interesting discriminative model. The plan now is to correct with Nico the possible mistakes I have introduced into the code and to generate some training and test data so I can start debugging phase! My idea is to port this function [7] from HM-SVM toolbox that generates random two-state data.

Comments and suggestions are very welcome!

You can find the code I have been talking about in github:
https://github.com/iglesias/shogun/tree/so/src/shogun/
(mainly in the files structure/HMSVMModel.[cpp|h], features/MatrixFeatures.[cpp|h], structure/HMSVMLabels.[cpp|h]).

References

[1] Altun, Y., Tsochantaridis, I., Hofmann, T. Hidden Markov Support Vector Machines.
http://cs.brown.edu/~th/papers/AltTsoHof-ICML2003.pdf
[2] SHOGUN Hidden Markov SO-SVM written by Nico Görnitz.
https://dl.dropbox.com/u/11020840/shogun/hmsvm.pdf
[3] The code of my HMM implementation for one of the homeworks of the Artificial Intelligence course that I took at KTH. https://github.com/iglesias/hmm-ai-ducks
[4] Viterbi algorithm tutorial.
http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/
html_dev/viterbi_algorithm/s1_pg2.html
[5] Entry in mloss.org for the HM-SVM toolbox written by Gunnar Raetsch and Georg Zeller.
http://mloss.org/software/tags/hmsvm/
[6] Viterbi algorithm implementation in the HM-SVM toolbox.
https://dl.dropbox.com/u/11020840/shogun/best_path.cpp
[7] Data generation for a two-state model in the HM-SVM toolbox.
https://dl.dropbox.com/u/11020840/shogun/simulate_data.m

# Third Weekly Report GSoC 2012 – Clash of Assumptions

After a weekend of debugging I can proudly say that I am able to run toy examples using my implementation of SO-SVM for the multiclass classification case of use \o/

Just for the record, and for the fun of its discovery, I am going to write about the last bug I fixed in my code. As it is commented in the previous posts, part of the project deals with solving an optimization problem, a QP in particular. The fact is that I could run the code just fine, no more failing ASSERTS popping up nor segmentation faults. However, something weird was going on with the solution. The weight vector $\vec{w}$ was zero for every problem instance. After double checking with MATLAB’s function quadprog that the correct solution for the problem was effectively non-trivial, I tried adding some lines here and there to ensure that the parameters given to MOSEK’s were the correct ones. All the matrices and vectors were ok. After some more lines to print out the bounds of the problem I found it. The variables of the optimization problem were all fixed to zero!

At the beginning I just assumed that if no bound is given for a variable, then MOSEK would consider that this one is free; i.e. it may take values within $(-\infty, \infty)$. However, this is false. If no bound is given, then the reality is that MOSEK fixes the value of the variable to zero. I still don’t get the point of including variables to the optimization vector whose values are fixed… but that’s another story.

For the next time I will try to remember to ensure that the assumptions I make are correct or, even better, to RTFM :)

My next objective is to try out the application with bigger examples. Unfortunately my MOSEK license file just allows me to solve problems with up to 300 variables or constraints, which currently prevents me from doing tests with training data bigger than 17 bidimensional training examples!

# Second Weekly Report GSoC 2012

Let’s have some fun, fun with optimization!

Basically during the last days my work in the project has been strictly focused on the implementation of the optimization algorithm for SO-SVM (Structured Output – Support Vector Machine). This algorithm is presented and fully analysed in [1]. In the paper the dual formulation of the optimization problem is explored whereas we are concerned with the primal version of it at this point in the project. The reason why the primal has been selected stems from its properties regarding complexity  and efficiency; the dual requires an explicit expansion of the weight vector and the memory requirements are quadratic rather than linear as with the primal formulation. In [2] appears the primal formulation of the algorithm even though the main topic of this article is beyond our current approach for SO learning.

The algorithm belongs to a class called Cutting Plane Algorithms. These optimization methods are based on the idea of refining iteration by iteration the feasible set of solutions using the most violated constraints. In [3] there is a video lecture and around the instant 6:20 it is nicely introduced the intuition of how cutting plane methods work, which I really like.

As it was commented in the previous post, a quadratic program (QP) appears in the algorithm. In particular, every iteration of the algorithm we need to solve a QP that looks like

$\text{minimize}_{\vec{w}, \xi_i} \frac{1}{2} \vec{w}^TC\vec{w} + \sum_{i=1}^N \xi_i$
$s.t.~\forall i~\forall y \in \mathcal{Y} \setminus y_i : \vec{w} \cdot\delta\Psi_i(y) \ge \Delta(y_i, y) - \xi_i$

We are using MOSEK to solve this QP [4]. The QP shown above needs to be translated to a more general formulation as the one provided by MOSEK though. In order to do this, I have created an interface to MOSEK from Shogun inspired by the CPLEX one that is in the toolbox.

Right now the algorithm is close to be finished. However, nothing has been properly tested this far and this makes me feel uncomfortable. Once the algorithm is finally ready, I will start to prepare the first and simplest case of use of the framework, multiclass classification. Debugging and testing time will come then!

References
[1] Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y. Support vector machine learning for interdependent and structured output spaces.
[2] Finley, T., Joachims, T. Training Structural SVMs when Exact Inference is Intractable.
[3] Video lecture by Thomas Finley on Training Structural SVMs when Exact Inference is Intractable. http://videolectures.net/icml08_finley_tssvm/
[4] MOSEK C API Quadratic Optimization. http://docs.mosek.com/6.0/capi/node007.html#329747704
[5] Nowozin, S., Lampert, C. H. Structural Learning and Prediction in Computer Vision. The section 6 is mainly related with this part of the project.

# First Weekly Report GSoC 2012

Let’s do the weekly reports kick-off of this summer!

Although GSoC started officially a couple of days ago, on May 21st, I have been working on the project for about two weeks. Next, I am going to summarize what progress has been made during this time.

First of all, based on the code skeletons [1] that Nico wrote, I started with the design of the SO framework in Shogun. The design decisions taken during this phase are summarized in the attached class diagram [2]. In the diagram, the classes in light green existed in Shogun previous to this project whereas the classes filled with light red are brand new. Among the new classes, the class CStructuredModel seeks to offer functionality to put together all the application-dependent parts of a SO problem instance.  The CLossFunction class became very handy since I just needed to extend it with a few methods in order to support the functionality required by SO. The idea of this class is to provide a generic interface for well-defined loss functions (e.g. Hinge loss). Needless to say, the design shown in the diagram is very likely to evolve. For example, CStructuredModel is currently implemented to be used with function pointers for some of its members and this will change to use a more understandable interface with classes.

Initial SO class diagram.

In addition, classes (labels/CStructuredLabels and lib/CStructuredData) to provide labels with structure (e.g. sequences, graphs) have been added. This is probably the feature that distinguishes the most SO learning from the other strategies already present in Shogun.

Finally, the optimization algorithm presented in [3]. This is still work in progress and the code is in CPrimalMosekSOSVM. The main difficulty I have found here is that, in order to solve the quadratic program (QP) that arises, we need to use a non Open Source tool since libqp does not support all the required constraints (in particular inequality constraints of the type $A \cdot x \leq b$ for the QP with box constraints). I have started to write some code in CPrimalMosekSOSVM that makes use of MOSEK to solve the QP. This piece of code is still rather poor and it is just in my local repository.

The current working plan is, in this order: finish the code in CPrimalMosekSOSVM mentioned above (I have set a deadline for this on Friday, June 1st), prepare the first case of use with multiclass SVMs, extend the design creating a class for the $\arg \max$ computation and another one for the structured loss function $\Delta(y_{pred}, y_{true})$.

References
[1] Gist with main concepts of the framework written by Nico Görnitz. https://gist.github.com/2634487.
[2] Structured Output framework – Class Diagram.
[3] Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y. Support vector machine learning for interdependent and structured output spaces.
[4] SO learning branch in git:  https://github.com/iglesias/shogun/tree/so.