First Weekly Report GSoC 2012

Let’s do the weekly reports kick-off of this summer!

Although GSoC started officially a couple of days ago, on May 21st, I have been working on the project for about two weeks. Next, I am going to summarize what progress has been made during this time.

First of all, based on the code skeletons [1] that Nico wrote, I started with the design of the SO framework in Shogun. The design decisions taken during this phase are summarized in the attached class diagram [2]. In the diagram, the classes in light green existed in Shogun previous to this project whereas the classes filled with light red are brand new. Among the new classes, the class CStructuredModel seeks to offer functionality to put together all the application-dependent parts of a SO problem instance.  The CLossFunction class became very handy since I just needed to extend it with a few methods in order to support the functionality required by SO. The idea of this class is to provide a generic interface for well-defined loss functions (e.g. Hinge loss). Needless to say, the design shown in the diagram is very likely to evolve. For example, CStructuredModel is currently implemented to be used with function pointers for some of its members and this will change to use a more understandable interface with classes.

Initial SO class diagram.

In addition, classes (labels/CStructuredLabels and lib/CStructuredData) to provide labels with structure (e.g. sequences, graphs) have been added. This is probably the feature that distinguishes the most SO learning from the other strategies already present in Shogun.

Finally, the optimization algorithm presented in [3]. This is still work in progress and the code is in CPrimalMosekSOSVM. The main difficulty I have found here is that, in order to solve the quadratic program (QP) that arises, we need to use a non Open Source tool since libqp does not support all the required constraints (in particular inequality constraints of the type A \cdot x \leq b for the QP with box constraints). I have started to write some code in CPrimalMosekSOSVM that makes use of MOSEK to solve the QP. This piece of code is still rather poor and it is just in my local repository.

The current working plan is, in this order: finish the code in CPrimalMosekSOSVM mentioned above (I have set a deadline for this on Friday, June 1st), prepare the first case of use with multiclass SVMs, extend the design creating a class for the \arg \max computation and another one for the structured loss function \Delta(y_{pred}, y_{true}).

[1] Gist with main concepts of the framework written by Nico Görnitz.
[2] Structured Output framework – Class Diagram.
[3] Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y. Support vector machine learning for interdependent and structured output spaces.
[4] SO learning branch in git:

GSoC 2012 is already here!

Last month I became selected to develop a project for Shogun in the frame of GSoC. The project’s name is Build a Generic Structured Output Framework Learning and it is mentorized by Nico Görnitz. Here follows a short description of the project:

The aim is to implement tools for structured output (SO) problems. The data in these problems have complex structure (e.g. graphs, sequences) and the traditional learning algorithms fail to find solutions efficiently. Structured output support vector machines and conditional random fields are methods for SO learning. They will be implemented to form Shogun’s first module for SO learning. Finally, these methods will be applied to hidden Markov models-type of problems such as gene prediction.

Feel free to visit my project proposal where, among some personal information, you will be able to find a thorough description of the project together with a tentative schedule and useful references on the topic.

This is going to be a fun summer of coding!