A memory-leak-killer combination

Once you have finished coding a program, tested it, debugged it and checked that its behaviour is the expected one, it is quite normal to be invaded by a superhero-like feeling. You just feel smart. The simple thought that your program may not be actually correct hurts your pride, a lot. Well, C/C++ programmers (along with developers of other languages that do not use garbage collection) know that this can be the case. Your program may be leaking memory; i.e. it has reserved and made use of some memory but not freed it properly.

During these last two years I have been using Valgrind [1] to help me find the source of memory leaks. Valgrind has a tool suited for this purpose and it is called memcheck [2]. Even though I have successfully used this tool several times, I must admit that there were some details in the output that I did not quite understand properly. For example, what was the difference between a memory leak caused by a memory block directly or indirectly lost, if the blocks of memory that were still reachable were caused by errors in my code, etc. I am sure that the answers to these questions are in the manual. But, to be honest, I have never read a manual from the first to the last page before starting to use it. In my honest opinion, it is just too time consuming and full of unintelligible details for a non experienced user. Today, I have found this document [3] extremely useful for this task. The concepts are presented in a simple and concise way, together with pictures and some examples to make the topic clear. A MUST read for anyone getting in touch with Valgrind memcheck.

Apart from this, in SHOGUN we use our own system of reference counting to track the number of objects that hold a pointer to another object. In this way, when one object is deleted, the reference counts to all the other objects this object in question was holding are decremented by one, effectively deleting the other objects if their counts go to zero. This system turns out to be very comfortable to use. And now, why am I talking about this reference counting here? Well, because it is indispensable that you make the correct use of it if you want to have your SHOGUN program free of memory leaks. There is a very nice option which enables traces to let you know when a new SHOGUN object is created, when the count to a reference is increased, decreased and to know when an object is deleted. This option, together with the use of Valgrind, has enabled me to detect a bunch of memory leaks in my code in a matter of minutes FTW! I am completely sure that this task would have taken me at least a couple of hours some time ago.

This definitely made my day and I must thank Sergey Lisitsyn for the suggestion of enabling this debug output and Aleksander for his wonderful document about Valgrind memory leaks reports.

PS. Take a look to the code profiling tool provided by Valgrind, cachegrind [4] and its KDE graphical interface kcachegrind! They are simply awesome.


[1] Valgrind. http://valgrind.org/
[2] Valgrind manual page about the tool memcheck.
[3] Understanding Valgrind memory leaks reports.
[4] Valgrind manual page about the tool cachegrind.
[5] An interesting thread is stack overflow about different types of memory leaks detected by valgrind memcheck.



I tried to write a tutorial focusing on the type of memory leaks as detected internally by Valgrind and the generated output leak reports.

The PDF is available in the following URL:

And the simple C tester to generate each type of memory leak (cases 1 to 9 in the Valgrind Manual) is available here:

Comments, fixes, whatever… highly appreciated. You could even send me a diff file to the original LaTeX source, available in:

The document is licensed under the GFDL 1.3, and the tester is published in the public domain.

View original post

Finally, a new project update

It is about time to say something again about how the project is going. First of all, as a matter of excuse (probably a bad one though) I want to say why the blog has not been updated lately. There have been a couple of weeks in which I have been debugging the code for the primal formulation of the SO-SVM based on MOSEK using a simple multiclass classification example. Those two weeks were basically that, debugging testing, debugging again, testing, … it is not interesting stuff to tell the truth but very necessary in any case! However, last week I started to work in the – probably – most interesting part of the project, the Hidden Markov – Support Vector Machine [1].

I like to think of HM-SVMs as a particular instance of the general Structured Output (SO) problem I have been talking about in the previous posts. In SO learning we are trying to find a function f that takes an object from the space \mathcal{X}  and outputs and object from the space \mathcal{Y}. In HM-SVMs the objects of \mathcal{Y} are sequences of discrete data (e.g. integers). In order to deal with the structure of the sequences properly, we extract features of the data in the space \mathcal{Y} together with the data in the space \mathcal{X}. This is done via a function we denote as \Psi(x, y) whose definition appears in the tutorial about HM-SVM that Nico, my GSoC mentor, wrote [2, pages 4 and 5].

Another important piece of the SO-SVM is the argmax function. In HM-SVMs this function is computed using the well-known Viterbi algorithm. I had never studied before this algorithm but I was happy to discover its similar structure to the forward-backward algorithm used to train an HMM from data that I have already studied and implemented [3]. I found this tutorial [4] on the Viterbi algorithm very useful and easy to understand. The implementation I have done in SHOGUN of this algorithm is based on the one in the HMSVM toolbox [5] (in particular, it is in this file [6]).

One more piece of code I have implemented in the toolbox for the HM-SVM is the loss function, commonly denoted as \Delta. In this case, we use the Hamming distance defined for strings. In our case, the inputs of the loss function are sequences of the same length so the Hamming loss is simple the count of the number of elements that are different in the sequences (think of it as the minimum number of modifications one may do to one of the sequences to make it equal to the other. Providing that the sequences are of the same length, the only modification allowed is to change the value of an element in one of the sequences).

These three functions, together with some data structures to hold features and labels used by the HM-SVM, constitute what I have implemented so far into SHOGUN about this interesting discriminative model. The plan now is to correct with Nico the possible mistakes I have introduced into the code and to generate some training and test data so I can start debugging phase! My idea is to port this function [7] from HM-SVM toolbox that generates random two-state data.

Comments and suggestions are very welcome!

You can find the code I have been talking about in github:
(mainly in the files structure/HMSVMModel.[cpp|h], features/MatrixFeatures.[cpp|h], structure/HMSVMLabels.[cpp|h]).


[1] Altun, Y., Tsochantaridis, I., Hofmann, T. Hidden Markov Support Vector Machines.
[2] SHOGUN Hidden Markov SO-SVM written by Nico Görnitz.
[3] The code of my HMM implementation for one of the homeworks of the Artificial Intelligence course that I took at KTH. https://github.com/iglesias/hmm-ai-ducks
[4] Viterbi algorithm tutorial.
[5] Entry in mloss.org for the HM-SVM toolbox written by Gunnar Raetsch and Georg Zeller.
[6] Viterbi algorithm implementation in the HM-SVM toolbox.
[7] Data generation for a two-state model in the HM-SVM toolbox.