April 7th 2009
Profiling with Valgrind/Callgrind
Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.
Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.
This is more or less a translation of my French tutorial on Valgrind profiling.
Calling Valgrind
Calling the profiler is really easy:
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes program arguments |
Here, I ask valgrind to use the callgrind profiler plugin, and it is supposed to dump the executed instructions (which will help knowing which part of a function really costs, not only which function), simulate the cache (to help enhancing the processor usage) and collect jumps (to have a dynamic view of the program behavior). Of course, the program must have been compiled with the appropriate compilation options (at least -g).
Analyzing profiles
KCacheGrind is probably the best tool to visualize ad analyze valgrind results (it can also display other profilers results).
When opening a profile, KCacheGrind may not recognize the associated source files. You may add their folder to the annotations folders.

I think the most important graph KCacheGrind provides is the Callee Map. It can be colorized by different means (files, classes, …), the main point being that Callee Map provides an image where the surface of a function represents its weight in the program execution (weight being number of instructions, cache misses, …). Unfortunately it appears that in some cases, KCacheGrind is not able to create everything Callee related. I don’t know why, but I got this on a RedHat 4, the associate KCacheGrind and the latest valgrind.
![]() |
![]() |
Call graphs can also provide intel on how much each function consumes. When double-clicking on a function (in the call graph, in the Callee Map), it is “activated”. The original source code is shown (with jumps, if they were collected) with the cost for each instruction, which functions called the function and which functions are called. Another important thing is the difference between the self cost (sometimes called exclusive cost) and the inclusive cost. The former is the cost of the function alone, the latter is the cost of the function with the cost of the called functions.
![]() |
![]() |
Conclusion
Valgrind combined with KCacheGrind are free tools to make an application profile. It is far from perfect, but it provides valuable information. Instrumentation- and sample-based profiles need a patched kernel (for Linux) or administrator rights (for Windows and Linux), and they can’t provide at the moment every cost, contrary to emulation.
3 Comments »

(5 votes, average: 4.80 out of 5)




Matthieu Brucher’s blog » Profiling with Visual Studio Performance Tool on 18 Aug 2009 at 9:20 AM #
[...] presenting Valgrind as an emulation profiler, I will present Microsoft solution, Visual Studio Performance Tool. It is [...]
Two For One! | Software Development with Linux on 10 Mar 2010 at 4:19 PM #
[...] Profiling with Valgrind [...]
Matthieu Brucher's blog » Cover tree for nearest-neighbors on 27 Mar 2012 at 10:32 AM #
[...] would be unusual) and that would result in an almost linear time.I’ve done a profile with callgrind, but the majority of the time is in the tree construction. No point in checking the search and [...]