July 7th 2009

Review of Intel Parallel Studio

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 5.00 out of 5)
Loading ... Loading ...

I’ve played a little bit with Intel Parallel Studio. Let’s say it has been a pleasant trip out in the wildness of multithreaded applications.

Intel Parallel Studio is a set of tools geared toward multithreaded applications. It consists of three Visual Studio plugins (so you need a fully-fledged Visual Studio, not an Express edition):

  • Parallel Inspector for memory analysis
  • Parallel Amplifier for thread behavior and concurrency
  • Parallel Composer for parallel debugging

This is an update of the review I’ve done for the beta version. Since this first review, I’ve tried the official first version.


Since the beta phase, Intel added a lot of documentation, online help, as well as additional samples. This was my main complaint at that time, and now, I can say that Intel provides a complete tool with appropriate help. There is still room for improvement, but not much. For instance, here are videos presenting Parallel Studio.

There is a simple sample to show how all the plugins can be used simultaneously, the NQueens solution, that is also the main Composer example. For Composer though, different parallelization solutions are proposed. According to the Starting Guide and other documents, Intel’s workflow consists of using Advisor (I’ll try to use it in an other post), then Composer to debug the parallelization, Inspector to check for contentions, … and then Inspector to profile the application. One of the videos is dedicated to showing how to use the plugins with the NQueens sample.

As a final point, each plugin has a specific, parametrable toolbar, with a distinct icon.

Parallel Composer

Parallel Composer is mainly an parallel extension to Visual Studio’s debugger. It is based on an Intel runtime, which means you have to use Intel C++ Compiler, which is provided, as well as IPP (a primitives library) and TBB (a parallel library), but not MKL, the scientific library. The 11.1 version of the compiler provides OpenMP 3.0 (Visual Studio compiler only provides 2.5) and thus task parallelism. Intel’s goal is to provide this to C developers (C++ programmers can use TBB, for instance).

The goal of the extension is to detect shared data and its implication on reentrancy (can this function be simultaneously called by different threads ?) or the task and thread tree with OpenMP.

Parallel Composer: The additional options during debugging

Parallel Composer: The additional options during debugging

The OpenMP panels are not only for OpenMP. They are for every extension that needs /qopenmp (for instance for the parallel extension like __par), in which case useful information is displayed for the state of existing threads. It is also possible to suppress the multithreading and use a monothread execution.

Parallel Composer: Task and thread views

Parallel Composer: Task and thread views

It seems that it is possible to debug several process simultaneously, like TotalView does, but there are no example and no tutorial to explain how to do this.

Parallel Composer is a powerfull debugger extension, with a lot of information that you can get. On one hand, Intel did also a good job to provide tutorials and an online help. On the other hand, the documentation for the most important plugin is perhaps the shortest compared to the two other ones.

Parallel Inspector

Parallel Inspector is in charge of detecting general memory issues as well as thread memory issues. Depending on the inspection level, the execution time can be several times longer. Each time a problem is detecting, it is assigned a gravity degree and registered in a list where you can then have access to its location and the source code.

The first analysis is the general memory one. It detects, for instance, memory leaks. Here is a result that it can give:

Parallel Inspector: memory report

Parallel Inspector: Memory report

Parallel Inspector: Location of a memory leak

Parallel Inspector: Location of a memory leak

Usually, this kind of detection needs to modify your code, or with Linux, you have to preload a library that will detect memory leaks (or use valgrind). Here, the really great point is that there are no modification to do on the code and you can use the compiler of your choice.

The real addition of Inspector is of course not checking for memory leaks. Parallel Inspector is not titled “Parallel” for nothing. It can check concurrent memory accesses, and thus warn the developers that some threads can read or write concurrently. Of course, once you’ve checked the access is not dangerous, you can indicate Inspector to skip it (so the inspection is faster next time).

Parallel Inspector: Concurrent memory access

Parallel Inspector: Concurrent memory access

Parallel Inspector: Source code of a memory access

Parallel Inspector: source code of a memory access

Inspector is, in my opinion, the easiest-to-use plugin of Parallel Studio. I find it easy to use because memory checks is something developers always care, so we know what to expect from it.

Parallel Amplifier

Parallel Amplifier is a profiler (I don’t know if it is instrumentation- or sampling-based) like the one you can found in Visual Studio Team edition, or like VTune, the fully-fledged profiler Intel sells as a stand-alone product. Here, you can only get the execution time, but it is still valuable information (if you need more, go and get VTune or Visual Studio Team). Then, for the Parallel profile, you can get the concurrency quality as well as waiting time.

Hotspot is the first profile you can get. The goal is to find where the application sends most of its time, which is in fact called the “hotspot”. In the next example, it is algorithm2, and by double-clicking on it, an annotated source code is displayed.

Parallel Amplifier: hotspot profil

Parallel Amplifier: Hotspot profil

Parallel Studio: Hotspot annotated source code

Parallel Studio: Hotspot annotated source code

How scalable is my program? This is what the second profile tries to answer to. In this case, the scalability is given in the panel at the lower right of the screen (here, for two processors, I get 1.57, which means 78.4% of use, or efficiency). Source code can then be displayed with the annotations, here the lack of concurrency comes from the display routines. On the other hand, algorithm2 scales well. To optimize your concurrency, what you need is to reduce the red/”poor” part of the bar, and maximize the other ones.

Parallel Amplifier: concurrency

Parallel Amplifier: Concurrency

Parallel Amplifier: Concurrency annoted source code

Parallel Amplifier: Concurrency annoted source code

Finally, a crucial issue is waiting and locks. Here again, Amplifier has a specific profile. Here, the main thread only waits for the subthreads to return.

Parallel Amplifier: Waits and locks

Parallel Amplifier: Waits and locks

Parallel Amplifier: Annotated source code for waits and locks

Parallel Amplifier: Annotated source code for waits and locks

Profiling should be done anytime, and it is interesting to see whether one optimization enhances the program or not. Amplifier can help you do this.

Parallel Amplifier: Profils comparison

Parallel Amplifier: Profils comparison

Amplifier comes with several examples, and a good online help. It is not meant to be a full guide to optimization (there are complete books dedicated to this topic), but it gives you access to the tools you need and some leads to use them correctly.

Conclusion

If Amplifier and Inspector are intuitive and simple to use, it is perhaps not the same for Composer. Intel provides several videos as tutorials to help you use all the plugins, as well as complete guides and samples. Parallel Composer is perhaps less documented, but it is mainly more complicated to use, at least from my point of view.

This product is very helpful, in my opinion, not code intrusive (I’m thinking about Amplifier and Inspector for detecting issues without additional libraries) and efficient. The tackled issues are not easy ones to solve, and it does it brilliantly. Since the beta phase, Intel did a tremondous job at providing better documentation for its tool, and now it is the best tool for multithreaded development.

Dr. Dobbs publish some days ago a small post on what is needed for multithreaded application development, and it said Parallel Studio is the perfect tool to help this.

Intel Parallel Studio
Intel Parallel Studio
Price: $799.95
Designed for today s serial applications and tomorrow’s software innovators
Tags: , , , , ,

No Comments yet »

Trackback URI | Comments RSS

Leave a Reply

« | »

  • Blog Vitals

    Blog Stats
    17,299,528
    172
    197
    34
  • Advertisement