Nice title, surfing on the many core hype, and with a practical approach! What more could one expect from a book on such an interesting subject?
Sometimes, it’s so easy to rewrite some existing code because it doesn’t fit exactly your bill.
I just so the example with an All To All communication that was written by hand. The goal was to share how many elements would be sent from one MPI process to another, and these elements were stored on one process in different structure instances, one for each MPI process. So in the end, you had n structures on each of the n MPI processes.
The MPI_Alltoall cannot map directly to this scattered structure, so it sounds fair to assume that using MPI_Isend and MPI_Irecv would be simpler to implement. The issue is that this pattern uses buffers on each process for each other process it will send values to or receive values from. A lot of MPI library allocate their buffer when needed, but will never let go of the memory until the end. So you end up with a memory consumption that doesn’t scale. In my case, when using more than 1000 cores, the MPI library uses more than 1GB per MPI process when it hits these calls, just for these additional hidden buffers. This is just no manageable.
Now, if you use MPI_Alltoall, two things happen:
- there are no additional buffer allocated, so this scales nicely when you increase the number of cores
- it is actually faster than your custom implementation
Now with MPI 3 standard having non-blocking collective operations, there is absolutely no reason to try to outsmart the library when you need a collective operation. It has heuristics when it knows that it is doing a collective call, so let them work. You won’t be smarter if you try, but you will if you use them.
In my case, the code to retrieve all values and store them in an intermediate buffer was smaller that the one with the Isend/Irecv.
It seems that Packt Publishing is on a publishing spree on Machine Learning in Python. After Building Machine Learning Systems In Python for which I was technical reviewer, Packt published Learning Scikit-Learn In Python last November.
We know now that we won’t have the same serial computing increase we had in the last decades. We have to cope with optimizing serial codes, and programming parallel and concurrent ones, and this means that all coders have to cope with this paradigm shift. If computer scientists are aware of the tools to use, it is not the same for the “average” scientist or engineer. And this is the purpose of this book: educate the average coder.
Continue reading Book review: Introduction to High Performance Computing for Scientists and Engineers
Massively parallel processors are in the mood today. We had small parallel processors with a few cores and the ability to launch serevral threads on one core, we have now many cores on one processor and at the other end of the spectrum, we have GPUs. CPUs vendors are now going in this direction with Larabee and Fusion, and GPUs will still have more cores/threads/… It’s thus mandatory to understand this shift now.
Continue reading Book review: Programming Massively Parallel Processors: A Hands-on Approach
I’ve played a little bit with Intel Parallel Studio. Let’s say it has been a pleasant trip out in the wildness of multithreaded applications.
Intel Parallel Studio is a set of tools geared toward multithreaded applications. It consists of three Visual Studio plugins (so you need a fully-fledged Visual Studio, not an Express edition):
- Parallel Inspector for memory analysis
- Parallel Amplifier for thread behavior and concurrency
- Parallel Composer for parallel debugging
This is an update of the review I’ve done for the beta version. Since this first review, I’ve tried the official first version.
Since this post, Intel has officially released Parallel Studio. This is why I’ve published a new, up-to-date review here.
Some months ago, I’ve decided to dig into raytracing, and more exactly interactive raytracing. So I’ve started writting my own library, based on several publications.
nVidia announced recently its own framework, Intel wants also to do raytracing on Larrabee, it is the current trend.
Continue reading Interactive RayTracer
Some months ago, I had a TotalView tutorial, thanks to my job. Now, I’ve actually used it to debug one of my parallel applications and I would like to share my experience with fantastic tool.
First TotalView is not only a parallel debugger available on several Linux and Unix platforms. It also is a memory checker (MemoryScape and the TotalView plugin) as well as a reverse debugger, that is, you can roll back the execution of a program, even after it crashed (where it would be useless with a standard debugger like GDB).
Continue reading Overview of TotalView, a parallel debugger