March 31st 2010
Massively parallel processors are in the mood today. We had small parallel processors with a few cores and the ability to launch serevral threads on one core, we have now many cores on one processor and at the other end of the spectrum, we have GPUs. CPUs vendors are now going in this direction with Larabee and Fusion, and GPUs will still have more cores/threads/… It’s thus mandatory to understand this shift now.
Content and opinions
First of all, it’s not a book on programming massively parallel processors, it’s a book about CUDA. One of the authors is a nVidia fellow, so it’s no wonder. I think there are three parts in the book: an introduction of CUDA, two examples and then general considerations and the future.
The first 6 chapters (I don’t count the first chapter as a real chapter, it’s more of an introduction to the massively paralell processors and their use in a few pages) are the main CUDA tutorial. I say tutorial because it feel like all beginner courses I’ve taken in CUDA. The content can be found in all Internet classes, so the only advantage is that you have everything in a book. Nothing less, nothing more.
I had a feeling of “deja vu” for the MRI example, the second was unknown to me. There is not much code, only for the relevant parts, but you won’t be able to test the different implementations with what is provided in the book. Besides, several times during the writting flow, new techniques are introduced, but one can’t know what speed-up they provide. Perhaps this is because this speedup cannot be generalized, but still, with proper warnings, the different timings through the GPU port of woth examples would have been great.
The last part is, as I’ve said, more general. It starts with a workflow to help parallelizing with GPUs, then an introduction (too short IMHO) of OpenCL and the future of CUDA with Fermi and the SDK 3.0. The workflow chapter is too small. Of course, the goal isn’t to be like The Art of Concurrency, and at least there is a chapter about the process of selecting the algorithm, … but it is too small. The OpenCL introduction is really an introduction. I’ve seen one small complete OpenCL call, but that’s it. I couldn’t program a single kernel right now. Of course it’s a CUDA book, not an OpenCL one, but the chapter is useless. Perhaps it would be better to merge it with the “future” chapter, as OpenCL is not widely available. Finally, the last chapter states what can be expected of Fermi (really interesting) and of the SDK 3.0.
What I miss in this book is some explanations of the texture memory. The obvious matrix example uses constant memory for caching the memory accesses. Why isn’t texture memory used in this example? It’s far bigger than constant memory and also has a cache, so why not use it? It’s a CUDA book, but a lot of content is freely available in several tutorials that are sometimes better shaped than the book, so why isn’t there some special content, like how the cache works? How can you manage grid sizes that are no a power of two? (it’s explained in one of the example, with zero padding, but there are no protection in the first chapters, which is dangerous) What is coalescing memory and how can I optimize the memory bandwidth with coalescing in mind? (the actual real explanation and appropriate picture is in the last annexe!)
I don’t say that the book is not useful, it’s really interesting as a companion book for a CUDA course or for a beginner. If you’re used to electronic papers, you will not be interested. If you buy this book, don’t expect to know everything about CUDA, or even less massively parallel processors. You will have to dig deeper for specific topics, but at least you will have a good basis.Tags: Book review, CUDA, Parallel and Distributed Computing, Parallel computing
1 Comment »