<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>Matthieu Brucher&#039;s blog &#187; Profiler</title> <atom:link href="http://matt.eifelle.com/category/tools/profiler/feed/" rel="self" type="application/rss+xml" /><link>http://matt.eifelle.com</link> <description></description> <lastBuildDate>Tue, 27 Jul 2010 07:04:23 +0000</lastBuildDate> <generator>http://wordpress.org/?v=2.9.1</generator> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Book review: The Art of Concurrency: A Thread Monkey&#8217;s Guide to Writing Parallel Applications</title><link>http://matt.eifelle.com/2009/12/08/book-review-the-art-of-concurrency-a-thread-monkeys-guide-to-writing-parallel-applications/</link> <comments>http://matt.eifelle.com/2009/12/08/book-review-the-art-of-concurrency-a-thread-monkeys-guide-to-writing-parallel-applications/#comments</comments> <pubDate>Tue, 08 Dec 2009 07:57:54 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Book review]]></category> <category><![CDATA[C++]]></category> <category><![CDATA[Debugger]]></category> <category><![CDATA[O'Reilly]]></category> <category><![CDATA[Profiler]]></category> <category><![CDATA[Tools]]></category> <category><![CDATA[Multithreaded applications]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=847</guid> <description><![CDATA[Free lunch is over, it&#8217;s time to go concurrent. The Art of Concurrency addresses the need for a workflow to develop concurrent/parallel applications.Mainly based on multithreaded applications, the book covers pthreads, Windows threads, OpenMP or Intel Threading Building Blocks library. It also covers some part of multiprocess applications if there are differences with threaded ones.
Content [...]]]></description> <content:encoded><![CDATA[<p><a
href="http://www.gotw.ca/publications/concurrency-ddj.htm">Free lunch is over</a>, it&#8217;s time to go concurrent. <span
style="text-decoration: underline;">The Art of Concurrency</span> addresses the need for a workflow to develop concurrent/parallel applications.<br
/> <span
id="more-847"></span><br
/> Mainly based on multithreaded applications, the book covers pthreads, Windows threads, OpenMP or <a
href="http://matt.eifelle.com/2008/07/09/book-review-intel-threading-building-blocks-outfitting-c-for-multi-core-processor-parallelism/">Intel Threading Building Blocks library</a>. It also covers some part of multiprocess applications if there are differences with threaded ones.</p><h4>Content and opinions</h4><p>The book starts with two chapers on what actions to take before parallelizing and what can and what cannot. Before the usual algorithms that can be parallelized, the author takes three chapters to explain how you may achieve your goal. Ensuring correctness is a difficult task, so the book gives 8 rules to help and then an explanation of several support libraries that can be used.</p><p>The biggest part of the book, as I&#8217;ve hinted, is dedicated to simple but usual algorithms that may be parallelized: sums and scans, mapreduce, sorts, searches, and graph algorithms. Each time, several different algorithms are first coded in a serial way and then parallelized with possibly different support libraries. Also each time, the efficiency, the simplicity, the portability and the scalability conclude the sub art: it helps standing back.</p><p>The last chapter is a small overview of the additional tools that you may use (but they are not mandatory). They are mainly Intel&#8217;s tools, but it&#8217;s mainly because Intel provides the developer with some of the best tools.</p><h4>Conclusion</h4><p>Although the author works for Intel, he doesn&#8217;t expose Intel tools more than others. The book tone is adequate, not too much serious, not like a &#8220;For Dummies&#8221;, so just enjoyable.</p><p>If you need advices to parallelize your applications and you don&#8217;t want to buy <a
href="http://matt.eifelle.com/2009/03/10/book-review-patterns-for-parallel-programming/">Patterns for Parallel Programming</a>, buy this one (well, buy it anyway).</p><div
style="border: 1px solid #000; padding: 5px; margin-bottom: 15px; background: url(http://matt.eifelle.com/wp-content/uploads/2009/12/BN_Logo_3tier.jpg) right bottom no-repeat #ffffff;"> <a
rel="nofollow" href="http://r.popshops.com/pp/69253/the-art-of-concurrency-a-thread-monkey-s-guide-to-writing-parallel-applications"><img
style="width: 150px;" src="http://images.barnesandnoble.com/images/37180000/37189689.JPG" border="0" alt="The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications" /></a><br
/> <a
rel="nofollow" href="http://r.popshops.com/pp/69253/the-art-of-concurrency-a-thread-monkey-s-guide-to-writing-parallel-applications">The Art of Concurrency: A Thread Monkey&#8217;s Guide to Writing Parallel Applications</a><br
/> Price: $40.49</div><div
class="subcolumns"><div
style="border: 1px solid #000; padding: 5px; margin-bottom: 15px; background: url(http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/amazon_US_small.gif) right bottom no-repeat #ffffff;"><div
style="width: 57px; float: left; margin-right: 5px;"> <a
href="http://www.amazon.com/exec/obidos/ASIN/0596521537/masbl03-20" target="_blank"><img
src="http://ecx.images-amazon.com/images/I/51QaJYFLmGL._SL75_.jpg" width="57" height="75" border="0" /></a></div><div><p><a
href="http://www.amazon.com/exec/obidos/ASIN/0596521537/masbl03-20" target="_blank">The Art of Concurrency: A Thread Monkey&#8217;s Guide to Writing Parallel Applications</a> (Paperback)<br
/> <span
style="font-size: 0.8em;">by <strong>Clay Breshears</strong></span><br
/> ISBN: 0596521537</p><p><strong>Price:</strong> <span
style="color: #990000; font-weight: bold;">USD 38.48</span><br
/> <strong>40 used &#038; new</strong> available from <span
style="color: #990000; font-weight: bold;">USD 17.99</span></p><p> <img
src="http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/stars-3.5.gif" class="asa_rating_stars" /> | 3.5 | 6</div><div
style="clear: both;"></div></div></div>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2009/12/08/book-review-the-art-of-concurrency-a-thread-monkeys-guide-to-writing-parallel-applications/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Parallel Studio: Using Advisor Lite</title><link>http://matt.eifelle.com/2009/09/22/parallel-studio-using-advisor-lite/</link> <comments>http://matt.eifelle.com/2009/09/22/parallel-studio-using-advisor-lite/#comments</comments> <pubDate>Tue, 22 Sep 2009 08:02:28 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[Distributed Computing]]></category> <category><![CDATA[Interactive RayTracer]]></category> <category><![CDATA[Profiler]]></category> <category><![CDATA[Tools]]></category> <category><![CDATA[Advisor]]></category> <category><![CDATA[Intel]]></category> <category><![CDATA[Parallel Studio]]></category> <category><![CDATA[Raytracing]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=647</guid> <description><![CDATA[After reviewing Parallel Studio, I&#8217;ve decided to look after Advisor Lite. Intel offers it for free, before the actual Advisor is released with a future Parallel Studio version. It aims at steering multithreaded development with Parallel Studio.I&#8217;ve started with the Starting Guide, and in fact, it is the best way to know how to use [...]]]></description> <content:encoded><![CDATA[<p>After reviewing <a
href="http://matt.eifelle.com/2009/07/07/review-of-intel-parallel-studio/">Parallel Studio</a>, I&#8217;ve decided to look after Advisor Lite. Intel offers it for free, before the actual Advisor is released with a future Parallel Studio version. It aims at steering multithreaded development with Parallel Studio.<br
/> <span
id="more-647"></span><br
/> I&#8217;ve started with the Starting Guide, and in fact, it is the best way to know how to use this plugin. Advisor offers four steps, two of them being short-cuts to the online help, and the two others link to some Parallel Studio actions (namely hotspot in Amplifier and the threaded memory check with Inspector).<br
/> The online help is interesting, but once you know how you can parallelize an application and what to look for, the two Parallel Studio actions with the help of some macros presented in the Starting Guide are the only thing you need.</p><h4>Test on parallelizing a custom library</h4><p>I&#8217;ve decided to test Advisor Lite on my <a
href="http://matt.eifelle.com/category/cpp/interactive-raytracer/">Interactive Raytracer</a>. This is a test to verify if Advisor Lite finds the adequate parallelization and the memory sharing issues. It is a simple raytracer, so it can be parallelized for each pixel in the image. The only memory sharing issue that I know of is in the kd-tree ray traversal.</p><h4>Profiling the library</h4><p>First, I will profile the library. For the complete Advisor Lite workflow, I have to use Intel Compiler, and as it is faster than Microsoft&#8217;s compiler, I will use the <strong>timeit_image.py</strong> script instead of the <strong>measure_image.py</strong> I&#8217;ve used when profiling with <a
href="http://matt.eifelle.com/2009/04/07/profiling-with-valgrind/">Valgrind</a> or <a
href="http://matt.eifelle.com/2009/08/18/profiling-with-visual-studio-performance-tool/">Visual Studio</a>.</p><p>Amplifier can show the results in a bottom-up or in a top-down manner. Unfortunately, you only have the exclusive timing that is displayed. In my case, when displaying bottom-up results, the method <strong>getEntryExitDistances()</strong> is the most costly one. In the top-down view, unfortunately, I can&#8217;t have a simple tree, as it can be seen in the following view:</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/irt-profile-advisor.png"><img
class="aligncenter size-medium wp-image-727" title="IRT: Amplifier profile (call-tree view)" src="http://matt.eifelle.com/wp-content/uploads/2009/08/irt-profile-advisor-300x187.png" alt="IRT: Amplifier profile (call-tree view)" width="300" height="187" /></a></p><p>In Visual Studio, I have the same results &#8211; more or less -, but with a correct top-down call-tree:</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/irt-profile-msvc.png"><img
class="aligncenter size-medium wp-image-695" title="Profile returned by Visual Studio Performance Tool (call-tree)" src="http://matt.eifelle.com/wp-content/uploads/2009/08/irt-profile-msvc-300x187.png" alt="Profile returned by Visual Studio Performance Tool (call-tree)" width="300" height="187" /></a></p><p>The method <strong>getEntryExitDistances()</strong> cannot be parallelized: it is recursively called, several times per pixel, which would lead to a lot of memory contention. The simpler task is thus to parallelized the pixel rendering, a perfect data-parallel problem.</p><h4>Annotation of the code</h4><p>OK, now I can annotate my code. I had to dig inside the help for this, as it was not mentionned in the Starting Guide that Intel provides a header, <strong>annotate.h</strong>, which mimics the issues you may encounter in a multithreaded application.</p><p>So you need to read at least once the online help so that you know the available annotation macros, how you can get them and how they will retrieve what you need. Once the code is annotated, it must be recompiled and then the sharing issues can be detected.</p><h4>Detection of sharing issues</h4><p>As expected, Inspector detected errors in the kd-tree traversal:</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/irt-advisor-annotate-correctness.png"><img
class="aligncenter size-medium wp-image-696" title="Memory sharing issues detected by Inspector" src="http://matt.eifelle.com/wp-content/uploads/2009/08/irt-advisor-annotate-correctness-300x187.png" alt="Memory sharing issues detected by Inspector" width="300" height="187" /></a><br
/> The solution in this case is to have a ray-traversal stack per thread, which will have to be implemented in whichever parallel library will be chosen, or simply to put the stack in the actual traversal algorithm and not in the instance.</p><h4>Using TBB</h4><p>I&#8217;ve decided to go for Thread Building Blocks, as it was already used for game development. This seemed to me a good idea, as it is a Open Source solution. So now, I will split the screen in 2D pieces, and add a thread-specific storage in the kd-tree class. Of course, I will have to add a flag to disable this paralellization if TBB is not available.</p><p>The actual parallelization will be in a future post in the Interactive Raytracer category. It is pretty straightforward once I had the different elements Parallel Studio gave me.</p><h4>Conclusion</h4><p>In fact Advisor is mainly the <strong>annotate.h</strong> header, as you have to know your program to put the macros at correct locations. The parallelization must be done by hand, as well as correcting the memory sharing issues.</p><p>The only problem I had is that <strong>annotate.h</strong> includes <strong>window.h</strong>. This header is not C++ compliant and declares some macros as <strong>max()</strong> (in fact I got the same issue with TBB headers!). As I use a <strong>max()</strong> function declared in <strong>std::numerical_limits</strong>,  I had to explicitely undefine this macro.</p><p>Safe from this, Advisor Lite is a good plugin, and I&#8217;m looking forward to seeing Advisor in a next Parallel Studio release.</p>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2009/09/22/parallel-studio-using-advisor-lite/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Profiling with Visual Studio Performance Tool</title><link>http://matt.eifelle.com/2009/08/18/profiling-with-visual-studio-performance-tool/</link> <comments>http://matt.eifelle.com/2009/08/18/profiling-with-visual-studio-performance-tool/#comments</comments> <pubDate>Tue, 18 Aug 2009 08:14:39 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[General]]></category> <category><![CDATA[Interactive RayTracer]]></category> <category><![CDATA[Profiler]]></category> <category><![CDATA[Tools]]></category> <category><![CDATA[Microsoft]]></category> <category><![CDATA[Profiling]]></category> <category><![CDATA[Visual Studio]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=689</guid> <description><![CDATA[After presenting Valgrind as an emulation profiler, I will present Microsoft solution, Visual Studio Performance Tool. It is available in the Team Suite editions, and offers a sampling- and an instrumentation-based profiler. Of course, it is embedded in Visual Studio IDE and accessible from a solution.Using the profiler
First of all, the code must be compiled [...]]]></description> <content:encoded><![CDATA[<p>After presenting <a
href="http://matt.eifelle.com/2009/04/07/profiling-with-valgrind/">Valgrind</a> as an emulation profiler, I will present Microsoft solution, Visual Studio Performance Tool. It is available in the Team Suite editions, and offers a sampling- and an instrumentation-based profiler. Of course, it is embedded in Visual Studio IDE and accessible from a solution.<br
/> <span
id="more-689"></span></p><h4>Using the profiler</h4><p>First of all, the code must be compiled with <strong>/Zi</strong> or <strong>/Z7</strong>. Then, the link edition must be done with <strong>/DEBUG</strong>. Without these options, it won&#8217;t be possible to measure anything.</p><p>Performance Tool must be used in a Visual Studio Solution, which I don&#8217;t have in the case of the Interactive Raytracer. I will have to create an empty project, and I will create a bogus project, just to be able to browse the code in the IDE.</p><p>Once in Performance Tool, I create a new target which will be my generated library:<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-ajouter-binaire-01.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-ajouter-binaire-01-300x187.png" alt="Adding a new target to a Performance session" title="Adding a new target to a Performance session" width="300" height="187" class="aligncenter size-medium wp-image-704" /></a></center></p><p>Then, I have to add it to the targets to profile:<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-proprietes-02.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-proprietes-02-300x187.png" alt="Selected targets to profile" title="Selected targets to profile" width="300" height="187" class="aligncenter size-medium wp-image-703" /></a></center></p><p>Then I modify the launch properties to launch python with a sample script that is supposed to be descriptive enough. If you profile a Visual Studio project, you can use the debugging project properties (there are more environment parameters that you may modify).<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-ajouter-binaire-02.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-ajouter-binaire-02-300x187.png" alt="Modify launch properties" title="Modify launch properties" width="300" height="187" class="aligncenter size-medium wp-image-702" /></a></center></p><h4>Sampling</h4><p>To use the sampling profiler, you have to select it in the session parameters. Once done, you can launch the profiler.</p><p>This is the results I got from the IRT:<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-sampling-01.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-sampling-01-300x187.png" alt="Sampling: Main result page" title="Sampling: Main result page" width="300" height="187" class="aligncenter size-medium wp-image-707" /></a></center><br
/> Only the most relevant are shown in the first panel. The first table is the most important inclusive function. The cost of a function is its cost as well as the functions it has called. Then the second table is the most important exclusive functions: only the cost of the function, without the cost of the called functions, is used.</p><p>This view helps finding the function to optimize. It display the current function at the middle of the panel, and then the caller functions at the top and the callee/called functions at the bottom. The displayed costs are inclusive and exclusive costs.<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-sampling-03.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-sampling-03-300x187.png" alt="Sampling: Function callers and callees" title="Sampling: Function callers and callees" width="300" height="187" class="aligncenter size-medium wp-image-708" /></a></center></p><p>A more general view, the call-tree, can help finding the hotspot as well. The caller can obviously not be displayed.<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-sampling-04.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-sampling-04-300x187.png" alt="Sampling: Call-tree" title="Sampling: Call-tree" width="300" height="187" class="aligncenter size-medium wp-image-709" /></a></center></p><p>It is easy to get access to the code by a left click, which is one of the reasons I&#8217;ve created a bigus Visual Studio project.</p><h4>Instrumentation</h4><p>Instrumentation-based profiling is somewhat different than sampling-based profiling. Here, the executable is modified to count cycles or counters, whereas in sampling-based profile, the program is sampled at regular intervals to get where the program is, the counters, &#8230;</p><p>Instrumentation results are quite like the sampling ones<br
/><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-instrumentation-01.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-instrumentation-01-300x187.png" alt="Instrumentation: Main result page" title="Instrumentation: Main result page" width="300" height="187" class="aligncenter size-medium wp-image-710" /></a></center></p><p><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-instrumentation-03.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-instrumentation-03-300x187.png" alt="Instrumentation: Callers and callees" title="Instrumentation: Callers and callees" width="300" height="187" class="aligncenter size-medium wp-image-711" /></a></center></p><p><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-instrumentation-04.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/08/msvc-instrumentation-04-300x187.png" alt="Instrumentation: Call-tree" title="Instrumentation: Call-tree" width="300" height="187" class="aligncenter size-medium wp-image-712" /></a></center></p><h4>Conclusion</h4><p>According to the different results I got, I know that I have some work to do on the math library. Well, this was some months ago, so now this code optimized as well. Based on Performance Tool profile, a lot has been optimized, as not creating rays inside the loop, &#8230;</p><p>I cannot really compare Performance Tool to Valgrind, as they are not the same kind of profilers. They each give more or less precise information on some parts of your program.<br
/> Compared to KCacheGrind, the callee map is missing, but Microsoft did a great job with the UI. The information may not be visualized as a 2D image, but it is still easy to extract it from the different views.</p><p>As a conclusion, I&#8217;d simply say that Performance Tool is an excellent tool Mixrosoft provides in its high-end Visual Studio versions. To bad it is not acceissble in the lower versions.</p>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2009/08/18/profiling-with-visual-studio-performance-tool/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Served from: matt.eifelle.com @ 2010-07-30 08:53:58 by W3 Total Cache -->